Skip to content

Conversation

encukou
Copy link
Member

@encukou encukou commented Aug 27, 2025

The lexical analysis docs have notes like this at the end:

  • The period can also occur in floating-point and imaginary literals.

  • The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer: ' " # \

  • The following printing ASCII characters are not used in Python. Their occurrence outside string literals and comments is an unconditional error: $ ? `

The intent behind these seems to be providing a "map" of what all the ASCII characters do in Python, but that map is incomplete as it is, and isn't really kept up to date.

This instead provides a summary of source characters -- nominally the ones that start tokens, with notes for other notable cases.
The table can also serve as an alternate "table of contents".

The presentation -- a table of bulleted lists -- is a bit wacky but I think it gets the job done.


📚 Documentation preview 📚: https://cpython-previews--138194.org.readthedocs.build/

The lexical analyzer determines the program text's :ref:`encoding <encodings>`
(UTF-8 by default), and decodes the text into
:ref:`source characters <lexical-source-character>`.
If the text cannot be decoded, a :exc:`SyntaxError` is raised.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this also not raise a Unicode*Error? For example, running a file with a surrogate:

UnicodeEncodeError: 'utf-8' codec can't encode characters in position 1-2: surrogates not allowed

* formfeed
* * :ref:`Whitespace <whitespace>`

* * * CR, LF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also list CRLF, as we list in the formal grammar:

newline: <ASCII LF> | <ASCII CR> <ASCII LF> | <ASCII CR>

.. (the following uses zero-width-joiner characters to render
.. a literal backquote)

backquote (``‍`‍``)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a substitution be used here? See example for nbsp in math.rst.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir skip news
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

3 participants