gh-135676: Add a summary of source characters #138194

encukou · 2025-08-27T15:46:57Z

The lexical analysis docs have notes like this at the end:

The period can also occur in floating-point and imaginary literals.
The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer: ' " # \
The following printing ASCII characters are not used in Python. Their occurrence outside string literals and comments is an unconditional error: $ ? `

The intent behind these seems to be providing a "map" of what all the ASCII characters do in Python, but that map is incomplete as it is, and isn't really kept up to date.

This instead provides a summary of source characters -- nominally the ones that start tokens, with notes for other notable cases.
The table can also serve as an alternate "table of contents".

The presentation -- a table of bulleted lists -- is a bit wacky but I think it gets the job done.

Issue: Reword the Lexical Analysis chapter of the docs #135676

📚 Documentation preview 📚: https://cpython-previews--138194.org.readthedocs.build/

StanFromIreland · 2025-08-27T15:48:08Z

Doc/reference/lexical_analysis.rst

+The lexical analyzer determines the program text's :ref:`encoding <encodings>`
+(UTF-8 by default), and decodes the text into
+:ref:`source characters <lexical-source-character>`.
+If the text cannot be decoded, a :exc:`SyntaxError` is raised.


Can this also not raise a Unicode*Error? For example, running a file with a surrogate:

UnicodeEncodeError: 'utf-8' codec can't encode characters in position 1-2: surrogates not allowed

StanFromIreland · 2025-08-27T15:50:44Z

Doc/reference/lexical_analysis.rst

+       * formfeed
+     * * :ref:`Whitespace <whitespace>`
+
+   * * * CR, LF


Should we also list CRLF, as we list in the formal grammar:

newline: <ASCII LF> | <ASCII CR> <ASCII LF> | <ASCII CR>

serhiy-storchaka · 2025-08-28T15:19:12Z

Doc/reference/lexical_analysis.rst

+         .. (the following uses zero-width-joiner characters to render
+         .. a literal backquote)
+
+         backquote (``‍`‍``)


Could a substitution be used here? See example for nbsp in math.rst.

pythongh-135676: Add a summary of source characters

4f2b85b

bedevere-app bot added docs Documentation in the Doc dir skip news labels Aug 27, 2025

github-project-automation bot added this to Docs PRs Aug 27, 2025

github-project-automation bot moved this to Todo in Docs PRs Aug 27, 2025

bedevere-app bot mentioned this pull request Aug 27, 2025

Reword the Lexical Analysis chapter of the docs #135676

Open

StanFromIreland reviewed Aug 27, 2025

View reviewed changes

serhiy-storchaka reviewed Aug 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-135676: Add a summary of source characters #138194

gh-135676: Add a summary of source characters #138194

encukou commented Aug 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

StanFromIreland Aug 27, 2025

Uh oh!

StanFromIreland Aug 27, 2025

Uh oh!

serhiy-storchaka Aug 28, 2025

Uh oh!

Uh oh!

Uh oh!

gh-135676: Add a summary of source characters #138194

Are you sure you want to change the base?

gh-135676: Add a summary of source characters #138194

Conversation

encukou commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StanFromIreland Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

StanFromIreland Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

encukou commented Aug 27, 2025 •

edited by github-actions bot

Loading