Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-12486: Document tokenize.generate_tokens() as public API #6957

Merged
merged 5 commits into from
Jun 5, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion Doc/library/tokenize.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,16 @@ The primary entry point is a :term:`generator`:
:func:`.tokenize` determines the source encoding of the file by looking for a
UTF-8 BOM or encoding cookie, according to :pep:`263`.

.. function:: generate_tokens(readline)

Tokenize a source reading unicode strings instead of bytes.

Like :func:`.tokenize`, the *readline* argument is a callable returning
a single line of input. However, :func:`generate_tokens` expects *readline*
to return a str object rather than bytes.

The result is an iterator yielding named tuples, exactly like
:func:`.tokenize`. It does not yield an :data:`~token.ENCODING` token.

All constants from the :mod:`token` module are also exported from
:mod:`tokenize`.
Expand All @@ -79,7 +89,8 @@ write back the modified script.
positions) may change.

It returns bytes, encoded using the :data:`~token.ENCODING` token, which
is the first token sequence output by :func:`.tokenize`.
is the first token sequence output by :func:`.tokenize`. If there is no
encoding token in the input, it returns a str instead.


:func:`.tokenize` needs to detect the encoding of source files it tokenizes. The
Expand Down
17 changes: 15 additions & 2 deletions Lib/test/test_tokenize.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from test import support
from tokenize import (tokenize, _tokenize, untokenize, NUMBER, NAME, OP,
STRING, ENDMARKER, ENCODING, tok_name, detect_encoding,
open as tokenize_open, Untokenizer)
from io import BytesIO
open as tokenize_open, Untokenizer, generate_tokens)
from io import BytesIO, StringIO
import unittest
from unittest import TestCase, mock
from test.test_grammar import (VALID_UNDERSCORE_LITERALS,
Expand Down Expand Up @@ -919,6 +919,19 @@ async def bar(): pass
DEDENT '' (7, 0) (7, 0)
""")

class GenerateTokensTest(TokenizeTest):
def check_tokenize(self, s, expected):
# Format the tokens in s in a table format.
# The ENDMARKER is omitted.
result = []
f = StringIO(s)
for type, token, start, end, line in generate_tokens(f.readline):
if type == ENDMARKER:
break
type = tok_name[type]
result.append(f" {type:10} {token!r:13} {start} {end}")
self.assertEqual(result, expected.rstrip().splitlines())


def decistmt(s):
result = []
Expand Down
9 changes: 6 additions & 3 deletions Lib/tokenize.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
blank_re = re.compile(br'^[ \t\f]*(?:[#\r\n]|$)', re.ASCII)

import token
__all__ = token.__all__ + ["tokenize", "detect_encoding",
__all__ = token.__all__ + ["tokenize", "generate_tokens", "detect_encoding",
"untokenize", "TokenInfo"]
del token

Expand Down Expand Up @@ -653,9 +653,12 @@ def _tokenize(readline, encoding):
yield TokenInfo(ENDMARKER, '', (lnum, 0), (lnum, 0), '')


# An undocumented, backwards compatible, API for all the places in the standard
# library that expect to be able to use tokenize with strings
def generate_tokens(readline):
"""Tokenize a source reading Python code as unicode strings.

This has the same API as tokenize(), except that it expects the *readline*
callable to return str objects instead of bytes.
"""
return _tokenize(readline, None)

def main():
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
:func:`tokenize.generate_tokens` is now documented as a public API to
tokenize unicode strings. It was previously present but undocumented.