Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A crash in Python 3.5: lib2to3.pgen2.parse.ParseError: bad input: type=16, value='*' #825

Closed
yanqd0 opened this issue Mar 16, 2020 · 22 comments

Comments

@yanqd0
Copy link

yanqd0 commented Mar 16, 2020

This crash will only happen in Python 3.5.

The Test file

def func(iterable, *args, **kwargs):
    other(*iterable, *args, **kwargs)


def other(*args, **kwargs):
    print(args)
    print(kwargs)


func([1, 2], 'arg0', 'arg1', arg2=2, arg3=3)

Its runtime result is just as expected.

$ python3.5 test.py
(1, 2, 'arg0', 'arg1')
{'arg3': 3, 'arg2': 2}

Crash

$ yapf test.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/yapf/yapflib/pytree_utils.py", line 115, in ParseCodeToTree
    tree = parser_driver.parse_string(code, debug=False)
  File "/usr/lib/python3.5/lib2to3/pgen2/driver.py", line 106, in parse_string
    return self.parse_tokens(tokens, debug)
  File "/usr/lib/python3.5/lib2to3/pgen2/driver.py", line 71, in parse_tokens
    if p.addtoken(type, value, (prefix, start)):
  File "/usr/lib/python3.5/lib2to3/pgen2/parse.py", line 159, in addtoken
    raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=16, value='*', context=(' ', (2, 21))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/yapf", line 8, in <module>
    sys.exit(run_main())
  File "/usr/local/lib/python3.5/dist-packages/yapf/__init__.py", line 344, in run_main
    sys.exit(main(sys.argv))
  File "/usr/local/lib/python3.5/dist-packages/yapf/__init__.py", line 226, in main
    verbose=args.verbose)
  File "/usr/local/lib/python3.5/dist-packages/yapf/__init__.py", line 278, in FormatFiles
    in_place, print_diff, verify, quiet, verbose)
  File "/usr/local/lib/python3.5/dist-packages/yapf/__init__.py", line 305, in _FormatFile
    logger=logging.warning)
  File "/usr/local/lib/python3.5/dist-packages/yapf/yapflib/yapf_api.py", line 91, in FormatFile
    verify=verify)
  File "/usr/local/lib/python3.5/dist-packages/yapf/yapflib/yapf_api.py", line 129, in FormatCode
    tree = pytree_utils.ParseCodeToTree(unformatted_source)
  File "/usr/local/lib/python3.5/dist-packages/yapf/yapflib/pytree_utils.py", line 121, in ParseCodeToTree
    tree = parser_driver.parse_string(code, debug=False)
  File "/usr/lib/python3.5/lib2to3/pgen2/driver.py", line 106, in parse_string
    return self.parse_tokens(tokens, debug)
  File "/usr/lib/python3.5/lib2to3/pgen2/driver.py", line 71, in parse_tokens
    if p.addtoken(type, value, (prefix, start)):
  File "/usr/lib/python3.5/lib2to3/pgen2/parse.py", line 159, in addtoken
    raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=16, value='*', context=(' ', (2, 21))

It will crash at other(*iterable, *args, **kwargs).

Environment

$ python3.5 --version
Python 3.5.2
$ yapf --version
yapf 0.29.0
@bwendling
Copy link
Member

Does this happen with Python 3.7 or 3.8?

@yanqd0
Copy link
Author

yanqd0 commented Mar 18, 2020

No, only with Python 3.5 (or maybe below).

@kamahen
Copy link
Contributor

kamahen commented Mar 18, 2020

Probably you're using a syntax feature that was introduced in Python 3.6.

I think that Yapf uses the Python grammar that comes with the runtime, so if you use Yapf with Python 3.5, you're restricted to a Python 3.5 grammar.

@yanqd0
Copy link
Author

yanqd0 commented Mar 20, 2020

Its runtime result is just as expected.

So, it is not a Python 3.6 grammar.

@kamahen
Copy link
Contributor

kamahen commented Mar 20, 2020

Python 3.5 doesn't seem to be available for Ubuntu 18.0.4, so I can't reproduce this. (Python 3.5 was released Sept 2015.)

The error message is typical of what happens when there's a syntax error according to the grammar. For the fun of it, I tried your code with Python 2.7; it failed at line 2 column 21 (at *args), so it's possible that Python 3.5's lib2to3 grammar isn't quite in sync with the grammar that Python 3.5 will accept.

@yanqd0
Copy link
Author

yanqd0 commented Mar 20, 2020

You can use docker to reproduce it.

docker pull python:3.5-buster
docker run --rm -it python:3.5-buster bash
...

In the crash code (yapf/yapflib/pytree_utils.py):

def ParseCodeToTree(code):
  """Parse the given code to a lib2to3 pytree.

  Arguments:
    code: a string with the code to parse.

  Raises:
    SyntaxError if the code is invalid syntax.
    parse.ParseError if some other parsing failure.

  Returns:
    The root node of the parsed tree.
  """
  # This function is tiny, but the incantation for invoking the parser correctly
  # is sufficiently magical to be worth abstracting away.
  try:
    # Try to parse using a Python 3 grammar, which is more permissive (print and
    # exec are not keywords).
    parser_driver = driver.Driver(_GRAMMAR_FOR_PY3, convert=pytree.convert)
    tree = parser_driver.parse_string(code, debug=False)
  except parse.ParseError:
    # Now try to parse using a Python 2 grammar; If this fails, then
    # there's something else wrong with the code.
    try:
      parser_driver = driver.Driver(_GRAMMAR_FOR_PY2, convert=pytree.convert)
      tree = parser_driver.parse_string(code, debug=False)
    except parse.ParseError:
      # Raise a syntax error if the code is invalid python syntax.
      try:
        ast.parse(code)
      except SyntaxError as e:
        raise e
      else:
        raise
  return _WrapEndMarker(tree)

yapf tried to parse the code with _GRAMMAR_FOR_PY3. After failed, it went to _GRAMMAR_FOR_PY2, which crashed.

I agree with you. It seems to be a bug of lib2to3.

But my code target to Python 3 only, why yapf use lib2to3?

@gslavin
Copy link

gslavin commented Mar 25, 2020

I've hit similar crashes using Python3.5, which also seem to be related to the syntax used for unpacking generalizations (https://www.python.org/dev/peps/pep-0448/).

Based on this bug (https://bugs.python.org/issue25969), lib2to3 should support this syntax, so maybe there's a bug in lib2to3.

@chrisspen
Copy link

Is there any workaround for this? Yapf seems to be broken in all recent Python versions. I get this error in Python3.8.

@kamahen
Copy link
Contributor

kamahen commented Jun 25, 2021

Lib2to3 is on its way to deprecation because future Pythons will use a different parsing technology.

I proposed doing some work to make a lib2to3-like interface to the new Python parser, but nobody seemed interested, and it's a fair bit of work.

There are some alternative parsers, but they'd require some work to integrate into yapf, and it's not clear how well they'll work in future either. I've looked at another parser for "leo-editor" but it seems overly complicated for what it does ... I might be persuaded to make a simpler version of that if enough people are interested.

@volareneo
Copy link

Is there any workaround for this? Yapf seems to be broken in all recent Python versions. I get this error in Python3.8.

Hi, how did you solve the problem? My env is python3.8, got the same problem

@kamahen
Copy link
Contributor

kamahen commented Jul 2, 2021

As I said a few days ago, it's a significant amount of work to switch from lib2to3 parser to the new parser. I might work on it, one of these days, but don't have an immediate need and there are more interesting things that I'd rather do first.

@NeilGirdhar
Copy link

NeilGirdhar commented Jul 2, 2021

@kamahen You are right though that if an alternate parser isn't eventually used, yapf will unfortunately end up broken. Have you looked at parso? That's what the Python docs recommend as a replacement.

Also, you asked if people were interested. Please put me down as interested 😄

@kamahen
Copy link
Contributor

kamahen commented Jul 2, 2021 via email

@kamahen
Copy link
Contributor

kamahen commented Jul 4, 2021

It appears that the "black" formatter uses a slightly modified lib2to3, so it would likely have a similar problem.
https://github.com/psf/black/tree/main/src/blib2to3

But it might be worthwhile tracking black's version, as it could handle some things that have been reported in this thread.

@kamahen
Copy link
Contributor

kamahen commented Jul 5, 2021

See also kamahen/pykythe#27

@NeilGirdhar
Copy link

Why not just use black's blib2to3 in yapf?

@kamahen
Copy link
Contributor

kamahen commented Jul 23, 2021

AFAICT black's blib2to3 uses the same compiler technology that lib2to3 uses, and therefore won't work in the future -- the PEG parsers can handle things that the lib2to3 parsers can't.

I'm going to contact one of the black developers about ... hopefully, I'll report back soon. (I need to read up on a few things first ...)

@NeilGirdhar
Copy link

@kamahen Good point, and good idea.

@kamahen
Copy link
Contributor

kamahen commented Jul 24, 2021

It appears that asttokens could be used with the new PEG parser. I'll try converting some of my code to use asttokens and see how it goes. Don't expect a quick response ... I'm going to be out of town for a while.

@kamahen
Copy link
Contributor

kamahen commented Dec 18, 2021

It seems that there's now a PEG parser (implemented in Rust) that's aimed towards ASTs. Somebody might want to investigate whether it'll suffice for yapf.
Instagram/LibCST#566

@kkew3
Copy link

kkew3 commented May 7, 2022

Same problem in Python 3.9

@bwendling
Copy link
Member

Closed with 7c408b9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants