Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Black cannot parse previously parseable file in 24.4.1 #4329

Closed
mrmundt opened this issue Apr 24, 2024 · 14 comments · Fixed by #4332
Closed

Black cannot parse previously parseable file in 24.4.1 #4329

mrmundt opened this issue Apr 24, 2024 · 14 comments · Fixed by #4332
Labels
T: bug Something isn't working

Comments

@mrmundt
Copy link

mrmundt commented Apr 24, 2024

Describe the bug

The newest version of black started failing on our CI due to an error about not being able to parse a line that it used to be able to parse just fine.

To Reproduce

On the file pyomo/contrib/pyros/util.py: black pyomo/contrib/pyros/util.py

The resulting error is:

error: cannot format pyomo/contrib/pyros/util.py: Cannot parse: 2009:77: return f"{attr_val_str:{f'{self._ATTR_FORMAT_LENGTHS[attr_name]}'}}"

Expected behavior

It doesn't just error and fail on that file.

Environment

  • Black's version: 24.4.1
  • OS and Python version: MacOS 3.11; Ubuntu 22.04 3.10
@mrmundt mrmundt added the T: bug Something isn't working label Apr 24, 2024
@JelleZijlstra
Copy link
Collaborator

Thanks! cc @tusharsadhwani.

@mrmundt
Copy link
Author

mrmundt commented Apr 24, 2024

No no, thank YOU! We love your tool :) (Well, I love it. My team grumbles when they forget to run it and our linting job snarks at them.)

@tusharsadhwani
Copy link
Contributor

This seems like the minimal reproduction:

f"{1:{f'{2}'}}"

@tusharsadhwani
Copy link
Contributor

actually, using same or different quotes gives us two different crash scenarios:

f'{1:{f'{2}'}}'

If the quotes of the outer and inner fstring are the same, we get a different crash.

@tarper24
Copy link

Using the same quotes is a syntax error in Python itself. You terminate the string early.

>>> f'{1:{f'{2}'}}'
  File "<stdin>", line 1
    f'{1:{f'{2}'}}'
            ^
SyntaxError: f-string: expecting '}'

@tusharsadhwani
Copy link
Contributor

@tarper24 it works fine on Python 3.12 onwards.

@JelleZijlstra
Copy link
Collaborator

@tarper24 not in Python 3.12 any more. That's actually why we made this change; we had to revamp the parser around f-strings to support the new syntax. Unfortunately that caused us to start failing on some f-strings that were already valid. We found a few such cases before release by running Black on various codebases, but unfortunately we missed your case.

@JelleZijlstra
Copy link
Collaborator

I spent some time on this but couldn't figure out a solution yet.

The reproducer gets tokenized like this:

% python -m blib2to3.pgen2.tokenize 4329.py
1,0-1,2:	FSTRING_START	'f"'
1,2-1,2:	FSTRING_MIDDLE	''
1,2-1,3:	LBRACE	'{'
1,3-1,4:	NUMBER	'1'
1,4-1,5:	OP	':'
1,5-1,5:	FSTRING_MIDDLE	''
1,5-1,6:	OP	'{'
1,6-1,8:	FSTRING_START	"f'"
1,8-1,8:	FSTRING_MIDDLE	''
1,8-1,9:	LBRACE	'{'
1,9-1,10:	NUMBER	'2'
1,10-1,11:	OP	'}'
1,11-1,12:	FSTRING_MIDDLE	"'"
1,12-1,13:	RBRACE	'}'
Traceback (most recent call last):

The FSTRING_MIDDLE "'" near the end is wrong; it should be an FSTRING_END, closing the inner f-string.

My current thinking is that the issue is that the inside_fstring_colon in the tokenizer gets set to True for the outer f-string and then applied incorrectly while we're parsing the inner f-string. To address that, I tried turning inside_fstring_colon into a stack with an entry for each nested f-string, but that so far doesn't work.

@tusharsadhwani
Copy link
Contributor

Commenting out and bracelev == 0 in the part that yields RBRACE fixes this case. But it breaks other cases. That's how far I got yesterday night

@tusharsadhwani
Copy link
Contributor

tusharsadhwani commented Apr 25, 2024

Also it's not the FSTRING_MIDDLE that's incorrect, it's the OP just above it, which should be an RBRACE to match the LBRACE.

@tusharsadhwani
Copy link
Contributor

The minimised case that breaks when making the bracelev change is:

f'{1:{2}d}'

@JelleZijlstra
Copy link
Collaborator

What is the difference between OP and LBRACE/RBRACE here? I noticed the variation but it wasn't clear to me which one is correct.

@tusharsadhwani
Copy link
Contributor

tusharsadhwani commented Apr 25, 2024

In the original impl it's very blurry what to use, but I went with yielding LBRACE whenever we go from collecting FSTRING_MIDDLE tokens to parsing python expressions again

@JelleZijlstra
Copy link
Collaborator

I got something that appears to work: #4332.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants