Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pasta indentation easier. #97

Merged
merged 8 commits into from
Dec 9, 2021
Merged

Make pasta indentation easier. #97

merged 8 commits into from
Dec 9, 2021

Conversation

martinwicke
Copy link
Contributor

pasta guarantees lossless round-trip but doesn't provide an easy way to change indentation levels. This change adds a special @@indent@@ token which is embedded into the contents of pasta's annotations dict whenever whitespace is detected to be for the purpose of indentation. It is stored instead of the indentation whitespace is represents. If the 'indent' dict entry is changed, the generated code's indentation level will change as a result.

martinwicke and others added 7 commits December 8, 2021 12:02
pasta guarantees lossless round-trip but doesn't provide an easy way to change indentation levels. This change adds a special @@indent@@ token which is embedded into the contents of pasta's annotations dict whenever whitespace is detected to
be for the purpose of indentation. It is stored instead of the indentation whitespace is represents. If the 'indent' dict entry is changed, the generated code's indentation level will change as a result.
Recursive calls in to_tree_str had the wrong argument order.
Block whitespace uses the token generator directly. Therefore, factor out the
@@nl@@ to @@indent@@ logic from ws() and also apply it wherever block_suffix_*
is set.

Clean up the logic for @block_statement: Make a is_annotator property on the
Annotator classes. Inline block_suffix, which is confusing, and use the same
logic we use in other places where we read block_whitespace from the tokenizer.

Fix tests and add a one for block whitespace indentation handling.
Each node starting at (1,0) was generating a '@@nl@@' for its prefix. This adds
a check to ensure that only a single '@@nl@@' is generated at the beginning
of the source.
Now with indentation handled explicitly, the previous practice of indenting
once per with, even in the continued case, is shown to be buggy. This might
have shown up in a test that tests automatic indentation after a continued
with, but we have no such test.

This commit fixes the visitation such that a continued with statement is
visited without an indented wrapper.
try...except...finally is handled in 2.7 with a TryFinally containing a
TryExcept as its only body node. Because the inner TryExcept does not absorb
any tokens, it "steals" the whitespace from its first body node. That in turn
leads to the indent_diff and indent being computed wrongly, and messes up
everything else.

To fix, don't use @block_statement on TryExcept, and call prefix only if the
statement is not continued. Continued statements don't represent any tokens, and
therefore should't have any prefix.
is_continued (for 2.7 try/with statements) was added as a node attribute. This
moves the information to pasta's formatting dict, which is cleaner (and allows
to_tree_str to print the information).
# Replace @@NL@@<indent> with @@indent@@. Note this string munging is safe
# because inside whitespace, @@indent@@ is never legal after a newline
# (only after a comment, which needs a '#' after a newline).
# This might not work in the presence of '\' line continuations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still true / okay? Is there a TODO here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I haven't tested, and I don't think that there are any tests that consider \ continuations. So I expect this to not work in general right now.

The tokenizer will not emit anything for a '', not even in the .src attribute of the tokens:

"this is a" + \
"continued line"

turns into

1,0-1,11:	STRING	'"this is a"'
1,12-1,13:	OP	'+'
2,0-2,16:	STRING	'"continued line"'
2,16-2,17:	NEWLINE	'\n'
3,0-3,1:	NL	'\n'
4,0-4,0:	ENDMARKER	''

So I think handling this properly might be infeasible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I think this means that ''.join([t.src for t in tokenize.generate_tokens(src)]) != src. That pretty much guarantees that none of the usual guarantees pasta makes will hold in the presence of ''.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixable if we want to add back the '\\n' into the whitespace in token_generator. I think that might work, but I'm not sure it's worth doing.

@soupytwist soupytwist merged commit a99ab7b into google:astlib Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants