Make pasta indentation easier. #97

martinwicke · 2021-12-07T00:11:14Z

pasta guarantees lossless round-trip but doesn't provide an easy way to change indentation levels. This change adds a special @@indent@@ token which is embedded into the contents of pasta's annotations dict whenever whitespace is detected to be for the purpose of indentation. It is stored instead of the indentation whitespace is represents. If the 'indent' dict entry is changed, the generated code's indentation level will change as a result.

@indent

pasta guarantees lossless round-trip but doesn't provide an easy way to change indentation levels. This change adds a special @@indent@@ token which is embedded into the contents of pasta's annotations dict whenever whitespace is detected to be for the purpose of indentation. It is stored instead of the indentation whitespace is represents. If the 'indent' dict entry is changed, the generated code's indentation level will change as a result.

Recursive calls in to_tree_str had the wrong argument order.

@nl

Block whitespace uses the token generator directly. Therefore, factor out the @@nl@@ to @@indent@@ logic from ws() and also apply it wherever block_suffix_* is set. Clean up the logic for @block_statement: Make a is_annotator property on the Annotator classes. Inline block_suffix, which is confusing, and use the same logic we use in other places where we read block_whitespace from the tokenizer. Fix tests and add a one for block whitespace indentation handling.

@nl

Each node starting at (1,0) was generating a '@@nl@@' for its prefix. This adds a check to ensure that only a single '@@nl@@' is generated at the beginning of the source.

Now with indentation handled explicitly, the previous practice of indenting once per with, even in the continued case, is shown to be buggy. This might have shown up in a test that tests automatic indentation after a continued with, but we have no such test. This commit fixes the visitation such that a continued with statement is visited without an indented wrapper.

try...except...finally is handled in 2.7 with a TryFinally containing a TryExcept as its only body node. Because the inner TryExcept does not absorb any tokens, it "steals" the whitespace from its first body node. That in turn leads to the indent_diff and indent being computed wrongly, and messes up everything else. To fix, don't use @block_statement on TryExcept, and call prefix only if the statement is not continued. Continued statements don't represent any tokens, and therefore should't have any prefix.

is_continued (for 2.7 try/with statements) was added as a node attribute. This moves the information to pasta's formatting dict, which is cleaner (and allows to_tree_str to print the information).

soupytwist · 2021-12-08T23:29:08Z

pasta/base/annotate.py

+      # Replace @@NL@@<indent> with @@indent@@. Note this string munging is safe
+      # because inside whitespace, @@indent@@ is never legal after a newline
+      # (only after a comment, which needs a '#' after a newline).
+      # This might not work in the presence of '\' line continuations.


Is this still true / okay? Is there a TODO here?

Ah. I haven't tested, and I don't think that there are any tests that consider \ continuations. So I expect this to not work in general right now.

The tokenizer will not emit anything for a '', not even in the .src attribute of the tokens:

"this is a" + \ "continued line"

turns into

1,0-1,11: STRING '"this is a"' 1,12-1,13: OP '+' 2,0-2,16: STRING '"continued line"' 2,16-2,17: NEWLINE '\n' 3,0-3,1: NL '\n' 4,0-4,0: ENDMARKER ''

So I think handling this properly might be infeasible?

Interestingly, I think this means that ''.join([t.src for t in tokenize.generate_tokens(src)]) != src. That pretty much guarantees that none of the usual guarantees pasta makes will hold in the presence of ''.

This is fixable if we want to add back the '\\n' into the whitespace in token_generator. I think that might work, but I'm not sure it's worth doing.

martinwicke and others added 7 commits December 8, 2021 12:02

Fix arg order in to_tree_str

2ba7625

Recursive calls in to_tree_str had the wrong argument order.

Fix multiple @@nl@@ at beginning of source.

5881a39

Each node starting at (1,0) was generating a '@@nl@@' for its prefix. This adds a check to ensure that only a single '@@nl@@' is generated at the beginning of the source.

Put is_continued into the formatting dict instead of a node attribute.

9e52b8a

is_continued (for 2.7 try/with statements) was added as a node attribute. This moves the information to pasta's formatting dict, which is cleaner (and allows to_tree_str to print the information).

soupytwist reviewed Dec 8, 2021

View reviewed changes

Update golden test data with @@indent@@.

ceca2a6

soupytwist merged commit a99ab7b into google:astlib Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make pasta indentation easier. #97

Make pasta indentation easier. #97

martinwicke commented Dec 7, 2021

soupytwist Dec 8, 2021

martinwicke Dec 8, 2021

martinwicke Dec 8, 2021

martinwicke Dec 8, 2021

Make pasta indentation easier. #97

Make pasta indentation easier. #97

Conversation

martinwicke commented Dec 7, 2021

soupytwist Dec 8, 2021

Choose a reason for hiding this comment

martinwicke Dec 8, 2021

Choose a reason for hiding this comment

martinwicke Dec 8, 2021

Choose a reason for hiding this comment

martinwicke Dec 8, 2021

Choose a reason for hiding this comment