Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: intra-line withPosition formatting #1872

Merged
merged 1 commit into from
Nov 28, 2022

Conversation

digama0
Copy link
Collaborator

@digama0 digama0 commented Nov 22, 2022

Now that people are starting to use mathport more heavily, I think it is more important that we get the formatting right so that people don't blindly copy current formatting mistakes as though they are recommended practice.

This PR is primarily aimed at fixing the formatting of the · tac combinator that is ubiquitous in mathported proofs.

  1. There was an extra newline after each use of · tac. So for example you would get:
by
  · skip
    done
  
  · · skip
      done
    
    
  · done
  
  1. The cause of (1) is that the · parser uses cdotTk many1Indent(tactic ";"? ppLine), and here we can see a second issue: since it's not using sepBy1IndentSemicolon we get the same issues as used to be happening in by, e.g. · skip skip works, and the tactics don't have to be strictly aligned resulting in weird parses.

If we just switch out the many1Indent for sepBy1IndentSemicolon we get some improvements and some regressions:

by
  ·
    skip
    done
  · ·
      skip
      done
  · done

The dot now uses the same "hanging" indentation style as by,try and all the other combinators. But we'd really like it to fit on the same line if possible.

After investigating how to do this in Formatter for a while I came to the conclusion that it would be very hackish and special cased to the · tactic, when what we really want is a way to get withPosition-safe line wrapping. So instead, this implements it inside the Format algorithm itself.

  • We add a new Format constructor, Format.align (force : Bool). The semantics are that if the current position is less than the indentation level (such as would happen immediately after a nest 2), align will pad with spaces until the column matches the indent. If the current column is greater than the indent, then it adds a newline and indents as normal.
  • When the force flag is disabled, align becomes nil when the enclosing group is flattened; otherwise it always pads or line-breaks to the indent. (Currently only force := true is being used, see below.)
  • The sepByIndent combinator used to conservatively put a Format.line before the list when there is at least one newline in the list (that's the spurious newlines we are seeing above) to prevent it from breaking withPosition parsing. Now it can use .align true instead.

The results look pretty good to me, see the formatTerm test. Because the align is used in sepByIndent the same behavior is inherited by other parsers as well. It doesn't make too much difference with lean's default 2-space indent but if you increase the indent to 4 then you will see things like by, try, do also getting the same nesting behavior when they appear at the start of a line.

I had to remove the HACK which forcibly sets sepByIndent to ungrouped, because this causes there to be no indent on the last line, as in:

· skip
  done
·
skip
done

(which still parses correctly, but is clearly not desirable). This was being done in order to allow structure { to hang on the previous line, as a way of recovering from the forced newline that used to be there. Now the { is at the beginning of the line, which also looks fine and matches with the } that hangs at the end of the last line. (I think we have the tools to do hanging { properly now if we need to.)

One issue that existed before and has not been fixed in this PR is that the check for whether to insert a newline, now an align, at the start of sepByIndent is imprecise: it looks for the existence of forced newlines in the sepBy list, e.g. skip\n skip instead of skip; skip. The problem with this is that even if every tactic has a ; after it, if the sequence is long you can get line wrapping caused by the Format because it is using the fill combinator. After the line wrapping, you might get a tactic at the indent line even though we did not use an align at the start, as in:

try pos; tac; tac; tac; tac; tac; tac; tac; tac; tac; tac; tac; 
  ohno; tac; tac; tac;

The ohno tactic is to the left of the pos tactic, so it is either a parse error or will be parsed as part of the surrounding block. It is not possible to fix this in the Formatter though because it doesn't know whether the Format will be line broken or not. We would either need some more Format primitives or a general mechanism that I have seen in some implementations of the Wadler-Leijen pretty printer to allow choice (useIfFlattened useIfNormal : Format) nodes.

@Kha
Copy link
Member

Kha commented Nov 22, 2022

Mmh, this does look a bit complicated for a single tactic. Don't we have basically all the tools to emulate sepBy1Indent with correct formatting in syntax?

syntax cdotTk ppHardSpace withPosition(sepBy1(colGe tactic, "", "; " <|> (colEq linebreak ppLine), allowTrailingSep)) : tactic

It looks like the ppLine is ignored, but I'm not quite sure why...

@gebner
Copy link
Member

gebner commented Nov 22, 2022

Mmh, this does look a bit complicated for a single tactic.

I had hoped that we could use this combinator in a lot of cases where we have to resort to ppLine now (and still have bugs!). See e.g. #1488 (comment) for the class of issues this would fix:

[let x := 42
 id x]

The above doesn't round-trip atm.

Top-level structure instances would also look a lot nicer, right now they're printed like this:

{
  a := 42
  b := 8 }

@gebner
Copy link
Member

gebner commented Nov 22, 2022

I've wanted such an align combinator for a long time btw: #369 (comment)

@Kha
Copy link
Member

Kha commented Nov 22, 2022

I hate top-level parentheses 😄 . It's so awkward that they introduce non-standard single-character indentation. But it's not like I have a better formatting in mind.

The hanging initial token looks like the core issue to me. If we only had top-level parentheses, I assume we could simply indent by 1 for them. Thus why I'm saying we shouldn't need any new tools for cdot, which never hangs. I'm also concerned about Mario's last example, I can't tell yet whether this align is still the way to go for that issue as well.

@Kha
Copy link
Member

Kha commented Nov 22, 2022

I should also mention that I'm working on integrating Lars' refactored Format.pretty mentioned in #1488 into Lean. The current implementation is unmaintainable to me, and we do need to extend it going forward.

@digama0
Copy link
Collaborator Author

digama0 commented Nov 22, 2022

The hanging initial token looks like the core issue to me. If we only had top-level parentheses, I assume we could simply indent by 1 for them. Thus why I'm saying we shouldn't need any new tools for cdot, which never hangs.

The issue is that you have to think about how many characters there will be in the previous token, as well as what the indent size is, to implement an equivalent of .align for things like the formatter implementation of · tac. And by tac. And (tac. And try tac. Maybe we shouldn't have made indent size an option if we were planning to hard-code it... And if the previous syntax has a newline then you have to scrap the whole approach and do something else. That's why .align is better, it just transparently does the right thing in all cases, and the Formatter can just focus on whether it wants to output aligned code or not.

@digama0
Copy link
Collaborator Author

digama0 commented Nov 22, 2022

I'm also concerned about Mario's last example, I can't tell yet whether this align is still the way to go for that issue as well.

I forgot to mention: there is a simple fix for that issue with this PR, namely to remove buggy newline reasoning and just always add an align at the start of a sepByIndent. The downside is that

try
  tac

will always be line-broken like this, as well as other one-line tactic combinators like case => etc, even though in the majority of cases they don't need it. For the immediate (mathport) concern I think that's a worthwhile tradeoff, especially considering that the formatter already does line wrapping incorrectly for mathlib, which has a 100 character limit not 120.

@Kha
Copy link
Member

Kha commented Nov 23, 2022

it just transparently does the right thing in all cases

That's great if you primarily care about correct output, and maybe we should have it just for that, but evidently it is not sufficient to get good output: https://github.com/leanprover/lean4/pull/1872/files#diff-6bde8d5882f415e03ae56f70a39d8d0bed774302649baa5934c1962f024fdd54R40.

My current thinking is:

  • For the rare cases like
. tac1
  tac2

and

(tac1
 tac2)

where we really want to "right-append" a block to a token, we need to indent by the token(+whitespace) length (which we can have a helper combinator do), which should be sufficient as long as we're not in a hanging context. And I really don't ever want to see something like

arbitrarilyLongPrefix (tac1
                       tac2)

If align (used in front of the token) is the most convenient way to avoid a hanging context, great, let's use it.

  • In the more common case like try where we want to insert a newline in between token and tactic block iff there is a newline in the formatted tactic block, "try" ppRealGroup(ppSpace tac1 "; " tac2) really should be sufficient, that's exactly what it's supposed to do. Am I missing something here?
  • Finally, the structure instances... I'm getting convinced that having two different formatting styles dependent on context is just weird, so I'm inclined to say we should just use
{
  f1
  f2
}

everywhere where a single line is not sufficient, even in non-hanging contexts. Like with trailing commas in other contexts, it has the great advantage that the line of the first and last field is not special.

@Kha
Copy link
Member

Kha commented Nov 23, 2022

Maybe we shouldn't have made indent size an option if we were planning to hard-code it

The indentation option is far older than any plans to use Format for actual code formatting. But I agree that we may as well remove it to make our lives easier.

especially considering that the formatter already does line wrapping incorrectly for mathlib, which has a 100 character limit not 120

Why is that, is the option getting lost somewhere?

@leodemoura leodemoura added the dev meeting It will be discussed at the (next) dev meeting label Nov 28, 2022
@leodemoura leodemoura merged commit 17ef0ce into leanprover:master Nov 28, 2022
@gebner
Copy link
Member

gebner commented Nov 30, 2022

This PR has some pretty ugly regressions on mathport:

-def lsmul : A →ₐ[R] Module.EndCat R M where
+def lsmul :
+    A →ₐ[R] Module.EndCat R
+        M where 

(Also note the trailing space on the last line now.)

@digama0
Copy link
Collaborator Author

digama0 commented Nov 30, 2022

The trailing where is I think a separate bug, it was already triggering before under different conditions. I was aware of the potential for trailing spaces but I didn't think it would be too important since most users are set up to strip trailing whitespace (we can even do that in mathport postprocessing if necessary).

I'm not sure what causes the crazy indentation in your example though; everything should be well below the line wrap threshold. What's the generated Format object?

@gebner
Copy link
Member

gebner commented Nov 30, 2022

The trailing spaces are because .align isn't guaranteed to insert a space. The crazy indentation is probably because the align combinator is not resetting the isUngrouped flag.

I think it would be best to make align behave more like the whitespace combinators. It always resets the lead word, and always results in some kind of whitespace (separate line or space).

@digama0
Copy link
Collaborator Author

digama0 commented Nov 30, 2022

Although I think the no-space version of align is useful for some things, it's not important for our immediate goals so I'm fine with replacing it with a version that always uses at least one space for now.

Testing the change is going to be really annoying though. Do you have an idea for how to turn that example into a MWE that can go in formatTerm.lean?

@gebner
Copy link
Member

gebner commented Nov 30, 2022

Here you go:

#eval fmt `(command|
def foo : a b c d e f g a b c d where
  h := 42
  i := 42
  j := 42
  k := 42)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev meeting It will be discussed at the (next) dev meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants