-
-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 'reflow' capability for configurable, black-like deterministic formatting #2436
Comments
So I'm going to take a go at this as my next biggie. Here's my rough plan of attack: Step 1. Blunt reflow.First step of implementing this I think will just be on the API (i.e. not on the CLI yet until this is more stable). The "blunt reflow" would be to remove all whitespace and newlines, then step through the file, adding a newline (and appropriate indent) for every This will be implemented ideally as a method on This is totally deterministic, but ignores line lengths, delimiters and will look kind of silly. It demonstrates proof of concept for what we're trying to do though. Step 2. Line break hint segments.To facilitate appropriate splitting of lists (e.g. Adding this capability into the blunt reflow just makes files even longer. Things still look silly, but this is the most extreme version of laying out a file. All still deterministic. Step 3. Meta segment priority.For all NOTE: at this stage, the priority isn't actually used, but we can set it. Step 4. Progressively pack.So I think this is where most of the "magic" happens:
I'm not sure whether we first "blunt reflow" and then work backwards, or whether we remove all newlines and work forwards (probably the latter). Assuming the latter, we progressively add high priority newlines (and work down the priority order), until a segment (and all it's children) all fit within the max line length. That would have the effect for example that we might have to reflow more aggressively in a Alternatives/Options
Would love feedback on this rough plan before I dive in. |
The plan sounds good overall. I had to skim it just now, but one thought I've had about this is that the depth of a segment in the parse tree should have some influence on how lines are broken. All else being equal, my intuition is that it's better to break lines higher than lower in the tree, e.g. each function parameter on its own line is preferred to splitting up a computation for a single parameter, e.g. |
1 similar comment
The plan sounds good overall. I had to skim it just now, but one thought I've had about this is that the depth of a segment in the parse tree should have some influence on how lines are broken. All else being equal, my intuition is that it's better to break lines higher than lower in the tree, e.g. each function parameter on its own line is preferred to splitting up a computation for a single parameter, e.g. |
Sounds great @alanmcruickshank . What are your thoughts on segment-level configurability of this, and do you think there are any considerations we should be wary of early on on to facilitate that? Referring to this idea in the issue description:
Where these configs would then descend down to any segments which inherit from another. I'm then imagining a world where community members can codify their entire code style using this config but perhaps I'm getting too far ahead. |
For configuration it would leverage the existing configuration options for turning some meta segments on and off (a la I don't think "normal" indentation is that different from team to team (beyond the examples that we've already covered) - what kind of use cases did you have in mind for a much more flexible approach? |
Agreed @barrywhart - I'm not yet sure how tree depth and priority might interact, but I definitely agree with the intent. |
Feature
Same functionality as black, but totally configurable. The 'reflow' command would produce a deterministic output for the same 'code_only' input, regardless of the non-code elements (whitespace, capitalisation, indentation).
Benefits:
Illustrative example
The user has a sql file with the following:
The user has configured the following:
And so the output is:
How
We're already embedding Indent and Dedent into the dialects, this feature builds upon that to add all configuration and functionality to render valid SQL from code-only elements.
We've talked for a while about the idea of separating rules by what they do. It seems we can clearly draw a line between:
This feature focuses strictly on the latter and would not add, remove or modify code segments beyond their capitalisation.
Implementation suggestions
pre_newline
config at the top level segment.SchemaReferenceSegment
would useObjectReferenceSegment
's configuration if it did not have its own).select_clause_element: pre_newline_when_multiple
.My current availability combined with the fact that there are many much better developers than me around here means I would really love to see someone/others take this on, I'm not precious about any of the above ideas!
The text was updated successfully, but these errors were encountered: