Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indentation rework #1562

Merged
merged 22 commits into from
Mar 30, 2022
Merged

Indentation rework #1562

merged 22 commits into from
Mar 30, 2022

Conversation

Triton171
Copy link
Contributor

@Triton171 Triton171 commented Jan 22, 2022

I reworked the indentation system to remove some of it's limitations and make it easier to extend. This should close #114, #1523 and #1440.

Things I want to implement in followup PRs:

  • Specify nodes to be extended to blank lines (and a list of "return" nodes, that interrupt this behavior, as it's done in nvim-treesitter). This is required to correctly indent languages such as python where treesitter nodes may be too short
  • Specify nodes with a custom prefix (e.g. to automatically make a new line a comment)
  • Specify nodes inside which no treesitter indentation is used (e.g. for multiline string literals)
  • Specify nodes which should be aligned with each other
  • Add indentation tests for other languages

@sudormrfbin
Copy link
Member

I would recommend splitting all the TODOs to separate small PRs since it will be easier to review and keep track of.

Triton171 added 4 commits January 23, 2022 17:48
Add more options to ComplexNode and use them to improve C/C++ indentation.
@Triton171
Copy link
Contributor Author

Thanks @sudormrfbin, splitting it up is a good idea.
This PR now just implements the core rework and migration of all existing indent queries. I've added a few rules to the Rust, C and C++ queries that show what the new features can be used for. Additionally, I've added one missing rule to the Go indent query to fix #1523.

@Triton171 Triton171 marked this pull request as ready for review January 23, 2022 17:05
helix-core/src/syntax.rs Outdated Show resolved Hide resolved
helix-core/src/indent.rs Outdated Show resolved Hide resolved
Comment on lines 208 to 210
struct IndentResult {
indent: i32,
}
Copy link
Contributor

@pickfire pickfire Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment here saying that indentation is in terms of levels and not spaces? Also, I wonder how well this will work for python indention for this:

def hello(a,
          b,
          c):

given that the indentation depends on the previous line, if there are no items there. IIRC even vim have issues on this but there is a plugin that solves this I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding python, I have this on my radar, I'm planning to add something like an align specification. I still have to think about how to make it as intuitive and general as possible. It's kinda difficult to define what to make the "anchor" for the alignment. For simple cases (like aligning the parameters of a function) it should not be to complicated though, since treesitter does all the heavy lifting.

@Triton171
Copy link
Contributor Author

I added some more detailed documentation which should hopefully explain the general idea. @archseer let me know if anything is unclear.

helix-core/src/syntax.rs Outdated Show resolved Hide resolved
helix-core/src/indent.rs Outdated Show resolved Hide resolved
helix-core/src/indent.rs Outdated Show resolved Hide resolved
helix-core/src/indent.rs Outdated Show resolved Hide resolved
helix-core/src/indent.rs Outdated Show resolved Hide resolved
helix-core/src/indent.rs Outdated Show resolved Hide resolved
@pickfire
Copy link
Contributor

Also, please fix the lints.

helix-core/src/indent.rs Outdated Show resolved Hide resolved
@pickfire
Copy link
Contributor

My review ends here. I think it's better for @archseer to check since I don't haven't done indent much.

Comment on lines 2 to 6
all = [
{ parent_kind_in = ["assignment_expression", "compound_assignment_expr"], field_name_in = ["right"]},
{ parent_kind_in = ["let_declaration"], field_name_in = ["value"] }
]
tail = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read through both the examples the docs and the original issue, but I don't understand the need for all vs tail.

The point is to make:

let (
    a_long_variable_name_in_this_tuple,
    b_long_variable_name_in_this_tuple,
    c_long_variable_name_in_this_tuple,
    d_long_variable_name_in_this_tuple,
    e_long_variable_name_in_this_tuple,
): (usize, usize, usize, usize, usize) = //<- cursor here

<Enter> produce indentation level == 1 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, basically. indent.tail produces the behavior of the old indent option (indenting everything except for the first line of a node). indent.all indents the whole node including the first line, similar to how the old outdent behaves. Here we need it because the whole RHS of a let declaration should be indented, including the first line. Additionally, indent and outdent are symmetric in the new version, so we don't need any special cases for them and for everything else that the indent code might handle in the future (comments, alignment, etc.).

Your example doesn't work because of a limitation of tree-sitter: In that situation there isn't a value for the let_declaration yet, so we don't know that we should indent it. The best solution here is probably an optional auto-indent feature that updates the indent as soon as tree-sitter parses the code correctly.
Pressing <Enter> in this situation does produce the correct indent (1):

let (
    a_long_variable_name_in_this_tuple,
    b_long_variable_name_in_this_tuple,
    c_long_variable_name_in_this_tuple,
    d_long_variable_name_in_this_tuple,
    e_long_variable_name_in_this_tuple,
): (usize, usize, usize, usize, usize) = |(0,0,0,0,0);
// Cursor at | in the line above

@Triton171 Triton171 marked this pull request as draft February 10, 2022 14:33
@Triton171
Copy link
Contributor Author

I just had a closer look at the Treesitter query documentation and code and realized I misunderstood the API. Setting the byte range for a query doesn't just find all matches inside that range, it finds all matches that intersect the range. This means that we can query a very small range (usually a couple of bytes should be enough) and still get all the relevant matches for indentation.

Unless there's some disadvantage that I'm not seeing, I'll change this PR to use treesitter queries instead of the custom TOML format, as was suggested in #114. I still think we shouldn't just try to copy the indent queries from nvim-treesitter because they don't support stuff like custom prefixes (e.g. for comments or markdown lists). Maybe I'm wrong though, so suggestions/discussions are welcome.

@archseer
Copy link
Member

I explained this in #114 (comment), the problem is that we aren't querying the innermost node, but instead traversing the tree. So passing the range of a single line will match the let_statement on that line, but it won't give you matches for the enclosing function body that is the parent of the statement.

@Triton171
Copy link
Contributor Author

Triton171 commented Feb 17, 2022

I've improved the rust query a bit and implemented your suggestions. I also tried running the indentation test on helix-term/src/commands.rs. It now matches the indentation for the first ~3500 lines, then it fails because the indentation system currently can't align stuff (which is required when the parameter list of a closure spans multiple lines). I'll implement that in a follow-up PR, once we can handle it, commands.rs should be a good test for rust indentation.

EDIT: During testing, I also found & fixed 2 places where commands.rs wasn't correctly indented. It seems like cargo fmt is somehow struggling with a file that big.

@archseer
Copy link
Member

Looks good, I'm taking a deeper look at indent.rs and I'm test driving this branch for a few days.

@archseer
Copy link
Member

Regarding cargo fmt: it was probably for code inside a macro? It seems to avoid any formatting inside them

helix-core/src/indent.rs Outdated Show resolved Hide resolved
Comment on lines +246 to +247
if added.indent > 0 && added.outdent == 0 {
self.indent += 1;
} else if added.outdent > 0 && added.indent == 0 {
self.outdent += 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd to me, can't indent and outdent both be > 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're also not using the number from added and always use 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of how the indentation that's added from a single line works: We can always have at most one indent or at most one outdent (because the indentation never increases by multiple levels on a single line). Here, added.indent and added.outdent will always be 1 or 0 because added is constructed using the function add_capture. I just check for >0 instead for ==1 to prevent weird errors in case this invariant no longer holds in the future.

If there are both indent and outdent nodes on the same line, we ignore all of them, that's why there is no third case for when added.indent > 0 && added.outdent > 0. This is required to fix #1523.

helix-core/src/indent.rs Outdated Show resolved Hide resolved
helix-core/src/indent.rs Outdated Show resolved Hide resolved
helix-core/src/indent.rs Outdated Show resolved Hide resolved
}) {
continue;
}
for capture in m.captures {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reasonable guess for capacity on indent_captures we could use? Either ::with_capacity() or by calling reserve() inside this loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It corresponds to the number of times this node is captured with an IndentCapture (currently @indent and @outdent. This should be 1 in most cases (because we don't create the Vec if it's 0), I added a with_capacity to make that a bit more explicit.

Copy link
Member

@the-mikedavis the-mikedavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very exciting! 😄

Comment on lines 19 to 21
"}"
"]"
")"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"}"
"]"
")"
"}"
"]"
")"

Sorry this is a bit of a nit. I see the single-space indent in some other indent files as well:

  • dart
  • go
  • llvm
  • nix
  • scala
  • zig

At some point I'd really like to write a formatter for tree-sitter queries to make this all moot 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A formatter would be pretty nice. I must've missed this while converting all the indent queries, thanks for pointing out :)

let mut indent_for_line = AddedIndent::new();
let mut indent_for_line_below = AddedIndent::new();
loop {
let is_first = *first_in_line.last().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment for that?

@@ -85,17 +85,17 @@ pub struct LanguageConfiguration {
// first_line_regex
//
#[serde(skip)]
pub(crate) highlight_config: OnceCell<Option<Arc<HighlightConfiguration>>>,
pub highlight_config: OnceCell<Option<Arc<HighlightConfiguration>>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these made pub because of usage in tests? Why not use the associated methods highlight_config() and indent_query()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's because LanguageConfiguration is constructed directly in the test. Let's instead use the loader with a custom languages.toml? Can reuse this one: https://github.com/helix-editor/helix/blob/ce8311727f32975dc1c1030c676ffc141f63b851/.github/workflows/languages.toml from #1659

node = parent
/// Computes for node and all ancestors whether they are the first node on their line.
/// The first entry in the return value represents the root node, the last one the node itself
fn get_first_in_line(mut node: Node, byte_pos: usize, new_line: bool) -> Vec<bool> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wouldn't get_first_in_line also match this first character?

@archseer
Copy link
Member

archseer commented Mar 4, 2022

Tested over the week and it works great! Once the last set of comments is resolved this is ready for merge 👍🏻

@Triton171
Copy link
Contributor Author

Nice to hear that! I probably won't have time to work on it in the next week or so but I definitely want to finish it (and work on same of the follow-up features) when I can.

@max-privatevoid
Copy link
Contributor

I wrote a basic test suite (currently specific to Nix) to test this PR. The following test cases currently fail:

TESTING: let-in-attr
*** work/let-in-attr.nix	2022-03-05 21:29:07.460718670 +0100
--- output/let-in-attr.nix	2022-03-05 21:28:24.453336376 +0100
***************
*** 1,7 ****
  let
    a = 3;
  in {
    b = [
!       /*DONE*/
    ];
  }
--- 1,7 ----
  let
    a = 3;
  in {
    b = [
!     /*DONE*/
    ];
  }
TESTING: let
*** work/let.nix	2022-03-05 21:29:08.416727170 +0100
--- output/let.nix	2022-03-05 19:31:23.883676561 +0100
***************
*** 1,2 ****
  let
! /*DONE*/
--- 1,2 ----
  let
!   /*DONE*/

The /*DONE*/ marker represents where the cursor would end up at when adding a new line from the line above.


In let-in-attr.nix, Helix seems to pick which indentation level to use in relation to the layout of the entire file rather than the context immediately around the current line. Sometimes, "incorrect" indentations may be preferred for stylistic reasons, such as:

let
  value = 1;
in {
  inherit value;
}

vs the variant Helix would want:

let
  value = 1;
in {
    inherit value;
  }

Nix formatters such as alejandra would transform the latter into the former.

This also happens with other languages, such as JSON. Here, the indentation before "a" was incorrectly removed. Helix places the cursor and closing square bracket on the correct level, while carrying over the incorrect indentation would look more consistent.

{
"a": [
    | <- cursor ends up here
  ]
}
{
"a": [
  | <- cursor should end up here
]
}

The case of let.nix seems to be a matter of dealing with indents while syntax errors are present due to incomplete statements. A let with a missing in results in no indentation on the next line. If the in is added here and a new line is then created after let, it gets indented by one level.

As for why a different behavior would be desirable: Writing let, then some declarations, then in is a more natural flow than writing let and in, then going back up to write the declarations inbetween the two keywords.

@Triton171
Copy link
Contributor Author

Thanks for bringing up these issues, even though some of them will probably be a bit complicated to fix.

The incorrect indentation for let ... in expressions should be fixable with the current system. The issue is that both the let as well as the parenthesized node produce a level of indentation. This could be fixed by replacing the @indent for let expressions with an @indent.all for any node whose parent is let and that is not parenthesized or in (you can use #not-kind-eq) for that. I don't know when/if I'll get to improving Nix indentation but I'm happy to help in case you want to open a PR (once this is merged).
Once it works, having Nix tests in helix-core/tests/indent.rs would also be great.

The fact that indentation is calculated globally can in my opinion be seen both as an advantage and a disadvantage: It manages to correctly compute indentation even when the current indentation is wrong in some places. It just becomes annoying when this "wrong" indentation is actually intentional. We could have an option where we don't apply indent from a tree-sitter node when the indent doesn't exist in the file. For me this isn't super high-priority though.

For the indentation of incomplete code, i can think of 2 possible solutions:

  • Explicitly handle common cases in indents.scm by writing query patterns that contain errors (I've seen this approach in nvim-treesitter). This should already be possible after this PR.
  • Add an (optional) auto-indent feature. This would run on every keystroke and update the code when the correct indent changes. This would need to be smart in order to prevent "correcting" the indent after the user has explicitly set it. Also, it should be enabled on a per-language basis for languages with well-tested indent queries. This would have 2 advantages over the previous approach but would take more effort initially:
    • It doesn't add complexity to the indent query for each languages, it only requires them to be reasonably accurate.
    • It also works in ambiguous situations. For example in C, the line after if (condition) should only be indented, if it isn't the beginning of a compound_statement (i.e. if the next character is not {).

Copy link
Member

@archseer archseer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, stellar work @Triton171! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants