Indentation rework #1562

Triton171 · 2022-01-22T17:30:46Z

I reworked the indentation system to remove some of it's limitations and make it easier to extend. This should close #114, #1523 and #1440.

Things I want to implement in followup PRs:

Specify nodes to be extended to blank lines (and a list of "return" nodes, that interrupt this behavior, as it's done in nvim-treesitter). This is required to correctly indent languages such as python where treesitter nodes may be too short
Specify nodes with a custom prefix (e.g. to automatically make a new line a comment)
Specify nodes inside which no treesitter indentation is used (e.g. for multiline string literals)
Specify nodes which should be aligned with each other
Add indentation tests for other languages

sudormrfbin · 2022-01-23T06:30:50Z

I would recommend splitting all the TODOs to separate small PRs since it will be easier to review and keep track of.

…oncept for assignment statements in rust)

…sue helix-editor#1523) Remove commented-out code

Add more options to ComplexNode and use them to improve C/C++ indentation.

Triton171 · 2022-01-23T17:05:50Z

Thanks @sudormrfbin, splitting it up is a good idea.
This PR now just implements the core rework and migration of all existing indent queries. I've added a few rules to the Rust, C and C++ queries that show what the new features can be used for. Additionally, I've added one missing rule to the Go indent query to fix #1523.

helix-core/src/syntax.rs

helix-core/src/indent.rs

pickfire · 2022-01-24T15:49:35Z

helix-core/src/indent.rs

+struct IndentResult {
+    indent: i32,
+}


Can you please add a comment here saying that indentation is in terms of levels and not spaces? Also, I wonder how well this will work for python indention for this:

def hello(a, b, c):

given that the indentation depends on the previous line, if there are no items there. IIRC even vim have issues on this but there is a plugin that solves this I think.

Regarding python, I have this on my radar, I'm planning to add something like an align specification. I still have to think about how to make it as intuitive and general as possible. It's kinda difficult to define what to make the "anchor" for the alignment. For simple cases (like aligning the parameters of a function) it should not be to complicated though, since treesitter does all the heavy lifting.

Triton171 · 2022-01-29T17:12:42Z

I added some more detailed documentation which should hopefully explain the general idea. @archseer let me know if anything is unclear.

helix-core/src/syntax.rs

helix-core/src/indent.rs

pickfire · 2022-01-30T00:34:36Z

Also, please fix the lints.

helix-core/src/indent.rs

pickfire · 2022-01-31T09:49:25Z

My review ends here. I think it's better for @archseer to check since I don't haven't done indent much.

archseer · 2022-02-08T02:44:19Z

runtime/queries/rust/indents.toml

+all = [
+ { parent_kind_in = ["assignment_expression", "compound_assignment_expr"], field_name_in = ["right"]},
+ { parent_kind_in = ["let_declaration"], field_name_in = ["value"] }
+]
+tail = [


I've read through both the examples the docs and the original issue, but I don't understand the need for all vs tail.

The point is to make:

let ( a_long_variable_name_in_this_tuple, b_long_variable_name_in_this_tuple, c_long_variable_name_in_this_tuple, d_long_variable_name_in_this_tuple, e_long_variable_name_in_this_tuple, ): (usize, usize, usize, usize, usize) = //<- cursor here

<Enter> produce indentation level == 1 here?

Yes, basically. indent.tail produces the behavior of the old indent option (indenting everything except for the first line of a node). indent.all indents the whole node including the first line, similar to how the old outdent behaves. Here we need it because the whole RHS of a let declaration should be indented, including the first line. Additionally, indent and outdent are symmetric in the new version, so we don't need any special cases for them and for everything else that the indent code might handle in the future (comments, alignment, etc.).

Your example doesn't work because of a limitation of tree-sitter: In that situation there isn't a value for the let_declaration yet, so we don't know that we should indent it. The best solution here is probably an optional auto-indent feature that updates the indent as soon as tree-sitter parses the code correctly.
Pressing <Enter> in this situation does produce the correct indent (1):

let ( a_long_variable_name_in_this_tuple, b_long_variable_name_in_this_tuple, c_long_variable_name_in_this_tuple, d_long_variable_name_in_this_tuple, e_long_variable_name_in_this_tuple, ): (usize, usize, usize, usize, usize) = |(0,0,0,0,0); // Cursor at | in the line above

Triton171 · 2022-02-10T14:48:32Z

I just had a closer look at the Treesitter query documentation and code and realized I misunderstood the API. Setting the byte range for a query doesn't just find all matches inside that range, it finds all matches that intersect the range. This means that we can query a very small range (usually a couple of bytes should be enough) and still get all the relevant matches for indentation.

Unless there's some disadvantage that I'm not seeing, I'll change this PR to use treesitter queries instead of the custom TOML format, as was suggested in #114. I still think we shouldn't just try to copy the indent queries from nvim-treesitter because they don't support stuff like custom prefixes (e.g. for comments or markdown lists). Maybe I'm wrong though, so suggestions/discussions are welcome.

archseer · 2022-02-10T15:01:54Z

I explained this in #114 (comment), the problem is that we aren't querying the innermost node, but instead traversing the tree. So passing the range of a single line will match the let_statement on that line, but it won't give you matches for the enclosing function body that is the parent of the statement.

Triton171 · 2022-02-17T18:57:15Z

I've improved the rust query a bit and implemented your suggestions. I also tried running the indentation test on helix-term/src/commands.rs. It now matches the indentation for the first ~3500 lines, then it fails because the indentation system currently can't align stuff (which is required when the parameter list of a closure spans multiple lines). I'll implement that in a follow-up PR, once we can handle it, commands.rs should be a good test for rust indentation.

EDIT: During testing, I also found & fixed 2 places where commands.rs wasn't correctly indented. It seems like cargo fmt is somehow struggling with a file that big.

archseer · 2022-02-23T12:12:09Z

Looks good, I'm taking a deeper look at indent.rs and I'm test driving this branch for a few days.

archseer · 2022-02-23T12:13:11Z

Regarding cargo fmt: it was probably for code inside a macro? It seems to avoid any formatting inside them

helix-core/src/indent.rs

archseer · 2022-02-25T08:42:58Z

helix-core/src/indent.rs

+        if added.indent > 0 && added.outdent == 0 {
+            self.indent += 1;
+        } else if added.outdent > 0 && added.indent == 0 {
+            self.outdent += 1;
+        }


This seems odd to me, can't indent and outdent both be > 0?

You're also not using the number from added and always use 1.

This is because of how the indentation that's added from a single line works: We can always have at most one indent or at most one outdent (because the indentation never increases by multiple levels on a single line). Here, added.indent and added.outdent will always be 1 or 0 because added is constructed using the function add_capture. I just check for >0 instead for ==1 to prevent weird errors in case this invariant no longer holds in the future.

If there are both indent and outdent nodes on the same line, we ignore all of them, that's why there is no third case for when added.indent > 0 && added.outdent > 0. This is required to fix #1523.

helix-core/src/indent.rs

archseer · 2022-02-25T08:46:56Z

helix-core/src/indent.rs

+        }) {
+            continue;
+        }
+        for capture in m.captures {


Is there a reasonable guess for capacity on indent_captures we could use? Either ::with_capacity() or by calling reserve() inside this loop

It corresponds to the number of times this node is captured with an IndentCapture (currently @indent and @outdent. This should be 1 in most cases (because we don't create the Vec if it's 0), I added a with_capacity to make that a bit more explicit.

Reuse tree-sitter cursors for indentation queries.

the-mikedavis

this is very exciting! 😄

the-mikedavis · 2022-02-28T03:31:32Z

runtime/queries/javascript/indents.scm

+ "}"
+ "]"
+ ")"


Suggested change

"}"

"]"

")"

"}"

"]"

")"

Sorry this is a bit of a nit. I see the single-space indent in some other indent files as well:

dart

go

llvm

nix

scala

zig

At some point I'd really like to write a formatter for tree-sitter queries to make this all moot 😄

A formatter would be pretty nice. I must've missed this while converting all the indent queries, thanks for pointing out :)

archseer · 2022-03-02T01:07:03Z

helix-core/src/indent.rs

+    let mut indent_for_line = AddedIndent::new();
+    let mut indent_for_line_below = AddedIndent::new();
+    loop {
+        let is_first = *first_in_line.last().unwrap();


Can you add a comment for that?

archseer · 2022-03-02T01:08:41Z

helix-core/src/syntax.rs

@@ -85,17 +85,17 @@ pub struct LanguageConfiguration {
    // first_line_regex
    //
    #[serde(skip)]
-    pub(crate) highlight_config: OnceCell<Option<Arc<HighlightConfiguration>>>,
+    pub highlight_config: OnceCell<Option<Arc<HighlightConfiguration>>>,


Were these made pub because of usage in tests? Why not use the associated methods highlight_config() and indent_query()?

Ah, it's because LanguageConfiguration is constructed directly in the test. Let's instead use the loader with a custom languages.toml? Can reuse this one: https://github.com/helix-editor/helix/blob/ce8311727f32975dc1c1030c676ffc141f63b851/.github/workflows/languages.toml from #1659

archseer · 2022-03-02T01:19:54Z

helix-core/src/indent.rs

-            node = parent
+/// Computes for node and all ancestors whether they are the first node on their line.
+/// The first entry in the return value represents the root node, the last one the node itself
+fn get_first_in_line(mut node: Node, byte_pos: usize, new_line: bool) -> Vec<bool> {


But wouldn't get_first_in_line also match this first character?

archseer · 2022-03-04T04:37:37Z

Tested over the week and it works great! Once the last set of comments is resolved this is ready for merge 👍🏻

Triton171 · 2022-03-04T12:17:54Z

Nice to hear that! I probably won't have time to work on it in the next week or so but I definitely want to finish it (and work on same of the follow-up features) when I can.

max-privatevoid · 2022-03-05T23:12:02Z

I wrote a basic test suite (currently specific to Nix) to test this PR. The following test cases currently fail:

TESTING: let-in-attr
*** work/let-in-attr.nix	2022-03-05 21:29:07.460718670 +0100
--- output/let-in-attr.nix	2022-03-05 21:28:24.453336376 +0100
***************
*** 1,7 ****
  let
    a = 3;
  in {
    b = [
!       /*DONE*/
    ];
  }
--- 1,7 ----
  let
    a = 3;
  in {
    b = [
!     /*DONE*/
    ];
  }
TESTING: let
*** work/let.nix	2022-03-05 21:29:08.416727170 +0100
--- output/let.nix	2022-03-05 19:31:23.883676561 +0100
***************
*** 1,2 ****
  let
! /*DONE*/
--- 1,2 ----
  let
!   /*DONE*/

The /*DONE*/ marker represents where the cursor would end up at when adding a new line from the line above.

In let-in-attr.nix, Helix seems to pick which indentation level to use in relation to the layout of the entire file rather than the context immediately around the current line. Sometimes, "incorrect" indentations may be preferred for stylistic reasons, such as:

let
  value = 1;
in {
  inherit value;
}

vs the variant Helix would want:

let
  value = 1;
in {
    inherit value;
  }

Nix formatters such as alejandra would transform the latter into the former.

This also happens with other languages, such as JSON. Here, the indentation before "a" was incorrectly removed. Helix places the cursor and closing square bracket on the correct level, while carrying over the incorrect indentation would look more consistent.

{
"a": [
    | <- cursor ends up here
  ]
}

{
"a": [
  | <- cursor should end up here
]
}

The case of let.nix seems to be a matter of dealing with indents while syntax errors are present due to incomplete statements. A let with a missing in results in no indentation on the next line. If the in is added here and a new line is then created after let, it gets indented by one level.

As for why a different behavior would be desirable: Writing let, then some declarations, then in is a more natural flow than writing let and in, then going back up to write the declarations inbetween the two keywords.

Triton171 · 2022-03-11T16:32:26Z

Thanks for bringing up these issues, even though some of them will probably be a bit complicated to fix.

The incorrect indentation for let ... in expressions should be fixable with the current system. The issue is that both the let as well as the parenthesized node produce a level of indentation. This could be fixed by replacing the @indent for let expressions with an @indent.all for any node whose parent is let and that is not parenthesized or in (you can use #not-kind-eq) for that. I don't know when/if I'll get to improving Nix indentation but I'm happy to help in case you want to open a PR (once this is merged).
Once it works, having Nix tests in helix-core/tests/indent.rs would also be great.

The fact that indentation is calculated globally can in my opinion be seen both as an advantage and a disadvantage: It manages to correctly compute indentation even when the current indentation is wrong in some places. It just becomes annoying when this "wrong" indentation is actually intentional. We could have an option where we don't apply indent from a tree-sitter node when the indent doesn't exist in the file. For me this isn't super high-priority though.

For the indentation of incomplete code, i can think of 2 possible solutions:

Explicitly handle common cases in indents.scm by writing query patterns that contain errors (I've seen this approach in nvim-treesitter). This should already be possible after this PR.
Add an (optional) auto-indent feature. This would run on every keystroke and update the code when the correct indent changes. This would need to be smart in order to prevent "correcting" the indent after the user has explicitly set it. Also, it should be enabled on a per-language basis for languages with well-tested indent queries. This would have 2 advantages over the previous approach but would take more effort initially:
- It doesn't add complexity to the indent query for each languages, it only requires them to be reasonably accurate.
- It also works in ambiguous situations. For example in C, the line after if (condition) should only be indented, if it isn't the beginning of a compound_statement (i.e. if the next character is not {).

archseer

Looks good to me now, stellar work @Triton171! 🎉

Triton171 added 4 commits January 23, 2022 17:48

WIP: Rework indentation system

76378a1

Add ComplexNode for context-aware indentation (including a proof of c…

3e39f7a

…oncept for assignment statements in rust)

Add switch statements to Go indents.toml (fixes the second half of is…

c417b9e

…sue helix-editor#1523) Remove commented-out code

Migrate all existing indentation queries.

eac73eb

Add more options to ComplexNode and use them to improve C/C++ indentation.

Triton171 force-pushed the indentation-rework branch from 7260514 to eac73eb Compare January 23, 2022 16:56

Triton171 marked this pull request as ready for review January 23, 2022 17:05

pickfire reviewed Jan 24, 2022

View reviewed changes

helix-core/src/syntax.rs Outdated Show resolved Hide resolved

pickfire reviewed Jan 24, 2022

View reviewed changes

helix-core/src/indent.rs Outdated Show resolved Hide resolved

pickfire reviewed Jan 24, 2022

View reviewed changes

Add comments & replace Option<Vec<_>> with Vec<_>

d4c4533

Triton171 force-pushed the indentation-rework branch from 06664bf to c4a256e Compare January 29, 2022 17:26

Add more detailed documentation for tree-sitter indentation

3f91662

pickfire reviewed Jan 30, 2022

View reviewed changes

helix-core/src/syntax.rs Outdated Show resolved Hide resolved

pickfire reviewed Jan 30, 2022

View reviewed changes

helix-core/src/indent.rs Outdated Show resolved Hide resolved

pickfire reviewed Jan 30, 2022

View reviewed changes

helix-core/src/indent.rs Outdated Show resolved Hide resolved

pickfire reviewed Jan 30, 2022

View reviewed changes

helix-core/src/indent.rs Outdated Show resolved Hide resolved

pickfire reviewed Jan 30, 2022

View reviewed changes

helix-core/src/indent.rs Show resolved Hide resolved

pickfire reviewed Jan 30, 2022

View reviewed changes

helix-core/src/indent.rs Outdated Show resolved Hide resolved

pickfire reviewed Jan 30, 2022

View reviewed changes

helix-core/src/indent.rs Outdated Show resolved Hide resolved

Improve code style in indent.rs

7666318

pickfire reviewed Jan 31, 2022

View reviewed changes

helix-core/src/indent.rs Outdated Show resolved Hide resolved

archseer reviewed Feb 8, 2022

View reviewed changes

Triton171 marked this pull request as draft February 10, 2022 14:33

archseer reviewed Feb 25, 2022

View reviewed changes

Triton171 added 3 commits February 25, 2022 16:25

Move indentation tests to integration test folder.

09d7d5d

Improve code style in indent.rs.

9740d41

Reuse tree-sitter cursors for indentation queries.

Merge branch 'master' into indentation-rework

91d0a75

Triton171 force-pushed the indentation-rework branch from 738ea7a to 91d0a75 Compare February 25, 2022 16:12

Migrate HCL indent query

923ee07

archseer mentioned this pull request Feb 27, 2022

Unexpected indentation when editing nix/lua files. #1712

Closed

the-mikedavis reviewed Feb 28, 2022

View reviewed changes

archseer reviewed Mar 2, 2022

View reviewed changes

Triton171 added 4 commits March 11, 2022 12:01

Merge branch 'master' into indentation-rework

d5f4368

Replace custom loading in indent tests with a designated languages.toml

1504393

Update indent query file name for --health command.

6321937

Fix single-space formatting in indent queries.

ae35bc4

Add explanation for unwrapping.

3055f63

This was linked to issues Mar 30, 2022

Golang indentation broken for type declarations and switch statements #1523

Closed

Improve treesitter indentation #1440

Closed

archseer approved these changes Mar 30, 2022

View reviewed changes

archseer merged commit 58758fe into helix-editor:master Mar 30, 2022

the-mikedavis mentioned this pull request Apr 6, 2022

gdscript support #1985

Merged

pokey mentioned this pull request May 20, 2022

Use tree-sitter queries for smart indentation cursorless-dev/cursorless#685

Open

lambdadog mentioned this pull request Jul 6, 2022

1.0 Checklist lambdadog/rescript-tree-sitter-mode#1

Open

10 tasks

the-mikedavis mentioned this pull request Aug 8, 2022

auto-indentation issue: extra tabs for some reason (unable to reproduce) #3358

Closed

kirawi mentioned this pull request Dec 22, 2022

Feature Request: Support stack-graphs to provide code nav in the absence of LSP, building on tree-sitter #1252

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indentation rework #1562

Indentation rework #1562

Triton171 commented Jan 22, 2022 •

edited

Loading

sudormrfbin commented Jan 23, 2022

Triton171 commented Jan 23, 2022

pickfire Jan 24, 2022 •

edited

Loading

Triton171 Jan 24, 2022

Triton171 commented Jan 29, 2022

pickfire commented Jan 30, 2022

pickfire commented Jan 31, 2022

archseer Feb 8, 2022

Triton171 Feb 8, 2022

Triton171 commented Feb 10, 2022

archseer commented Feb 10, 2022

Triton171 commented Feb 17, 2022 •

edited

Loading

archseer commented Feb 23, 2022

archseer commented Feb 23, 2022

archseer Feb 25, 2022

archseer Feb 25, 2022

Triton171 Feb 25, 2022

archseer Feb 25, 2022

Triton171 Feb 25, 2022

the-mikedavis left a comment

the-mikedavis Feb 28, 2022

Triton171 Mar 11, 2022

archseer Mar 2, 2022

archseer Mar 2, 2022

archseer Mar 2, 2022

archseer Mar 2, 2022

archseer commented Mar 4, 2022

Triton171 commented Mar 4, 2022

max-privatevoid commented Mar 5, 2022

Triton171 commented Mar 11, 2022

archseer left a comment

- "}"
- "]"
- ")"
+  "}"
+  "]"
+  ")"

Indentation rework #1562

Indentation rework #1562

Conversation

Triton171 commented Jan 22, 2022 • edited Loading

sudormrfbin commented Jan 23, 2022

Triton171 commented Jan 23, 2022

pickfire Jan 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Triton171 commented Jan 29, 2022

pickfire commented Jan 30, 2022

pickfire commented Jan 31, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Triton171 commented Feb 10, 2022

archseer commented Feb 10, 2022

Triton171 commented Feb 17, 2022 • edited Loading

archseer commented Feb 23, 2022

archseer commented Feb 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

the-mikedavis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archseer commented Mar 4, 2022

Triton171 commented Mar 4, 2022

max-privatevoid commented Mar 5, 2022

Triton171 commented Mar 11, 2022

archseer left a comment

Choose a reason for hiding this comment

Triton171 commented Jan 22, 2022 •

edited

Loading

pickfire Jan 24, 2022 •

edited

Loading

Triton171 commented Feb 17, 2022 •

edited

Loading