Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support selection range lsp feature #2565

Merged
merged 27 commits into from
Jan 24, 2022
Merged

Conversation

kokobd
Copy link
Collaborator

@kokobd kokobd commented Jan 5, 2022

this will fix #212

TODO

  • use the default structure of HieAST to generate selection ranges
  • point lsp-types version to master in both cabal and stack files
  • add tests
  • optimize the behavior for some special cases?
  • Fix CI (especially for GHC versions other than 8.10.7, which I used during development)
  • update lsp-types to a new version on hackage before this is merged
  • update docs to include this plugin

cabal.project Outdated Show resolved Hide resolved
@kokobd kokobd changed the title WIP: support selection range lsp feature support selection range lsp feature Jan 10, 2022
@kokobd
Copy link
Collaborator Author

kokobd commented Jan 10, 2022

Currently the selection range is generated directly from the structure of HieAST, without considering what the ast node actually is. There are some bad cases:

  1. When selecting a whole import statement, expanding the selection now won't select the whole area of all imports. Instead, the whole module (excluding the module header) is selected.
  2. When selecting everything from a function definition (without signature), expanding the selection now won't select the function's signature and it's definition. Instead, the whole module (excluding the module header) is selected.

For case 1, there isn't a single node in ast containing all import statements. For case 2, signatures and definitions are distinct ast nodes (and actually they are not necessarily be placed nearby in source file)

If needed, special logic can be added to handle cases like this, but will incur extra complexity and compatibility issues (the actual structure of AST in GHC changes between versions?). Not sure whether to leave it as is, or handle the cases specially.

@kokobd
Copy link
Collaborator Author

kokobd commented Jan 15, 2022

Recent progress:

  • move this plugin to a standalone package
  • add a basic test suite
  • apply PositionMapping
  • add the whole import block as a step in expanding/shrinking selection

Future work:

  • add the combination of function signature and its definition as a step in expanding/shrinking selection
  • fix CI

@michaelpj
Copy link
Collaborator

add the combination of function signature and its definition as a step in expanding/shrinking selection

FWIW I think it would be fine to merge a first version without this and then add it later. Up to you when you consider that this has enough features!

I'll try and do a lsp-types release this week so that's sorted...

@kokobd kokobd marked this pull request as ready for review January 20, 2022 03:41
@kokobd
Copy link
Collaborator Author

kokobd commented Jan 20, 2022

Currenly all the planned work is done, except for bumping lsp version. Marking it ready for review and waiting for GitHub actions result now.

@jneira
Copy link
Member

jneira commented Jan 20, 2022

great work, I guess a LSP release would be useful /cc @michaelpj

a small demo exhibiting the feature would be great too

@kokobd
Copy link
Collaborator Author

kokobd commented Jan 20, 2022

A screen record:

selection_range_demo.mov

Copy link
Collaborator

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks pretty good, could just do with some comments where there's cleverness happening. Might be good to get another look from @wz1000 for the various HIE AST manipulations.

go x (y:ys) = if x == y then y:ys else x:y:ys

findSelectionRangesByPositions :: [SelectionRange] -> [Position] -> [SelectionRange]
findSelectionRangesByPositions selectionRanges = fmap findByPosition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd maybe just define the un-fmaped version and fmap it at the use site, but 🤷

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function filters the [SelectionRange], preserving all SelectionRanges containing at least one of the Positions. (will add to comment later)
So, at first glance, I think there might exist better algorithm for this, at least better than the currently implemented simple traversal. That's why I keep it as a standalone function, rather than moving fmap to the caller site.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem quite right (unless I'm reading it wrong). Rather, it goes through all the positions and finds (or creates) a selection range that contain it! So it's more like pickSelectionRangeForPosition, fmaped over positions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelpj Thanks for reading it so carefully. You got it right, the previous description is not accurate. I added more detailed explaination in the comment.

The performance concern makes sense somehow. If someone selects hundreds of positions in a batch edit, for a file containing tens of thousands of leaf nodes, it might become slow quickly. And will block the user's editing.

It is probably too early to optimize for things like that, for the very first version of this plugin. But at least for the possible optimizations I spotted during writing code, I would like to leave proper space for them. i.e. making the related code grouped together.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yes, I hadn't even thought about performance. In theory this is O(n*m) for the two input lists, but TBH I agree that they're almost certainly small and it doesn't matter.

@kokobd
Copy link
Collaborator Author

kokobd commented Jan 22, 2022

Recent update:

  • upgrade lsp-types to 1.4.0.1 (that version contains a bugfix)
  • add some comments to the code
  • return ResponseError incase of failures, instead of simply returning an empty list previously
  • list the plugin in Sphinx doc

Maybe it's ready for another review, to check if things are made right.

Copy link
Collaborator

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of nits, otherwise LGTM. Would still be nice to get a review from someone who knows about the HIE file stuff, but let's not block it on that.

go x (y:ys) = if x == y then y:ys else x:y:ys

findSelectionRangesByPositions :: [SelectionRange] -> [Position] -> [SelectionRange]
findSelectionRangesByPositions selectionRanges = fmap findByPosition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yes, I hadn't even thought about performance. In theory this is O(n*m) for the two input lists, but TBH I agree that they're almost certainly small and it doesn't matter.

@kokobd
Copy link
Collaborator Author

kokobd commented Jan 22, 2022

From a new contributor's perspective, before reaching this point, most of the time was spent on figuring out how to use HieAST a. I implemented the ast manipulating functions through trial and error, so they may have many other bad cases not discovered. Hope it will have better docs in the future.

@jneira
Copy link
Member

jneira commented Jan 22, 2022

From a new contributor's perspective, before reaching this point, most of the time was spent on figuring out how to use HieAST a. I implemented the ast manipulating functions through trial and error, so they may have many other bad cases not discovered. Hope it will have better docs in the future.

Many thanks for the observation, that is a ghc type, right? i wonder what could we do here to improve the knowledge about until ghc get better docs

Copy link
Member

@jneira jneira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, many thanks for including anew one whole feature!

@kokobd
Copy link
Collaborator Author

kokobd commented Jan 23, 2022

Many thanks for the observation, that is a ghc type, right? i wonder what could we do here to improve the knowledge about until ghc get better docs

Oh, it lives in package ghc, but its module name HieTypes seems to suggest this is for the sake of haskell ide engine (the previous name of hls?), so I thought it is in the scope of haskell language server. (correct me if I'm wrong)

This is another topic though, but I think it would be helpful to have these information somewhere (or maybe it exists, but not found by me):

  • Difference between GetParsedModule and GetHieAST. Which one to pick up?
  • The possible values of those FastStrings in HieAST's annotations. (later I knew this string is generated from ghc's actual ast type, like LHsDecl)
  • The mysterious type 'a' in HieAST a. (Type & TypeIndex)

@kokobd
Copy link
Collaborator Author

kokobd commented Jan 23, 2022

CI is green now, maybe it's ready for a merge?

@pepeiborra
Copy link
Collaborator

pepeiborra commented Jan 23, 2022

Many thanks for the observation, that is a ghc type, right? i wonder what could we do here to improve the knowledge about until ghc get better docs

Oh, it lives in package ghc, but its module name HieTypes seems to suggest this is for the sake of haskell ide engine (the previous name of hls?), so I thought it is in the scope of haskell language server. (correct me if I'm wrong)

Ping @wz1000 for this question

  • Difference between GetParsedModule and GetHieAST. Which one to pick up?

That's an interesting question. Definitely prefer GetHieAST when possible:

  • HieAst is designed to replace the need for the parsed AST in Haskell tooling,
  • the HieAst type is less likely to change across GHC versions.
  • useE GetHieAst will return a value even before the GHC session has been setup at initialisation, whereas GetParsedModule needs to wait until the cradle has been setup and the GHC session is available
  • The possible values of those FastStrings in HieAST's annotations. (later I knew this string is generated from ghc's actual ast type, like LHsDecl)
  • The mysterious type 'a' in HieAST a. (Type & TypeIndex)

Ping @wz1000 for those questions.

Copy link
Collaborator

@pepeiborra pepeiborra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more comments

Comment on lines 131 to 135
Performance Tips:
Doing a linear search from the first selection range for each position is not optimal.
If it becomes too slow for a large file and many positions, you may optimize the implementation.
At least we don't need to search from the very beginning for each position, if the are sorted firstly.
Or maybe we could apply some techniques like binary search?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please, do consider at least benchmarking this in a follow-up PR. We do care about performance! Unfortunately the benchmark suite doesn't currently support plugins (it's a benchmark suite for ghcide, not HLS).

I wonder if binary search is really the solution here. I suspect HieAst is isomorphic to a position indexed search tree, so the optimal approach would take advantage of this to find the solution without building a list of all selection ranges.

Copy link
Collaborator Author

@kokobd kokobd Jan 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's another optimization I didn't come up with. So if I get it correctly, there are two known optimizations now. (let n represents the number of selection ranges, m represents the number of positions)

  • For each position, we may treat HieAST as a position indexed tree to search it in O(log(n)).
  • For all positions, a searched position will narrow the search range for other positions. If we search the middle position first, then all positions before this position may only need to search in half of the tree (if both the tree and positions are balanced)

If this looks right, I will update the comment to reflect these thoughts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, at the moment (I think) you build all the possible selection ranges in the document, and then filter them by position, when you could fuse that into one pass and only build the paths which contain the target position or something.

Copy link
Collaborator Author

@kokobd kokobd Jan 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth noting that, as selection ranges have shared parents, building up the list is O(n), where n is the number of nodes in the original AST. astPathsLeafToRoot basically clone the tree and make every pointer reversed. And functions in ASTPreProcess need to traverse through the AST, they are also O(n). So maybe this is not the only performance bottleneck, and we will need furthur benchmarks to make a conclusion.

Copy link
Collaborator

@eddiemundo eddiemundo Jan 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it matters much but I think laziness saves a 1/2 constant factor on average for building the root to leaf paths, if we have few positions. I think this holds true for the preprocessing phase too even with the reverse foldl' but not certain.

So if the preprocessing phase is lazy enough, then I think if we needed to upgrade the algorithm applying it to only astPathsLeafToRoot might suffice.

@kokobd
Copy link
Collaborator Author

kokobd commented Jan 24, 2022

Added another issue #2632 to track the performance optimization

@pepeiborra pepeiborra added the merge me Label to trigger pull request merge label Jan 24, 2022
@kokobd
Copy link
Collaborator Author

kokobd commented Jan 24, 2022

It seems that some recent changes to nix build files in master branch break this. Looking into that.

@mergify mergify bot merged commit 310e6a4 into haskell:master Jan 24, 2022
@michaelpj
Copy link
Collaborator

Thanks again!

@kokobd kokobd deleted the feat/selection-range branch January 25, 2022 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge me Label to trigger pull request merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement selection range request lsp feature
6 participants