[WIP] Expanding Fixup Ranges by doing intelligent selection #1083

arafatkatze · 2023-09-16T22:15:05Z

Related Issues -> #585 and #223

Problem Summary

Currently the selection made by users is not always fully inclusive of the entire range where changes might need to be made by Cody to satisfy the request of fixup by the user. Because we're limiting the changes to only the region that was selected by the user, that creates quite a few problems:

Empty Selections don't work( shown by @sqs )

Variable renaming within a function would miss some parts of the function with the original variable name that wasn't selected
Bad selection boundaries cause issues with indentation with the changes are applied coz the diffs function doesn't work perfectly in some edge cases
Refactors -> If you only selected the return value then any refactor would miss other parts of the function where changes need to be applied
Imperfect selection boundaries sometimes lead to repetition of code outside the selection range and sometimes it doesn't generate outputs
Hard mode Problem: Its hard to do a mega refactor where change request is on a small selection but that edits multiple files and other places in the codebase to make it all work together in synergy.

My solution

I wanted to solve this problem and explore quite a few different ways to be able to solve it.

If we were to not use LLMs then there's three options.

TreeSitter: For TreeSitter, it has different parsers for different languages, and you can probably do the function start, function ending thing with TreeSitter, but I feel that it might be too inconvenient to write and handle a generic code for all these different cases. And then you would also have to import new libraries and stuff. I saw some tree sitter code in issues etc but didn't explore it enough to consider exploring this option.
LanguageServerProtocol: even in the LanguageServerProtocol, I'm pretty sure a similar problem is going to persist. I did not explore it deeply enough.
Using some special kind of codegraph tools from SourceGraph, and based on my searches, I was not able to find any special functions that I could potentially use to be able to expand the the range of my code selection. Probably there is something, and if there is something out there that’s trivial and relatively easier to use, feel free to let me know, but I couldn't find anything in my searches

If we were to use LLms:
The simplest solution was the expand the text selection with prefix suffix and then find all the line numbers of starting/ending of all the functions etc(We can expand this to other cases too like classes interface etc) but for now I wanted to just focus on finding a function within the +-50 lines range. That’s the algorithm I did right now

I am attempting to solve problems 1-5 shown above in the best way that I can by doing an LLM based selection of the closest approximation to a full function for any selection. Sometimes the llm results aren't perfect and in that case this automatically falls back to the original selection without any issues. This is a good first step to solve the problem of automatic range selection.

Single Variable Selection

Before

Original.Jaccard.mp4

After

JaccardMatch.mp4

Partial Selection of Two functions

Before

original.partial.selection.mp4

After

MapSelect.mp4

NOTE: Some code in this is quite hacky and I am not fully happy about it but I wanted to get something out to show something to people first and then I am happy to cleanup the whole thing to make a proper PR(with feature flags too) if people are okay with my approach.

Potential Improvements

I could still do some more prompt engineering to reduce the length of the prompts that I am using and still get results which are just as good as they are right now. This could reduce the token usage of the extra LLM call we are adding
The extra LLM calls adds a layer of latency I could do some A/B Testing between different LLMs to see which one would be the fastest/cheapest. Something like @abeatrix did here
The overwriting in the code structure looks like a bad practice to me and I am happy to fix that to use a much cleaner route so that this code is much more understandable.
I only added this to the edit case of fixup but happy to explore if there are perhaps other scenarios of auto selection where something like this can be helpful. Right now I just wanted to show a bare bones example.

Thanks to @abeatrix @jdorfman I was able to focus on an interesting problem for me to work on.

Test plan

Just try this locally and try some ideas that I mentioned in the PR description.

jdorfman · 2023-09-26T23:26:25Z

@umpox @abeatrix thoughts?

arafatkatze · 2023-09-26T23:34:01Z

@abeatrix i could add your latest changes into this to make a large range fix up selection(including cases where two functions are selected etc). Happy to spin up a pr for it. Would you like that?

abeatrix · 2023-10-04T16:46:24Z

@abeatrix i could add your latest changes into this to make a large range fix up selection(including cases where two functions are selected etc). Happy to spin up a pr for it. Would you like that?

@arafatkatze Sorry about the wait! Yea that'd be great since it's now merged to main, so I'll be happy to review your change once it's merged with the latest update, thank you!

arafatkatze · 2023-10-04T23:25:08Z

@abeatrix There you go -> #1317

@abeatrix

Original PR -> #1083 (Closed because @abeatrix added some really amazing folding range functions to get the resize the selection range) Related Issues -> #585 and #223 ## Problem Summary Currently the selection made by users is not always fully inclusive of the entire range where changes might need to be made by Cody to satisfy the request of fixup by the user. Because we're limiting the changes to only the region that was selected by the user, that creates quite a few problems: 1. Empty Selections don't work( shown by @sqs ) <img width="814" alt="image" src="https://github.com/sourcegraph/cody/assets/11155207/115da45a-b142-4d40-94f5-baf91f2c8a64"> 2. Variable renaming within a function would miss some parts of the function with the original variable name that wasn't selected 4. Bad selection boundaries cause issues with indentation with the changes are applied coz the diffs function doesn't work perfectly in some edge cases 5. Refactors -> If you only selected the return value then any refactor would miss other parts of the function where changes need to be applied 6. Imperfect selection boundaries sometimes lead to repetition of code outside the selection range and sometimes it doesn't generate outputs 7. Hard mode Problem: Its hard to do a mega refactor where change request is on a small selection but that edits multiple files and other places in the codebase to make it all work together in synergy. ## Original solution Originally i leveraged an LLM call to decide the folding range and that was a cool algorithm but it had the latency of an extra LLM call. Now that @abeatrix added some really cool folding range functions I can just leverage them to get a better range for the selection. ## Video ### Before https://github.com/sourcegraph/cody/assets/11155207/53d63ee7-8650-4c03-b935-dfcb64d38c01 https://github.com/sourcegraph/cody/assets/11155207/5ab30335-6c2d-4142-849d-47234d6f4100 ## After https://drive.google.com/file/d/1RVShpNGiWK4wHJW6N_tHEGOdkzSwxM6d/view?usp=sharing ## Test plan  Tested in my local machine a few times. Works perfectly on edge cases too. --------- Co-authored-by: Dominic Cooney <dominic.cooney@gmail.com>

arafatkatze added 8 commits September 13, 2023 15:37

Adding RangeExpander Function to do simple LLM calls

1f825f0

Added a newRange Function that makes up a new RangeUsing LLMs

9132b29

Adding changes to the fixup thing and making everything better

68a3c29

Final fixing to change the range for edits

b6b956e

Fixing comments

40633e2

Refactoring

4180a18

More Refactoring

52b3788

More Refactoring

b6c6820

abeatrix requested review from toolmantim, umpox and a team September 18, 2023 16:55

arafatkatze mentioned this pull request Sep 20, 2023

feat: new /doc command with smart selection #1116

Merged

3 tasks

jdorfman added the clients/vscode label Sep 20, 2023

This was referenced Sep 23, 2023

[WIP] Handling support for questions that can be answered by linux commands #1148

Closed

Modifying the chat prompts to respond with terminal commands if needed #1152

Closed

arafatkatze closed this Oct 4, 2023

arafatkatze mentioned this pull request Oct 4, 2023

Adding Smart Selection to FixupRecipe #1317

Merged

arafatkatze mentioned this pull request Oct 19, 2023

fix: use selectionRange in edits when available #1429

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Expanding Fixup Ranges by doing intelligent selection #1083

[WIP] Expanding Fixup Ranges by doing intelligent selection #1083

arafatkatze commented Sep 16, 2023 •

edited

Loading

jdorfman commented Sep 26, 2023

arafatkatze commented Sep 26, 2023 •

edited

Loading

abeatrix commented Oct 4, 2023

arafatkatze commented Oct 4, 2023

[WIP] Expanding Fixup Ranges by doing intelligent selection #1083

[WIP] Expanding Fixup Ranges by doing intelligent selection #1083

Conversation

arafatkatze commented Sep 16, 2023 • edited Loading

Problem Summary

My solution

Single Variable Selection

Before

After

Partial Selection of Two functions

Before

After

Potential Improvements

Test plan

jdorfman commented Sep 26, 2023

arafatkatze commented Sep 26, 2023 • edited Loading

abeatrix commented Oct 4, 2023

arafatkatze commented Oct 4, 2023

arafatkatze commented Sep 16, 2023 •

edited

Loading

arafatkatze commented Sep 26, 2023 •

edited

Loading