Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Expanding Fixup Ranges by doing intelligent selection #1083

Closed
wants to merge 8 commits into from

Conversation

arafatkatze
Copy link
Contributor

@arafatkatze arafatkatze commented Sep 16, 2023

Related Issues -> #585 and #223

Problem Summary

Currently the selection made by users is not always fully inclusive of the entire range where changes might need to be made by Cody to satisfy the request of fixup by the user. Because we're limiting the changes to only the region that was selected by the user, that creates quite a few problems:

  1. Empty Selections don't work( shown by @sqs )
image
  1. Variable renaming within a function would miss some parts of the function with the original variable name that wasn't selected
  2. Bad selection boundaries cause issues with indentation with the changes are applied coz the diffs function doesn't work perfectly in some edge cases
  3. Refactors -> If you only selected the return value then any refactor would miss other parts of the function where changes need to be applied
  4. Imperfect selection boundaries sometimes lead to repetition of code outside the selection range and sometimes it doesn't generate outputs
  5. Hard mode Problem: Its hard to do a mega refactor where change request is on a small selection but that edits multiple files and other places in the codebase to make it all work together in synergy.

My solution

I wanted to solve this problem and explore quite a few different ways to be able to solve it.

If we were to not use LLMs then there's three options.

  1. TreeSitter: For TreeSitter, it has different parsers for different languages, and you can probably do the function start, function ending thing with TreeSitter, but I feel that it might be too inconvenient to write and handle a generic code for all these different cases. And then you would also have to import new libraries and stuff. I saw some tree sitter code in issues etc but didn't explore it enough to consider exploring this option.
  2. LanguageServerProtocol: even in the LanguageServerProtocol, I'm pretty sure a similar problem is going to persist. I did not explore it deeply enough.
  3. Using some special kind of codegraph tools from SourceGraph, and based on my searches, I was not able to find any special functions that I could potentially use to be able to expand the the range of my code selection. Probably there is something, and if there is something out there that’s trivial and relatively easier to use, feel free to let me know, but I couldn't find anything in my searches

If we were to use LLms:
The simplest solution was the expand the text selection with prefix suffix and then find all the line numbers of starting/ending of all the functions etc(We can expand this to other cases too like classes interface etc) but for now I wanted to just focus on finding a function within the +-50 lines range. That’s the algorithm I did right now

I am attempting to solve problems 1-5 shown above in the best way that I can by doing an LLM based selection of the closest approximation to a full function for any selection. Sometimes the llm results aren't perfect and in that case this automatically falls back to the original selection without any issues. This is a good first step to solve the problem of automatic range selection.

Single Variable Selection

Before

Original.Jaccard.mp4

After

JaccardMatch.mp4

Partial Selection of Two functions

Before

original.partial.selection.mp4

After

MapSelect.mp4

NOTE: Some code in this is quite hacky and I am not fully happy about it but I wanted to get something out to show something to people first and then I am happy to cleanup the whole thing to make a proper PR(with feature flags too) if people are okay with my approach.

Potential Improvements

  • I could still do some more prompt engineering to reduce the length of the prompts that I am using and still get results which are just as good as they are right now. This could reduce the token usage of the extra LLM call we are adding
  • The extra LLM calls adds a layer of latency I could do some A/B Testing between different LLMs to see which one would be the fastest/cheapest. Something like @abeatrix did here
  • The overwriting in the code structure looks like a bad practice to me and I am happy to fix that to use a much cleaner route so that this code is much more understandable.
  • I only added this to the edit case of fixup but happy to explore if there are perhaps other scenarios of auto selection where something like this can be helpful. Right now I just wanted to show a bare bones example.

Thanks to @abeatrix @jdorfman I was able to focus on an interesting problem for me to work on.

Test plan

Just try this locally and try some ideas that I mentioned in the PR description.

@jdorfman
Copy link
Member

@umpox @abeatrix thoughts?

@arafatkatze
Copy link
Contributor Author

arafatkatze commented Sep 26, 2023

@abeatrix i could add your latest changes into this to make a large range fix up selection(including cases where two functions are selected etc). Happy to spin up a pr for it. Would you like that?

@abeatrix
Copy link
Contributor

abeatrix commented Oct 4, 2023

@abeatrix i could add your latest changes into this to make a large range fix up selection(including cases where two functions are selected etc). Happy to spin up a pr for it. Would you like that?

@arafatkatze Sorry about the wait! Yea that'd be great since it's now merged to main, so I'll be happy to review your change once it's merged with the latest update, thank you!

@arafatkatze
Copy link
Contributor Author

@abeatrix There you go -> #1317

dominiccooney added a commit that referenced this pull request Oct 16, 2023
Original PR -> #1083 (Closed
because @abeatrix added some really amazing folding range functions to
get the resize the selection range)

Related Issues -> #585 and
#223

## Problem Summary
Currently the selection made by users is not always fully inclusive of
the entire range where changes might need to be made by Cody to satisfy
the request of fixup by the user. Because we're limiting the changes to
only the region that was selected by the user, that creates quite a few
problems:
1. Empty Selections don't work( shown by  @sqs ) 
<img width="814" alt="image"
src="https://github.com/sourcegraph/cody/assets/11155207/115da45a-b142-4d40-94f5-baf91f2c8a64">

2. Variable renaming within a function would miss some parts of the
function with the original variable name that wasn't selected
4. Bad selection boundaries cause issues with indentation with the
changes are applied coz the diffs function doesn't work perfectly in
some edge cases
5. Refactors -> If you only selected the return value then any refactor
would miss other parts of the function where changes need to be applied
6. Imperfect selection boundaries sometimes lead to repetition of code
outside the selection range and sometimes it doesn't generate outputs
7. Hard mode Problem: Its hard to do a mega refactor where change
request is on a small selection but that edits multiple files and other
places in the codebase to make it all work together in synergy.

## Original solution
Originally i leveraged an LLM call to decide the folding range and that
was a cool algorithm but it had the latency of an extra LLM call. Now
that @abeatrix added some really cool folding range functions I can just
leverage them to get a better range for the selection.

## Video

### Before


https://github.com/sourcegraph/cody/assets/11155207/53d63ee7-8650-4c03-b935-dfcb64d38c01


https://github.com/sourcegraph/cody/assets/11155207/5ab30335-6c2d-4142-849d-47234d6f4100


## After


https://drive.google.com/file/d/1RVShpNGiWK4wHJW6N_tHEGOdkzSwxM6d/view?usp=sharing


## Test plan

<!-- Required. See
https://docs.sourcegraph.com/dev/background-information/testing_principles.
-->
Tested in my local machine a few times. Works perfectly on edge cases
too.

---------

Co-authored-by: Dominic Cooney <dominic.cooney@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants