Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore whether 'smart code search' (AST, stack graphs, scope graphs) would improve performance beyond the current find/grep based approach. #38

Closed
0xdevalias opened this issue Apr 4, 2024 · 10 comments
Labels
✨ enhancement New feature or request ▼ prio: low Low priority

Comments

@0xdevalias
Copy link
Contributor

Looking at the current 'search' command that the agent has access to, it seems it just uses grep for searches within a file, and find for searches by filename:

# Use grep to directly get the desired formatted output
local matches=$(grep -nH "$search_term" "$file")
# Check if no matches were found
if [ -z "$matches" ]; then
echo "No matches found for \"$search_term\" in $file"
return
fi

local matches=$(find "$dir" -type f -name "$file_name")
# if no matches, return
if [ -z "$matches" ]; then
echo "No matches found for \"$file_name\" in $dir"
return
fi

Then the are the other tools that allow files to be opened/viewed/edited/etc:

https://github.com/princeton-nlp/SWE-agent/blob/main/config/commands/defaults.sh

https://github.com/princeton-nlp/SWE-agent/blob/main/config/commands/edit_linting.sh#L49-L68

It would be interesting to see whether giving some more powerful AST / 'smart' code based search functionality would improve the performance of the agent, particularly on larger/more complex codebases; or tasks that require more complex implementations.

One direction that might be interesting to explore with regards to this is 'stack graphs' / 'scope graphs'. I've included a bunch of references/resources I was exploring today below:

Stack Graphs (an evolution of Scope Graphs) sound like they could be really interesting/useful with regards to code navigation, symbol mapping, etc. Perhaps we could use them for module identification, or variable/function identifier naming stabilisation or similar?

  • https://github.blog/changelog/2024-03-14-precise-code-navigation-for-typescript-projects/
    • Precise code navigation is now available for all TypeScript repositories.
      Precise code navigation gives more accurate results by only considering the set of classes, functions, and imported definitions that are visible at a given point in your code.

      Precise code navigation is powered by the stack graphs framework.
      You can read about how we use stack graphs for code navigation and visit the stack graphs definition for TypeScript to learn more.

      • https://github.blog/2021-12-09-introducing-stack-graphs/
        • Introducing stack graphs

        • Precise code navigation is powered by stack graphs, a new open source framework we’ve created that lets you define the name binding rules for a programming language using a declarative, domain-specific language (DSL). With stack graphs, we can generate code navigation data for a repository without requiring any configuration from the repository owner, and without tapping into a build process or other CI job.

        • LOTS of interesting stuff in this post..
        • As part of developing stack graphs, we’ve added a new graph construction language to Tree-sitter, which lets you construct arbitrary graph structures (including but not limited to stack graphs) from parsed CSTs. You use stanzas to define the gadget of graph nodes and edges that should be created for each occurrence of a Tree-sitter query, and how the newly created nodes and edges should connect to graph content that you’ve already created elsewhere.

        • Why aren’t we using the Language Server Protocol (LSP) or Language Server Index Format (LSIF)?

          To dig even deeper and learn more, I encourage you to check out my Strange Loop talk and the stack-graphs crate: our open source Rust implementation of these ideas.

  • https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github
    • GitHub has developed two code navigation approaches based on the open source tree-sitter and stack-graphs library:

      • Search-based - searches all definitions and references across a repository to find entities with a given name
      • Precise - resolves definitions and references based on the set of classes, functions, and imported definitions at a given point in your code

      To learn more about these approaches, see "Precise and search-based navigation."

      • https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation
        • Precise and search-based navigation
          Certain languages supported by GitHub have access to precise code navigation, which uses an algorithm (based on the open source stack-graphs library) that resolves definitions and references based on the set of classes, functions, and imported definitions that are visible at any given point in your code. Other languages use search-based code navigation, which searches all definitions and references across a repository to find entities with a given name. Both strategies are effective at finding results and both make sure to avoid inappropriate results such as comments, but precise code navigation can give more accurate results, especially when a repository contains multiple methods or functions with the same name.

  • https://pl.ewi.tudelft.nl/research/projects/scope-graphs/
    • Scope Graphs | A Theory of Name Resolution

    • Scope graphs provide a new approach to defining the name binding rules of programming languages. A scope graph represents the name binding facts of a program using the basic concepts of declarations and reference associated with scopes that are connected by edges. Name resolution is defined by searching for paths from references to declarations in a scope graph. Scope graph diagrams provide an illuminating visual notation for explaining the bindings in programs.

Originally posted by @0xdevalias in 0xdevalias/chatgpt-source-watch#11

@josem7
Copy link

josem7 commented Apr 4, 2024

Hey, this could be a very interesting enhancement. At Blar we are trying to create a traversable graph for agents to use similar to what you suggested. I will be trying to integrate our solution in the following days and get results to see if there is any improvement :)

You can check our repo here we open-sourced the project and are currently looking for feedback and use cases.

Here's a small demo explaining what we are trying to achieve.

Sorry for the self-promotion :)

@0xdevalias
Copy link
Contributor Author

At Blar we are trying to create a traversable graph for agents to use similar to what you suggested. I will be trying to integrate our solution in the following days and get results to see if there is any improvement :)

@josem7 It would definitely be interesting to hear how the integration goes, and specifically if it does improve things.

I watched your demo video + had a quick skim through your repo, and I like the direction you're exploring; definitely sounds like it could be a valuable/useful one.

Some of the libs you use may be useful in this project as well, of particular note (off the top of my head):

@ofirpress
Copy link
Member

Due to the complexity of implementing and benchmarking this idea (to verify it actually works) this will not be a priority in the near-term, but if anyone implements and shows us numbers on the dev set we would be open to considering it.

@smith-co
Copy link

smith-co commented Apr 5, 2024

Does not look like its an important functionality, I would recommend to close it.

@code2graph
Copy link

This would just add complexity for no good reason. And in the broader perspective, would not be sufficiently helpful. I think this issue should be closed.

@0xdevalias
Copy link
Contributor Author

0xdevalias commented Apr 5, 2024

Due to the complexity of implementing and benchmarking this idea (to verify it actually works) this will not be a priority in the near-term

@ofirpress Totally valid/understandable :)

Does not look like its an important functionality

@smith-co Curious what your reasoning/basis for that conclusion is..?

This would just add complexity for no good reason. And in the broader perspective, would not be sufficiently helpful.

@code2graph Curious what your reasoning/basis for that conclusion is..?


This seems to be aligned to how some other agents have chosen to go, eg.

Where they saw it as an improvement on their older method:

I understand it not being a current priority; but to discount the concept entirely (particularly without reasoning beyond seemingly personal opinion) seems counterintuitive to getting the best agent/outcome here.

@josem7
Copy link

josem7 commented Apr 5, 2024

Hey, thanks for the feedback @0xdevalias!

Yesterday I played around with integrating a graph-based search into the agent (using Blar). The agent managed to use the tool in a good way easily jumping between code to get to the root of the problem. I still need to tweak it a bit so the tool closely resembles the other tools and doesn't affect the workflow as much. After these tweaks, I'll try to run a small-scale benchmark and see if the results look promising.

I'm also curious about your point of view @smith-co and @code2graph, it's true that it adds a bit more complexity but it's not much compared to the possible benefits. In my point of view, it's giving the agent the ability to jump between codes in a more precise manner, similar to using CTRL+click on Vscode to find the function being called.

For me the benefits are 2:

  1. Better precision when searching for surrounding code, maybe a function is called the same in different files, this way you make sure to access the correct function.
  2. Better context/cost management, instead of skimming through files and functions to find what you need you go directly to the function you are looking for

This comes at the cost of having to create the graph, our current approach is a bit inefficient as it's currently more in a proof-of-concept phase but we'll work on improving it with time. Also currently we use Neo4j as our "search and DB engine", in the future we are looking at ways to save this data in memory for a faster implementation. Let me know what you guys think.

I'm always open to receiving feedback and discussing :)

PD: What do you guys think of instead of jumping between individual functions, we provide all surrounding functions as context?

@0xdevalias
Copy link
Contributor Author

Yesterday I played around with integrating a graph-based search into the agent (using Blar). The agent managed to use the tool in a good way easily jumping between code to get to the root of the problem. I still need to tweak it a bit so the tool closely resembles the other tools and doesn't affect the workflow as much. After these tweaks, I'll try to run a small-scale benchmark and see if the results look promising.

@josem7 Nice! Keen to hear how the tweaked version goes on the benchmark!


In my point of view, it's giving the agent the ability to jump between codes in a more precise manner, similar to using CTRL+click on Vscode to find the function being called.

This aligns with my view as well. The way I see it, it brings the way the agent is accessing the files even closer to how a real dev would do it in a modern IDE/dev environment; like a more precise/contextually relevant version of string matching in files (and the edge case issues that can bring up)


What do you guys think of instead of jumping between individual functions, we provide all surrounding functions as context?

@josem7 The full code of the surrounding functions? Or just like a summary of the function signatures/similar? And are you defining 'surrounding functions' in terms of literal adjacency within the same code file, or in more of a 'related functions' way (eg. if the function I want is foo(), and that calls bar1()/bar2(), providing all 3 of those to the context)?

I don't have a specific answer here, but I guess my questions/potential concerns would be how much extra 'noise' that might add to the context; and also how that would align to/differ from the current 'scroll through file' approach.

@klieret klieret added the ▼ prio: low Low priority label Apr 10, 2024
@ofirpress
Copy link
Member

Closing this for now, if anyone works on this and has numbers on SWE-bench and it ends up improving performance, we would love to integrate this into main

@0xdevalias
Copy link
Contributor Author

0xdevalias commented May 30, 2024

This seems to be aligned to how some other agents have chosen to go, eg.

Where they saw it as an improvement on their older method:

I understand it not being a current priority; but to discount the concept entirely (particularly without reasoning beyond seemingly personal opinion) seems counterintuitive to getting the best agent/outcome here.

Further to this, aider just set a new SOTA and topped the SWE-bench lite leaderboard with 26.3%. While all of that performance gain can't be attributed to just their smart code search/repo map'; I would happily bet that it helped it achieve it:

It will be interesting to see if they end up exploring stack graphs directly, and if that improves their performance further again:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request ▼ prio: low Low priority
Projects
None yet
Development

No branches or pull requests

6 participants