-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore whether 'smart code search' (AST, stack graphs, scope graphs) would improve performance beyond the current find/grep based approach. #38
Comments
Hey, this could be a very interesting enhancement. At Blar we are trying to create a traversable graph for agents to use similar to what you suggested. I will be trying to integrate our solution in the following days and get results to see if there is any improvement :) You can check our repo here we open-sourced the project and are currently looking for feedback and use cases. Here's a small demo explaining what we are trying to achieve. Sorry for the self-promotion :) |
@josem7 It would definitely be interesting to hear how the integration goes, and specifically if it does improve things. I watched your demo video + had a quick skim through your repo, and I like the direction you're exploring; definitely sounds like it could be a valuable/useful one. Some of the libs you use may be useful in this project as well, of particular note (off the top of my head):
|
Due to the complexity of implementing and benchmarking this idea (to verify it actually works) this will not be a priority in the near-term, but if anyone implements and shows us numbers on the dev set we would be open to considering it. |
Does not look like its an important functionality, I would recommend to close it. |
This would just add complexity for no good reason. And in the broader perspective, would not be sufficiently helpful. I think this issue should be closed. |
@ofirpress Totally valid/understandable :)
@smith-co Curious what your reasoning/basis for that conclusion is..?
@code2graph Curious what your reasoning/basis for that conclusion is..? This seems to be aligned to how some other agents have chosen to go, eg.
Where they saw it as an improvement on their older method:
I understand it not being a current priority; but to discount the concept entirely (particularly without reasoning beyond seemingly personal opinion) seems counterintuitive to getting the best agent/outcome here. |
Hey, thanks for the feedback @0xdevalias! Yesterday I played around with integrating a graph-based search into the agent (using Blar). The agent managed to use the tool in a good way easily jumping between code to get to the root of the problem. I still need to tweak it a bit so the tool closely resembles the other tools and doesn't affect the workflow as much. After these tweaks, I'll try to run a small-scale benchmark and see if the results look promising. I'm also curious about your point of view @smith-co and @code2graph, it's true that it adds a bit more complexity but it's not much compared to the possible benefits. In my point of view, it's giving the agent the ability to jump between codes in a more precise manner, similar to using CTRL+click on Vscode to find the function being called. For me the benefits are 2:
This comes at the cost of having to create the graph, our current approach is a bit inefficient as it's currently more in a proof-of-concept phase but we'll work on improving it with time. Also currently we use Neo4j as our "search and DB engine", in the future we are looking at ways to save this data in memory for a faster implementation. Let me know what you guys think. I'm always open to receiving feedback and discussing :) PD: What do you guys think of instead of jumping between individual functions, we provide all surrounding functions as context? |
@josem7 Nice! Keen to hear how the tweaked version goes on the benchmark!
This aligns with my view as well. The way I see it, it brings the way the agent is accessing the files even closer to how a real dev would do it in a modern IDE/dev environment; like a more precise/contextually relevant version of string matching in files (and the edge case issues that can bring up)
@josem7 The full code of the surrounding functions? Or just like a summary of the function signatures/similar? And are you defining 'surrounding functions' in terms of literal adjacency within the same code file, or in more of a 'related functions' way (eg. if the function I want is foo(), and that calls bar1()/bar2(), providing all 3 of those to the context)? I don't have a specific answer here, but I guess my questions/potential concerns would be how much extra 'noise' that might add to the context; and also how that would align to/differ from the current 'scroll through file' approach. |
Closing this for now, if anyone works on this and has numbers on SWE-bench and it ends up improving performance, we would love to integrate this into main |
Further to this,
It will be interesting to see if they end up exploring stack graphs directly, and if that improves their performance further again: |
Looking at the current 'search' command that the agent has access to, it seems it just uses
grep
for searches within a file, andfind
for searches by filename:SWE-agent/config/commands/search.sh
Lines 89 to 95 in 7903883
SWE-agent/config/commands/search.sh
Lines 145 to 150 in 7903883
Then the are the other tools that allow files to be opened/viewed/edited/etc:
https://github.com/princeton-nlp/SWE-agent/blob/main/config/commands/defaults.sh
https://github.com/princeton-nlp/SWE-agent/blob/main/config/commands/edit_linting.sh#L49-L68
It would be interesting to see whether giving some more powerful AST / 'smart' code based search functionality would improve the performance of the agent, particularly on larger/more complex codebases; or tasks that require more complex implementations.
One direction that might be interesting to explore with regards to this is 'stack graphs' / 'scope graphs'. I've included a bunch of references/resources I was exploring today below:
The text was updated successfully, but these errors were encountered: