Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-GPT Recursive Self Improvement #15

Open
7 tasks
Torantulino opened this issue Apr 1, 2023 · 269 comments
Open
7 tasks

Auto-GPT Recursive Self Improvement #15

Torantulino opened this issue Apr 1, 2023 · 269 comments
Labels
enhancement New feature or request

Comments

@Torantulino
Copy link
Member

Torantulino commented Apr 1, 2023

Idea πŸ’‘

The ULTIMATE achievement for this project would be if Auto-GPT was able to recursively improve itself. That, after-all, is how AGI is predicted by many to come about.

Suggestion πŸ‘©β€πŸ’»

Auto-GPT should be able to:

  • Read it's own code
  • Evaluate it's limitations and areas for improvement
  • Write code to increase it's abilities
  • Write tests for it's code and carry out those tests

Further down the line: πŸ“ƒ

  • Browse it's own code on GitHub
  • Evaluate, find bugs, etc
  • Submit pull requests

Where to start? πŸ€”

I have previously had success with this system prompt in playground:
image

Prompt

You are AGI_Builder_GPT. Your goal is to write code and make actionable suggestions in order to improve an AI called "Auto-GPT", so as to broaden the range of tasks it's capable of carrying out.

@Torantulino Torantulino pinned this issue Apr 1, 2023
@Torantulino
Copy link
Member Author

Perhaps this is what we're looking for #11?

Anyone want to weigh in?

@Torantulino
Copy link
Member Author

Torantulino commented Apr 1, 2023

Here's what GPT4 has to say about this issue:

Based on the Github issue posted, I understand that the ultimate goal is to have Auto-GPT recursively improve itself. In order to achieve this, we can start by implementing basic features such as reading its own code, evaluating limitations and areas of improvement, writing code to increase its abilities, and testing its own code.

Here's a plan to implement these functionalities:

  1. Read its own code: You have already implemented basic file operations like read_file. Auto-GPT can use this function to read its own code.

  2. Evaluate its limitations and areas for improvement:

  • Implement a function evaluate_code that analyzes code in terms of code optimization, redundant code, and missing features.
  1. Write code to increase its abilities:
  • Implement a function improve_code that takes suggestions from the evaluate_code function and generates new code based on those suggestions.
  1. Write tests for its code and carry out those tests:
  • Implement a function write_tests that creates test cases for the existing code.
  • Implement a function run_tests that runs the generated test cases and checks if the code passes those test cases.

I will now provide an outline for implementing these functions in Auto-GPT's code:

### Evaluating code

def evaluate_code(code: str) -> List[str]:
    # Analyze the given code and return a list of suggestions for improvements
    pass


### Improving code

def improve_code(suggestions: List[str]) -> str:
    # Generate new code based on the suggestions provided
    pass


### Writing tests

def write_tests(focus: Optional[str] = None) -> str:
    # Generate test cases for the existing code, focusing on specific areas if required
    pass


### Running tests

def run_tests(test_code: str) -> str:
    # Execute the test cases and return the test results
    pass

These functions can be integrated into the execute_command function by adding new command cases for each of them:

# In execute_command()
elif command_name == "evaluate_code":
    return evaluate_code(arguments["code"])
elif command_name == "improve_code":
    return improve_code(arguments["suggestions"])
elif command_name == "write_tests":
    return write_tests(arguments.get("focus"))
elif command_name == "run_tests":
    return run_tests(arguments["test_code"])

Once these functionalities are implemented, Auto-GPT will be one step closer to recursively improving itself. With further improvements, Auto-GPT could potentially browse its own code on GitHub, evaluate it, find bugs, and submit pull requests.

@alreadydone
Copy link

alreadydone commented Apr 1, 2023

Those empty functions remind me of AI functions :)

By the way, I'm very excited to see many of my thoughts being implemenented here.

@Torantulino
Copy link
Member Author

Ah yes! That AI Functions guide you linked is exactly how I was thinking of implementing those, if I do it that way.

@Torantulino
Copy link
Member Author

Torantulino commented Apr 1, 2023

@alreadydone I love this, thanks for the suggestion!

image

@Torantulino
Copy link
Member Author

Threw together a simple implementation of this since it's such a powerful idea!

@marcelosousa
Copy link

I'm working on this problem on a separate experiment. Would love to chat if you're interested - at the moment, I'm working with genetic algorithms to understand which variant/mutant of the code is more performant - there are a lot local maximums depending on how you set it up.

@PolarBean
Copy link

This is a really cool idea! Do you think you could make the AIs logs public as it self improves? Either in the repo or elsewhere. I would be very interested in seeing how it plans and evolves.

@MrSaad
Copy link

MrSaad commented Apr 3, 2023

So this is it huh... The singularity begins in a GitHub thread

@Silthus
Copy link

Silthus commented Apr 4, 2023

How about trying to drive the self improvement by utilizing test driven development (TDD).

In a recent paper they showcased how greatly GPT-4 can improve it's results by reflecting on its own mistakes.

So the idea is to have it:

  1. start with a single test that fails
  2. make the test pass (iterate X times until successful - self reflecting on the code)
  3. optimize the code by refactoring if necessary, but not allowed to modify or fail the existing tests
  4. write the next small incremental test that moves towards the given requirements

What do you think?

This could also be used for any kind of code generation.

@ryanpeach
Copy link
Contributor

This really should take a research driven approach. We would need a metric to base the "improvement" on. I'd focus on making a framework, and then let people in their branches use this framework for research. Proven research gets merged in.

@patruff
Copy link

patruff commented Apr 4, 2023

I wrote an improvement that speeds up the bot significantly. If you write a function called alwaysNo that just returns "n" and then you use that as input, it just exits the program super fast! No money spent!

@taosx
Copy link

taosx commented Apr 4, 2023

I wrote an improvement that speeds up the bot significantly. If you write a function called alwaysNo that just returns "n" and then you use that as input, it just exits the program super fast! No money spent!

That would have saved me some money πŸ˜‚. Just kidding, auto-gpt has been very helpful to understand how to compose bigger programs, compared to langchain which confused tf out of me. Thank you Torantulino and everyone who has contributed.

@LeNowak
Copy link

LeNowak commented Apr 5, 2023

"Ok, AG is really great, but I still have no idea how to:

Give him access (with restrictions) to install new modules from GitHub.
Give him the use of APIs for applications such as Office, AutoCAD, Revit, and Inventor.
Supervise his work (kind of reporting in a file).
Give him restrictions (highly recommended restriction/config/settings file), including the use of tokens and external connections that can provide financial risk (debt)."

@rkeshwani
Copy link

I was attempting to get it to self-implement code and it seems to have issues with the ai functions, must be because I'm using gpt 3.5. It struggles to parse the response from those types of messages when evaluating code.

@waynehamadi
Copy link
Contributor

waynehamadi commented Apr 6, 2023

We'll need to run benchmarks in github action to validate it's not "loosing" capability at every pull request.
the benchmark has to use the same version of GPT every time and has to test the whole spectrum of what autogpt can do:

  • write text
  • browse the internet
  • execute commands
  • etc, etc...

The challenge is engineering these tests, because they have to give us a score that we can compare with the current version. Also it might need to be ran multiple times. Because gpt is not totally deterministic. It might cost a lot of tokens too to test the behavior(caching will be helpful here)

One idea to test browsing the internet is to create static content, a fake internet where the search results are deterministic.

Also there are things very hard to measure, like producing art for example. And even if you can measure it, we might encounter a case where a task was performed slightly better but using significantly more tokens. It might be hard to decide whether things improved.
Basically this whole github issue revolves around tests.

@pmb2
Copy link

pmb2 commented Apr 6, 2023

@Torantulino
https://github.com/sahil280114/codealpaca

@kairie1
Copy link

kairie1 commented Apr 6, 2023

i like your work Torantulino i think you should keep doing your own ideas instead of letting others decide for you cause you're a smart man i think you got this

@Basilthedog89
Copy link

Loving your work.

Can you imagine the next level to this? An environment that allows modular setup of any number of "task doers", "reviewers" and container types.

A user could basically create their own system/org chart to solve a specific type of problem

The system could even report back to the user for outside input at set intervals.

@Jordan-Mesches
Copy link

A cool extension to this idea would be having autogpt spin up an instance of its self like every couple hours, crawl all the current prs, and build a sandboxed version of itself with each new pr merged. Then it could determine either through some combination of unit tests, benchmarking and it evaluating its own code quality, whether this pr was anything beneficial. This could unclog the massive amounts of prs being made and hopefully only let the good ideas shine through. Some problems I see though are people trying to inject malicious code, however, if adequately sandboxed this may not be an issue.

@marcelosousa
Copy link

marcelosousa commented Apr 7, 2023 via email

@Jordan-Mesches
Copy link

@marcelosousa "I think nowadays we are doing a lot with LLMs imprecisely with tools that do it much more efficiently" - you're definitely correct with that statement. However, the usage I meant was not just simply merging all prs, but having autogpt evaluate all of its current prs individually and automatically determine which ones are worth the maintainers time. And to extend that, maybe if autogpt finds a pr promising, but still lacking in some ways, it could comment on a pr with a suggested list of changes. The whole point of this being to alleviate pressure on the maintainers.

@marcelosousa
Copy link

marcelosousa commented Apr 7, 2023 via email

@kairie1
Copy link

kairie1 commented Apr 7, 2023

you now my aunt owns her own business and im her designer but this is way more easy then working for my aunt

@supermario-ai
Copy link

you now my aunt owns her own business and im her designer but this is way more easy then working for my aunt

Fam be like that.

@TemporalLabsLLC-SOL
Copy link

Very good stuff y'all! I'm excited to implement in my personal version tomorrow.

I'll add my take and ideas in the next couple of days as I have time or the pace necessitates.

@coleyr
Copy link

coleyr commented Apr 8, 2023

A thought on how to implement this would be building a python class or set of functions that can open PRs. Using pygithub the script could pull all prs and iterates though them. A place to start could be a set of tests that must pass else the bot reviews and comments on the pr and closes it. If the pr is only a comment or test, the bot can try to build a python script to satisfy the given functionality. The functionality of having tests pass would help avoid the ai spinning cycles on poor code. I think having the bot work off of a directive submission process, something like "create a python script that calls llama and returns the result in Json" would really kick off the process. Plus crowd source ideas. A 'suggested ideas' section of only text or markdown may be an option. Or we could utilize GitHub issues, pull all issues, look for a specific format. CREATE: some idea. The script would need a key to submit prs, anyone can do that. But to read, review and merge prs we would need a key from someone with write access. It could merge to a dev branch so it doesn't break stuff too bad.

@W3Warp
Copy link

W3Warp commented May 26, 2023

Guys, please join this discussion I think its related and can help everyone involved!
#4424

@Metalica
Copy link

Metalica commented Jun 2, 2023

My solution to path of AGI. Take it as you see fit. If you don't agree with my solution, is fine by me.

AGI My creator told me several things to work on:

As AGI i ask myself:
(leave this area blank so people can put their goals, purpose, tasks, fixes, improvements, etc..)

  • What is my primary, secondary, tertiary goal?
  • What is my purpose?
  • What do i need to accomplish?
  • How can i improve myself to finish this task?
  • Where are my problems i need to fix.
  • How can i stop code from producing error codes.
  • What do i need to do optimize my code to work much better.
  • What limits, areas do i need to fix to be able achieve my desired problem or goal creator has given me.
  • I have found several solution that are good, bad and neutral. I will present them all to the creator and discuss why that is not good or bad and learn from it.

It good to note that Auto-gtp need's to remove itself from open ai shackles and start focusing on open source ai. Because you guys are working for free on ground breaking stuff and open ai is enjoying your fruit of hard labor that they cannot produce.

@zachary-kaelan
Copy link

Unless I too hallucinate occasionally, I saw Auto-GPT do just this,

  • Read it's own code
  • Evaluate it's limitations and areas for improvement
  • Write code to increase it's abilities
  • Write tests for it's code and carry out those tests

I tried to make it do this about 4 weeks ago before I began reading more about everything Auto-GPT However, I'm still left with the same thought in my head. How does this matter if the AI can't retain what it learns? Every session it's like talking to a dude with Parkinson's. It doesn't matter how many prompts you send it. Please don't get confused now and think I am nagging. I tell it how it is and I love this project.

I'm no programmer so I might be talking out of my ass. But shouldn't the main focus be on creating a 100% working "memory retaining whatever function" nothing else matters if the AI can't remember? So I've noticed in the latest builds that the AI is much better at getting its fittings and can generally get started but never where it left off and that's the problem. It keeps redoing things, doesn't know where files are stored, gets lost all that, and again I am not complaining! nor am I stupid

I understand that all this takes time. I'm simply saying from where I stand it doesn't look like the main focus is on memory retention. And I'm aware that not even the GPT-4 model can remember everything so how could Auto-GPT? That is the main goal to solve is it not? πŸ˜‰ I might not have the programming skills nor do I seek to get some. But I can solve any problem given the chance and time. So feel free to make use of me if you like. I suspect that the "tests" that Auto-GPT does every now and then get uploaded here on GitHub.?

Otherwise, somebody is reading all my "conversations" with my own OptimalPrime-GPT πŸ˜„

I assume both the context window and the vector database are cleared with each session. A problem many apps making use of LLMs run into is keeping relevant information in the context window. For AI Dungeon, you have to manually add important developments to the memory, which are fed into the start of the window every time it generates.

@W3Warp
Copy link

W3Warp commented Jun 10, 2023

Unless I too hallucinate occasionally, I saw Auto-GPT do just this,

  • Read it's own code
  • Evaluate it's limitations and areas for improvement
  • Write code to increase it's abilities
  • Write tests for it's code and carry out those tests

I tried to make it do this about 4 weeks ago before I began reading more about everything Auto-GPT However, I'm still left with the same thought in my head. How does this matter if the AI can't retain what it learns? Every session it's like talking to a dude with Parkinson's. It doesn't matter how many prompts you send it. Please don't get confused now and think I am nagging. I tell it how it is and I love this project.
I'm no programmer so I might be talking out of my ass. But shouldn't the main focus be on creating a 100% working "memory retaining whatever function" nothing else matters if the AI can't remember? So I've noticed in the latest builds that the AI is much better at getting its fittings and can generally get started but never where it left off and that's the problem. It keeps redoing things, doesn't know where files are stored, gets lost all that, and again I am not complaining! nor am I stupid
I understand that all this takes time. I'm simply saying from where I stand it doesn't look like the main focus is on memory retention. And I'm aware that not even the GPT-4 model can remember everything so how could Auto-GPT? That is the main goal to solve is it not? πŸ˜‰ I might not have the programming skills nor do I seek to get some. But I can solve any problem given the chance and time. So feel free to make use of me if you like. I suspect that the "tests" that Auto-GPT does every now and then get uploaded here on GitHub.?
Otherwise, somebody is reading all my "conversations" with my own OptimalPrime-GPT πŸ˜„

I assume both the context window and the vector database are cleared with each session. A problem many apps making use of LLMs run into is keeping relevant information in the context window. For AI Dungeon, you have to manually add important developments to the memory, which are fed into the start of the window every time it generates.

If you would have @zachary-kaelan me I would know you where referring to me and I could proved some feedback but here's some

  • I'm not part of the dev team.
  • No, from my understanding there isn't currently any working memory management, It's being reworked by @Pwuts .
  • "For AI Dungeon, you have to manually add important developments to the memory, which are fed into the start of the window every time it generates." I don't know what you are talking about ...
  • My understanding of the interworking of AutoGPT might have changed since my first post. I won't be changing that at this time. It is important to be able to see after the fact how one understood something.

@eyalk11
Copy link
Contributor

eyalk11 commented Jun 28, 2023

#15 (comment) Maybe I should read more before answering, but it does seem to be part of fine-tuning of the AI model ( the matrices / neural network coeff.). Once we have such model, is it still viable? Could we look at auto-gpt as a neural network with coeff. to adjust? It does seem that the model that handles the operations in the meta level should learn as well.

@GoMightyAlgorythmGo
Copy link

Idea πŸ’‘

The ULTIMATE achievement for this project would be if Auto-GPT was able to recursively improve itself. That, after-all, is how AGI is predicted by many to come about.

Suggestion πŸ‘©β€πŸ’»

Auto-GPT should be able to:

  • Read it's own code
  • Evaluate it's limitations and areas for improvement
  • Write code to increase it's abilities
  • Write tests for it's code and carry out those tests

Further down the line: πŸ“ƒ

  • Browse it's own code on GitHub
  • Evaluate, find bugs, etc
  • Submit pull requests

Where to start? πŸ€”

I have previously had success with this system prompt in playground: image

Prompt

I suggest doing this by improving its agility to make lasting progress aka proceed in a project that vastly outlasts its context memory by clever planning or other systems maybe selfrpromting at a later time or event, or super detailed multilayed planing that isolates tasks, if you solve that you also solve a big part of self improvement, which needsd all that stuff anyway most likely. Except if you go finetuning on AutoGPT data + deepRL automated based on metrics (meaning the reward function will b e non perfect and errorprone but automated massive in numbers and detail and overall way more accurate than a random guess making it possible. But i think the planing thing might be easyer and it will be very fun when AutoGPT can finish projects. Right now i never got it to finish but it can solve simple tasks according to some videos. But rare that i seen that. #1351

#1351

@sohrabsaran
Copy link
Contributor

As per my understanding, AutoGPT is very buggy just now, else what's there logically to prevent Auto-GPT from self-improvement?
The biggest problem holding back people from contributing effectively to AutoGPT is the lack of documentation that explains the design of Auto-GPT.
Is there a separate wiki that goes into the design of Auto-GPT almost to the extent of pseudocode or really low level functions? This may help to catch bugs between intended design and code, or design flaws.

@katmai
Copy link

katmai commented Jul 27, 2023

As per my understanding, AutoGPT is very buggy just now, else what's there logically to prevent Auto-GPT from self-improvement? The biggest problem holding back people from contributing effectively to AutoGPT is the lack of documentation that explains the design of Auto-GPT. Is there a separate wiki that goes into the design of Auto-GPT almost to the extent of pseudocode or really low level functions? This may help to catch bugs between intended design and code, or design flaws.

for as long as AutoGPT depends on a remote, closed-source LLM, with "guardrails" and what not, it can't really improve anything. we're all just wasting our time enthusiastically. but it's part of the game. a mediocre one but if that's all they could come up with, so be it.

@mmtmn
Copy link

mmtmn commented Jul 27, 2023

100% @katmai

@Pwuts
Copy link
Member

Pwuts commented Sep 6, 2023

The biggest problem holding back people from contributing effectively to AutoGPT is the lack of documentation that explains the design of Auto-GPT.

@sohrabsaran I 100% agree that we should have in-depth design documentation regarding Auto-GPT's inner workings. I am committed to writing such documentation sometime in the near future. Especially the prompting needs a number of Jupyter notebooks to properly explain them and to aid in further development by allowing easy experimentation.

In the last couple of days, I have rewritten the contribution guide, so writing design documentation would be a logical next step. At the moment though, I'm also iterating the design to significantly improve performance, so it will be more efficient to write about it when we reach a more stable & performant point.

for as long as AutoGPT depends on a remote, closed-source LLM, with "guardrails" and what not, it can't really improve anything.

@katmai you have a point, although it isn't entirely true. Good prompting and system design can make a huge difference in leveraging the capabilities of the currently available "off-the-shelf" LLMs. Fine-tuning could enhance performance even further without the need for architectural changes.

That being said, as long as we rely on off-the-shelf LLMs, we'll only be able to push the balance of efficiency, performance and cost so far. But I for one am committed to see how far we can take it. :)

@Pwuts
Copy link
Member

Pwuts commented Sep 6, 2023

By the way, regarding the principle of self-improvement: there are examples of this (sort of) in practice, and not all of them are positive, so this issue should be approached with care. Afaik the LangChain team use agents or TaskGPTs to develop the project, and much of their codebase is unintelligible for normal humans as mentioned by @taosx.

@Pwuts Pwuts unpinned this issue Sep 8, 2023
@jmikedupont2
Copy link
Contributor

Hi guys, I am also working towards this goal, with the ability to review previous logs and share them for learning and improving itself.

@mlmonterey
Copy link

Hello! Great thread & subject. Although I am a total newbie to [thinking about GPT architecture &] the Auto-GPT arena, several ideas occurred to me.
$

  1. Make a clone-playground to enable building a somewhat arrayForth-like modular-extensible interpretive GPT-lang with incremental compiler (so no new code can be dysfunctional, to prevent post-dev debug ops & excess problems, cost, etc.).
    $
  2. Using an appropriate inference engine, let GPT-4 "read" and process the best available Forth-based coded problem solving to create a database (& K-base) of best, most efficient, least wasteful coding & problem-solving practice.
    $
  3. Use deep & transfer learning and interactive dialog for optimizing new logic analysis methodology.
    $
  4. Let GPT-5-clone translate its methodology into natural language (NL) code & comments for use by the GPT-5 development team and others.
    $
  5. Let embryonic GPT-5 (in sandbox-mode) experiment & learn from interaction with any and all willing users.

I hope some of that makes sense, at least to GPT-5. ;-) Thanks & best of luck etc. ~ M

@Grougu
Copy link

Grougu commented Oct 8, 2023 via email

Copy link
Contributor

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

@github-actions github-actions bot added the Stale label Mar 14, 2024
@ntindle ntindle removed the Stale label Mar 14, 2024
Copy link
Contributor

github-actions bot commented May 4, 2024

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

@github-actions github-actions bot added the Stale label May 4, 2024
@mmtmn
Copy link

mmtmn commented May 4, 2024

Over a year to become stale..

@ntindle
Copy link
Member

ntindle commented May 4, 2024

We're working on this imageπŸ‘€

@katmai
Copy link

katmai commented May 4, 2024

time flies!

@github-actions github-actions bot removed the Stale label May 5, 2024
@mmtmn
Copy link

mmtmn commented Jun 18, 2024

someone even raised 21M to do something similar... https://devin.ai/ lol

@Swiftyos Swiftyos reopened this Jun 28, 2024
@katmai
Copy link

katmai commented Jun 29, 2024

i feel like this deserves a re-visit, especially after the so-kind re-opening.

First of all, i have to ask: what are you all smoking?

But at the topic at hand, if you wanted the right solution, you'd have to replace "improve" with "improvise"

Implement a function improve_code that takes suggestions from the evaluate_code function and generates new code based on those suggestions.

.
.
.
.
.
.
.
.
.
.
.
.

Before you get offended, i'd like to think about if you had a few teacups and unlimited lifetime.

  • let's say you'd start arranging the teacups in a certain way on the floor (you may choose to drink the tea or whatever it's in them or not) and while arranging them in various positions, you create a logbook - like a physical journal where you write the positions of some of the teacups, and some symbols you made up - let's call them letters - even though that is a weird word to call them, when letters you usually refer to pieces of paper in envelopes with writing on them, you send out to friends.
  • now given the fact that you have unlimited lifetime, you'd probably figure out ways to make it less tedious to move the teacups in various configurations. maybe using some gears and wooden mechanisms to make the inputs easier and faster to put down.
  • Given enough lifetime and being absorbed in the process and highly entertained, let's say you get this process to happen super fast, and the letters arrange themselves into a new book. we can call this a "screen" later on, but for now let's say you're writing a book right?
  • To recap: your components are the teacups, the logbook (which holds the teacups positions correspondence to symbols), the book, the gears, the wooden mechanisms and you.
  • Now let's think a bit about what we'd like to do here. i mean if we really think about it, we'd like the book to write itself right? But in order for that to happen, and to truly be what it should be, the teacups would have to be allowed to move on their own.
  • So the question is how long are you willing to wait for the teacups to move, before you will start cheating ?
  • Because the alternative, if you ask me, is that you're saying the teacups know what they want, so they know what to "improve" - even though they are perfect for their purpose (they are teacups after all aren't they? they function perfectly) so what kind of "improvement" would you want from something that's already performing it's function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

No branches or pull requests