Architectural thoughts/discussion #14

founderblocks-sils · 2023-04-17T11:37:18Z

I've been playing with this right now and we're definitely getting somewhere.

This blog post is a good summary, I'm guessing you know it: https://jina.ai/news/auto-gpt-unmasked-hype-hard-truths-production-pitfalls/

Now, autogpt allows subtasks to be executed at GPT 3.5 turbo, which is a lot faster and cheaper to execute. I believe that with a bit of prompt engineering that is possible at least for solving some subtasks here. (Prompt engineering in this case model specific, as 3.5 sometimes doesn't want to format responses nicely and machine readable. I also believe autogpt has a "json autofix" thing which adds missing parenthesis before parsing.)

Apart from that it would be interesting to explore, how we could reduce the number of needed steps to achieve small, autonomous tasks more quickly and cost effectively.

lgrammel · 2023-04-17T12:10:43Z

First of all, thanks for all the great pull requests and feedback! This is super helpful!

Good article! This is also worth a read: https://huyenchip.com/2023/04/11/llm-engineering.html

Here are some of my thoughts on the topic:

I'm trying to build for GPT-4, because it represents the power I expect in future LLMs (both local and by other providers), while at the same time trying to support as many models as possible (to have a sufficiently generalized framework). I've only added GPT-3.5 because not everyone has GPT-4 access yet. GPT-4 provides vastly superior results, and I expect the GPT-4 cost to come down a lot in the next 12 months (and its speed to increase). In particular for code-related tasks, I would strongly recommend not using GTP-3.5.
The main problem with agents at the moment is that they can't even reliably complete tasks. This problem is a priority for me (compared to efficiency, cost reduction, speed, etc. which will be important later down the road). I expect the first agents to be more like interns that get handed tasks (so speed does not matter as much). Solving those tasks even for a few dollars is still way cheaper than hiring someone. Improving the robustness of agents will require a testing and experimentation framework (a/b testing, maybe some form of non-deterministic unit tests - not sure yet how that could look like), observability (logging agent executions, monitoring failures, potentially a/b testing in production), and probably other tools.
That being said, there are a few easy tricks to improve GPT-3.5 performance. E.g. the JsonActionFormatter could try to parse the first JSON object that appears in the message, since gpt-3.5-turbo tends to put them into different places. Or several results could be returned (say n=3), and the first parseable result is used (increases the cost though). Btw, the JsonFormatter is not a good choice for coding, since code itself can contain JSON and things can get messy. I have a better formatter that uses special characters (not yet in the repo), and XML is another option.

Overall it's still early and I'm trying to get the project into a good shape before OpenAI gives GPT-4 access to everyone.

founderblocks-sils · 2023-04-17T15:30:17Z

I tend to agree with most points. What I found however, is that the speed of GPT 3.5 turbo makes a huge difference for me as a developer and prompt engineer because it allows for much faster experimentation. So even though GPT 4 WILL be the default and fast standard, right now it does come with a hefty penalty making debugging and iterating quite slow.

So this could have a big impact for us as developers of these things.

Also some things are fundamental and model independent, e.g. how can we prompt the model and run tools to save steps (showing available files in case of a file not found is one example) and thus enable us to iterate faster on the tooling and the users to get results more effectively.

lgrammel · 2023-04-18T03:46:51Z

100% on the last part. And GPT 3.5 is supported (plus I'll add support for the OpenAI text models soon).

Personally I have the same sense of rapid feedback and fast experimentation when I use GPT-3.5, but when I reflect on it, I tend to realize that even though it might feel fast and good, it probably is a case of walking faster in the wrong direction and solving soon-to-be-obsolete problems.

Therefore I want to make GPT-4 the primary target and would like to avoid any GPT-3.5 specific optimizations.

lgrammel · 2023-04-18T08:54:27Z

I've added FlexibleJsonActionFormat, which should help a lot with GPT-3.5-turbo. GPT-3.5-turbo agents should work a lot better now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architectural thoughts/discussion #14

Architectural thoughts/discussion #14

founderblocks-sils commented Apr 17, 2023

lgrammel commented Apr 17, 2023 •

edited

Loading

founderblocks-sils commented Apr 17, 2023

lgrammel commented Apr 18, 2023 •

edited

Loading

lgrammel commented Apr 18, 2023 •

edited

Loading

Architectural thoughts/discussion #14

Architectural thoughts/discussion #14

Comments

founderblocks-sils commented Apr 17, 2023

lgrammel commented Apr 17, 2023 • edited Loading

founderblocks-sils commented Apr 17, 2023

lgrammel commented Apr 18, 2023 • edited Loading

lgrammel commented Apr 18, 2023 • edited Loading

lgrammel commented Apr 17, 2023 •

edited

Loading

lgrammel commented Apr 18, 2023 •

edited

Loading

lgrammel commented Apr 18, 2023 •

edited

Loading