Skip to content

Autonomous AI agent structured as a dynamic task tree.

Notifications You must be signed in to change notification settings

maxilie/agent-roger

Repository files navigation

Agent Roger (pre-alpha version)

An application for running AI tasks using a dynamic tree structure.

agent-roger-dem-1 agent-roger-dem-2

This repo includes:

  • Dashboard (Vercel app)
  • Task Runner (Docker container)
  • Redis database (Docker container)
  • Weaviate database (Docker container)

🔍 What is Agent Roger?

Agent Roger is an application that allows you to "steer" AI by chaining tasks and sub-tasks in a tree structure.

The dashboard makes it easy to manage tasks that can branch out into sub-tasks and sub-tasks-of-sub-tasks, forming a tree structure. You can explore each tree visually, like a map, to see what steps the AI is taking and to change the parts you dislike without affecting the parts you do like.

Agent Roger's purpose is similar to that of other AI agents. However, there are key differences...

More Info

Serves as an application rather than a framework:

  • This repo contains code to launch a dashboard and task runner process(es) that require particular database setups, which are described below in the "Getting Started" section.

Uses a task tree instead of a queue:

  • AI can delegate to arbitrary sub-tasks, branching out into sub-tasks-of-sub-tasks-of-etc.
  • Independent sub-tasks are run concurrently.

Task tree is dynamic:

  • A single misstep in a sub-task does not necessarily ruin the overall task.
  • A failed sub-task will try to improve until it is successful or has exhausted all reasonable options.
  • User can provide feedback on a sub-task and restart a sub-tree while preserving the rest of the tree's independent logic.

Practically free (excepting cost of inferencing tokens) to get started:

  • As of publishing this, all default database vendors have unreasonably generous free tiers, and offer reasonable pay-as-you-go pricing should you exceed the free tier limits.

Visualization of Data Flow:

  • Interactive, zoomable task tree shows every thought and data point involved in your task.
  • Ability to pause/modify/rerun sub-tasks.

Uses Multi-Shot Prompting:

  • Before generating AI output, the task runner finds examples of similar input & output from previous prompts, and injects them into the new prompt.
  • Multi-shot prompting enables you to "fine-tune" the system without updating the AI model.
  • When a sub-task fails, you can view all of its prompts, modify the responses to your liking, and add them to the injection list to be used when the AI is prompted with similar prompts in the future.

Written in TypeScript:

  • Uses the zod library for type checking, which enables better autocomplete, error handling, bug detection, etc.
  • Enables the developer, the dashboard user, and the AI to be confident that any JSON data has the fields it expects -- even including custom schema generated by the AI or user.
  • NOTE: If you're not using an API like OpenAI's, then you will still need to implement your own inference engine, likely using Python.

Runs orders of magnitude more inferences and logic to execute a single sub-task than do traditional systems:

  • Agent Roger is made for an age when inference is relatively cheap (think 200k tokens/second at $30 USD/hr for a 50B-parameter multi-modal model).
  • This repo provides a starting point for exploring the possibilities of using dynamic, concurrency-friendly task trees.
  • The problem of inference (two problems: fine-tuning models and inferencing them quickly) is left to the intelligent and determined reader.

AI can switch its context between a global memory bank and local, task-specific memory banks.

  • Memory banks are vector databases that store JSON documents and their embeddings.
  • Currently we only store indexes of local files and summaries of previous tasks. Soon we will also store indexes of web content, information that the AI determines is commonly needed, and summaries of task trees.
  • By default, a new memory bank is created for each new root task (user input), and documents are stored to both the new local memory bank and the global memory bank.
    • To save time, the AI will use the global memory bank if you tell it to (using plain english) in the root task's inputFields. For example, inputFields: { "instructions": "Do some task. Use the global memory bank." }.
    • Using the global memory bank is a trade-off: Tasks using the global memory bank will progress quicker as you run more of them, because they will remember how similar tasks were run and will already have the filesystem indexed. However, this could lead to the system remembering outdated prompts and file contents, which could cause the task to fail.
    • For best results, do not tell the system to use the global memory bank.

⚙️ Getting Started

The easiest way to get started is to:

  1. Fork the repo.
  2. Duplicate .env.example to a fresh .env (only in your local environment!).
  3. Fill in the environment variables in .env, using the Setup Details (below) as a reference.
Setup Details

You will need the following (free) infra, each of which can be spun up using vendors' websites:

  • new Vercel app pointing at your forked GitHub repo (vercel.com)
  • new PlanetScale MySQL database (planetscale.com)
  • new Neo4J graph database (neo4j.com/auradb)
  • new Clerk authentication app (clerk.com)
    • create a user, say, adminUser. create an organization called admin set its owner to the admin user.
    • only members of the admin organization will be able to access the dashboard.

Set environment variables:

  • Use .env.example as a template which lists the requried environment variables.
  • For local development, set correct environment variables in your .env.
  • For deployment, set correct environment variables in the Vercel dashboard under Settings -> Environment Variables (you can copy/paste from your .env file).

🪄 Deploying

First, ensure your .env file is correct. Make sure Vercel's environment variables match your .env file.

Dashboard

Push to GitHub to trigger a new Vercel deployment of the dashboard.

To run the dashboard on your local computer:
# install external dependencies
yarn install

# build core packages
yarn run build:core

# START THE DASHBOARD
yarn run start:dashboard # or:  yarn run dev

Vector Database

To start a Weaviate vector database:

yarn run start:vector-db

NOTE: It may be advisable to use a managed vector database if persistence is important to you, or if you are dealing with many documents.

Redis Database

To start a Redis database:

yarn run start:redis

NOTE: The redis database should be located as close to the task runner(s) as possible, as latency is somewhat important. The data does not need to persist.

Local Embeddings Service

To start the local embeddings service (used for embedding prompt injections):

# IMPORTANT: Match the docker container to your chip architecture below!

# if you have an Apple Silicon (M1 or M2) chip
yarn run start:local-embeddings-arm64

# if you have an Intel or other x86 chip
yarn run start:local-embeddings-amd64

NOTE: The embeddings service must be run on the same computer as the task runner.

Task Runner

To start a task runner:

# build a docker image for task runner
yarn run build:task-runner

# run docker container
yarn run start:task-runner

🛞 IDE Setup

NOTE: We use yarn workspaces to configure the monorepo. You might need a Yarn "Editor SDK" in order for your IDE to properly recognize imports:

❓ Troubleshooting

The dashboard visualizer does not work with Brave browser's shields enabled:

  • Specifically, the "block fingerprinting" option disables click functionality for the dashboard's force graph.

If docker fails to build, you may need to change your docker engine settings. Try one of the following solutions:

  • Disable docker buildkit: Go to Docker Desktop -> Settings -> Docker Engine, and set "features": {"buildkit": false}.
  • Disable docker-compose V2: Go to Docker Desktop -> Settings -> General, and uncheck "Use Docker Compose V2".
  • If all else fails, try deleting your docker config: rm ~/.docker/config.json.

To fix file permissions that can arise for various reasons (abusing sudo, copying files from another computer, etc):

# INSTRUCTIONS FOR MAC & LINUX
# FIRST, decide what <DIRECTORY> to fix. Try your desktop ("~/Desktop") directory first.

# Get your <USER>
whoami
# Reset ownership of <DIRECTORY> to <USER>
sudo chown -R <USER>:staff <DIRECTORY>
# Reset ACL of <DIRECTORY>
sudo chmod -R -N <DIRECTORY>
# Reset file permissions of <DIRECTORY>
sudo chmod -R 755 <DIRECTORY>

Inference

The system requires a fine-tuned LLM to be made available for inference via the included inference engine (a simple Redis queue).

For practicality, we multiple versions of the same model:

  • Q3 quantized version when running locally (M1 chip with 32 GB memory)
  • Q4 quantized version when running in the cloud
  • Full version when fine-tuning the model

Getting Started

First, select a quantized model in .gguf format from HuggingFace. Then, you can use it to start the inference engine locally or in a docker container.

Local Inference Engine (Apple Silicon ONLY):

First, download the quantized model. It is recommended to use a version ending with _K_M.gguf.

# Make sure your version of Python is compatible with ARM64 (10x slower if not)
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh
# select the python interpreter for the VS Code project (search: "Python: Select Interpreter". Choose miniforge3/bin/python)
# Make sure cmake is installed on your computer
brew install cmake
# Navigate to a directory where you want to install llama.cpp
cd ~/Desktop
mkdir inference
cd inference
# Install llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake .. -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 
make -j
# Increase "wired memory" limit to 26 GB / 32 GB (on MacOS Sonoma)
sudo sysctl iogpu.wired_limit_mb=26624
# Test llm
cd ~/Desktop/inference/llama.cpp
./build/bin/main --model "/Users/maxilie/Desktop/dolphin-2.6-mixtral-8x7b.Q4_K_M.gguf" -p "<|im_start|>system\nYou are a helpful assistant who can answer any question succinctly but thoroughly and expertly<|im_end|>\n<|im_start|>user\nProve that the square root of 2 is irrational<|im_end|>\n<|im_start|>assistant" -n -2

Parameter "c" is context length. Paramet "n" as -2 means: generate as many tokens as needed until reaching end token or context limit ("c" parameter).

TODO: Make a python script that can be called with "yarn test:llm".

Cloud Inference Engine:

For simplicity, we recommend renting a GPU-accelerated machine from runpod.io.

todo

🧰 Making it Yours

More Details

You can customize the following parts of the agent-roger-core package:

Prompts

  • Located in packages/agent-roger-core/constants/prompts.ts.
  • The most fruitful place to start modifying prompts is the SUGGESTED_APPROACHES variable, which tells the AI what fields to output under what scenarios.

Stage Functions

  • A stage function is a function that a task continuously calls until the stage is ended, at which point the task moves on to the next stage function.
  • Each stage function has access to variables saved by the stage functions before it.
  • Located in packages/agent-roger-core/stage/....

Task Presets

  • A task preset is just a name for a TaskDefinition, or a string key that maps to a TaskDefinition value.
  • A TaskDefinition defines an array of stage function names to run, in order, before passing the task's output to its parent task.
  • Adding an entry to TASK_PRESETS allows the AI to spawn sub-tasks that run your stage functions.
  • Located in packages/agent-roger-core/stage/presets.ts.

JSON Input & Output

The AI can accept any arbitrary JSON fields you provide it, and return JSON values for the named outputFields you request.

Adding New Tools

To give the AI new functionality:

  • Create an index.ts file in a new folder: packages/agent-roger-core/src/stage/task-<custom-task-name>.
    • To keep it simple, you can perform all the task's logic in a single stage function.
    • Create your stage function by following the patterns of existing stage functions, like those in packages/agent-roger-core/src/stage/task-execute-shell/index.ts.
    • Follow the pattern for registering your new stage function(s) and task to the packages/agent-roger-core/stage/presets.ts file:
      • Modify the stageFunctionsToRegister array accordingly.
      • Modify the TASK_PRESETS map accordingly.
      • The "isAbstract" field should almost always be set to false. The system should only have one abstract task available to it (the task preset called "abstract" - see below file), which is responsible for breaking down an abstract task into simpler, more concrete sub-tasks.
  • Add a SuggestedApproach for your new task preset in the packages/agent-roger-core/constants/prompts.ts file, in the variable SUGGESTED_APPROACHES.generateSubTasks.
    • Adding a SuggestedApproach tells the AI that the task preset is available to it, and specifies the input fields it expects.

Modifying the Databases

SQL Schema

To update SQL schema, run yarn workspace agent-roger-core db-push.

To be safe, you should make your changes in a separate git branch, for the repo, and a separate PlanetScale branch, for the SQL database. To use a new PlanetScale branch:

  1. Log in to planetscale.com.
  2. Create a new branch of the main database and copy the credentials into your local .env file.
  3. Change /packages/agent-roger-core/src/db/sql-schema.ts and other files as necessary.
  4. Run yarn workspace agent-roger-core db-push to update the new PlanetScale branch.
  5. Make any changes you need in order for the rest of the code to work with the new schema.
  6. Once everything is working, go to the PlanetScale branch and create a new "deploy request", and then approve the deploy request to update the main branch's schema.
Vector Database

Weaviate powers Agent Roger's context logic:

  • It stores documents as vector embeddings (lists of numbers) that represent semantic meaning.
  • Weaviate seems to be a good solution because it allows for both vector and traditional keyword search, and it can be self-hosted locally on a decent CPU or in the cloud.

Switching to a different vector database:

  • You will need to alter a few components: new environment variables, new Tasks using new Stages for retrieving and storing context, and possibly for embedding vectors (depending on the vector length setting of the database).

About

Autonomous AI agent structured as a dynamic task tree.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published