AGI Agent

A Rust-based AGI agent system combining memory management, session tracking, reinforcement learning, and multi-agent collaboration.

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                         Python CLI (./agi)                                   │
│                    Simple client - calls Agent API                            │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         Agent API (port 8080)                                │
│                      Rust HTTP Server (Axum)                                 │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────────────┐ │
│  │ Session Manager │  │  Memory Store   │  │      LLM Client            │ │
│  │ (State)        │  │  (SQLite)       │  │      → llama.cpp           │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────────────────┘ │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────────────┐ │
│  │ Reasoning       │  │ RAG Retrieval   │  │    Tool Registry           │ │
│  │ Engine          │  │ (Text Search)  │  │    (search, bash, etc)    │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                   llama.cpp Server (port 8081)                               │
│                     Qwen3.5-4B-Q8_0 Model (symlink)                         │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                    ┌─────────────────┴─────────────────┐
                    ▼                                   ▼
┌───────────────────────────────┐     ┌───────────────────────────────────────┐
│ Qwen3-Embedding-0.6B (8083)  │     │ Qwen3-Reranker-0.6B (8084)            │
│ Vector embeddings for memory   │     │ Optional reranking for search         │
└───────────────────────────────┘     └───────────────────────────────────────┘

Services (Systemd)

Service	Port	Description
`llama-qwen`	8081	Qwen3.5-4B inference (symlink: `~/qwen35-trained-latest.gguf`)
`llama-embedding`	8083	Qwen3-Embedding-0.6B for vector search
`llama-rerank`	8084	Qwen3-Reranker-0.6B for result reranking
`agi-agent`	8080	Main agent API

Quick Start

1. Start All Services

# Start all services
sudo systemctl start llama-qwen
sudo systemctl start llama-embedding
sudo systemctl start llama-rerank
sudo systemctl start agi-agent

# Check status
sudo systemctl status llama-qwen agi-agent

2. Chat with the Agent

./agi chat
# or
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "agent", "messages": [{"role": "user", "content": "What is my name?"}]}'

CLI Commands

./agi chat              # Interactive chat
./agi status            # Check Agent status and memory counts
./agi train             # Trigger RL training (collect → train → deploy)
./agi data [query]      # Query training data from SQLite
./agi research [topic]  # Trigger search learning for a topic
./agi models            # List available trained models

Configuration

Edit agent.toml:

[server]
host = "0.0.0.0"
port = 8080

[model]
base_url = "http://10.10.199.146:8081"  # llama.cpp endpoint
name = "qwen3.5-4b"
embedding_model = "qwen3-embedding"           # Qwen3 embedding model
embedding_base_url = "http://127.0.0.1:8083" # Embedding server
embedding_dim = 768
max_tokens = 4096
temperature = 0.7

[memory]
storage_path = "/home/administrator/.agi/memory"
retrieval_ttl_hours = 24
default_namespace = "retrieval"
min_similarity = 0.6
query_limit = 10

[training]
enabled = true
schedule = "0 2 * * *"  # 2 AM daily
model = "qwen3.5-4b"
output_path = ".agent/models"
epochs = 3
batch_size = 4
learning_rate = 1e-4
lora_rank = 16

[search]
instance = "https://search.butler.ooo"  # SearXNG
timeout = 30

[online_learning]
batch_size = 16
max_buffer_size = 1000
replay_ratio = 0.3
learning_rate = 1e-5

[curiosity]
enabled = true
max_gaps = 50
curiosity_threshold = 0.5
exploration_depth = 2

Memory System

Architecture

The memory system uses a three-tier approach:

┌─────────────────────────────────────────────────────────────┐
│                    Retrieval Namespace                        │
│  Purpose: Short-term context for conversations              │
│  Storage: SQLite                                           │
│  TTL: 24 hours (configurable)                              │
│  Eviction: Move to training namespace                       │
└─────────────────────────────────────────────────────────────┘
                              │
                              │ On TTL expiration
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Training Namespace                        │
│  Purpose: Long-term knowledge for RL training               │
│  Storage: SQLite (persistent)                               │
│  TTL: Never (accumulates over time)                         │
│  Usage: Training examples + RAG context                      │
└─────────────────────────────────────────────────────────────┘

RAG (Retrieval-Augmented Generation)

RAG uses fast SQLite text search to retrieve relevant memories at query time:

// Query both namespaces for relevant memories
let memories = memory_store.search_by_text("training", "name", 5)?;
// Memories are injected into system prompt as context

Features:

Fast text search (no embedding generation at query time)
Searches both retrieval and training namespaces
Filters stop words from query
Returns most recent matching memories

Memory Eviction

When retrieval memories expire (TTL):

Facts → Move to training namespace (keeps knowledge for RL + RAG)
Concepts → Already in training namespace
Conversations → Deleted (no longer needed)

Training Pipeline

Flow

1. Session ends → Session review extracts facts/concepts
2. Memory eviction → Moves facts to training namespace  
3. Nightly training (cron) → Collects from training namespace
4. GRPO/LoRA training → Fine-tunes model
5. Export to GGUF → Merges adapters + exports
6. Deploy → Updates symlink + restarts llama-qwen.service

Manual Training

./agi train

This:

Collects training data from ~/.agi/memory/memories.db
Runs training with unsloth (Qwen3.5-4B)
Exports to GGUF format
Updates ~/qwen35-trained-latest.gguf symlink
Restarts llama-qwen.service

Training Data

Training data is stored in SQLite:

Location: ~/.agi/memory/memories.db
Table: memories
Namespace: training

Query directly:

./agi data jeremiah

API Endpoints

Chat Completions

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "agent",
    "messages": [
      {"role": "user", "content": "What is my name?"}
    ]
  }'

Memory

# List memories by namespace
curl "http://localhost:8080/memories?namespace=training&limit=10"

# Query memories with text search
curl -X POST http://localhost:8080/memories/query \
  -H "Content-Type: application/json" \
  -d '{"query": "jeremiah", "namespace": "training", "limit": 5}'

# Store memory
curl -X POST http://localhost:8080/memories \
  -H "Content-Type: application/json" \
  -d '{"content": "My name is Jeremiah", "tags": ["fact"], "memory_type": "fact"}'

Training

# Trigger training
curl -X POST http://localhost:8080/training/trigger

# Check status
curl http://localhost:8080/training/status

# Batch training
curl -X POST http://localhost:8080/training/batch/collect   # Collect from memory
curl -X POST http://localhost:8080/training/batch/run       # Run training
curl http://localhost:8080/training/batch/export            # Export JSONL
curl -X POST http://localhost:8080/training/batch/clear     # Clear examples

Scheduler

curl http://localhost:8080/scheduler/stats
curl -X POST http://localhost:8080/scheduler/trigger

Systemd Services

llama-qwen.service

Serves the Qwen3.5-4B model via llama.cpp. Uses symlink ~/qwen35-trained-latest.gguf.

llama-embedding.service

Serves Qwen3-Embedding-0.6B for vector embeddings on port 8083.

llama-rerank.service

Serves Qwen3-Reranker-0.6B for result reranking on port 8084.

agi-agent.service

Main agent API on port 8080.

Model Management

Models are stored in .agent/models/:

.agent/models/
├── qwen35-trained-1774719656-Q8_0.gguf  # Latest trained
├── qwen35-trained-1774718908-Q8_0.gguf  # Previous
└── ...

The symlink always points to the latest:

~/qwen35-trained-latest.gguf → .agent/models/qwen35-trained-XXX-Q8_0.gguf

After training, the symlink is updated and llama-qwen.service is restarted.

Directory Structure

/data/jbutler/mule/agent/
├── src/                    # Rust source code
│   ├── agent/              # Core agent logic
│   ├── memory/             # Memory store (SQLite)
│   ├── tools/              # Tool implementations
│   ├── services/           # Background services
│   └── training/           # Training pipeline
├── cli.py                  # Python CLI client
├── agent.toml              # Configuration
├── training_script.py      # Training script (unsloth)
└── .agent/
    └── models/             # Trained model files

~/.agi/
├── memory/memories.db      # SQLite memory database
└── training/examples.jsonl # Training data export

Troubleshooting

Agent not responding

sudo systemctl status agi-agent
curl http://localhost:8080/health

Model not responding

sudo systemctl status llama-qwen
curl http://10.10.199.146:8081/health

RAG not finding memories

# Check training namespace
./agi data jeremiah

# Or query directly
curl -X POST http://localhost:8080/memories/query \
  -d '{"query": "name", "namespace": "training", "limit": 5}'

Training stuck

# Check if training script is running
ps aux | grep training

# Check training examples count
./agi data --count

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
src		src
unsloth_compiled_cache		unsloth_compiled_cache
.gitignore		.gitignore
AGENTS.md		AGENTS.md
BUILD.md		BUILD.md
CLI.md		CLI.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DESIGN.md		DESIGN.md
PLAN.md		PLAN.md
README.md		README.md
SPEC.md		SPEC.md
SUMMARY.md		SUMMARY.md
TRAINING.md		TRAINING.md
agent		agent
agent.toml		agent.toml
agi		agi
build.sh		build.sh
cli		cli
cli.py		cli.py
progress.md		progress.md
training_script.py		training_script.py

Folders and files

Latest commit

History

Repository files navigation

AGI Agent

Architecture

Services (Systemd)

Quick Start

1. Start All Services

2. Chat with the Agent

CLI Commands

Configuration

Memory System

Architecture

RAG (Retrieval-Augmented Generation)

Memory Eviction

Training Pipeline

Flow

Manual Training

Training Data

API Endpoints

Chat Completions

Memory

Training

Scheduler

Systemd Services

llama-qwen.service

llama-embedding.service

llama-rerank.service

agi-agent.service

Model Management

Directory Structure

Troubleshooting

Agent not responding

Model not responding

RAG not finding memories

Training stuck

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages