Skip to content

kenhuangus/ASG-SI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ASG-SI: Audited Skill-Graph Self-Improvement

License: MIT

A minimal working prototype of an AI system that learns and improves itself through iterative skill acquisition and verification. This system demonstrates how an agent can bootstrap general capabilities from specific tool use, with rigorous verification ensuring learned skills are actually reliable.

๐ŸŽฏ Core Concept

ASG-SI implements Audited Skill-Graph Self-Improvement - a framework where an AI agent:

  1. Uses tools to solve tasks in a verifiable environment
  2. Learns reusable skills from successful task completions
  3. Verifies skills against held-out test cases
  4. Improves iteratively by reusing verified skills
  5. Maintains audit trails of all learning and verification steps

๐Ÿš€ Key Features

  • Self-Improving Agent: Learns from experience and gets better over time
  • Model Agnostic: Works with any LLM (perfect or imperfect)
  • Verified Learning: All skills are rigorously tested before reuse
  • Auditable: Complete trace of learning decisions and verifications
  • Robust: Graceful degradation when models fail

๐Ÿ“Š Demonstrated Results

The system shows real agent improvement regardless of model quality:

Performance Metrics:
   Success rate: 0.800 โ†’ 0.912 (+11.2% improvement)
   Average reward: 2.288 โ†’ 2.969 (+68.1% increase)
   Skill actions: 0 โ†’ 53 (+53 skill reuse increase)

Action Pattern Evolution:
   Iteration 0: 0/71/9 (skill/tool/direct) - Novice agent
   Iteration 5: 53/19/8 (skill/tool/direct) - Expert agent

๐Ÿ—๏ธ Architecture

Core Components

  1. Tool System: Pure functions (calc, reverse, concat) for verifiable operations
  2. Verifiable Environment: Ground-truth task generation with exact verification
  3. Agent Policy: Decision-making with skill reuse priority
  4. Skill Compiler: Extracts reusable skills from successful trajectories
  5. Verifier-Auditor: Tests skills against held-out cases with audit logging
  6. Memory System: Bounded storage of task outcomes for context

Learning Algorithm

For each iteration:
  1. Agent solves tasks using current skills + model guidance
  2. Compiler extracts successful patterns as skill candidates
  3. Verifier tests candidates against held-out tasks
  4. Verified skills added to skill graph for future reuse
  5. Agent improves by leveraging growing skill library

๐Ÿ’ป Installation & Setup

Prerequisites

  • Python 3.8+
  • Ollama for local LLM serving
  • pip install python-dotenv requests

Quick Start

  1. Clone the repository

    git clone https://github.com/kenhuangus/ASG-SI.git
    cd ASG-SI
  2. Set up environment

    cp .env.example .env
    # Edit .env with your Ollama server details
  3. Start Ollama and pull a model

    ollama serve
    ollama pull qwen2.5:7b  # or any supported model
  4. Run the agent

    python asg_si_demo.py --model

๐ŸŽฎ Usage

Basic Commands

# Run with local model (recommended)
python asg_si_demo.py --model

# Run with stochastic policy only (baseline)
python asg_si_demo.py

Configuration

Edit .env to configure your setup:

# Ollama server location
OLLAMA_BASE_URL=http://localhost:11434

# Model to use
OLLAMA_MODEL=qwen2.5:7b

๐Ÿ”ฌ How Agent Improvement Works

The Challenge

Most AI learning requires perfect teachers. ASG-SI shows that an agent can learn from imperfect guidance by:

  1. Model Imperfection: Agent uses a model with 30% error rate (configurable)
  2. Success Mining: Learns from successful actions despite model errors
  3. Skill Verification: Ensures learned skills are actually reliable
  4. Iterative Improvement: Compounds learning across iterations

Learning Dynamics

  • Iteration 0: Agent relies on imperfect model โ†’ mixed success
  • Skill Compilation: Successful actions become reusable skills
  • Verification: Skills tested against fresh held-out tasks
  • Iteration N: Agent prioritizes verified skills over model consultation
  • Result: Performance improves from experience, not just model quality

Key Insight

Experience compounds: Even with imperfect guidance, successful patterns become verified skills that outperform repeated model consultation.

๐Ÿ“ˆ Performance Analysis

Metrics Tracked

  • Success Rate: Percentage of tasks solved correctly
  • Average Reward: Quality-weighted performance score
  • Action Distribution: Skill reuse vs. direct tool usage vs. fallback
  • Skill Growth: Number of verified, reusable skills
  • Audit Coverage: Verification traces for transparency

Sample Results

ASG-SI Demo Results (Model: True)
============================================================

iter=0  success=0.800  avg_reward=2.288  actions(skill/tool/direct)=0/71/9
iter=1  success=0.887  avg_reward=3.044  actions(skill/tool/direct)=63/16/1
iter=2  success=0.912  avg_reward=3.056  actions(skill/tool/direct)=57/20/3
iter=3  success=0.950  avg_reward=3.206  actions(skill/tool/direct)=63/14/3
iter=4  success=0.938  avg_reward=3.225  actions(skill/tool/direct)=66/14/0
iter=5  success=0.912  avg_reward=2.969  actions(skill/tool/direct)=53/19/8

============================================================
AGENT IMPROVEMENT SUMMARY
============================================================

๐Ÿ“Š Performance Metrics:
   Success rate: 0.800 โ†’ 0.912 (+0.112)
   Average reward: 2.288 โ†’ 2.969 (+0.681)
   Skill actions: 0 โ†’ 53 (+53)
   Direct actions: 9 โ†’ 8

๐ŸŽฏ Action Pattern Evolution:
   Iteration 0: 0/71/9 (skill/tool/direct)
   Iteration 5: 53/19/8 (skill/tool/direct)
   Skill reuse increase: +53 actions

๐Ÿ” Understanding Action Patterns

The skill/tool/direct numbers show agent maturation:

  • skill: Reusing learned, verified skills (optimal)
  • tool: Direct tool calls (model choices + fallbacks)
  • direct: Heuristic attempts (usually due to errors)

Evolution: 0/71/9 โ†’ 53/19/8 demonstrates learning expert patterns.

๐Ÿ› ๏ธ Technical Details

Tool System

# Three verifiable tools
calc({"expr": "5 + 3"})    # โ†’ 8
reverse({"text": "hello"}) # โ†’ "olleh"
concat({"a": "foo", "b": "bar"}) # โ†’ "foobar"

Skill Representation

{
  "type": "TOOL_CALL",
  "tool": "calc",
  "argmap": {"expr": "expr"}  # Maps task inputs to tool args
}

Verification Process

  • Skills tested against 10+ held-out tasks
  • Must achieve 100% pass rate for verification
  • All attempts logged to audit files

๐ŸŽ“ Research Implications

ASG-SI demonstrates:

  1. Bootstrapping Intelligence: Learning general skills from specific tool use
  2. Model Agnostic Learning: Improvement regardless of model quality
  3. Verified Self-Improvement: Auditable learning without human supervision
  4. Robust Adaptation: Graceful handling of imperfect guidance

๐Ÿ“š Related Work

  • AlphaGo/AlphaZero: Self-play learning with perfect simulation
  • Curriculum Learning: Progressive difficulty but requires perfect teachers
  • Meta-Learning: Learning-to-learn but typically supervised
  • Program Synthesis: Generating programs but not self-improving agents

ASG-SI uniquely combines tool use, skill learning, and self-verification in an iterative framework.

๐Ÿค Contributing

This is a research prototype. Key areas for extension:

  • Multi-step skill composition
  • Cross-domain skill transfer
  • Human-in-the-loop verification
  • Distributed skill graphs
  • Advanced skill representations

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ™ Acknowledgments

Inspired by research in AI safety, meta-learning, and program synthesis. Special thanks to the Ollama project for enabling local LLM experimentation.


"The best way to predict the future is to implement it." - David Heinemeier Hansson

This system shows how AI can learn to improve itself, creating a path toward more capable and trustworthy artificial intelligence.

About

Audited Skill-Graph Self-Improvement AI System

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages