The pattern-based AI coding assistant that improves through experience.
Stack 2.9 is an open-source AI coding assistant powered by Qwen2.5-Coder-32B, enhanced with Pattern Memory — a system that learns from interactions by storing successful patterns and retrieving them for future tasks.
| Feature | Description |
|---|---|
| Pattern Memory | Stores and retrieves successful coding patterns, becoming more helpful over time |
| Multi-Provider | Works with Ollama, OpenAI, Anthropic, OpenRouter, Together AI |
| 46 Built-in Tools | File ops, git, shell, web search, memory, task planning |
| Voice Integration | Coqui XTTS for voice cloning, STT for voice input |
| 128K Context | Handles large codebases with ease |
| Self-Hosted | Full control, your data stays private |
| MCP Support | Integrates with any Model Context Protocol server |
git clone https://github.com/my-ai-stack/stack-2.9.git
cd stack-2.9
pip install -r requirements.txt# Start interactive chat
python stack.py
# Single query
python stack.py -c "Write a Python function to reverse a string"
# Run evaluation (requires datasets)
python stack.py --eval humaneval --provider ollamaSet environment variables before running:
# For Ollama (local, recommended)
export MODEL_PROVIDER=ollama
export OLLAMA_MODEL=qwen2.5-coder:32b
# For OpenAI
export MODEL_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_MODEL=gpt-4o
# For Together AI (recommended for Qwen)
export MODEL_PROVIDER=together
export TOGETHER_API_KEY=tog-...
export TOGETHER_MODEL=togethercomputer/qwen2.5-coder-32b-instructSee Configuration for all options.
- Architecture: Qwen2.5-Coder-32B (32 billion parameters)
- Fine-tuning: LoRA (Low-Rank Adaptation)
- Context Length: 131,072 tokens
- Quantization: 4-bit AWQ optional for efficient deployment
Stack 2.9 is fine-tuned on a diverse dataset including:
- Pattern Memory Data (5K-10K examples): Successful interaction logs with feedback
- Synthetic Tool Examples (20K+): Generated scenarios covering all 46 tools
- Public Datasets:
- OpenAssistant (coding conversations)
- CodeAct (executable actions)
- CodeContests (competition problems)
- StarCoder Data (permissively licensed code)
All data undergoes:
- Deduplication
- License compatibility check
- Quality filtering (length, validity, success rate)
✅ Allowed:
- AI-assisted coding and code completion
- Code explanation and documentation
- Debugging and error analysis
- Tool-use automation
- Educational purposes
- Research on pattern-based AI
❌ Not Recommended:
- High-stakes production code without human review
- Security-critical applications
- Medical, legal, or financial decision-making
- Generating harmful or malicious code
- Large-scale redistribution without compliance checks
- Hallucinations: May generate incorrect code; always verify with tests
- Security: Can suggest vulnerable code; security review required for production
- Licensing: May reproduce copyrighted snippets; use license checks
- Tool Dependencies: Full functionality requires OpenClaw framework
- Pattern Freshness: Initial deployments have limited pattern library
- HumanEval & MBPP implementations only had 20 problems (1-4% of full benchmarks)
- No proper inference logs exist for claimed numbers
- Tool Use evaluation lacked proper implementation
These scores were unverifiable and have been removed.
| Benchmark | Status | Notes |
|---|---|---|
| HumanEval | Evaluation in progress | Full 164-problem suite |
| MBPP | Evaluation in progress | Full 500-problem suite |
| Tool Use | Benchmark development | Custom tool-calling task |
| GSM8K | Not started | Math reasoning (optional) |
We are rebuilding evaluation infrastructure with proper methodology. See EVALUATION.md for the audit report and plan.
Expected baseline (based on Qwen2.5-Coder-32B):
- HumanEval: ~70-72% Pass@1
- MBPP: ~75-77% Pass@1
Actual fine-tuned results will be published after proper evaluation.
# Interactive chat mode
python stack.py
# Single query
python stack.py -c "Explain this code..."
# Run benchmarks
python stack.py --eval all --provider ollama
# Manage patterns
python stack.py --patterns list
python stack.py --patterns statsfrom stack_cli.agent import create_agent
# Create agent
agent = create_agent()
# Chat
response = agent.process("Write a hello world function")
print(response.content)
# Use tools
result = agent.process("List files in current directory")Stack 2.9 includes 46 built-in tools for:
- File operations (read, write, edit, search, grep, copy, move, delete)
- Git operations (status, commit, push, pull, branch, log, diff)
- Code execution (run, test, lint, format, typecheck, server, install)
- Web (search, fetch, download, check_url, screenshot)
- Memory (recall, save, list, context_load, project_scan)
- Task planning (create_task, list_tasks, update_task, delete_task, create_plan, execute_plan)
See TOOLS.md for complete documentation with examples.
Stack 2.9's Pattern Memory can evolve automatically:
Mine your Git history for patterns:
python scripts/extract_patterns_from_git.py \
--repo-path . \
--output patterns.jsonl \
--since-date "2024-01-01"See docs/pattern-moat.md for details.
Multiple developers can share patterns via a central PostgreSQL + FastAPI service. Schema and API endpoints documented in docs/pattern-moat.md.
Merge LoRA adapters from multiple users with success-rate-weighted averaging:
python scripts/merge_lora_adapters.py \
--adapters adapter_a.safetensors adapter_b.safetensors \
--weights 0.7 0.3 \
--output merged.safetensors| Platform | Notebook | Description |
|---|---|---|
| Google Colab | colab_train_stack29.ipynb |
Free T4 GPU, 3-5 hours |
| Kaggle | kaggle_train_stack29.ipynb |
Free P100 GPU, 2-4 hours |
| Local Mac | train_local.py |
MPS/Apple Silicon |
| Cloud GPUs | See below | RunPod, Vast.ai, etc |
Use the provided notebook for quick prototyping:
# Open in Google Colab
colab_train_stack29.ipynbTrains a 5K-example mini dataset in 3-5 hours on free T4 GPU.
# Prepare data (from your sources)
python scripts/create_mini_dataset.py --size 5000 --output data_mini/train.jsonl
# Train LoRA adapter
cd stack_2_9_training
python -m train_lora --config train_config.yaml
# Merge adapter with base model
python -m merge_adapter --base-model Qwen/Qwen2.5-Coder-32BFor production training on GPUs:
- RunPod:
runpod_deploy.sh— launches A100-80GB instances - Vast.ai:
vastai_deploy.sh— finds cheapest suitable instances - Kubernetes:
k8s/deployment.yaml— deploy to your K8s cluster - Docker:
docker-compose.cloud.yaml— bare-metal GPU servers
See each script for usage instructions.
Extract tool patterns from your codebase to train the model:
# Extract tool patterns
python scripts/extract_rtmp_tools.py
# Create advanced examples
python scripts/extract_rtmp_tools_advanced.pyThis creates data/rtmp-tools/ with tool usage patterns that can be combined with the main training data.
Free GPU training on Kaggle (P100 16GB VRAM):
# Open in Kaggle
kaggle_train_stack29.ipynbFor Apple Silicon Macs without GPU cloud access:
python train_local.pyExtract training data from your RTMP codebase to teach the model your custom tools:
# Extract tool patterns
python scripts/extract_rtmp_tools.py
python scripts/extract_rtmp_tools_advanced.py
# Combined data includes 46+ tool patterns
data/rtmp-tools/combined_tools.jsonlThe combined training data includes:
- 41,807 code completion examples
- 59 RTMP tool usage patterns (BashTool, FileReadTool, Task tools, etc.)
cd stack-2.9-deploy
docker-compose up -d| Platform | Use Case | Documentation |
|---|---|---|
| RunPod | Pay-as-you-go GPU | runpod_deploy.sh |
| Vast.ai | Spot instances (cheap) | vastai_deploy.sh |
| Kubernetes | Enterprise scale | k8s/ directory |
| HuggingFace Spaces | Free inference hosting | docs/free-deployment.md |
Hardware requirements:
- 7B model: RTX 3070 (8GB) minimum
- 32B model: A100-40GB recommended
- Quantized: 4-bit reduces VRAM by ~50%
| Variable | Required | Description |
|---|---|---|
MODEL_PROVIDER |
Yes | ollama, openai, anthropic, openrouter, together |
OPENAI_API_KEY |
If OpenAI | Your OpenAI API key |
ANTHROPIC_API_KEY |
If Anthropic | Your Anthropic API key |
OPENROUTER_API_KEY |
If OpenRouter | Your OpenRouter API key |
TOGETHER_API_KEY |
If Together | Your Together AI API key |
OLLAMA_MODEL |
If Ollama | Model name (e.g., qwen2.5-coder:32b) |
Create stack.yaml in project root:
model:
provider: ollama
name: qwen2.5-coder:32b
temperature: 0.7
training:
lora_rank: 16
learning_rate: 3e-4
epochs: 3
pattern_memory:
enabled: true
max_patterns: 10000
similarity_threshold: 0.75stack-2.9/
├── stack_cli/ # CLI interface & agent
│ ├── cli.py # Main entry point
│ ├── agent.py # AI agent with tools
│ └── context.py # Context management
│
├── stack_2_9_eval/ # Evaluation framework
│ ├── model_client.py # Unified model API
│ └── benchmarks/ # Benchmark implementations
│
├── stack_2_9_training/ # Training scripts
│ ├── train_lora.py # LoRA training
│ ├── merge_adapter.py # Merge LoRA into base
│ └── prepare_data.py # Data preparation
│
├── stack_2_9_deploy/ # Deployment configs
│ ├── docker-compose.yml
│ └── nginx.conf
│
├── scripts/ # Utility scripts
│ ├── extract_patterns_from_git.py
│ ├── merge_lora_adapters.py
│ └── ...
│
├── docs/ # Documentation
│ ├── pattern-moat.md # Pattern memory evolution
│ └── ...
│
├── k8s/ # Kubernetes configs
│ ├── deployment.yaml
│ ├── service.yaml
│ └── secret.yaml
│
├── TOOLS.md # Complete tool reference (46 tools)
├── README.md # This file
├── requirements.txt # Python dependencies
├── stack.yaml # Config (create your own)
└── colab_train_stack29.ipynb # Quick training notebook
Contributions are welcome! Please read CONTRIBUTING.md before submitting PRs.
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open Pull Request
Licensed under the MIT License. See LICENSE for full text.
- Base model: Qwen2.5-Coder-32B (Apache 2.0)
- Training code: HuggingFace Transformers, PEFT, bitsandbytes (Apache 2.0 / BSD)
- Your modifications: MIT
- Qwen for Qwen2.5-Coder base model
- Hugging Face for transformers & PEFT
- Ollama for local inference platform
- Together AI for cloud inference & fine-tuning
Built with ❤️ for developers who want an AI that grows with them