Agentic Post-Training Framework

Orchestrating LLM alignment through autonomous agent collaboration.

A modular framework where specialized AI agents coordinate to execute post-training pipelines — from technique selection through training, optimization, and evaluation. Agents communicate via a structured message bus, making the entire process observable and debuggable.

Architecture

                    ┌─────────────────┐
                    │   Coordinator   │
                    │     Agent       │
                    └────────┬────────┘
                             │ Message Bus
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼───────┐ ┌───▼──────────┐ ┌─▼──────────────┐
     │  Training Agent │ │ Optimization │ │  Evaluation    │
     │                 │ │    Agent     │ │    Agent       │
     │ PPO,GRPO,DPO,  │ │ Quantization │ │ MMLU,MT-Bench  │
     │ SPO,RLHF,...   │ │ Pruning,     │ │ HumanEval,...  │
     │                 │ │ Distillation │ │                │
     └─────────────────┘ └──────────────┘ └────────────────┘

Supported Techniques

Priority	Technique	Description	Key Advantage
1	PPO	Proximal Policy Optimization	Stable RL with clipped objectives
1	GRPO	Group Relative Policy Opt.	No value model needed (DeepSeek-R1)
1	DPO	Direct Preference Opt.	Simple, no reward model needed
1	SPO	Self-Play Optimization	Iterative self-improvement
1	RLHF	RL from Human Feedback	Full proven pipeline (InstructGPT)
2	KTO	Kahneman-Tversky Opt.	Works with binary feedback
2	ORPO	Odds Ratio Preference Opt.	Combined SFT + alignment
2	RLAIF	RL from AI Feedback	No human labelers needed
2	SPIN	Self-Play Fine-Tuning	Only needs SFT data
2	SimPO	Simple Preference Opt.	Reference-free, length-normalized
3	IPO	Identity Preference Opt.	Regularized DPO variant

Demos

Demo	Platform	Link
Interactive Browser Demo	GitHub Pages	Launch Demo
Full Training on A100	Google Colab
Multi-GPU Training (T4x2)	Kaggle	Open in Kaggle
Agent Communication	Local terminal	`python3 examples/agent_demo.py`

Quick Start

# Install
pip install -e .

# Run the agent demo (no GPU needed)
python3 examples/agent_demo.py

# Run full pipeline
python3 examples/run_pipeline.py --technique grpo --model gpt2 --epochs 3

# Compare techniques
python3 examples/compare_techniques.py --techniques ppo dpo grpo

Python API

import asyncio
from pipeline.config import PipelineConfig
from pipeline.pipeline import AgenticPipeline

config = PipelineConfig(
    technique="grpo",
    model_name="gpt2",
    epochs=3,
    quantization="gptq",
    benchmarks=["mmlu", "mt_bench", "humaneval"],
)

pipeline = AgenticPipeline(config)
results = asyncio.run(pipeline.run())

Pipeline Stages

Data Preparation — Validate and preprocess training data
Technique Selection — Choose optimal technique for the task
Training — Execute post-training with the selected technique
Optimization — Quantize (GPTQ/AWQ/GGUF), prune, or distill
Evaluation — Benchmark on MMLU, MT-Bench, HumanEval, etc.

Agent Communication

Agents communicate via a structured message bus. Watch them coordinate in real-time:

[14:23:01] 🔗 Coordinator → broadcast: Starting stage: training
[14:23:01] 📊 Trainer → Coordinator: Loading model: gpt2
[14:23:02] 📊 Trainer → broadcast: Beginning GRPO training for 3 epochs
[14:23:02] 📊 Trainer → broadcast: Epoch 1/3 | Loss: 1.8432 | Reward: 0.5700
[14:23:03] 📊 Trainer → broadcast: Epoch 2/3 | Loss: 0.9821 | Reward: 0.7900
[14:23:03] ✅ Trainer → Coordinator: Training complete! Final loss: 0.5513
[14:23:04] ⚡ Optimizer → broadcast: Starting GPTQ quantization (4-bit)
[14:23:04] ✅ Optimizer → Coordinator: Quantization complete: 8.0x compression
[14:23:05] 📈 Evaluator → broadcast: Starting evaluation suite: mmlu, mt_bench

Configuration Presets

from pipeline.config import PipelineConfig

# Quick DPO alignment
config = PipelineConfig.preset("quick_dpo")

# Full RLHF pipeline
config = PipelineConfig.preset("full_rlhf")

# GRPO with quantization
config = PipelineConfig.preset("efficient_grpo")

# Production deployment
config = PipelineConfig.preset("production")

Optimization

Method	Compression	Quality	Use Case
GPTQ	8x	~99%	Production inference
AWQ	8x	~99.5%	Best quality at 4-bit
GGUF	8x	~98%	llama.cpp deployment
NF4	8x	~99%	QLoRA training
INT8	4x	~99.9%	Minimal quality loss

Tests

python3 -m pytest tests/ -v

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agents		agents
docs		docs
examples		examples
notebooks		notebooks
optimization		optimization
pipeline		pipeline
techniques		techniques
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Post-Training Framework

Architecture

Supported Techniques

Demos

Quick Start

Python API

Pipeline Stages

Agent Communication

Configuration Presets

Optimization

Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic Post-Training Framework

Architecture

Supported Techniques

Demos

Quick Start

Python API

Pipeline Stages

Agent Communication

Configuration Presets

Optimization

Tests

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages