This repository implements a Cursor-style tri-plane composed of an FP8 MoE trainer, a Ray-based inference orchestrator, and Firecracker-isolated environment servers that expose a unified tool API.
- Environment Fleet:
envd/server.pyprovides the gRPC tool surface (read/edit/search/lint/exec) and optional semantic search backed by Qdrant; Firecracker launch scripts inscripts/firecracker/create snapshot-based microVMs. - Inference:
inference/serve.pybootstraps Ray actors (controller, samplers, env clients) to execute parallel tool plans with straggler mitigation and speculative rollouts, with pluggable samplers (stub or OpenAI-compatible vLLM backend) and rollout persistence (JSONL/S3/ClickHouse). - Trainer:
trainer/contains a PPO loop over a lightweight MoE transformer policy plus a DeepSpeed/TransformerEngine FP8 training stack for large-scale runs.
- Install dependencies (Python ≥3.10):
python3 -m venv .venv && source .venv/bin/activate pip install -r requirements.txt
- Generate gRPC bindings:
./scripts/gen_protos.sh
- Run proto-environment server (inside Firecracker VM or locally):
python -m envd
- Demo rollouts (requires Ray runtime):
python experiments/run_inference_demo.py
- Dry-run trainer using sample rollouts:
./experiments/run_training.sh
- DeepSpeed MoE trainer (requires NVIDIA + TransformerEngine/DeepSpeed):
deepspeed --num_gpus=8 trainer/train_deepspeed.py --rollouts /data/rollouts.jsonl
- Install lightweight test dependencies:
pip install -r requirements-test.txt
- Run the pytest suite:
pytest
- GitHub Actions workflow:
.github/workflows/ci.ymlexecutes the same test suite on pushes and pull requests.
- Build the base image and import into Ignite:
./scripts/firecracker/build_base.sh
- Create a snapshot template per task family:
./scripts/firecracker/create_template.sh
- Launch disposable microVMs for batched rollouts:
COUNT=50 SEED_REPO=./seed_repo ./scripts/firecracker/launch_envs.sh
trainer/train.pyuses a configurable PPO loop with checkpoint emission every 50 steps.- Swap the placeholder
PolicyModelwith a Megatron-Core/DeepSpeed MoE transformer initialized with TransformerEngine FP8 kernels to enable MXFP8 microscaling. - Rollout logs (JSONL) should include
obs,actions,logprob,return,advantage, and metric payloads; seeexperiments/sample_rollouts.jsonlfor the expected schema.
- Integrate semantic code search using a production-grade embedding model.
- Replace the stub sampler with a production vLLM deployment and wire in live checkpoints.
- Wire reward streaming to an external registry (e.g., ClickHouse + S3 checkpoint sync) for online PPO.