Cursor Triplane RL Stack

This repository implements a Cursor-style tri-plane composed of an FP8 MoE trainer, a Ray-based inference orchestrator, and Firecracker-isolated environment servers that expose a unified tool API.

Architecture Overview

Environment Fleet: envd/server.py provides the gRPC tool surface (read/edit/search/lint/exec) and optional semantic search backed by Qdrant; Firecracker launch scripts in scripts/firecracker/ create snapshot-based microVMs.
Inference: inference/serve.py bootstraps Ray actors (controller, samplers, env clients) to execute parallel tool plans with straggler mitigation and speculative rollouts, with pluggable samplers (stub or OpenAI-compatible vLLM backend) and rollout persistence (JSONL/S3/ClickHouse).
Trainer: trainer/ contains a PPO loop over a lightweight MoE transformer policy plus a DeepSpeed/TransformerEngine FP8 training stack for large-scale runs.

Getting Started

Install dependencies (Python ≥3.10):

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

Generate gRPC bindings:
```
./scripts/gen_protos.sh
```
Run proto-environment server (inside Firecracker VM or locally):
```
python -m envd
```

Demo rollouts (requires Ray runtime):

python experiments/run_inference_demo.py

Dry-run trainer using sample rollouts:
```
./experiments/run_training.sh
```

DeepSpeed MoE trainer (requires NVIDIA + TransformerEngine/DeepSpeed):

deepspeed --num_gpus=8 trainer/train_deepspeed.py --rollouts /data/rollouts.jsonl

Testing & Continuous Integration

Install lightweight test dependencies:
```
pip install -r requirements-test.txt
```
Run the pytest suite:
```
pytest
```
GitHub Actions workflow: .github/workflows/ci.yml executes the same test suite on pushes and pull requests.

Firecracker Workflow

Build the base image and import into Ignite:
```
./scripts/firecracker/build_base.sh
```
Create a snapshot template per task family:
```
./scripts/firecracker/create_template.sh
```

Launch disposable microVMs for batched rollouts:

COUNT=50 SEED_REPO=./seed_repo ./scripts/firecracker/launch_envs.sh

Training Notes

trainer/train.py uses a configurable PPO loop with checkpoint emission every 50 steps.
Swap the placeholder PolicyModel with a Megatron-Core/DeepSpeed MoE transformer initialized with TransformerEngine FP8 kernels to enable MXFP8 microscaling.
Rollout logs (JSONL) should include obs, actions, logprob, return, advantage, and metric payloads; see experiments/sample_rollouts.jsonl for the expected schema.

Roadmap

Integrate semantic code search using a production-grade embedding model.
Replace the stub sampler with a production vLLM deployment and wire in live checkpoints.
Wire reward streaming to an external registry (e.g., ClickHouse + S3 checkpoint sync) for online PPO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cursor Triplane RL Stack

Architecture Overview

Getting Started

Testing & Continuous Integration

Firecracker Workflow

Training Notes

Roadmap

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
envd		envd
experiments		experiments
inference		inference
proto		proto
scripts		scripts
tests		tests
trainer		trainer
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt

evalops/cursor-triplane

Folders and files

Latest commit

History

Repository files navigation

Cursor Triplane RL Stack

Architecture Overview

Getting Started

Testing & Continuous Integration

Firecracker Workflow

Training Notes

Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages