Refinery

Iteratively reach consensus across multiple AI models

Getting Started

As a CLI

cargo install --path crates/refinery_cli

Requires Rust 1.85+.

As a Library

Add to your Cargo.toml

[dependencies]
refinery_core = "0.1"

Quick Start

Set up credentials for at least one provider (see Credentials)

refinery "What are the three most impactful breakthroughs in physics?" \
  --models claude,codex,gemini

Models propose, review, refine, and repeat until consensus.

CLI Usage

Models

Pass models as a comma-separated list

refinery "your prompt" --models claude,gemini

Available models:

Alias	Resolves to
`claude`	claude-opus-4-6, effort: high
`codex`	gpt-5.4, reasoning: xhigh
`gemini`	gemini-3.1-pro-preview

Any claude-* value (e.g., claude-sonnet) is passed through to Anthropic. Any codex-* value (e.g., codex-o3) overrides the underlying model.

Options

Set the convergence threshold

refinery "prompt" --models claude,codex --threshold 9.0

Limit the number of rounds

refinery "prompt" --models claude,codex --max-rounds 3

Set per-call timeout (seconds)

refinery "prompt" --models claude,codex --timeout 180

Limit concurrent API calls

refinery "prompt" --models claude,codex --max-concurrent 4

Output Formats

Output is plain text by default. Get JSON for programmatic use

refinery "prompt" --models claude,codex --output-format json

{
  "status": "converged",
  "winner": { "model_id": "claude-opus-4-6", "answer": "..." },
  "final_round": 2,
  "strategy": "vote-threshold",
  "all_answers": [{ "model_id": "...", "answer": "...", "mean_score": 9.5 }],
  "metadata": { "total_rounds": 2, "total_calls": 12, "elapsed_ms": 45000 }
}

Dry Run

Estimate API call count without running

refinery "prompt" --models claude,codex,gemini --dry-run

File Input

Pass one or more files with -f/--file (repeatable, 1 MB total)

# Files as the subject — no text prompt needed
refinery --file src/auth.rs --file src/crypto.rs --models claude,codex,gemini

# Files with an instruction prompt
refinery "review these for security issues" --file src/auth.rs --file src/lib.rs --models claude,codex

# Combine stdin instruction with a file
echo "what does this do?" | refinery - --file src/main.rs --models claude,gemini

File contents are wrapped in nonce-tagged blocks (<file-{nonce} path="...">) so models know which content came from where. Non-UTF-8 files and files exceeding the 1 MB budget are rejected with a clear error before any API calls are made.

Reading from Stdin

Pipe a prompt from another command (max 1 MB)

cat question.txt | refinery - --models claude,codex

Verbose and Debug

refinery "prompt" --models claude,codex --verbose  # per-round progress
refinery "prompt" --models claude,codex --debug    # raw CLI invocations

Exit Codes

Code	Meaning
0	Converged or single model
1	Error or cancellation
2	Max rounds exceeded
3	Insufficient models
4	Config or input error

Library Usage

Basic

Run the full consensus loop

use std::time::Duration;
use refinery_core::{Engine, EngineConfig, ModelId, VoteThreshold};

let config = EngineConfig::new(
    vec![ModelId::new("model-a"), ModelId::new("model-b")],
    5,    // max_rounds
    8.0,  // threshold
    2,    // stability_rounds
    Duration::from_secs(120),
    0,    // max_concurrent (0 = unlimited)
)?;

let providers = vec![/* your Arc<dyn ModelProvider> instances */];
let strategy = Box::new(VoteThreshold::new(8.0, 2));
let engine = Engine::new(providers, strategy, config);

let outcome = engine.run("What is the capital of France?").await?;
println!("{}: {}", outcome.winner, outcome.answer);

Round-by-Round

Step through rounds for fine-grained control

use refinery_core::ClosingDecision;

let mut session = engine.start("prompt").await?;

loop {
    let round = session.next_round().await?;
    println!("Round {}: {:?}", round.round, round.closing_decision);

    if matches!(round.closing_decision, ClosingDecision::Converged { .. }) {
        break;
    }
}

let outcome = session.finalize();

Inject overrides between rounds

use refinery_core::RoundOverrides;

let overrides = RoundOverrides {
    additional_context: Some("Focus on recent developments".into()),
    ..Default::default()
};
let round = session.next_round_with(overrides).await?;

Custom Providers

Implement the ModelProvider trait

use async_trait::async_trait;
use refinery_core::{ModelId, ModelProvider, Message, ProviderError};

#[derive(Debug)]
struct MyProvider { model_id: ModelId }

#[async_trait]
impl ModelProvider for MyProvider {
    async fn send_message(&self, messages: &[Message]) -> Result<String, ProviderError> {
        // Call your model here
        Ok("response".to_string())
    }

    fn model_id(&self) -> &ModelId {
        &self.model_id
    }
}

Cost Estimation

Estimate API calls without running

let estimate = Engine::estimate(&config);
println!("{} calls/round, {} total", estimate.calls_per_round, estimate.total_calls);

Credentials

Set credentials via environment variables. You need at least one provider.

Copy .env.example to .env and fill in your credentials:

cp .env.example .env

Anthropic (Claude)

API Key (pay-per-use) — set ANTHROPIC_API_KEY:

Create an account at console.anthropic.com
Go to Settings → API Keys
Click Create Key, give it a name, and copy the value

ANTHROPIC_API_KEY=sk-ant-api03-...

Subscription (Claude Pro/Max) — set CLAUDE_CODE_OAUTH_TOKEN:

Install the Claude CLI: npm install -g @anthropic-ai/claude-code
Run claude setup-token and follow the prompts — this generates a long-lived (~1 year) token
Copy the token it outputs

CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-...

OpenAI (Codex)

API Key (pay-per-use) — set OPENAI_API_KEY:

Create an account at platform.openai.com
Go to Dashboard → API Keys (direct link)
Click Create new secret key, give it a name, and copy the value

OPENAI_API_KEY=sk-...

Alternative — set CODEX_API_KEY:

The Codex CLI also accepts CODEX_API_KEY for non-interactive (codex exec) mode. Same key format as OPENAI_API_KEY.

CODEX_API_KEY=sk-...

Google (Gemini)

API Key (Google AI Studio) — set GEMINI_API_KEY:

Go to Google AI Studio
Sign in with your Google account
Click Create API Key, select a Google Cloud project (one will be created if needed), and copy the value

GEMINI_API_KEY=AI...

Alternative (Google Cloud) — set GOOGLE_API_KEY:

If you already have a Google Cloud API key with the Generative Language API enabled, you can use it directly.

GOOGLE_API_KEY=AI...

AWS Bedrock

Coming soon — for accessing Claude and other models via AWS Bedrock.

Google Cloud (Vertex AI)

Coming soon — for accessing Gemini via Vertex AI.

How It Works

ConVerge runs a 4-phase loop until convergence or max rounds:

Propose — each model independently answers the prompt
Evaluate — each model reviews and scores every other model's answer (1–10)
Refine — each model improves its answer based on peer feedback
Close — check if the top-scoring model meets the threshold and has been stable

Models are anonymized during evaluation (shuffled labels A, B, C…) to reduce bias. Self-scores are excluded.

Convergence Criterion

The default VoteThreshold strategy converges when:

The top model's mean score ≥ threshold (default 8.0), and
The same model has led for stability_rounds consecutive rounds (default 2)

Cost per Round

Each round makes N² + N API calls (e.g., 3 models = 12 calls). Use --dry-run to estimate.

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

Report bugs
Fix bugs and submit pull requests
Write, clarify, or fix documentation
Suggest or add new features

To get started with development:

git clone https://github.com/Ligthless-Labs/refinery.git
cd refinery
cargo test --workspace

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
crates		crates
docs		docs
todos		todos
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Refinery

Getting Started

As a CLI

As a Library

Quick Start

CLI Usage

Models

Options

Output Formats

Dry Run

File Input

Reading from Stdin

Verbose and Debug

Exit Codes

Library Usage

Basic

Round-by-Round

Custom Providers

Cost Estimation

Credentials

Anthropic (Claude)

OpenAI (Codex)

Google (Gemini)

AWS Bedrock

Google Cloud (Vertex AI)

How It Works

Convergence Criterion

Cost per Round

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages