NightShift

The agent runtime that slashes token spend for autonomous AI research.

AI Scientist costs $15/paper. AI-Researcher burns 2-5M tokens per task. AutoResearch hits context limits by iteration 60.

NightShift fixes the engine, not the agent. Run any AI research pipeline overnight on your laptop. Pay in time, not tokens.

The Problem

Every autonomous research system has the same architecture:

[Read 100 papers] --> [Send everything to LLM] --> [Get output] --> [Repeat]

This is O(n) token cost where n = input size. We found five structural waste patterns across AI Scientist, AI-Researcher, and AutoResearch that account for 60-80% of all token spend:

#	Pattern	Example	Waste
1	History accumulation	Full conversation replayed every turn, never compressed	500K-1.5M tokens/session
2	Redundant context	Same code/templates/fewshot re-sent with every call	2M tokens across 50 ideas
3	Raw data in prompts	500K-2M JSON agent logs dumped into LLM input	95% unnecessary
4	Uniform model routing	Frontier model used for simple JSON queries	10-20x overspend
5	Zero cost awareness	No tracking, no budgets, no ROI measurement	Unoptimizable

NightShift is a runtime layer that sits between your agent and the LLM API, eliminating these patterns automatically.

How It Works

Without NightShift:
  Agent sends 10M tokens to API --> $15/task

With NightShift:
  Agent sends 10M tokens
    --> Progressive compression (heavy local reduction before the API)
    --> Confidence gate (skip API if local models handle it)
    --> Content deduplication (don't re-send what hasn't changed)
    --> Budget-optimal scheduling (spend $ where ROI is highest)
    --> API receives 1.5K tokens --> $1.50/task

Illustrative example; actual reduction depends on the workload. No benchmark numbers are published yet (see Benchmarks).

Core Components

1. Progressive Compression Pipeline

Local small models (sub-300M, run on any laptop) process raw input through stages:

10M tokens  -->  Parse (Granite-Docling 258M)
 2M tokens  -->  Extract (GLiNER 90M)
100K tokens -->  Embed + Cluster (Jina v5 Nano 239M)
 20K tokens -->  Summarize (T5-small 60M)
  4K tokens -->  Rank + Deduplicate (MiniLM 22M)
  1.5K tokens -> API

Each stage is free (local inference). The API only sees the final distillate.

2. Confidence-Gated Router

Not every decision needs the API. The router estimates local confidence before calling out:

High confidence locally (entity extraction, classification, dedup) --> skip API
Low confidence (synthesis, hypothesis generation, strategic decisions) --> call API
Ambiguous (novelty evaluation, quality judgment) --> call API with compressed context

Skips the API entirely for the calls local models can handle confidently.

3. Content Deduplication Layer

Tracks what content has already been sent to the API. Never re-sends:

Static templates, fewshot examples, code that hasn't changed
Conversation history that's already been summarized
Documents that have already been processed

Uses content hashing + diff-based updates.

4. Sliding Window History Manager

Replaces unbounded conversation history with:

Active window: Last N turns in full detail
Compressed archive: Older turns summarized periodically
Persistent knowledge: Facts extracted and stored in vector DB

Prevents the context accumulation that crashes AutoResearch at iteration 60.

5. Token Economics Engine

Every API call is tracked:

{
  "call_id": "synth_042",
  "cost_usd": 0.45,
  "input_tokens": 1847,
  "new_insights": 4,
  "cost_per_insight": 0.112,
  "confidence_before": 0.6,
  "confidence_after": 0.85
}

UCB1 bandit algorithm allocates budget optimally across:

Explore: New research directions (high variance)
Deepen: Expand promising findings (medium variance)
Synthesize: Connect and output (low variance)

6. Persistent Knowledge Graph

Research doesn't start from zero. Cross-session memory:

Facts, entities, and relations persist across runs
New research queries existing knowledge first
Repeated work reuses prior results instead of recomputing from scratch

Architecture

┌──────────────────────────────────────────────────┐
│                   YOUR AGENT                      │
│        (AI Scientist, custom, any agent)          │
└──────────────────────┬───────────────────────────┘
                       │ raw LLM calls
┌──────────────────────▼───────────────────────────┐
│                  NIGHTSHIFT RUNTIME               │
│                                                   │
│  ┌─────────────┐  ┌──────────────┐  ┌─────────┐ │
│  │ Progressive  │  │  Confidence  │  │ Content │ │
│  │ Compression  │  │    Gate      │  │  Dedup  │ │
│  └──────┬──────┘  └──────┬───────┘  └────┬────┘ │
│         └────────────────┼────────────────┘      │
│                          │                        │
│  ┌───────────────────────▼──────────────────────┐│
│  │          Token Economics Engine               ││
│  │  Budget tracker | Bandit scheduler | ROI log  ││
│  └───────────────────────┬──────────────────────┘│
│                          │                        │
│  ┌───────────────────────▼──────────────────────┐│
│  │     Sliding Window History Manager            ││
│  │  Active window | Compressed archive | KG      ││
│  └───────────────────────┬──────────────────────┘│
│                          │                        │
│  ┌───────────────────────▼──────────────────────┐│
│  │     Local Model Pool                          ││
│  │  Load/unload on demand | ONNX/GGUF inference  ││
│  │  Granite-Docling | GLiNER | Jina | T5 | MiniLM││
│  └──────────────────────────────────────────────┘│
└──────────────────────┬───────────────────────────┘
                       │ optimized calls only
┌──────────────────────▼───────────────────────────┐
│                   LLM API                         │
│         (Claude, GPT-4, Gemini, local)            │
└──────────────────────────────────────────────────┘

Benchmarks

Benchmarks are not published yet. A harness under benchmarks/ is planned to compare NightShift against AI Scientist, AI-Researcher, and AutoResearch on tokens and cost per task; results will be added here once those runs are complete. Until then, treat the figures in this README as illustrative of the mechanism, not measured results.

Quick Start

pip install nightshift-runtime

# Wrap any LLM client
from nightshift import NightShift

engine = NightShift(
    local_models="auto",        # downloads sub-300M models on first run
    api_budget="$5.00",         # hard cap
    knowledge_db="./research",  # persistent across sessions
)

# Drop-in replacement for your LLM calls
response = engine.complete(
    messages=messages,
    model="claude-sonnet-4-20250514",
    compress=True,              # enable progressive compression
    gate=True,                  # enable confidence gating
)

# Or wrap an existing agent
engine.wrap(your_agent_function, budget="$10", duration="overnight")

Project Structure

nightshift/
├── nightshift/
│   ├── __init__.py
│   ├── engine.py              # Core runtime orchestrator
│   ├── compression/
│   │   ├── pipeline.py        # Progressive compression stages
│   │   ├── dedup.py           # Content deduplication
│   │   └── models.py          # Local model pool manager
│   ├── routing/
│   │   ├── confidence.py      # Confidence-gated router
│   │   ├── gate.py            # API vs local decision logic
│   │   └── model_select.py    # Task-appropriate model selection
│   ├── history/
│   │   ├── window.py          # Sliding window manager
│   │   ├── compressor.py      # History summarization
│   │   └── knowledge.py       # Persistent knowledge graph
│   ├── economics/
│   │   ├── tracker.py         # Token cost/value tracking
│   │   ├── bandit.py          # UCB1 budget allocation
│   │   └── reports.py         # ROI analysis and logging
│   └── agents/
│       ├── research.py        # Reference research agent
│       ├── pdf.py             # Reference PDF analysis agent
│       └── base.py            # Base agent class
├── tests/
├── benchmarks/
│   ├── vs_ai_scientist.py
│   ├── vs_ai_researcher.py
│   └── vs_autoresearch.py
├── docs/
│   ├── architecture.md
│   ├── waste-analysis.md      # The 5 patterns paper
│   └── token-economics.md     # Formal framework
├── pyproject.toml
├── LICENSE                    # MIT
└── README.md

Research Paper

This project is accompanied by a research paper:

"Token Economics: Cost-Optimal Autonomous AI Research via Progressive Compression and Confidence-Gated Routing"

Key contributions:

Formal framework for token economics in multi-model agent systems
Progressive compression pipeline for large local token reduction before the API
Confidence-gated routing that avoids the API calls local models can handle
UCB1-based budget-optimal exploration with convergence guarantees
A methodology for evaluating research quality per dollar across local and API models

Why "NightShift"

Because the best research happens while you sleep. Set your budget, point it at a problem, go to bed. Wake up to results.

Contributing

NightShift is MIT licensed and open to contributions. See CONTRIBUTING.md.

Built By

Santo -- I build the infrastructure most engineers import.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
docs		docs
nightshift		nightshift
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
research_demo.py		research_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NightShift

The Problem

How It Works

Core Components

1. Progressive Compression Pipeline

2. Confidence-Gated Router

3. Content Deduplication Layer

4. Sliding Window History Manager

5. Token Economics Engine

6. Persistent Knowledge Graph

Architecture

Benchmarks

Quick Start

Project Structure

Research Paper

Why "NightShift"

Contributing

Built By

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NightShift

The Problem

How It Works

Core Components

1. Progressive Compression Pipeline

2. Confidence-Gated Router

3. Content Deduplication Layer

4. Sliding Window History Manager

5. Token Economics Engine

6. Persistent Knowledge Graph

Architecture

Benchmarks

Quick Start

Project Structure

Research Paper

Why "NightShift"

Contributing

Built By

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages