CascadeAnnote

A four-layer in-context-learning data annotation pipeline that produces verifiable, on-chain attested labels — built as a single Next.js 15 deployment, ready for Vercel.

CascadeAnnote turns raw text into labeled, attested intelligence. Every annotation is the output of an explicit four-layer pipeline — retrieval, chain-of-thought, self-consistency vote, adaptive fallback — and every result produces a content-addressed receipt that can be anchored to decentralized storage and a public chain. The entire system is implemented in TypeScript and deploys as a single Next.js app.

Why CascadeAnnote

The default workflow for AI annotation is to pipe text into a single model and trust the output. That approach is fragile — it provides no evidence, no calibration, and no recovery when the model is uncertain. CascadeAnnote treats every label as a falsifiable claim.

Evidence-first. Every label is supported by retrieved exemplars and a structured reasoning trace.
Calibrated. Confidence is computed from a self-consistency vote blended with retrieval similarity.
Recoverable. Low-confidence runs trigger a deeper, cooler re-vote with a wider evidence window.
Verifiable. Each result is hashed (sha-256), uploaded to decentralized storage, and tied to an agent identity — making third-party verification a side effect of how the system runs, not a feature you bolt on later.
Provider-agnostic. Local heuristic, OpenAI, or 0G Compute — all behind a single inference interface.

The four-layer cascade

INPUT TEXT
    │
    ▼
L1 ── Dynamic ICL Retrieval ─── TF-IDF + cosine over labeled corpus
    │   top-K labeled exemplars
    ▼
L2 ── Chain-of-Thought Prompt ── 5-step structured reasoning
    │   provider-agnostic prompt
    ▼
L3 ── Self-Consistency Vote ──── 5× inference at temperatures [0.3, 0.7]
    │   confidence ≥ threshold?
    ▼
L4 ── Adaptive Fallback ──────── widen window, cool temperatures, re-vote
    │
    ▼
{ label, confidence, examples, vote, trace, storage_receipt }

Each layer is independently auditable, replaceable, and emits a typed trace recorded in the response.

Layer	Module	Responsibility
L1	`lib/cascade/retriever.ts`	TF-IDF index over uni+bi-grams, cosine top-K retrieval
L2	`lib/cascade/prompt.ts`	Builds a 5-step CoT prompt, exposes a 7-strategy label extractor
L3	`lib/cascade/inference.ts` + `voter.ts`	Local ICL classifier, OpenAI, or 0G Compute; majority vote
L4	`lib/cascade/fallback.ts`	Widens the example window and re-votes at colder temperatures

System architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                            Browser (React 19)                          │
│  /  /annotate  /pipeline  /datasets  /storage  /agent  /results  /docs │
└──────────┬──────────────────────────────────────────────────────────────┘
           │ fetch                                              
┌──────────▼──────────────────────────────────────────────────────────────┐
│                        Next.js 15 API routes                            │
│  /api/annotate    /api/annotate/batch   /api/datasets                   │
│  /api/pipeline/*  /api/results          /api/storage   /api/agent       │
└──────────┬──────────────────────────────────────────────────────────────┘
           │
┌──────────▼──────────────────────────────────────────────────────────────┐
│                        Cascade engine (TypeScript)                      │
│  Retriever ─► PromptBuilder ─► InferenceEngine ─► Voter ─► Fallback     │
└──────────┬──────────────────────────────────────────────────────────────┘
           │
┌──────────▼─────────────┐  ┌──────────────────────┐  ┌──────────────────┐
│  0G Storage (receipts) │  │  0G Compute (opt.)   │  │  0G Chain (DID)  │
└────────────────────────┘  └──────────────────────┘  └──────────────────┘

Single deployment — no separate API server, no database. Stateless serverless functions plus a process-level cache.
Pure TypeScript engine — runs without GPU, transformers, or any heavy ML dependency.
Pluggable providers — swap inference and storage adapters via environment variables.

Quickstart (local)

# 1. Install dependencies
npm install

# 2. Optionally copy the env template (everything is optional)
cp .env.example .env.local

# 3. Run the dev server
npm run dev

Then open http://localhost:3000.

The app boots with a 60-row seed corpus across four label families (sentiment, topic, intent, toxicity). You can immediately:

run a single annotation at /annotate
watch the cascade trace fill in real time
inspect storage receipts at /storage
upload your own corpus at /datasets

Deploying to Vercel

git init
git add .
git commit -m "feat: cascadeannote v1"
git branch -M main
git remote add origin git@github.com:<your-handle>/cascadeannote.git
git push -u origin main

Open https://vercel.com/new.
Import the repository.
The framework preset (Next.js) is auto-detected.
Optionally add the env vars from .env.example in the project settings.
Deploy.

vercel.json already pins iad1 and disables caching on /api/* so receipts always reflect the latest run.

Project structure

.
├── app/                       Next.js App Router
│   ├── api/                   API routes (serverless functions)
│   │   ├── annotate/          POST single + POST batch
│   │   ├── pipeline/          GET status, GET stats
│   │   ├── results/           GET, DELETE
│   │   ├── datasets/          GET, POST
│   │   ├── storage/           GET, POST
│   │   ├── agent/             GET
│   │   └── health/            GET
│   ├── annotate/              Studio page + client component
│   ├── pipeline/              Architecture deep-dive
│   ├── datasets/              CSV upload + activation
│   ├── storage/               Receipt explorer
│   ├── agent/                 Agent identity + reputation
│   ├── results/               Aggregated telemetry
│   ├── docs/                  Documentation
│   └── about/                 Manifesto, principles, stack, roadmap
├── components/                Shared UI primitives
├── lib/
│   ├── cascade/               Pipeline engine (4 layers)
│   ├── og/                    0G Storage / Compute / Agent adapters
│   ├── data/seed.ts           60-example bootstrap corpus
│   ├── store.ts               Process-wide singleton + stats aggregation
│   ├── types.ts               Shared TS types
│   └── utils.ts               Formatting + helpers
├── public/                    Static assets
├── tailwind.config.ts         Design tokens
├── next.config.mjs            Security headers + bundle hints
├── vercel.json                Region pin + API caching policy
└── README.md                  This file

API reference

All routes are JSON in / JSON out. No authentication is required by default.

Method	Path	Description
`POST`	`/api/annotate`	Annotate a single text. Body: `{ text, candidates? }`.
`POST`	`/api/annotate/batch`	Annotate up to 50 texts in one call.
`GET`	`/api/pipeline/status`	Returns corpus size, label set, and active config.
`GET`	`/api/pipeline/stats`	Aggregated metrics (avg confidence, latency, fallback rate).
`GET`	`/api/results?limit=25`	Recent annotation results, newest first.
`DELETE`	`/api/results`	Clears the in-memory result history.
`GET`	`/api/datasets`	List active and uploaded corpora.
`POST`	`/api/datasets`	Upload a corpus and (optionally) activate it.
`GET`	`/api/storage`	List storage receipts.
`POST`	`/api/storage`	Upload an arbitrary JSON payload to storage.
`GET`	`/api/agent`	Returns the deployment's agent DID, public key, reputation.
`GET`	`/api/health`	Health probe with version and 0G connectivity.

Example: annotate a single text

curl -X POST http://localhost:3000/api/annotate \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Best purchase I have made all year, highly recommend.",
    "candidates": ["positive", "neutral", "negative"]
  }'

Response (truncated):

{
  "id": "abc123def456",
  "text": "Best purchase I have made all year, highly recommend.",
  "label": "positive",
  "confidence": 0.93,
  "fallbackUsed": false,
  "vote": { "label": "positive", "confidence": 0.93, "votes": { "positive": 5 }, "runs": 5 },
  "examples": [{ "id": "s2", "text": "...", "label": "positive", "similarity": 0.71 }],
  "trace": [/* 4 entries, one per layer */],
  "storage": { "provider": "local", "rootHash": "0x…", "txHash": "0x…", "size": 4128 }
}

Configuration

Every environment variable is optional. With no env at all, the app runs in deterministic local mode and remains fully functional.

Key	Purpose
`NEXT_PUBLIC_OG_NETWORK`	Network for explorer link (`mainnet` \| `testnet` \| `local`).
`OG_RPC_URL`	0G Chain JSON-RPC endpoint.
`OG_INDEXER_URL`	0G Storage indexer URL — enables real uploads.
`OG_PRIVATE_KEY`	Signer for storage uploads (optional).
`OG_CONTRACT_ADDRESS`	Annotation registry contract for receipts.
`OG_AGENT_ID`	Override the auto-derived agent DID.
`INFERENCE_PROVIDER`	`openai` or `og-compute` (empty = local classifier).
`OPENAI_API_KEY`	Used when `INFERENCE_PROVIDER=openai`.
`OPENAI_MODEL`	Defaults to `gpt-4o-mini`.
`ANNOTATION_TOP_K`	Number of retrieved exemplars (default `5`).
`ANNOTATION_VOTE_RUNS`	Self-consistency runs (default `5`).
`ANNOTATION_FALLBACK_THRESHOLD`	Confidence floor before L4 engages (default `0.6`).

Decentralized stack integration

CascadeAnnote ships with first-class adapters for the 0G modular infrastructure:

0G Storage (lib/og/storage.ts) — every annotation is sha-256 content-addressed, then uploaded to the configured indexer (OG_INDEXER_URL). The response carries a verifiable rootHash, a txHash, and an explorer URL when available. Without keys, the adapter falls back to a deterministic local receipt so the UX is identical in development.
0G Compute (lib/og/compute.ts) — when INFERENCE_PROVIDER=og-compute, Layer-3 inference is routed through a 0G Compute endpoint instead of the local classifier or OpenAI.
0G Chain (lib/og/agent.ts) — derives a stable did:0g:<hash> agent identity per deployment. Receipts can be anchored under this DID via OG_CONTRACT_ADDRESS.
Agent ID — the agent's reputation is computed from the rolling mean confidence of its annotations. Public via GET /api/agent.

The adapters are intentionally thin and pluggable: implement a different decentralized provider by swapping the four files in lib/og/.

Bringing your own data

The retriever is deterministic and rebuilds in milliseconds. Upload a labeled CSV at /datasets:

text,label
"This product changed my workflow",positive
"Awful quality, would not recommend",negative
"It is okay, nothing special",neutral

Or call the API directly:

curl -X POST http://localhost:3000/api/datasets \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Reviews v1",
    "rows": [
      { "text": "loved it", "label": "positive" },
      { "text": "broke immediately", "label": "negative" }
    ],
    "activate": true
  }'

The studio at /annotate automatically picks up the new label set on the next request.

Inference providers

Provider	When to use	Setup
`local` (default)	Demos, judges, no-secrets deployments. Always available.	None
`openai`	Production runs requiring strong reasoning.	`INFERENCE_PROVIDER=openai`, `OPENAI_API_KEY=…`
`og-compute`	Verifiable on-network inference.	`INFERENCE_PROVIDER=og-compute`, `OG_INDEXER_URL=…`

The local provider is a temperature-perturbed cosine classifier: it scores each candidate label by the average similarity of its top supporting exemplars and softmax-samples a winner under the requested temperature. It is deterministic per (query, temperature) seed, runs in under a millisecond, and requires zero external services.

Frontend pages

Route	Purpose
`/`	Hero, pipeline diagram, capabilities, decentralized-stack integration.
`/annotate`	Live studio. Input box, full cascade trace, vote breakdown, receipt.
`/pipeline`	Per-layer architecture deep-dive with input/output contracts.
`/datasets`	Upload a corpus, activate it, browse uploaded corpora.
`/storage`	Receipt explorer with root hash, tx hash, explorer link.
`/agent`	Agent identity, public key, reputation, run count.
`/results`	Aggregated telemetry — confidence, latency, fallback rate, distribution.
`/docs`	Full API reference, configuration, deployment instructions.
`/about`	Manifesto, design principles, stack, roadmap.

Design system

Typography. Inter for UI, JetBrains Mono for code/labels, Instrument Serif for display headlines.
Palette. Ink (#05050A → #262633), accent mint (#00FFB2), plum (#B388FF), amber (#FFB547), rose (#FF4D6D).
Surfaces. Glass cards over a dot grid + radial gradient mesh. Sub-pixel hairline dividers.
Motion. Framer Motion for trace fills and result transitions; Tailwind keyframes for shimmer and float.
Iconography. Lucide.

Tokens live in tailwind.config.ts and global tokens in app/globals.css.

Roadmap

Bring-your-own embeddings (OpenAI, Cohere, local).
Active learning loop — surface low-confidence runs for next-to-label.
Multi-agent voting across multiple DIDs.
Sealed inference adapter for confidential annotation.
Per-result diff viewer + corpus drift detection.

License

MIT. See LICENSE (TBA).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CascadeAnnote

Table of contents

Why CascadeAnnote

The four-layer cascade

System architecture

Quickstart (local)

Deploying to Vercel

Project structure

API reference

Example: annotate a single text

Configuration

Decentralized stack integration

Bringing your own data

Inference providers

Frontend pages

Design system

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
components		components
lib		lib
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
next-env.d.ts		next-env.d.ts
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

CascadeAnnote

Table of contents

Why CascadeAnnote

The four-layer cascade

System architecture

Quickstart (local)

Deploying to Vercel

Project structure

API reference

Example: annotate a single text

Configuration

Decentralized stack integration

Bringing your own data

Inference providers

Frontend pages

Design system

Roadmap

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages