ContextWeaver 🚀

Dynamic In-Context Learning Router for Intelligent Data Annotation

📸 See it in Action

(Dashboard showcasing the live data annotation pipeline)

💡 The Problem & Solution

In today's world, annotating long-context documents with smaller LLMs (like Qwen3-4B) fails due to lost-in-the-middle phenomena and context dilution from static few-shot examples.

ContextWeaver solves this by reframing prompt construction as a retrieval problem—applying RAG to In-Context Learning itself.

Key Features:

⚡ Dynamic Retrieval: For each document chunk, ChromaDB retrieves the top-3 most semantically relevant examples.
🎯 Targeted Prompts: Reduces a 100k-token monolithic prompt into focused ~4,000-token prompts per chunk.
🎨 Visual Tracing: A real-time Next.js and React Flow dashboard that animates data flow and provides a 3-column Chunk Inspector for total transparency.

🏗️ Architecture & Tech Stack

We built the frontend using Next.js 16, React 19, and Tailwind CSS v4, visualizing the pipeline with React Flow. The backend is powered by Python FastAPI, using ChromaDB and sentence-transformers for vector retrieval, and simulating the Qwen3-4B model for data annotation.

System Architecture

graph TB
    subgraph "Next.js Dashboard"
        A["📄 Document Upload"]
        B["🔀 React Flow Pipeline"]
        C["📊 Confidence Heatmap"]
        D["📋 Results Explorer"]
    end

    subgraph "FastAPI Backend"
        E["📑 Semantic Chunker"]
        F["🔍 ICL Example Retriever"]
        G["🧩 Prompt Builder"]
        H["🤖 Qwen3-4B Annotator"]
        I["🔗 Result Merger"]
    end

    subgraph "Data Layer"
        J["ChromaDB — ICL Examples"]
        K["Local FS — Documents"]
        L["JSON — Final Dataset"]
    end

    A -->|"Upload PDF/TXT"| E
    E -->|"Chunk[]"| F
    F -->|"top-3 examples/chunk"| J
    J -->|"similar examples"| G
    G -->|"4k-token prompt"| H
    H -->|"Qwen3-4B API"| I
    I -->|"merged annotations"| L

    E -.->|"progress events"| B
    F -.->|"retrieval events"| B
    H -.->|"annotation events"| B
    I -.->|"confidence data"| C
    L -.->|"final results"| D

Data Flow — Annotation Pipeline

sequenceDiagram
    participant UI as Next.js Dashboard
    participant API as FastAPI
    participant Chunk as Chunker
    participant VDB as ChromaDB
    participant PB as Prompt Builder
    participant LLM as Qwen3-4B
    participant Merge as Merger

    UI->>API: POST /annotate {document, config}
    API->>Chunk: split(document, chunk_size=2048, overlap=0.15)
    Chunk-->>API: Chunk[] (n chunks)
    
    loop For each chunk
        API->>VDB: query(chunk.embedding, top_k=3)
        VDB-->>API: ICL examples[]
        API->>PB: build_prompt(chunk, examples, schema)
        PB-->>API: formatted prompt (~4k tokens)
        API->>LLM: generate(prompt, json_mode=true)
        LLM-->>API: {annotations, confidence, reasoning}
        API-->>UI: SSE: chunk_completed {id, status, confidence}
    end

    API->>Merge: merge(all_chunk_results)
    Merge-->>API: unified_dataset.json
    API-->>UI: SSE: pipeline_complete {accuracy_estimate, stats}

🏆 Sponsor Tracks Targeted

FlagOS Open Computing Global Challenge: Submitted to Track 3 — Automatic Data Annotation with Large Models in Long-Context Scenarios. We effectively optimize the context window for Qwen3-4B.

Jointly hosted by:

DoraHacks
FlagOS Community
Beijing Academy of Artificial Intelligence (BAAI)
CCF Open Source Development Technology Committee (ODTC)

🚀 Run it Locally (For Judges)

We have made running the project as frictionless as possible. You can run it via Docker or locally using Make:

Option A: Docker (Recommended)

Clone the repo: git clone https://github.com/edycutjong/contextweaver.git
Navigate to directory: cd contextweaver
Run with Docker Compose: docker-compose up --build (The backend runs on port 8000 and the frontend on port 3000)

Option B: Local Make

Clone the repo: git clone https://github.com/edycutjong/contextweaver.git
Navigate to directory: cd contextweaver
Install dependencies: make install
Run the app: make dev

Note for Judges: Both options start the FastAPI backend and Next.js frontend concurrently. Simply navigate to http://localhost:3000 in your browser and click "Run Document Annotation" to see the live simulation!

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.github/workflows		.github/workflows
.vscode		.vscode
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ContextWeaver 🚀

📸 See it in Action

💡 The Problem & Solution

🏗️ Architecture & Tech Stack

System Architecture

Data Flow — Annotation Pipeline

🏆 Sponsor Tracks Targeted

🚀 Run it Locally (For Judges)

Option A: Docker (Recommended)

Option B: Local Make

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ContextWeaver 🚀

📸 See it in Action

💡 The Problem & Solution

🏗️ Architecture & Tech Stack

System Architecture

Data Flow — Annotation Pipeline

🏆 Sponsor Tracks Targeted

🚀 Run it Locally (For Judges)

Option A: Docker (Recommended)

Option B: Local Make

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages