Skip to content

edycutjong/ContextWeaver

Repository files navigation

ContextWeaver Banner

ContextWeaver 🚀

Dynamic In-Context Learning Router for Intelligent Data Annotation

Live Demo Pitch Video
Next.js React Tailwind CSS FastAPI Python Qwen Docker GNU Make License: MIT


📸 See it in Action

(Dashboard showcasing the live data annotation pipeline) App Demo

💡 The Problem & Solution

In today's world, annotating long-context documents with smaller LLMs (like Qwen3-4B) fails due to lost-in-the-middle phenomena and context dilution from static few-shot examples.

ContextWeaver solves this by reframing prompt construction as a retrieval problem—applying RAG to In-Context Learning itself.

Key Features:

  • Dynamic Retrieval: For each document chunk, ChromaDB retrieves the top-3 most semantically relevant examples.
  • 🎯 Targeted Prompts: Reduces a 100k-token monolithic prompt into focused ~4,000-token prompts per chunk.
  • 🎨 Visual Tracing: A real-time Next.js and React Flow dashboard that animates data flow and provides a 3-column Chunk Inspector for total transparency.

🏗️ Architecture & Tech Stack

We built the frontend using Next.js 16, React 19, and Tailwind CSS v4, visualizing the pipeline with React Flow. The backend is powered by Python FastAPI, using ChromaDB and sentence-transformers for vector retrieval, and simulating the Qwen3-4B model for data annotation.

System Architecture

graph TB
    subgraph "Next.js Dashboard"
        A["📄 Document Upload"]
        B["🔀 React Flow Pipeline"]
        C["📊 Confidence Heatmap"]
        D["📋 Results Explorer"]
    end

    subgraph "FastAPI Backend"
        E["📑 Semantic Chunker"]
        F["🔍 ICL Example Retriever"]
        G["🧩 Prompt Builder"]
        H["🤖 Qwen3-4B Annotator"]
        I["🔗 Result Merger"]
    end

    subgraph "Data Layer"
        J["ChromaDB — ICL Examples"]
        K["Local FS — Documents"]
        L["JSON — Final Dataset"]
    end

    A -->|"Upload PDF/TXT"| E
    E -->|"Chunk[]"| F
    F -->|"top-3 examples/chunk"| J
    J -->|"similar examples"| G
    G -->|"4k-token prompt"| H
    H -->|"Qwen3-4B API"| I
    I -->|"merged annotations"| L

    E -.->|"progress events"| B
    F -.->|"retrieval events"| B
    H -.->|"annotation events"| B
    I -.->|"confidence data"| C
    L -.->|"final results"| D
Loading

Data Flow — Annotation Pipeline

sequenceDiagram
    participant UI as Next.js Dashboard
    participant API as FastAPI
    participant Chunk as Chunker
    participant VDB as ChromaDB
    participant PB as Prompt Builder
    participant LLM as Qwen3-4B
    participant Merge as Merger

    UI->>API: POST /annotate {document, config}
    API->>Chunk: split(document, chunk_size=2048, overlap=0.15)
    Chunk-->>API: Chunk[] (n chunks)
    
    loop For each chunk
        API->>VDB: query(chunk.embedding, top_k=3)
        VDB-->>API: ICL examples[]
        API->>PB: build_prompt(chunk, examples, schema)
        PB-->>API: formatted prompt (~4k tokens)
        API->>LLM: generate(prompt, json_mode=true)
        LLM-->>API: {annotations, confidence, reasoning}
        API-->>UI: SSE: chunk_completed {id, status, confidence}
    end

    API->>Merge: merge(all_chunk_results)
    Merge-->>API: unified_dataset.json
    API-->>UI: SSE: pipeline_complete {accuracy_estimate, stats}
Loading

🏆 Sponsor Tracks Targeted

  • FlagOS Open Computing Global Challenge: Submitted to Track 3 — Automatic Data Annotation with Large Models in Long-Context Scenarios. We effectively optimize the context window for Qwen3-4B.

Jointly hosted by:

  • DoraHacks
  • FlagOS Community
  • Beijing Academy of Artificial Intelligence (BAAI)
  • CCF Open Source Development Technology Committee (ODTC)

🚀 Run it Locally (For Judges)

We have made running the project as frictionless as possible. You can run it via Docker or locally using Make:

Option A: Docker (Recommended)

  1. Clone the repo: git clone https://github.com/edycutjong/contextweaver.git
  2. Navigate to directory: cd contextweaver
  3. Run with Docker Compose: docker-compose up --build (The backend runs on port 8000 and the frontend on port 3000)

Option B: Local Make

  1. Clone the repo: git clone https://github.com/edycutjong/contextweaver.git
  2. Navigate to directory: cd contextweaver
  3. Install dependencies: make install
  4. Run the app: make dev

Note for Judges: Both options start the FastAPI backend and Next.js frontend concurrently. Simply navigate to http://localhost:3000 in your browser and click "Run Document Annotation" to see the live simulation!

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors