Skip to content

kylemaa/distributed-semantic-cache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Semantic Cache

Open-source semantic caching for LLM applications. Reduce API costs by 50-80% while improving response times.

License: MIT TypeScript Node.js Tests

🎯 Why Distributed Semantic Cache?

Challenge Solution
High LLM API costs Semantic caching reduces calls by 50-80%
Slow response times Sub-millisecond cache hits vs 1-3s API calls
Exact match limitations Semantic similarity catches paraphrased queries
Data privacy concerns 100% local embeddings, your data never leaves
Production scalability Kubernetes-ready with HNSW indexing for 100K+ vectors

📦 SDK - The Developer Experience

npm install @distributed-semantic-cache/sdk

Drop-in LLM Integration

import { createOpenAIMiddleware, SemanticCache } from '@distributed-semantic-cache/sdk';
import OpenAI from 'openai';

// Setup cache
const cache = new SemanticCache({
  baseUrl: process.env.CACHE_URL,
  apiKey: process.env.CACHE_API_KEY,
});

// Create middleware
const middleware = createOpenAIMiddleware({ cache, threshold: 0.85 });

// Wrap your OpenAI calls - that's it!
const result = await middleware.chat(
  { model: 'gpt-4', messages: [{ role: 'user', content: 'Explain quantum computing' }] },
  () => openai.chat.completions.create({ model: 'gpt-4', messages: [...] })
);

if (result.cached) {
  console.log(`💰 Saved API call! Similarity: ${result.similarity}`);
}

Also Supports

  • Anthropic Claude - createAnthropicMiddleware()
  • Custom LLMs - createGenericLLMMiddleware()
  • React Apps - createSemanticCacheHooks(React)
  • Fluent Config - buildCache().withPreset('production').build()

📚 Full SDK Documentation

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Your Application                        │
├─────────────────────────────────────────────────────────────────┤
│                         SDK Middleware                          │
│    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │
│    │   OpenAI    │    │  Anthropic  │    │   Custom    │        │
│    │  Middleware │    │  Middleware │    │     LLM     │        │
│    └──────┬──────┘    └──────┬──────┘    └──────┬──────┘        │
└───────────┼──────────────────┼──────────────────┼───────────────┘
            │                  │                  │
            ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Semantic Cache API                           │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐    ┌──────────────┐    ┌─────────────┐         │
│  │ L1: Exact   │ →  │L2: Normalized│ →  │L3: Semantic │         │
│  │   Match     │    │    Match     │    │   Search    │         │
│  │   O(1)      │    │    O(1)      │    │  O(log n)   │         │
│  └─────────────┘    └──────────────┘    └─────────────┘         │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐          │
│  │   HNSW      │    │  Matryoshka │    │  Predictive │          │
│  │   Index     │    │   Cascade   │    │   Warming   │          │
│  └─────────────┘    └─────────────┘    └─────────────┘          │
└─────────────────────────────────────────────────────────────────┘
            │                  │                  │
            ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Storage & Embeddings                         │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐          │
│  │   SQLite    │    │    Local    │    │   OpenAI    │          │
│  │   Storage   │    │  Embeddings │    │  Embeddings │          │
│  └─────────────┘    └─────────────┘    └─────────────┘          │
└─────────────────────────────────────────────────────────────────┘

🚀 Features

Features

  • 3-Layer Cache Architecture - Exact → Normalized → Semantic matching
  • Local Embeddings - 100% free, privacy-first (MiniLM, mpnet, e5)
  • Query Normalization - Case, punctuation, contraction handling
  • Confidence Scoring - Multi-factor cache hit confidence
  • SQLite Storage - Lightweight, file-based, zero-config
  • Full REST API - Query, store, stats, admin endpoints
  • React Chat UI - Interactive demo and testing interface
  • Multi-Tenancy - Complete data isolation, per-tenant quotas
  • Analytics - Cost tracking, ROI dashboards, time-series metrics
  • Predictive Cache Warming - Pattern-based pre-population
  • HNSW Indexing - O(log n) search for 100K+ vectors
  • Matryoshka Cascade - Adaptive dimension search (4-8x faster)
  • Production Ready - Docker, Kubernetes, Terraform templates

📊 Performance

Metric Value
Cache Hit Latency < 5ms
L1 (Exact) Lookup O(1)
L3 (Semantic) Search O(log n) with HNSW
Vector Capacity 100K+ entries
Storage Reduction 75% with quantization
API Cost Savings 50-80% typical

🛠️ Quick Start

Prerequisites

  • Node.js 18+
  • pnpm 8+

Installation

# Clone the repository
git clone https://github.com/your-org/distributed-semantic-cache.git
cd distributed-semantic-cache

# Install dependencies
pnpm install

# Configure environment
cp .env.example .env

Configuration

Option A: Local Embeddings (Free, Privacy-First) ⭐ Recommended

EMBEDDING_PROVIDER=local
LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2

Option B: OpenAI Embeddings (Higher Quality)

EMBEDDING_PROVIDER=openai
OPENAI_API_KEY=your_openai_api_key

Run

# Development mode (all packages)
pnpm dev

# Or individually
cd packages/api && pnpm dev   # API: http://localhost:3000
cd packages/web && pnpm dev   # Web: http://localhost:5173

📡 API Reference

Query Cache

curl -X POST http://localhost:3000/api/cache/query \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{"query": "What is TypeScript?", "threshold": 0.85}'

Store Response

curl -X POST http://localhost:3000/api/cache/store \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{"query": "What is TypeScript?", "response": "TypeScript is..."}'

Get Statistics

curl http://localhost:3000/api/cache/stats \
  -H "x-api-key: YOUR_API_KEY"

📖 Full API Documentation

🐳 Production Deployment

Docker

docker-compose up -d

Kubernetes

kubectl apply -f deploy/kubernetes/

Terraform (AWS)

cd deploy/terraform/aws
terraform init && terraform apply

🚀 Deployment Guide

📁 Project Structure

distributed-semantic-cache/
├── packages/
│   ├── api/           # Fastify REST API server
│   ├── sdk/           # TypeScript SDK for developers
│   ├── web/           # React demo application
│   └── shared/        # Shared types and utilities
├── deploy/
│   ├── kubernetes/    # K8s manifests
│   ├── terraform/     # Infrastructure as code
│   └── nginx/         # Reverse proxy config
└── docs/
    ├── architecture/  # System design docs
    ├── guides/        # User guides
    └── business/      # Strategy docs

📄 License

This project is licensed under the MIT License - see LICENSE for details.

Free to use, modify, and distribute for any purpose.

📚 Documentation

Document Description
SDK Documentation TypeScript SDK reference
Quick Start Guide Get running in 5 minutes
Architecture System design overview
Security Guide Production hardening
Examples Integration patterns

🧪 Testing

# Run all tests
pnpm test

# Run SDK tests
cd packages/sdk && pnpm test

# Run API tests
cd packages/api && pnpm test

220+ tests passing across all packages.

🤝 Contributing

See CONTRIBUTING.md for development guidelines.

📞 Support

Have questions or need help?

  • 📝 Open an Issue for bugs or feature requests
  • 💬 Discussions for questions and ideas
  • ⭐ Star this repo if you find it useful!

Reduce LLM costs. Improve performance. Ship faster.

Built with ❤️ for the AI community

About

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors