Distributed Semantic Cache

Open-source semantic caching for LLM applications. Reduce API costs by 50-80% while improving response times.

🎯 Why Distributed Semantic Cache?

Challenge	Solution
High LLM API costs	Semantic caching reduces calls by 50-80%
Slow response times	Sub-millisecond cache hits vs 1-3s API calls
Exact match limitations	Semantic similarity catches paraphrased queries
Data privacy concerns	100% local embeddings, your data never leaves
Production scalability	Kubernetes-ready with HNSW indexing for 100K+ vectors

📦 SDK - The Developer Experience

npm install @distributed-semantic-cache/sdk

Drop-in LLM Integration

import { createOpenAIMiddleware, SemanticCache } from '@distributed-semantic-cache/sdk';
import OpenAI from 'openai';

// Setup cache
const cache = new SemanticCache({
  baseUrl: process.env.CACHE_URL,
  apiKey: process.env.CACHE_API_KEY,
});

// Create middleware
const middleware = createOpenAIMiddleware({ cache, threshold: 0.85 });

// Wrap your OpenAI calls - that's it!
const result = await middleware.chat(
  { model: 'gpt-4', messages: [{ role: 'user', content: 'Explain quantum computing' }] },
  () => openai.chat.completions.create({ model: 'gpt-4', messages: [...] })
);

if (result.cached) {
  console.log(`💰 Saved API call! Similarity: ${result.similarity}`);
}

Also Supports

Anthropic Claude - createAnthropicMiddleware()
Custom LLMs - createGenericLLMMiddleware()
React Apps - createSemanticCacheHooks(React)
Fluent Config - buildCache().withPreset('production').build()

📚 Full SDK Documentation

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Your Application                        │
├─────────────────────────────────────────────────────────────────┤
│                         SDK Middleware                          │
│    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │
│    │   OpenAI    │    │  Anthropic  │    │   Custom    │        │
│    │  Middleware │    │  Middleware │    │     LLM     │        │
│    └──────┬──────┘    └──────┬──────┘    └──────┬──────┘        │
└───────────┼──────────────────┼──────────────────┼───────────────┘
            │                  │                  │
            ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Semantic Cache API                           │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐    ┌──────────────┐    ┌─────────────┐         │
│  │ L1: Exact   │ →  │L2: Normalized│ →  │L3: Semantic │         │
│  │   Match     │    │    Match     │    │   Search    │         │
│  │   O(1)      │    │    O(1)      │    │  O(log n)   │         │
│  └─────────────┘    └──────────────┘    └─────────────┘         │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐          │
│  │   HNSW      │    │  Matryoshka │    │  Predictive │          │
│  │   Index     │    │   Cascade   │    │   Warming   │          │
│  └─────────────┘    └─────────────┘    └─────────────┘          │
└─────────────────────────────────────────────────────────────────┘
            │                  │                  │
            ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Storage & Embeddings                         │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐          │
│  │   SQLite    │    │    Local    │    │   OpenAI    │          │
│  │   Storage   │    │  Embeddings │    │  Embeddings │          │
│  └─────────────┘    └─────────────┘    └─────────────┘          │
└─────────────────────────────────────────────────────────────────┘

🚀 Features

Features

3-Layer Cache Architecture - Exact → Normalized → Semantic matching
Local Embeddings - 100% free, privacy-first (MiniLM, mpnet, e5)
Query Normalization - Case, punctuation, contraction handling
Confidence Scoring - Multi-factor cache hit confidence
SQLite Storage - Lightweight, file-based, zero-config
Full REST API - Query, store, stats, admin endpoints
React Chat UI - Interactive demo and testing interface
Multi-Tenancy - Complete data isolation, per-tenant quotas
Analytics - Cost tracking, ROI dashboards, time-series metrics
Predictive Cache Warming - Pattern-based pre-population
HNSW Indexing - O(log n) search for 100K+ vectors
Matryoshka Cascade - Adaptive dimension search (4-8x faster)
Production Ready - Docker, Kubernetes, Terraform templates

📊 Performance

Metric	Value
Cache Hit Latency	< 5ms
L1 (Exact) Lookup	O(1)
L3 (Semantic) Search	O(log n) with HNSW
Vector Capacity	100K+ entries
Storage Reduction	75% with quantization
API Cost Savings	50-80% typical

🛠️ Quick Start

Prerequisites

Node.js 18+
pnpm 8+

Installation

# Clone the repository
git clone https://github.com/your-org/distributed-semantic-cache.git
cd distributed-semantic-cache

# Install dependencies
pnpm install

# Configure environment
cp .env.example .env

Configuration

Option A: Local Embeddings (Free, Privacy-First) ⭐ Recommended

EMBEDDING_PROVIDER=local
LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2

Option B: OpenAI Embeddings (Higher Quality)

EMBEDDING_PROVIDER=openai
OPENAI_API_KEY=your_openai_api_key

Run

# Development mode (all packages)
pnpm dev

# Or individually
cd packages/api && pnpm dev   # API: http://localhost:3000
cd packages/web && pnpm dev   # Web: http://localhost:5173

📡 API Reference

Query Cache

curl -X POST http://localhost:3000/api/cache/query \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{"query": "What is TypeScript?", "threshold": 0.85}'

Store Response

curl -X POST http://localhost:3000/api/cache/store \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{"query": "What is TypeScript?", "response": "TypeScript is..."}'

Get Statistics

curl http://localhost:3000/api/cache/stats \
  -H "x-api-key: YOUR_API_KEY"

📖 Full API Documentation

🐳 Production Deployment

Docker

docker-compose up -d

Kubernetes

kubectl apply -f deploy/kubernetes/

Terraform (AWS)

cd deploy/terraform/aws
terraform init && terraform apply

🚀 Deployment Guide

📁 Project Structure

distributed-semantic-cache/
├── packages/
│   ├── api/           # Fastify REST API server
│   ├── sdk/           # TypeScript SDK for developers
│   ├── web/           # React demo application
│   └── shared/        # Shared types and utilities
├── deploy/
│   ├── kubernetes/    # K8s manifests
│   ├── terraform/     # Infrastructure as code
│   └── nginx/         # Reverse proxy config
└── docs/
    ├── architecture/  # System design docs
    ├── guides/        # User guides
    └── business/      # Strategy docs

📄 License

This project is licensed under the MIT License - see LICENSE for details.

Free to use, modify, and distribute for any purpose.

📚 Documentation

Document	Description
SDK Documentation	TypeScript SDK reference
Quick Start Guide	Get running in 5 minutes
Architecture	System design overview
Security Guide	Production hardening
Examples	Integration patterns

🧪 Testing

# Run all tests
pnpm test

# Run SDK tests
cd packages/sdk && pnpm test

# Run API tests
cd packages/api && pnpm test

220+ tests passing across all packages.

🤝 Contributing

See CONTRIBUTING.md for development guidelines.

📞 Support

Have questions or need help?

📝 Open an Issue for bugs or feature requests
💬 Discussions for questions and ideas
⭐ Star this repo if you find it useful!

Reduce LLM costs. Improve performance. Ship faster.

Built with ❤️ for the AI community

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
deploy		deploy
docs		docs
packages		packages
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
EXAMPLES.md		EXAMPLES.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.postgres.yml		docker-compose.postgres.yml
docker-compose.scale.yml		docker-compose.scale.yml
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
populate-cache.ps1		populate-cache.ps1
test-api.ps1		test-api.ps1
test-poc.ps1		test-poc.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Semantic Cache

🎯 Why Distributed Semantic Cache?

📦 SDK - The Developer Experience

Drop-in LLM Integration

Also Supports

🏗️ Architecture

🚀 Features

Features

📊 Performance

🛠️ Quick Start

Prerequisites

Installation

Configuration

Run

📡 API Reference

Query Cache

Store Response

Get Statistics

🐳 Production Deployment

Docker

Kubernetes

Terraform (AWS)

📁 Project Structure

📄 License

📚 Documentation

🧪 Testing

🤝 Contributing

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Distributed Semantic Cache

🎯 Why Distributed Semantic Cache?

📦 SDK - The Developer Experience

Drop-in LLM Integration

Also Supports

🏗️ Architecture

🚀 Features

Features

📊 Performance

🛠️ Quick Start

Prerequisites

Installation

Configuration

Run

📡 API Reference

Query Cache

Store Response

Get Statistics

🐳 Production Deployment

Docker

Kubernetes

Terraform (AWS)

📁 Project Structure

📄 License

📚 Documentation

🧪 Testing

🤝 Contributing

📞 Support

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages