dyncontext

Dynamic Context Management for LLMs

dyncontext is a Python package that provides intelligent, dynamic context window management for Large Language Models. Instead of static conversation history, dyncontext allows you to store various types of knowledge and automatically inject relevant context based on the current query.

Features

Dynamic Context Injection: Automatically retrieve and inject relevant context based on the query
Multiple Retrieval Strategies: Semantic (embedding-based), keyword, recency, and hybrid retrieval
Flexible Storage: Store facts, documents, code snippets, and custom content types
Document Chunking: Automatically chunk large documents for optimal retrieval
Token Management: Smart token budget management for context windows
Multi-Provider Support: Works with OpenAI, Anthropic, and 100+ providers via LiteLLM
Persistence: Save and load context stores for reuse
Vector Store Backends: Support for in-memory, ChromaDB, and custom vector stores
Reranking: Improve retrieval quality with cross-encoder and ensemble rerankers
Caching: Cache embeddings, query results, and LLM responses for efficiency
Telemetry: Built-in observability with metrics and event tracking
Context Compression: Smart compression strategies when context exceeds limits

Installation

# Basic installation
pip install dyncontext

# With OpenAI support
pip install dyncontext[openai]

# With Anthropic support
pip install dyncontext[anthropic]

# With all providers
pip install dyncontext[all]

Quick Start

from dyncontext import DynContext

# Initialize with OpenAI
ctx = DynContext(provider="openai", model="gpt-4")

# Add knowledge to the context store
ctx.add("Python was created by Guido van Rossum in 1991", tags=["python"])
ctx.add("FastAPI is a modern Python web framework", tags=["python", "web"])
ctx.add("React is a JavaScript library for building UIs", tags=["javascript"])

# Query - relevant context is automatically injected
response = ctx.complete("Who created Python?")
print(response.content)
# Output: Python was created by Guido van Rossum in 1991...

How It Works

Unlike traditional RAG systems or static context management, dyncontext:

Stores knowledge in a flexible context store with metadata, tags, and embeddings
Retrieves relevant context using hybrid search (semantic + keyword)
Injects context dynamically into each LLM request based on the query
Manages token budgets to fit within context window limits

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   User Query    │───▶│    Retrieval    │───▶│  Context Block  │
└─────────────────┘    │   (Semantic +   │    │   (Injected)    │
                       │    Keyword)     │    └────────┬────────┘
                       └─────────────────┘             │
                                                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│    Response     │◀───│      LLM        │◀───│  System + Query │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Examples

Basic Usage

from dyncontext import DynContext, ContextType

ctx = DynContext(
    provider="openai",
    model="gpt-4",
    max_context_tokens=2000,
)

# Add different types of content
ctx.add("Company was founded in 2020", context_type=ContextType.FACT)
ctx.add("def hello(): print('world')", context_type=ContextType.CODE)

# Query with automatic context injection
response = ctx.complete("When was the company founded?")

Document RAG

# Add a document with automatic chunking
with open("manual.txt") as f:
    ctx.add_document(
        f.read(),
        chunk_size=500,
        chunk_overlap=50,
        tags=["manual", "product"],
    )

# Query the document
response = ctx.complete("How do I set up the device?")

Filtering Context

# Only retrieve context with specific tags
response = ctx.complete(
    "What web frameworks are available?",
    tags=["web"],  # Only items tagged with 'web'
)

# Only retrieve specific types
response = ctx.complete(
    "Show me code examples",
    context_types=[ContextType.CODE],
)

Streaming

for chunk in ctx.stream("Explain Python decorators"):
    print(chunk, end="", flush=True)

Manual Retrieval

# See what context would be retrieved
results = ctx.retrieve("Python web framework", top_k=5)
for result in results:
    print(f"Score: {result.score:.3f}")
    print(f"Content: {result.item.content[:100]}...")

Persistence

# Save context store
ctx.save("knowledge_base.json")

# Load later
ctx.load("knowledge_base.json")

Vector Store Backends

from dyncontext import DynContext
from dyncontext.stores import InMemoryVectorStore, ChromaVectorStore

# Use in-memory store (default)
ctx = DynContext(
    provider="openai",
    vector_store=InMemoryVectorStore(),
)

# Use ChromaDB for persistent vector storage
ctx = DynContext(
    provider="openai",
    vector_store=ChromaVectorStore(
        collection_name="my_knowledge",
        persist_directory="./chroma_db",
    ),
)

# Use FAISS for high-performance similarity search
from dyncontext.stores import FAISSVectorStore

ctx = DynContext(
    provider="openai",
    vector_store=FAISSVectorStore(
        dimension=1536,  # OpenAI embedding dimension
        index_type="flat",  # "flat", "ivf", "hnsw", or "pq"
        persist_directory="./faiss_index",
        metric="cosine",  # "cosine", "l2", or "ip"
    ),
)

# FAISS with approximate search for large datasets
ctx = DynContext(
    provider="openai",
    vector_store=FAISSVectorStore(
        dimension=1536,
        index_type="ivf",  # Inverted file index for faster search
        nlist=100,  # Number of clusters
        nprobe=10,  # Number of clusters to search
    ),
)

# Custom vector store - implement BaseVectorStore interface
from dyncontext.stores import BaseVectorStore

class MyVectorStore(BaseVectorStore):
    def add(self, documents): ...
    def search(self, query_embedding, top_k): ...
    def delete(self, ids): ...

Reranking

Improve retrieval quality by reranking initial results with a more accurate model.

from dyncontext import DynContext
from dyncontext.reranker import CrossEncoderReranker, CohereReranker, EnsembleReranker

# Use cross-encoder reranking (local model)
ctx = DynContext(
    provider="openai",
    reranker=CrossEncoderReranker(model_name="cross-encoder/ms-marco-MiniLM-L-6-v2"),
)

# Use Cohere reranking API
ctx = DynContext(
    provider="openai",
    reranker=CohereReranker(api_key="your-cohere-key"),
)

# Ensemble multiple rerankers
ctx = DynContext(
    provider="openai",
    reranker=EnsembleReranker([
        (CrossEncoderReranker(), 0.6),
        (CohereReranker(), 0.4),
    ]),
)

Caching

Cache embeddings, query results, and LLM responses for improved performance.

from dyncontext import DynContext

# Enable all caching
ctx = DynContext(
    provider="openai",
    enable_embedding_cache=True,
    enable_query_cache=True,
    enable_response_cache=True,
    cache_ttl=3600,  # Cache TTL in seconds
    cache_max_size=1000,  # Max items per cache
)

# Check cache statistics
stats = ctx.get_cache_stats()
print(f"Embedding cache hits: {stats['embedding']['hits']}")
print(f"Query cache hits: {stats['query']['hits']}")
print(f"Response cache hits: {stats['response']['hits']}")

# Clear caches when needed
ctx.clear_caches()

Telemetry

Built-in observability for monitoring and debugging.

from dyncontext import DynContext

# Enable telemetry
ctx = DynContext(
    provider="openai",
    enable_telemetry=True,
)

# Use context normally
ctx.add("Some knowledge")
response = ctx.complete("Query")

# Get metrics
metrics = ctx.get_telemetry_metrics()
print(f"Total retrievals: {metrics['retrievals']}")
print(f"Total completions: {metrics['completions']}")
print(f"Average retrieval time: {metrics['avg_retrieval_time']:.3f}s")
print(f"Total tokens used: {metrics['total_tokens']}")

Context Compression

Automatically compress context when it exceeds token limits.

from dyncontext import DynContext
from dyncontext.compression import (
    TruncationCompressor,
    SentenceCompressor,
    LLMCompressor,
    HierarchicalCompressor,
    AdaptiveCompressor,
)

# Simple truncation (default)
ctx = DynContext(
    provider="openai",
    compressor=TruncationCompressor(max_tokens=2000),
)

# Sentence-level compression (preserves complete sentences)
ctx = DynContext(
    provider="openai",
    compressor=SentenceCompressor(max_tokens=2000),
)

# LLM-based summarization compression
ctx = DynContext(
    provider="openai",
    compressor=LLMCompressor(
        model="gpt-3.5-turbo",
        target_ratio=0.5,  # Compress to 50% of original
    ),
)

# Hierarchical compression (preserves structure)
ctx = DynContext(
    provider="openai",
    compressor=HierarchicalCompressor(
        levels=["paragraph", "sentence"],
        max_tokens=2000,
    ),
)

# Adaptive compression (automatically chooses strategy)
ctx = DynContext(
    provider="openai",
    compressor=AdaptiveCompressor(
        max_tokens=2000,
        prefer_quality=True,
    ),
)

Using DynContextConfig

For cleaner configuration, use the DynContextConfig dataclass.

from dyncontext import DynContext, DynContextConfig
from dyncontext.stores import ChromaVectorStore
from dyncontext.reranker import CrossEncoderReranker
from dyncontext.compression import AdaptiveCompressor

# Create configuration
config = DynContextConfig(
    provider="openai",
    model="gpt-4",
    max_context_tokens=4000,
    vector_store=ChromaVectorStore(collection_name="docs"),
    reranker=CrossEncoderReranker(),
    enable_embedding_cache=True,
    enable_query_cache=True,
    enable_response_cache=True,
    cache_ttl=3600,
    enable_telemetry=True,
    compressor=AdaptiveCompressor(max_tokens=4000),
)

# Create context manager from config
ctx = DynContext.from_config(config)

API Reference

DynContext

Main class for dynamic context management.

DynContext(
    provider="openai",           # LLM provider: "openai", "anthropic", "litellm"
    model="gpt-4",               # Model name
    embedder="openai",           # Embedding provider (or None to disable semantic search)
    max_context_tokens=4000,     # Max tokens for injected context
    system_prompt=None,          # Default system prompt
    enable_semantic=True,        # Enable semantic retrieval
    enable_keyword=True,         # Enable keyword retrieval
    semantic_weight=0.7,         # Weight for semantic in hybrid search
    keyword_weight=0.3,          # Weight for keyword in hybrid search
    # New parameters
    vector_store=None,           # Custom vector store backend
    reranker=None,               # Reranker for improving retrieval quality
    compressor=None,             # Context compressor
    enable_embedding_cache=False,# Cache embeddings
    enable_query_cache=False,    # Cache query results
    enable_response_cache=False, # Cache LLM responses
    cache_ttl=3600,              # Cache time-to-live in seconds
    cache_max_size=1000,         # Maximum items per cache
    enable_telemetry=False,      # Enable telemetry tracking
)

Methods

add(content, context_type, tags, metadata, priority) - Add content to store
add_many(contents, ...) - Add multiple items
add_document(content, chunk_size, chunk_overlap, tags) - Add chunked document
complete(prompt, system, top_k, tags, ...) - Generate completion with context
acomplete(...) - Async completion
stream(...) - Streaming completion
retrieve(query, top_k, tags, ...) - Manual context retrieval
save(path) / load(path) - Persistence
clear_history() - Clear conversation history
get_cache_stats() - Get cache hit/miss statistics
clear_caches() - Clear all caches
get_telemetry_metrics() - Get telemetry metrics
from_config(config) - Create instance from DynContextConfig

Properties

vector_store - Access the vector store backend
reranker - Access the reranker
compressor - Access the context compressor

ContextItem

ContextItem(
    content="...",                    # The text content
    context_type=ContextType.FACT,    # Type of content
    tags=["tag1", "tag2"],            # Tags for filtering
    metadata={"key": "value"},        # Custom metadata
    priority=1.0,                     # Priority for ranking
)

ContextType

DOCUMENT - Long-form documents
CHUNK - Document chunks
FACT - Short factual statements
CONVERSATION - Past conversation turns
INSTRUCTION - System instructions
CODE - Code snippets
CUSTOM - User-defined

Retrievers

SemanticRetriever - Embedding-based similarity search
KeywordRetriever - BM25-style keyword matching
HybridRetriever - Combines multiple retrievers
RecencyRetriever - Time-based retrieval
TagRetriever - Tag-based filtering

Vector Stores

from dyncontext.stores import InMemoryVectorStore, ChromaVectorStore, FAISSVectorStore

# In-memory (default, fast, non-persistent)
InMemoryVectorStore()

# ChromaDB (persistent, scalable)
ChromaVectorStore(
    collection_name="my_collection",
    persist_directory="./chroma_db",
    embedding_function=None,  # Uses default if not specified
)

# FAISS (high-performance, supports GPU)
FAISSVectorStore(
    dimension=1536,  # Vector dimension
    index_type="flat",  # "flat", "ivf", "hnsw", "pq"
    persist_directory="./faiss_index",
    use_gpu=False,  # True for GPU acceleration
    metric="cosine",  # "cosine", "l2", "ip"
    nlist=100,  # Clusters for IVF
    nprobe=10,  # Search clusters for IVF
)

All vector stores implement the BaseVectorStore interface:

add(documents: List[VectorDocument]) - Add documents
search(query_embedding, top_k) - Search by embedding
delete(ids: List[str]) - Delete by IDs
get(ids: List[str]) - Retrieve by IDs
count() - Get document count
clear() - Clear all documents
persist() - Save to disk

Rerankers

from dyncontext.reranker import CrossEncoderReranker, CohereReranker, EnsembleReranker

# Cross-encoder (local, no API calls)
CrossEncoderReranker(
    model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
    top_k=10,
)

# Cohere API
CohereReranker(
    api_key="...",
    model="rerank-english-v2.0",
    top_k=10,
)

# Ensemble (combine multiple rerankers)
EnsembleReranker(
    rerankers=[(reranker1, weight1), (reranker2, weight2)],
    fusion_method="rrf",  # or "weighted"
)

Caches

from dyncontext.cache import EmbeddingCache, QueryCache, ResponseCache

# Embedding cache (caches vector embeddings)
EmbeddingCache(max_size=10000, ttl=3600)

# Query cache (caches retrieval results)
QueryCache(max_size=1000, ttl=3600)

# Response cache (caches LLM responses)
ResponseCache(max_size=500, ttl=3600)

Compressors

from dyncontext.compression import (
    TruncationCompressor,
    SentenceCompressor,
    LLMCompressor,
    HierarchicalCompressor,
    AdaptiveCompressor,
)

# Truncation (simple, fast)
TruncationCompressor(max_tokens=2000, truncate_from="end")

# Sentence-level (preserves sentence boundaries)
SentenceCompressor(max_tokens=2000)

# LLM-based summarization
LLMCompressor(model="gpt-3.5-turbo", target_ratio=0.5)

# Hierarchical (preserves document structure)
HierarchicalCompressor(levels=["paragraph", "sentence"], max_tokens=2000)

# Adaptive (auto-selects best strategy)
AdaptiveCompressor(max_tokens=2000, prefer_quality=True)

Telemetry

from dyncontext.telemetry import TelemetryManager

# Get metrics programmatically
telemetry = TelemetryManager()
telemetry.record_event("retrieval", {"query": "...", "results": 5})
metrics = telemetry.get_metrics()

Configuration

Environment Variables

OPENAI_API_KEY - OpenAI API key
ANTHROPIC_API_KEY - Anthropic API key

Custom Embeddings

from dyncontext import DynContext, SentenceTransformerEmbedding

# Use local embeddings (no API calls)
embedder = SentenceTransformerEmbedding(model_name="all-MiniLM-L6-v2")
ctx = DynContext(provider="openai", embedder=embedder)

Custom Retriever

from dyncontext import DynContext, HybridRetriever, SemanticRetriever, KeywordRetriever

retriever = HybridRetriever([
    (SemanticRetriever(embedding_fn=my_embed), 0.8),
    (KeywordRetriever(), 0.2),
], fusion_method="rrf")

ctx = DynContext(provider="openai", retriever=retriever)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
dyncontext		dyncontext
examples		examples
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

License

ltphen/dyncontext

Folders and files

Latest commit

History

Repository files navigation

dyncontext

Features

Installation

Quick Start

How It Works

Examples

Basic Usage

Document RAG

Filtering Context

Streaming

Manual Retrieval

Persistence

Vector Store Backends

Reranking

Caching

Telemetry

Context Compression

Using DynContextConfig

API Reference

DynContext

Methods

Properties

ContextItem

ContextType

Retrievers

Vector Stores

Rerankers

Caches

Compressors

Telemetry

Configuration

Environment Variables

Custom Embeddings

Custom Retriever

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages