v2.16.0
What's New
Ollama Context Window Configuration
Added num_ctx support for Ollama provider with a sensible default of 128,000 tokens. This fixes an issue where Ollama's default context window (2,048 tokens) was causing context truncation with large documents.
# Uses default of 128,000 tokens
model = AIFactory.create_language("ollama", "llama3.1")
# Or customize as needed
model = AIFactory.create_language("ollama", "llama3.1", config={"num_ctx": 32768})Ollama Keep Alive Configuration
Added keep_alive support to control how long models stay loaded in memory. No default is set to avoid forcing memory usage on users.
# Keep model loaded for 10 minutes
model = AIFactory.create_language("ollama", "llama3.1", config={"keep_alive": "10m"})
# Unload immediately after use
model = AIFactory.create_language("ollama", "llama3.1", config={"keep_alive": "0"})Both parameters work with LangChain integration via to_langchain().
Full Changelog
See CHANGELOG.md for details.
Full Changelog: v2.15.0...v2.16.0