Skip to content

v2.16.0

Choose a tag to compare

@lfnovo lfnovo released this 22 Jan 02:07
· 116 commits to main since this release
685d384

What's New

Ollama Context Window Configuration

Added num_ctx support for Ollama provider with a sensible default of 128,000 tokens. This fixes an issue where Ollama's default context window (2,048 tokens) was causing context truncation with large documents.

# Uses default of 128,000 tokens
model = AIFactory.create_language("ollama", "llama3.1")

# Or customize as needed
model = AIFactory.create_language("ollama", "llama3.1", config={"num_ctx": 32768})

Ollama Keep Alive Configuration

Added keep_alive support to control how long models stay loaded in memory. No default is set to avoid forcing memory usage on users.

# Keep model loaded for 10 minutes
model = AIFactory.create_language("ollama", "llama3.1", config={"keep_alive": "10m"})

# Unload immediately after use
model = AIFactory.create_language("ollama", "llama3.1", config={"keep_alive": "0"})

Both parameters work with LangChain integration via to_langchain().

Full Changelog

See CHANGELOG.md for details.

Full Changelog: v2.15.0...v2.16.0