Release v2.16.0 · lfnovo/esperanto

What's New

Ollama Context Window Configuration

Added num_ctx support for Ollama provider with a sensible default of 128,000 tokens. This fixes an issue where Ollama's default context window (2,048 tokens) was causing context truncation with large documents.

# Uses default of 128,000 tokens
model = AIFactory.create_language("ollama", "llama3.1")

# Or customize as needed
model = AIFactory.create_language("ollama", "llama3.1", config={"num_ctx": 32768})

Ollama Keep Alive Configuration

Added keep_alive support to control how long models stay loaded in memory. No default is set to avoid forcing memory usage on users.

# Keep model loaded for 10 minutes
model = AIFactory.create_language("ollama", "llama3.1", config={"keep_alive": "10m"})

# Unload immediately after use
model = AIFactory.create_language("ollama", "llama3.1", config={"keep_alive": "0"})

Both parameters work with LangChain integration via to_langchain().

Full Changelog

See CHANGELOG.md for details.

Full Changelog: v2.15.0...v2.16.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.16.0

Choose a tag to compare

Sorry, something went wrong.