The unified Python toolkit for accessing any LLM through Ollama.
Zero API keys. Zero signup. Completely free.
AI Cortex gives you a single, clean Python interface to hundreds of language models β Llama, Mistral, Gemma, DeepSeek, Qwen, and more β all served through Ollama. No accounts. No credit cards. No rate limits.
from aicortex import chat
response = chat("Explain neural networks like I'm five.")
print(response)| Feature | What it means for you |
|---|---|
| π 100% Free | No API keys, no billing, no subscriptions β ever |
| π€ Any Model | Llama, Mistral, Gemma, DeepSeek, Qwen, and more |
| π Any Server | Local Ollama, remote servers, or community endpoints |
| β‘ Streaming | Real-time token streaming for responsive UIs |
| π OpenAI-Compatible | Drop-in replacement for OpenAI client apps |
| π‘οΈ Type-Safe | Full type hints, stubs, and IDE autocomplete |
| π§ Production Ready | Automatic failover, multi-server routing, error handling |
| π¦ Lightweight | One dependency (ollama) for the core package |
# Core package
pip install aicortex-core
# With OpenAI-compatible server support
pip install aicortex-core[server]from aicortex import chat
# Simple response
response = chat("What is the speed of light?")
print(response)
# Custom model and parameters
response = chat(
"Write a Python function to reverse a string.",
model="llama3.2:3b",
temperature=0.2,
max_tokens=200,
)
print(response)from aicortex import chat
stream = chat("Write a haiku about AI.", stream=True)
for event in stream:
if event.type == "token":
print(event.content, end="", flush=True)from aicortex import families, models, get_model_info
# Available families
print(families()) # ['llama', 'mistral', 'gemma', 'deepseek', 'qwen']
# Models in a family
print(models("mistral"))
# Full metadata for a model
info = get_model_info("llama3.2:3b")
print(info['parameter_size'], info['quantization_level'])from aicortex import list_model_servers, get_server_info, get_llm_params
# All Ollama servers hosting a model
servers = list_model_servers("llama3.2:3b")
for s in servers:
print(f"{s['url']} β {s['location']['city']}, {s['location']['country']}")
# Ready-to-use params for LangChain's OllamaLLM
params = get_llm_params("mistral:7b")
# β {'model': 'mistral:7b', 'base_url': 'http://...'}Run a local proxy that speaks OpenAI's API β drop-in compatible with any OpenAI client:
from aicortex.tools import run_server
run_server(host="127.0.0.1", port=8000, default_model="llama3.2:3b")# Use with curl
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3.2:3b", "messages": [{"role": "user", "content": "Hello!"}]}'
# Use with the openai Python SDK β just change the base_url
from openai import OpenAI
client = OpenAI(api_key="none", base_url="http://localhost:8000/v1")
response = client.chat.completions.create(
model="llama3.2:3b",
messages=[{"role": "user", "content": "Hello!"}]
)Keep the bundled model database fresh with the four-step pipeline:
from pathlib import Path
from aicortex.tools import (
find_valid_endpoints, # Step 1: ping all known IPs
fetch_models, # Step 2: pull model lists
resolve_models, # Step 3: merge with IP metadata
apply_valid_models, # Step 4: write into family JSONs
)
json_dir = Path("aicortex/models")
valid_urls = find_valid_endpoints(json_dir) # Step 1
fetch_models(Path("valid.txt"), Path("fetched.json")) # Step 2
resolve_models(Path("fetched.json"), json_dir, Path("resolved.json")) # Step 3
apply_valid_models(Path("resolved.json"), json_dir, backup=True) # Step 4Contributions are welcome! See CONTRIBUTING.md and the Development Guide.
GNU Lesser General Public License v3.0 β free for open-source and commercial use.