Supercharge your AI assistant with local LLM access
A Python implementation of the MCP (Model Context Protocol) server that exposes Ollama SDK functionality as MCP tools, enabling seamless integration between your local LLM models and MCP-compatible applications like Windsurf and VS Code.
This is a Python port of the TypeScript ollama-mcp project.
Features β’ Installation β’ Available Tools β’ Configuration β’ Windsurf Integration β’ Usage β’ Development
Type in the chat window:
-
MCP Tool: ollama / ollama_chat. Use model llava and tell me a bed time story
-
MCP Tool: ollama / ollama_chat. Use model gpt-oss and tell me a bed time story
- βοΈ Ollama Cloud Support - Full integration with Ollama's cloud platform
- π§ 8 Comprehensive Tools - Full access to Ollama's SDK functionality
- π Hot-Swap Architecture - Automatic tool discovery with zero-config
- π― Type-Safe - Built with Pydantic models and type hints
- π High Test Coverage - Comprehensive test suite (planned)
- π Minimal Dependencies - Lightweight and fast
- π Drop-in Integration - Works with Windsurf, VS Code, and other MCP clients
- π Web Search & Fetch - Real-time web search and content extraction via Ollama Cloud (planned)
- π Hybrid Mode - Use local and cloud models seamlessly in one server
This Python implementation provides the same functionality as the TypeScript version but with:
- Python Native: No Node.js dependencies required
- Poetry Package Management: Modern Python dependency management
- Async/Await: Native Python async support
- Pydantic Models: Robust data validation and type safety
- Poetry Scripts: Easy installation and execution
- Python 3.10+
- Poetry (for development)
- Ollama running locally
# Clone the repository
git clone <repository-url>
cd mcp-ollama-python
# Install dependencies
py -m poetry install
# Run the server, run only if you wish to test using scripts, otherwise integration with Windsurf or VS Code will take care of it.
py -m poetry run mcp-ollama-python# Install Poetry if you don't have it
curl -sSL https://install.python-poetry.org | python3 -
# Clone and install
git clone <repository-url>
cd mcp-ollama-python
poetry install
# Run the server, run only if you wish to test using scripts, otherwise integration with Windsurf or VS Code will take care of it.
poetry run mcp-ollama-pythonπ οΈ Generate a Windows executable if you specifically need it; otherwise, this step can be skipped.
poetry run pyinstaller mcp-ollama-python.spec --clean --distpath bin
| Tool | Description |
|---|---|
ollama_list |
List all available local models |
ollama_show |
Get detailed information about a specific model |
ollama_pull |
Download models from Ollama library |
ollama_delete |
Remove models from local storage |
| Tool | Description |
|---|---|
ollama_ps |
List currently running models |
ollama_generate |
Generate text completions |
ollama_chat |
Interactive chat with models (supports tools/functions) |
ollama_embed |
Generate embeddings for text |
| Tool | Description |
|---|---|
ollama_web_search |
Search the web with customizable result limits |
ollama_web_fetch |
Fetch and parse web page content |
| Variable | Default | Description |
|---|---|---|
OLLAMA_HOST |
http://127.0.0.1:11434 |
Ollama server endpoint |
OLLAMA_API_KEY |
- | API key for Ollama Cloud (when implemented) |
export OLLAMA_HOST="http://localhost:11434"
py -m poetry run mcp-ollama-pythonexport OLLAMA_HOST="https://ollama.com"
export OLLAMA_API_KEY="your-ollama-cloud-api-key"
py -m poetry run mcp-ollama-pythonThe server exposes local Ollama models through MCP. Configure available models in mcp.json:
mcp-ollama-python/mcp.json
{
"capabilities": {
"models": [
{
"name": "gpt-oss",
"provider": "ollama",
"description": "Local Ollama GPT-OSS model served through MCP",
"maxTokens": 4096
}
]
}
}Model Configuration Options:
name: Model identifier used by MCP clientsprovider: Always "ollama" for this serverdescription: Human-readable model descriptionmaxTokens: Maximum context window size
You can add multiple models to expose different Ollama models through MCP:
{
"capabilities": {
"models": [
{
"name": "gpt-oss",
"provider": "ollama",
"description": "Local Ollama GPT-OSS model",
"maxTokens": 4096
},
{
"name": "llama3.2",
"provider": "ollama",
"description": "Llama 3.2 model for general tasks",
"maxTokens": 8192
},
{
"name": "codellama",
"provider": "ollama",
"description": "Code Llama for programming tasks",
"maxTokens": 16384
}
]
}
}Windsurf is an AI-powered code editor that supports MCP servers. This section provides complete setup instructions for integrating the Ollama MCP server with Windsurf.
Add the Ollama MCP server to your Windsurf MCP configuration:
%USERPROFILE%\.codeium\windsurf\mcp_config.json (Windows)
~/.codeium/windsurf/mcp_config.json (macOS/Linux)
{
"mcpServers": {
"ollama": {
"command": "py",
"args": ["-m", "mcp_ollama_python"],
"disabled": false,
"env": {}
},
"git": {
"command": "py",
"args": ["-m", "mcp_server_git"],
"disabled": true,
"env": {}
}
}
}Windsurf Tools setup file: ** .windsurf\workflows\tools.md
---
description: Quick reference for Windsurf MCP tools (mcp-ollama)
auto_execution_mode: 2
---
# MCP Tools (mcp-ollama)
Available tools exposed by the local `mcp-ollama-python` server:
- **ollama_chat** β Interactive chat with models (multi-turn, tool-calling, structured outputs)
- **ollama_list** β List installed models
- **ollama_show** β Show details for a specific model
- **ollama_generate** β Single-prompt text generation
- **ollama_pull** β Pull a model from a registry
- **ollama_delete** β Delete a local model
- **ollama_ps** β List running models
- **ollama_embed** β Create embeddings for input text
- **ollama_execute** β Execute a system command via the server (utility/test)
## How to list tools in Windsurf
1) Open the command palette and run: `MCP: List Tools`
2) Or run the MCP tool via the chat with: `/tools`
## Notes
- Server: local Ollama via `mcp-ollama-python`
- Formats: most tools accept `format` = `json` (default) or `markdown`
Configuration Options:
command: Python interpreter command (py,python, orpython3)args: Module execution argumentsdisabled: Set tofalseto enable the serverenv: Environment variables (e.g.,OLLAMA_HOST)
Alternative Configuration (with Poetry):
{
"mcpServers": {
"ollama": {
"command": "py",
"args": ["-m", "poetry", "run", "mcp-ollama-python"],
"cwd": "d:/path/to/mcp-ollama-python",
"disabled": false,
"env": {}
}
}
}Set Windsurf to prefer your local MCP server over cloud models:
%USERPROFILE%\.codeium\windsurf\settings.json (Windows)
~/.codeium/windsurf/settings.json (macOS/Linux)
{
"defaultModelBehavior": "prefer-mcp",
"preferredMcpModel": {
"server": "ollama",
"model": "gpt-oss"
}
}Settings Explanation:
defaultModelBehavior: Set to"prefer-mcp"to prioritize MCP modelspreferredMcpModel.server: Name of the MCP server (must matchmcp_config.json)preferredMcpModel.model: Model name from yourmcp.jsonconfiguration
Create custom instructions to ensure Windsurf uses your local model:
%USERPROFILE%\.codeium\windsurf\instructions.md (Windows)
~/.codeium/windsurf/instructions.md (macOS/Linux)
Always use my local MCP server named "ollama" with the model "gpt-oss" for all reasoning, coding, and problem-solving tasks unless I explicitly request another model.
Prefer the MCP server over any cloud or paid model.- Restart Windsurf to load the new configuration (Ctrl-Shift; Search for "Developer: Reload Window"; Then hit Enter)
- Check MCP Status: Look for the Ollama MCP server in Windsurf's MCP panel
- Test Connection: Try a simple query to verify the model responds
- Check Logs: Review Windsurf logs if connection issues occur
Server Not Appearing:
- Verify
mcp_config.jsonsyntax is valid JSON - Ensure
disabledis set tofalse - Check that Python and the module are in your PATH
- Restart Windsurf completely
Model Not Available:
- Verify the model name in
settings.jsonmatchesmcp.json - Ensure Ollama is running (
ollama serve) - Check that the model is pulled (
ollama pull gpt-oss)
Connection Errors:
- Verify
OLLAMA_HOSTenvironment variable if using custom host - Check Ollama server logs for errors
- Ensure no firewall blocking localhost connections
Add to your VS Code MCP settings:
{
"mcpServers": {
"ollama": {
"command": "py",
"args": ["-m", "mcp_ollama_python"],
"disabled": false
}
}
}# MCP clients can invoke:
{
"tool": "ollama_chat",
"arguments": {
"model": "llama3.2:latest",
"messages": [
{ "role": "user", "content": "Explain quantum computing" }
]
}
}{
"tool": "ollama_embed",
"arguments": {
"model": "nomic-embed-text",
"input": ["Hello world", "Embeddings are great"]
}
}This server uses a hot-swap autoloader pattern:
src/
βββ main.py # Entry point (82 lines)
βββ server.py # MCP server creation
βββ autoloader.py # Dynamic tool discovery
βββ ollama_client.py # Ollama HTTP client
βββ types.py # Type definitions
βββ response_formatter.py # Response formatting
βββ tools/ # Tool implementations
βββ chat.py # Each exports tool_definition
βββ generate.py
βββ ...
Key Benefits:
- Add new tools by dropping files in
src/tools/ - Zero server code changes required
- Each tool is independently testable
- 100% function coverage on all tools (planned)
# Clone repository
git clone <repository-url>
cd mcp-ollama-python
# Install dependencies
py -m poetry install
# Run in development mode
py -m poetry run mcp-ollama-python
# Run tests (when implemented)
py -m poetry run pytest- Create
src/tools/your_tool.py:
from typing import Dict, Any
from ..types import ToolDefinition, ResponseFormat, OllamaClient
from ..response_formatter import format_response
async def your_tool_handler(
ollama: OllamaClient, args: Dict[str, Any], format: ResponseFormat
) -> str:
"""Your tool implementation"""
# Implementation here
return format_response({"result": "success"}, format)
# Tool definition
tool_definition = ToolDefinition(
name="ollama_your_tool",
description="Your tool description",
input_schema={
"type": "object",
"properties": {
"param": {"type": "string"}
},
"required": ["param"]
},
)- Done! The autoloader discovers it automatically.
Ask codeium to use local MCP Ollama with gpt-oss model for LLM tasks, it will create a memory of this preference, which looks like this:
Always use local MCP Ollama with gpt-oss model for LLM tasks #llm_preference #mcp_ollama #gpt-oss #workflow #user_preference
Edit When the user requests LLM-based tasks (explanations, code generation, analysis, etc.), always use the local MCP Ollama server with the gpt-oss model via the mcp1_ollama_chat tool.
Important guidelines:
Use mcp1_ollama_chat tool with model parameter set to "gpt-oss" Format responses in markdown for better readability (format: "markdown") Communicate with the model in English unless the user explicitly requests another language Do NOT create separate Python scripts to interact with Ollama - use the MCP tools directly The local Ollama server runs at http://127.0.0.1:11434 (default OLLAMA_HOST) Example usage:
mcp1_ollama_chat( model="gpt-oss", messages=[{"role": "user", "content": "Your prompt here"}], format="markdown" ) This applies to tasks like:
Code explanations Documentation generation Technical analysis Q&A about code or concepts Any other LLM-powered assistance Project context: mcp-ollama-python (d:\Pdj\gitHub\mcp-ollama-python)
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Write tests - We maintain comprehensive test coverage
- Commit with clear messages (
git commit -m 'Add amazing feature') - Push to your branch (
git push origin feature/amazing-feature) - Open a Pull Request
- All new tools must export
tool_definition - Maintain comprehensive test coverage
- Follow existing Python patterns
- Use Pydantic schemas for input validation
This project is licensed under the MIT License.
See LICENSE for details.
- ollama-mcp (TypeScript) - Original TypeScript implementation
- Ollama - Get up and running with large language models locally
- Model Context Protocol - Open standard for AI assistant integration
- Windsurf - AI-powered code editor with MCP support
- Cline - VS Code AI assistant
Made with β€οΈ using Python, Poetry, and Ollama