Skip to content

Ollama MCP Server, let's you chat with your local LLM running on your local OLLAMA server.

License

Notifications You must be signed in to change notification settings

pblagoje/mcp-ollama-python

Repository files navigation

πŸ¦™ Ollama MCP Server (Python)

Supercharge your AI assistant with local LLM access

Python Poetry MCP

A Python implementation of the MCP (Model Context Protocol) server that exposes Ollama SDK functionality as MCP tools, enabling seamless integration between your local LLM models and MCP-compatible applications like Windsurf and VS Code.

This is a Python port of the TypeScript ollama-mcp project.

Features β€’ Installation β€’ Available Tools β€’ Configuration β€’ Windsurf Integration β€’ Usage β€’ Development


Example of usage

Type in the chat window:

  • MCP Tool: ollama / ollama_chat. Use model llava and tell me a bed time story

  • MCP Tool: ollama / ollama_chat. Use model gpt-oss and tell me a bed time story


✨ Features

  • ☁️ Ollama Cloud Support - Full integration with Ollama's cloud platform
  • πŸ”§ 8 Comprehensive Tools - Full access to Ollama's SDK functionality
  • πŸ”„ Hot-Swap Architecture - Automatic tool discovery with zero-config
  • 🎯 Type-Safe - Built with Pydantic models and type hints
  • πŸ“Š High Test Coverage - Comprehensive test suite (planned)
  • πŸš€ Minimal Dependencies - Lightweight and fast
  • πŸ”Œ Drop-in Integration - Works with Windsurf, VS Code, and other MCP clients
  • 🌐 Web Search & Fetch - Real-time web search and content extraction via Ollama Cloud (planned)
  • πŸ”€ Hybrid Mode - Use local and cloud models seamlessly in one server

πŸ’‘ Why Python?

This Python implementation provides the same functionality as the TypeScript version but with:

  • Python Native: No Node.js dependencies required
  • Poetry Package Management: Modern Python dependency management
  • Async/Await: Native Python async support
  • Pydantic Models: Robust data validation and type safety
  • Poetry Scripts: Easy installation and execution

πŸ“¦ Installation

Prerequisites

  • Python 3.10+
  • Poetry (for development)
  • Ollama running locally

Quick Install with Poetry

# Clone the repository
git clone <repository-url>
cd mcp-ollama-python

# Install dependencies
py -m poetry install

# Run the server, run only if you wish to test using scripts, otherwise integration with Windsurf or VS Code will take care of it.
py -m poetry run mcp-ollama-python

Manual Installation

# Install Poetry if you don't have it
curl -sSL https://install.python-poetry.org | python3 -

# Clone and install
git clone <repository-url>
cd mcp-ollama-python
poetry install

# Run the server, run only if you wish to test using scripts, otherwise integration with Windsurf or VS Code will take care of it.
poetry run mcp-ollama-python

πŸ› οΈ Generate a Windows executable if you specifically need it; otherwise, this step can be skipped.

poetry run pyinstaller mcp-ollama-python.spec --clean --distpath bin

πŸ› οΈ Available Tools

Model Management

Tool Description
ollama_list List all available local models
ollama_show Get detailed information about a specific model
ollama_pull Download models from Ollama library
ollama_delete Remove models from local storage

Model Operations

Tool Description
ollama_ps List currently running models
ollama_generate Generate text completions
ollama_chat Interactive chat with models (supports tools/functions)
ollama_embed Generate embeddings for text

Web Tools (Ollama Cloud - Planned)

Tool Description
ollama_web_search Search the web with customizable result limits
ollama_web_fetch Fetch and parse web page content

βš™οΈ Configuration

Environment Variables

Variable Default Description
OLLAMA_HOST http://127.0.0.1:11434 Ollama server endpoint
OLLAMA_API_KEY - API key for Ollama Cloud (when implemented)

Custom Ollama Host

export OLLAMA_HOST="http://localhost:11434"
py -m poetry run mcp-ollama-python

Ollama Cloud Configuration (Planned)

export OLLAMA_HOST="https://ollama.com"
export OLLAMA_API_KEY="your-ollama-cloud-api-key"
py -m poetry run mcp-ollama-python

MCP Model Configuration

The server exposes local Ollama models through MCP. Configure available models in mcp.json:

mcp-ollama-python/mcp.json

{
  "capabilities": {
    "models": [
      {
        "name": "gpt-oss",
        "provider": "ollama",
        "description": "Local Ollama GPT-OSS model served through MCP",
        "maxTokens": 4096
      }
    ]
  }
}

Model Configuration Options:

  • name: Model identifier used by MCP clients
  • provider: Always "ollama" for this server
  • description: Human-readable model description
  • maxTokens: Maximum context window size

You can add multiple models to expose different Ollama models through MCP:

{
  "capabilities": {
    "models": [
      {
        "name": "gpt-oss",
        "provider": "ollama",
        "description": "Local Ollama GPT-OSS model",
        "maxTokens": 4096
      },
      {
        "name": "llama3.2",
        "provider": "ollama",
        "description": "Llama 3.2 model for general tasks",
        "maxTokens": 8192
      },
      {
        "name": "codellama",
        "provider": "ollama",
        "description": "Code Llama for programming tasks",
        "maxTokens": 16384
      }
    ]
  }
}

🌊 Windsurf Integration

Windsurf is an AI-powered code editor that supports MCP servers. This section provides complete setup instructions for integrating the Ollama MCP server with Windsurf.

Step 1: Configure MCP Server

Add the Ollama MCP server to your Windsurf MCP configuration:

%USERPROFILE%\.codeium\windsurf\mcp_config.json (Windows) ~/.codeium/windsurf/mcp_config.json (macOS/Linux)

{
  "mcpServers": {
    "ollama": {
      "command": "py",
      "args": ["-m", "mcp_ollama_python"],
      "disabled": false,
      "env": {}
    },
    "git": {
      "command": "py",
      "args": ["-m", "mcp_server_git"],
      "disabled": true,
      "env": {}
    }
  }
}

Windsurf Tools setup file: ** .windsurf\workflows\tools.md

---
description: Quick reference for Windsurf MCP tools (mcp-ollama)
auto_execution_mode: 2
---

# MCP Tools (mcp-ollama)

Available tools exposed by the local `mcp-ollama-python` server:

- **ollama_chat** – Interactive chat with models (multi-turn, tool-calling, structured outputs)
- **ollama_list** – List installed models
- **ollama_show** – Show details for a specific model
- **ollama_generate** – Single-prompt text generation
- **ollama_pull** – Pull a model from a registry
- **ollama_delete** – Delete a local model
- **ollama_ps** – List running models
- **ollama_embed** – Create embeddings for input text
- **ollama_execute** – Execute a system command via the server (utility/test)

## How to list tools in Windsurf
1) Open the command palette and run: `MCP: List Tools`
2) Or run the MCP tool via the chat with: `/tools`

## Notes
- Server: local Ollama via `mcp-ollama-python`
- Formats: most tools accept `format` = `json` (default) or `markdown`

Configuration Options:

  • command: Python interpreter command (py, python, or python3)
  • args: Module execution arguments
  • disabled: Set to false to enable the server
  • env: Environment variables (e.g., OLLAMA_HOST)

Alternative Configuration (with Poetry):

{
  "mcpServers": {
    "ollama": {
      "command": "py",
      "args": ["-m", "poetry", "run", "mcp-ollama-python"],
      "cwd": "d:/path/to/mcp-ollama-python",
      "disabled": false,
      "env": {}
    }
  }
}

Step 2: Configure Default Model Behavior

Set Windsurf to prefer your local MCP server over cloud models:

%USERPROFILE%\.codeium\windsurf\settings.json (Windows) ~/.codeium/windsurf/settings.json (macOS/Linux)

{
  "defaultModelBehavior": "prefer-mcp",
  "preferredMcpModel": {
    "server": "ollama",
    "model": "gpt-oss"
  }
}

Settings Explanation:

  • defaultModelBehavior: Set to "prefer-mcp" to prioritize MCP models
  • preferredMcpModel.server: Name of the MCP server (must match mcp_config.json)
  • preferredMcpModel.model: Model name from your mcp.json configuration

Step 3: Create Windsurf Instructions

Create custom instructions to ensure Windsurf uses your local model:

%USERPROFILE%\.codeium\windsurf\instructions.md (Windows) ~/.codeium/windsurf/instructions.md (macOS/Linux)

Always use my local MCP server named "ollama" with the model "gpt-oss" for all reasoning, coding, and problem-solving tasks unless I explicitly request another model.

Prefer the MCP server over any cloud or paid model.

Step 4: Verify Installation

  1. Restart Windsurf to load the new configuration (Ctrl-Shift; Search for "Developer: Reload Window"; Then hit Enter)
  2. Check MCP Status: Look for the Ollama MCP server in Windsurf's MCP panel
  3. Test Connection: Try a simple query to verify the model responds
  4. Check Logs: Review Windsurf logs if connection issues occur

Troubleshooting

Server Not Appearing:

  • Verify mcp_config.json syntax is valid JSON
  • Ensure disabled is set to false
  • Check that Python and the module are in your PATH
  • Restart Windsurf completely

Model Not Available:

  • Verify the model name in settings.json matches mcp.json
  • Ensure Ollama is running (ollama serve)
  • Check that the model is pulled (ollama pull gpt-oss)

Connection Errors:

  • Verify OLLAMA_HOST environment variable if using custom host
  • Check Ollama server logs for errors
  • Ensure no firewall blocking localhost connections

🎯 Usage Examples

VS Code Integration

Add to your VS Code MCP settings:

{
  "mcpServers": {
    "ollama": {
      "command": "py",
      "args": ["-m", "mcp_ollama_python"],
      "disabled": false
    }
  }
}

Chat with a Model

# MCP clients can invoke:
{
  "tool": "ollama_chat",
  "arguments": {
    "model": "llama3.2:latest",
    "messages": [
      { "role": "user", "content": "Explain quantum computing" }
    ]
  }
}

Generate Embeddings

{
  "tool": "ollama_embed",
  "arguments": {
    "model": "nomic-embed-text",
    "input": ["Hello world", "Embeddings are great"]
  }
}

πŸ—οΈ Architecture

This server uses a hot-swap autoloader pattern:

src/
β”œβ”€β”€ main.py              # Entry point (82 lines)
β”œβ”€β”€ server.py            # MCP server creation
β”œβ”€β”€ autoloader.py        # Dynamic tool discovery
β”œβ”€β”€ ollama_client.py     # Ollama HTTP client
β”œβ”€β”€ types.py            # Type definitions
β”œβ”€β”€ response_formatter.py # Response formatting
└── tools/               # Tool implementations
    β”œβ”€β”€ chat.py         # Each exports tool_definition
    β”œβ”€β”€ generate.py
    └── ...

Key Benefits:

  • Add new tools by dropping files in src/tools/
  • Zero server code changes required
  • Each tool is independently testable
  • 100% function coverage on all tools (planned)

πŸ§ͺ Development

Setup

# Clone repository
git clone <repository-url>
cd mcp-ollama-python

# Install dependencies
py -m poetry install

# Run in development mode
py -m poetry run mcp-ollama-python

# Run tests (when implemented)
py -m poetry run pytest

Adding a New Tool

  1. Create src/tools/your_tool.py:
from typing import Dict, Any
from ..types import ToolDefinition, ResponseFormat, OllamaClient
from ..response_formatter import format_response

async def your_tool_handler(
    ollama: OllamaClient, args: Dict[str, Any], format: ResponseFormat
) -> str:
    """Your tool implementation"""
    # Implementation here
    return format_response({"result": "success"}, format)

# Tool definition
tool_definition = ToolDefinition(
    name="ollama_your_tool",
    description="Your tool description",
    input_schema={
        "type": "object",
        "properties": {
            "param": {"type": "string"}
        },
        "required": ["param"]
    },
)
  1. Done! The autoloader discovers it automatically.

Note on Windsurf IDE

Ask codeium to use local MCP Ollama with gpt-oss model for LLM tasks, it will create a memory of this preference, which looks like this:

Always use local MCP Ollama with gpt-oss model for LLM tasks #llm_preference #mcp_ollama #gpt-oss #workflow #user_preference

Edit When the user requests LLM-based tasks (explanations, code generation, analysis, etc.), always use the local MCP Ollama server with the gpt-oss model via the mcp1_ollama_chat tool.

Important guidelines:

Use mcp1_ollama_chat tool with model parameter set to "gpt-oss" Format responses in markdown for better readability (format: "markdown") Communicate with the model in English unless the user explicitly requests another language Do NOT create separate Python scripts to interact with Ollama - use the MCP tools directly The local Ollama server runs at http://127.0.0.1:11434 (default OLLAMA_HOST) Example usage:

mcp1_ollama_chat( model="gpt-oss", messages=[{"role": "user", "content": "Your prompt here"}], format="markdown" ) This applies to tasks like:

Code explanations Documentation generation Technical analysis Q&A about code or concepts Any other LLM-powered assistance Project context: mcp-ollama-python (d:\Pdj\gitHub\mcp-ollama-python)

🀝 Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Write tests - We maintain comprehensive test coverage
  4. Commit with clear messages (git commit -m 'Add amazing feature')
  5. Push to your branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

Code Quality Standards

  • All new tools must export tool_definition
  • Maintain comprehensive test coverage
  • Follow existing Python patterns
  • Use Pydantic schemas for input validation

πŸ“„ License

This project is licensed under the MIT License.

See LICENSE for details.

πŸ”— Related Projects


⬆ back to top

Made with ❀️ using Python, Poetry, and Ollama

About

Ollama MCP Server, let's you chat with your local LLM running on your local OLLAMA server.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages