An MCP server implementation in Python providing image recognition capabilities using various LLM providers (Gemini, OpenAI, Qwen/Tongyi, Doubao, etc.).
- Image Recognition: Describe images or answer questions about them.
- Multi-Model Support: Dynamically switch between Gemini, GPT-4o, Qwen-VL, Doubao, etc.
- Flexible: Accepts image URLs or Base64 data.
We provide automated scripts to set up the environment and dependencies in one click.
git clone https://github.com/glasses666/mcp-image-recognition-py.git
cd mcp-image-recognition-py
./setup.sh- Clone or download this repository.
- Double-click
setup.bat.
After the script finishes, simply edit the .env file with your API keys.
If you prefer manual installation or want to use uv:
- Python 3.10 or higher
- An API Key for your preferred model provider (Google Gemini, OpenAI, Aliyun DashScope, etc.)
uv is an extremely fast Python package manager.
You don't need to manually create a virtual environment.
# Clone the repo
git clone https://github.com/glasses666/mcp-image-recognition-py.git
cd mcp-image-recognition-py
# Create .env file with your API keys
cp .env.example .env
# Edit .env with your keys
# Run the server
uv run server.pyIf you want to run it without cloning the repo explicitly (experimental support via git):
# Note: You still need to provide environment variables.
# It's easier to clone and use 'uv run' for persistent config via .env
uvx --from git+https://github.com/glasses666/mcp-image-recognition-py mcp-image-recognition-
Clone and Setup:
git clone https://github.com/glasses666/mcp-image-recognition-py.git cd mcp-image-recognition-py python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
-
Configure:
cp .env.example .env # Edit .env and add your API keys -
Run:
python server.py
-
Clone and Setup:
git clone https://github.com/glasses666/mcp-image-recognition-py.git cd mcp-image-recognition-py python -m venv venv .\venv\Scripts\activate pip install -r requirements.txt
-
Configure:
copy .env.example .env # Edit .env and add your API keys -
Run:
python server.py
Create a .env file in the project root based on .env.example:
Get an API key from Google AI Studio.
GEMINI_API_KEY=your_google_api_key
DEFAULT_MODEL=gemini-1.5-flashGet an API key from Aliyun DashScope.
OPENAI_API_KEY=your_dashscope_api_key
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DEFAULT_MODEL=qwen-vl-maxGet an API key from Volcengine Ark.
OPENAI_API_KEY=your_volcengine_api_key
OPENAI_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
DEFAULT_MODEL=doubao-pro-32kTo use this server with an MCP client (like Claude Desktop), add it to your configuration file.
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json(if available)
Option A: Using uv (Easiest)
If you have uv installed, you can let it handle the environment.
{
"mcpServers": {
"image-recognition": {
"command": "/path/to/uv",
"args": [
"run",
"--directory",
"/absolute/path/to/mcp-image-recognition-py",
"server.py"
],
"env": {
"GEMINI_API_KEY": "your_gemini_key_here",
"OPENAI_API_KEY": "your_openai_key_here",
"OPENAI_BASE_URL": "https://api.openai.com/v1",
"DEFAULT_MODEL": "gemini-1.5-flash"
}
}
}
}Option B: Standard Python Venv Ensure you provide the absolute path to the python executable in your virtual environment.
{
"mcpServers": {
"image-recognition": {
"command": "/absolute/path/to/mcp-image-recognition-py/venv/bin/python",
"args": [
"/absolute/path/to/mcp-image-recognition-py/server.py"
],
"env": {
"GEMINI_API_KEY": "your_gemini_key_here",
"OPENAI_API_KEY": "your_openai_key_here",
"OPENAI_BASE_URL": "https://api.openai.com/v1",
"DEFAULT_MODEL": "gemini-1.5-flash"
}
}
}
}Windows Note: For paths, use double backslashes \\ (e.g., C:\\Users\\Name\\...).
Analyzes an image and returns a text description.
Parameters:
image(string, required): The image to analyze. Supports:- HTTP/HTTPS URLs (e.g.,
https://example.com/cat.jpg) - Base64 encoded strings (with or without
data:image/...;base64,prefix)
- HTTP/HTTPS URLs (e.g.,
prompt(string, optional): Specific instruction. Default: "Describe this image".model(string, optional): Override the default model for this specific request.
MIT