Open Google Image Generator MCP

This project is a Model Context Protocol (MCP) server that exposes Google Cloud Vertex AI and Google GenAI SDK capabilities—Imagen, Gemini, Veo, Lyria, and Chirp models—to MCP-compatible clients. Built with the FastMCP framework.

Current version: 3.0.0 — Full GenAI SDK integration (embed, speech, video analysis, live generation), WebP/AVIF format support, multi-tier model selection, parallel batch generation, sequential pipeline engine, and comprehensive video tools.

Features & Tools

Image Tools

Tool	Description	Backend
`tool_generate_image`	Text-to-image generation. Supports aspect ratio, negative prompt, seed, watermark, GCS output, and WebP/AVIF output	Imagen 4 (`imagen-4.0-fast-generate-001`)
`tool_edit_image`	Mask-based inpaint/outpaint, background swap, product image, and prompt-driven edit. See Edit modes below	Imagen 3 Capability (`imagen-3.0-capability-001`)
`tool_transform_image`	Free-form `image + text → image` transformation: style transfer, scene rewriting, multi-reference composition	Gemini multimodal (`gemini-2.5-flash-image`)
`tool_analyze_image`	Multimodal image understanding and Q&A. Supports `thinking_level` (MINIMAL/LOW/MEDIUM/HIGH) and `media_resolution` (LOW/MEDIUM/HIGH/ULTRA_HIGH)	Gemini Vision (`gemini-2.5-flash`)
`tool_upscale_image`	Upscale low-resolution images	Imagen
`tool_remove_background`	Remove background via `EDIT_MODE_BGSWAP`	Imagen
`tool_batch_generate`	Parallel batch text-to-image generation (up to 10 prompts, max 4 concurrent). `balanced` tier not supported for batch	Imagen
`tool_run_pipeline`	Sequential multi-step image processing pipeline (generate → edit → transform → …)	Mixed

Video Tools

Tool	Description	Backend
`tool_generate_video`	Text-to-video generation. Supports `audio_enabled` for Veo 3+	Veo (`veo-3.1-fast-generate-001`)
`tool_image_to_video`	Animate a still image into video. Supports optional `last_frame_path` for first+last frame mode	Veo
`tool_extend_video`	Extend an existing video clip by 4, 6, or 8 seconds	Veo
`tool_video_object_edit`	Insert or remove an object in a video via `operation` (`insert`/`remove`) and `prompt`	Veo
`tool_analyze_video`	Video understanding and Q&A (max 20MB; mp4, mov, avi, webm, mkv)	Gemini GenAI SDK

Audio Tools

Tool	Description	Backend
`tool_generate_speech`	Text-to-speech with voice selection (Aoede, Charon, Fenrir, Kore, Puck). Supports `model_tier` (fast/quality). Outputs WAV	Gemini TTS (`gemini-2.5-flash-preview-tts` / `gemini-2.5-pro-preview-tts`)
`tool_generate_music`	Music generation from a text prompt	Lyria 2 / Lyria 3 (GenAI SDK)

GenAI SDK Tools

Tool	Description	Backend
`tool_embed`	Text embeddings as float vectors	Gemini Embedding (`text-embedding-004` on Vertex AI, `gemini-embedding-2` on Gemini API)
`tool_live_generate`	Streaming text generation — response is accumulated and returned in full	Gemini Live (`gemini-2.5-flash` / `gemini-3.1-pro`)

Utility Tools

Tool	Description
`tool_list_available_models`	Live-probes every candidate model in the configured project/location and returns only those that respond (200/400 = reachable, 404 = excluded). Cached for the server process lifetime; pass `force_refresh=true` to rescan. Also reports available update versions.
`tool_upload_file`	Register a local file for use as a reference image in subsequent tool calls (e.g. `tool_transform_image`). Returns a `file_uri`.

Edit modes (`tool_edit_image`)

`edit_mode`	What it does	Mask required?
`EDIT_MODE_DEFAULT` (default)	Prompt-driven full-image edit, no mask	No
`EDIT_MODE_INPAINT_INSERTION`	Add an object into the masked region	Yes
`EDIT_MODE_INPAINT_REMOVAL`	Remove content in the masked region	Yes
`EDIT_MODE_OUTPAINT`	Extend the image beyond its original bounds	Yes
`EDIT_MODE_BGSWAP`	Swap the background	No
`EDIT_MODE_PRODUCT_IMAGE`	Product reference styling	No

Use imagen-3.0-capability-001 (default) for all of the above. The legacy imagen-3.0-generate-002 only supports EDIT_MODE_DEFAULT and does not accept a mask.

When to use which "image + text → image" tool

Need	Use
Mask-based inpaint/outpaint/BG-swap with pixel precision	`tool_edit_image` (Imagen Capability)
"Make it look like X" / style transfer / scene rewriting / multi-reference compositions	`tool_transform_image` (Gemini multimodal)

Model tiers

Most tools accept a model_tier parameter:

Tier	Description
`fast` (default)	Lowest latency, lowest cost
`balanced`	Quality / speed trade-off; routes to Gemini for image generation. Not supported for `tool_batch_generate`
`quality`	Higher quality, moderate latency
`ultra`	Maximum quality (Imagen 4 Ultra / Veo quality models)

Model resolution by tier and tool

Tier	`tool_generate_image`	`tool_transform_image`	`tool_generate_video`
`fast`	`imagen-4.0-fast-generate-001`	`gemini-2.5-flash-image`	`veo-3.1-fast-generate-001`
`balanced`	`gemini-2.5-flash-image`	`gemini-2.5-flash-image`	`veo-3.1-fast-generate-001`
`quality`	`imagen-4.0-generate-001`	`gemini-2.5-pro-image`	`veo-3.1-generate-001`
`ultra`	`imagen-4.0-ultra-generate-001`	`gemini-2.5-pro-image`	`veo-3.1-generate-001`

Output formats

tool_generate_image, tool_edit_image, tool_transform_image, and tool_upscale_image accept a save_format / output_format parameter:

Format	Notes
`PNG` (default)	Lossless
`JPEG`	Smaller files, lossy. `compression_quality` (0-100, default 85) applies only to JPEG
`WEBP`	Modern lossless/lossy, wide browser support
`AVIF`	Best compression, requires `Pillow>=10`

Error handling

All tools return a uniform error shape:

{
  "success": false,
  "error": {
    "code": 404,
    "model": "gemini-9.9-nonexistent",
    "endpoint": ":generateContent",
    "message": "Publisher Model `...` is not found.",
    "hint": "Model '...' not found in project '...' / location '...'. Try: gemini-2.5-flash-image.",
    "docs_url": "https://docs.cloud.google.com/...",
    "log_path": ".../logs/vertex_ai_mcp.log",
    "duration_s": 0.42
  }
}

HTTP code	What you'll see in `error.hint`
400	Vertex's parameter-validation message verbatim
401	"Run `gcloud auth application-default login` and retry."
403	IAM role hint (`roles/aiplatform.user`) + Vertex AI API enablement check
404	Live alternatives from the probe cache (`tool_list_available_models`)
429	`Retry after N` (from `Retry-After` header) + quota-increase pointer
500/502/503/504	"Safe to retry once"
`TIMEOUT`	After 90s — suggests a `-fast-` variant
`VALIDATION`	Client-side validation failure (mask missing, file not found, etc.); no HTTP call is made

Full request/response logs are written to logs/vertex_ai_mcp.log.

Resources & Prompts

Local Resources (local://outputs/{filename}): Generated and processed media files are exposed as MCP resources for seamless display in MCP clients (Claude Desktop, Cursor, etc.).
Pre-built Prompts: Includes specialized prompt templates for character_design, logo_concept, and UI_UX_mockup.

Prerequisites & Resources

Python 3.9 or newer
Google Cloud Account with an active project
Vertex AI API enabled in your project
Google Cloud CLI (gcloud) installed and configured

For GenAI SDK tools (tool_embed, tool_analyze_video, tool_generate_speech, tool_live_generate, tool_generate_music), you additionally need either:

A Gemini API key (GOOGLE_GENAI_API_KEY), or
Vertex AI ADC credentials with GOOGLE_GENAI_BACKEND=vertexai

Installation & Setup

Option A: Install from PyPI

pip install open-google-image-generator-mcp

Option B: Clone the Repository

git clone https://github.com/miracorhan/OpenGoogleImageGeneratorMCP.git
cd OpenGoogleImageGeneratorMCP
pip install -r requirements.txt

Authentication (Critical Step)

The server uses Google Cloud Application Default Credentials (ADC):

gcloud auth application-default login

This opens a browser for login. Use an account with access to your Google Cloud project.

Environment Configuration

Create a .env file in the project root:

# Required
GOOGLE_CLOUD_PROJECT=your-google-cloud-project-id
GOOGLE_CLOUD_LOCATION=us-central1

# Output directory for generated media
DEFAULT_OUTPUT_DIR=./outputs

# --- GenAI SDK (for embed, speech, live, music, video-analysis tools) ---
# Option A: Gemini API key (free tier available)
GOOGLE_GENAI_API_KEY=AIza...

# Option B: Use Vertex AI backend (uses ADC above, no separate key needed)
GOOGLE_GENAI_BACKEND=vertexai

# --- Advanced Vertex AI Authentication (Optional) ---
# Direct OAuth 2.0 Access Token
# GOOGLE_ACCESS_TOKEN=ya29.a0AfB_by...

# Service Account Impersonation
# IMPERSONATE_SERVICE_ACCOUNT=your-service-account@your-project.iam.gserviceaccount.com

Usage

Running as a Standalone Script

python mcp_server.py

Integrating with MCP Clients

For Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "OpenGoogleImageGenerator": {
      "command": "python",
      "args": ["/absolute/path/to/OpenGoogleImageGeneratorMCP/mcp_server.py"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-google-cloud-project-id",
        "GOOGLE_CLOUD_LOCATION": "us-central1",
        "GOOGLE_GENAI_API_KEY": "AIza..."
      }
    }
  }
}

Replace /absolute/path/to/your/... with the actual path, and use the correct Python executable if using a virtual environment.

Example prompts

"Generate an image of a futuristic city at sunset."
"Edit this banner — add a glowing cyan halo around the logo." (tool_edit_image, EDIT_MODE_DEFAULT)
"Transform this photo into a hand-drawn pencil sketch." (tool_transform_image)
"Remove the background from the image I just generated."
"Analyze this image and tell me what objects are present."
"Generate 8 product shots in parallel with different backgrounds." (tool_batch_generate)
"Run a pipeline: generate → remove background → upscale." (tool_run_pipeline)
"Convert this text to speech using the Kore voice." (tool_generate_speech)
"Generate a 30-second ambient music track." (tool_generate_music)
"Embed this sentence for semantic search." (tool_embed)
"Animate this product photo into a 5-second video." (tool_image_to_video)
"Generate a video of a sunset with audio." (tool_generate_video, audio_enabled=true)

Author & License

Developer: Mirac Orhan (mirac.orhan@gmail.com)
License: MIT License (Open Source — Free for everyone to use, modify, and distribute)

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
docs/superpowers		docs/superpowers
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
discovery.py		discovery.py
format_converter.py		format_converter.py
genai_tools.py		genai_tools.py
mcp_server.py		mcp_server.py
model_registry.py		model_registry.py
pipeline.py		pipeline.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
update.py		update.py
vertex_ai_tools.py		vertex_ai_tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Google Image Generator MCP

Features & Tools

Image Tools

Video Tools

Audio Tools

GenAI SDK Tools

Utility Tools

Edit modes (`tool_edit_image`)

When to use which "image + text → image" tool

Model tiers

Model resolution by tier and tool

Output formats

Error handling

Resources & Prompts

Prerequisites & Resources

Installation & Setup

Option A: Install from PyPI

Option B: Clone the Repository

Authentication (Critical Step)

Environment Configuration

Usage

Running as a Standalone Script

Integrating with MCP Clients

Example prompts

Author & License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Open Google Image Generator MCP

Features & Tools

Image Tools

Video Tools

Audio Tools

GenAI SDK Tools

Utility Tools

Edit modes (tool_edit_image)

When to use which "image + text → image" tool

Model tiers

Model resolution by tier and tool

Output formats

Error handling

Resources & Prompts

Prerequisites & Resources

Installation & Setup

Option A: Install from PyPI

Option B: Clone the Repository

Authentication (Critical Step)

Environment Configuration

Usage

Running as a Standalone Script

Integrating with MCP Clients

Example prompts

Author & License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Edit modes (`tool_edit_image`)

Packages