Unified image generation skill that dispatches to the right Cloubic-hosted model — OpenAI gpt-image-2 or Gemini gemini-3-pro-image-preview / gemini-3.1-flash-image-preview — through one parameterised script. Handles text-to-image and image-to-image. Auto-resizes oversized reference images so Cloudflare doesn't 524 on you.
Built as the image-generation backend for skill ecosystems (e.g. lovart-skills) — callers produce a Brief, image-gen turns it into a file on disk.
git clone https://github.com/motiful/image-gen.git
cd image-gen
bash scripts/setup.shsetup.sh checks python3 + pip, installs requests / Pillow / playwright (+ chromium), and reminds you to set the API key. Idempotent.
Pick one — CLOUBIC_API_KEY env var wins if both exist:
export CLOUBIC_API_KEY='sk-xxxx'
# or
echo 'sk-xxxx' > ~/.cloubic_api_key && chmod 600 ~/.cloubic_api_keyKey never reaches logs or stdout — only the Authorization header.
# text-to-image — defaults to gemini-3-pro-image-preview
python3 scripts/generate.py --prompt "a samurai cat in neon Tokyo" --aspect-ratio 16:9
# image-to-image with multiple references
python3 scripts/generate.py \
--model gpt-image-2 \
--prompt "put this product on a marble countertop, soft window light" \
--reference-image ./bottle.png \
--output ./hero.pngOutput path is printed on success along with elapsed time, file size, and token usage. Defaults: gemini-3-pro-image-preview, auto-generated timestamped filename in the current directory.
Drop image-gen/ next to your other skills. When another skill needs an image, it invokes scripts/generate.py with the prompt it built. The contract is generate_image(prompt, model, [aspect_ratio], [reference_images], [output_path]) → file_path — see SKILL.md for the full Engagement Principles and Execution Procedure.
SKILL.md # skill contract — read this first
references/
model-selection.md # model family routing + which one to pick
parameter-spec.md # argument schema + caller defaults
reference-image-handling.md # auto-resize rules (the 524 fix)
error-handling.md # HTTP/response error mapping
scripts/
generate.py # the parameterised entry point
setup.sh # dependency installer
url_screenshot.py # auxiliary: capture a URL → PNG for i2i input
- Auto-detect model family. Caller passes
--model; skill resolves OpenAI vs Gemini endpoint shape. No "which API is this" questions. - Reference images get auto-resized to ≤1024px / ≤1MB. A 2.5MB PNG + base64 expansion + Gemini processing reliably triggers HTTP 524 at Cloudflare's ~100s edge.
- Negative constraints are appended as "Avoid: …", not sent as a separate parameter. No model on Cloubic exposes a true
negative_prompt. - Fail loud. A timeout, malformed response, or missing image data exits non-zero with a diagnostic. Caller decides retry strategy.
- Cost is logged on every call —
gpt-image-2≈ ¥0.35/image,gemini-3-pro-image-preview≈ ¥1/image,gemini-3.1-flash-image-preview≈ ¥0.20/image.
See SKILL.md §Engagement Principles for the complete list.
Scaffolded and audited via skill-forge — passes the 5 must-fix structure findings (bidirectional EP contract, frontmatter on references, repo-root README + LICENSE).
MIT — see LICENSE.