Skip to content

motiful/image-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image-gen

Unified image generation skill that dispatches to the right Cloubic-hosted model — OpenAI gpt-image-2 or Gemini gemini-3-pro-image-preview / gemini-3.1-flash-image-preview — through one parameterised script. Handles text-to-image and image-to-image. Auto-resizes oversized reference images so Cloudflare doesn't 524 on you.

Built as the image-generation backend for skill ecosystems (e.g. lovart-skills) — callers produce a Brief, image-gen turns it into a file on disk.

Install

git clone https://github.com/motiful/image-gen.git
cd image-gen
bash scripts/setup.sh

setup.sh checks python3 + pip, installs requests / Pillow / playwright (+ chromium), and reminds you to set the API key. Idempotent.

API key

Pick one — CLOUBIC_API_KEY env var wins if both exist:

export CLOUBIC_API_KEY='sk-xxxx'
# or
echo 'sk-xxxx' > ~/.cloubic_api_key && chmod 600 ~/.cloubic_api_key

Key never reaches logs or stdout — only the Authorization header.

Use it (CLI)

# text-to-image — defaults to gemini-3-pro-image-preview
python3 scripts/generate.py --prompt "a samurai cat in neon Tokyo" --aspect-ratio 16:9

# image-to-image with multiple references
python3 scripts/generate.py \
    --model gpt-image-2 \
    --prompt "put this product on a marble countertop, soft window light" \
    --reference-image ./bottle.png \
    --output ./hero.png

Output path is printed on success along with elapsed time, file size, and token usage. Defaults: gemini-3-pro-image-preview, auto-generated timestamped filename in the current directory.

Use it (as a skill)

Drop image-gen/ next to your other skills. When another skill needs an image, it invokes scripts/generate.py with the prompt it built. The contract is generate_image(prompt, model, [aspect_ratio], [reference_images], [output_path]) → file_path — see SKILL.md for the full Engagement Principles and Execution Procedure.

What's inside

SKILL.md                       # skill contract — read this first
references/
  model-selection.md           # model family routing + which one to pick
  parameter-spec.md            # argument schema + caller defaults
  reference-image-handling.md  # auto-resize rules (the 524 fix)
  error-handling.md            # HTTP/response error mapping
scripts/
  generate.py                  # the parameterised entry point
  setup.sh                     # dependency installer
  url_screenshot.py            # auxiliary: capture a URL → PNG for i2i input

Design notes

  • Auto-detect model family. Caller passes --model; skill resolves OpenAI vs Gemini endpoint shape. No "which API is this" questions.
  • Reference images get auto-resized to ≤1024px / ≤1MB. A 2.5MB PNG + base64 expansion + Gemini processing reliably triggers HTTP 524 at Cloudflare's ~100s edge.
  • Negative constraints are appended as "Avoid: …", not sent as a separate parameter. No model on Cloubic exposes a true negative_prompt.
  • Fail loud. A timeout, malformed response, or missing image data exits non-zero with a diagnostic. Caller decides retry strategy.
  • Cost is logged on every callgpt-image-2 ≈ ¥0.35/image, gemini-3-pro-image-preview ≈ ¥1/image, gemini-3.1-flash-image-preview ≈ ¥0.20/image.

See SKILL.md §Engagement Principles for the complete list.

Provenance

Scaffolded and audited via skill-forge — passes the 5 must-fix structure findings (bidirectional EP contract, frontmatter on references, repo-root README + LICENSE).

License

MIT — see LICENSE.

About

Unified image generation skill — dispatches to OpenAI gpt-image-2 / Gemini Nano Banana Pro via Cloubic. One parameterised script for text-to-image + image-to-image, with auto-resize to avoid Cloudflare 524s. Backend for lovart-skills.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors