Skip to content

lidge-jun/ima2-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

849 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ima2-gen

ima2-gen logo

npm version Node.js License: MIT

🌐 Live site: lidge-jun.github.io/ima2-gen · 한국어

📖 Developer docs: Documentation site · 한국어

Read in other languages: 한국어 · 日本語 · 简体中文

ima2-gen is a local image generation studio for people who want the ChatGPT/Codex image workflow in a small desktop-like web app.

Install globally, sign in with ChatGPT OAuth or Grok OAuth, and start generating images and videos. Iterate with history, references, node branches, multimode batches, Canvas Mode cleanup, and Grok Video generation. No API key required — free ChatGPT OAuth and SuperGrok subscription cover everything.

ima2-gen video playback with gallery sidebar showing generated images and videos.

Quick Start

npm install -g ima2-gen
ima2 setup
ima2 serve

Then open http://localhost:3333.

To generate a video from the CLI:

ima2 video "a cat playing piano" --duration 5 --resolution 720p
ima2 video "animate this scene" --ref photo.png --duration 10

If 3333 is already occupied, ima2-gen binds the next available port and writes the actual URL to ~/.ima2/server.json. Use ima2 open or the URL printed in the terminal instead of assuming the port.

Using npx? See docs/NPX_QUICKSTART.md for the npx ima2-gen serve workflow.

One-Click Install (no npm required)

Don't have Node.js or npm? Use the platform install script — it detects your environment, installs Node LTS if needed, then installs ima2-gen.

macOS:

curl -fsSL https://lidge-jun.github.io/ima2-gen/install-mac.sh | bash

Windows (PowerShell):

irm https://lidge-jun.github.io/ima2-gen/install-windows.ps1 | iex

Linux / WSL:

curl -fsSL https://lidge-jun.github.io/ima2-gen/install-linux.sh | bash

Each script checks for nvm/fnm/brew/winget, installs Node LTS through the best available method, and handles stale process cleanup automatically.

Setup

ima2 setup offers four authentication choices:

  1. GPT OAuth — login with ChatGPT account (free, images only)
  2. Grok OAuth — login with xAI/Grok account (images + video)
  3. Both — GPT OAuth + Grok OAuth (full feature access)
  4. Web setup — configure everything in the web UI

Video generation requires Grok OAuth (option 2 or 3). Run ima2 grok login separately if you already have GPT OAuth configured and want to add video support; it defaults to the manual-paste flow.

Updating

Stop the running server with Ctrl+C, then:

npm install -g ima2-gen@latest

Ctrl+C now performs a clean shutdown — closing the database, stopping child processes, and releasing file locks. On older versions (< 1.1.22) or if you see EBUSY on Windows, use the install script which handles stale process cleanup automatically.

What It Does

  • Classic mode: generate, edit, reuse the current image, paste references, and continue from history.
  • Node mode: branch a good image into multiple directions without losing the original.
  • Multimode batches: launch several Classic outputs from one prompt, watch slot-by-slot progress, and continue from the best result.
  • Video generation: create short videos from text, a single image, or multiple reference images via Grok video models. SSE streaming shows planning → submitted → progress % → done. Video frame copy buttons (First/Mid/Last) let you extract and copy keyframes from generated videos.
  • Storyboard mode: toggle storyboard mode in the composer to maintain character and scene continuity across sequential frames. Works with both image and video generation — image keyframes are composed for video production, and video clips inherit character/environment lock rules.
  • Canvas Mode: zoom, pan, annotate, erase, clean backgrounds, keep transparent previews, and export either alpha or matte-backed versions.
  • Local gallery: keep generated assets on your machine with session-aware history. By default the gallery shows the current session and an All Images toggle reveals the full history; the default scope is sticky across sessions. Each image records its generation time and reasoning effort in the result metadata, so they persist across reloads.
  • Reference images: drag, drop, paste, and attach up to 5 references (images) or up to 7 references (video); large images are compressed before upload.
  • Prompt library imports: import local prompt packs, GitHub folders, and curated GPT-image prompt hints into the built-in prompt library.
  • Mobile shell: use the app bar, compose sheet, and compact settings toggle on smaller screens.
  • Observable jobs: active and recent jobs are tracked with safe logs and request IDs.

SSE Multiplexing

The web UI uses a single GET /api/events Server-Sent Events connection for all generation progress. Multimode, node, and video requests are submitted as async POST (202 { requestId }) and progress events are multiplexed through a shared event bus. This eliminates the browser 6-connection limit that previously caused gallery hangs during concurrent generation. CLI clients that do not send async: true still receive per-request SSE streams for backward compatibility.

Provider Paths

Image generation can run through the local Codex/ChatGPT OAuth path, a configured OpenAI API key, the bundled Grok provider, or the Gemini provider via Antigravity CLI.

  • provider: "oauth" uses the local Codex OAuth proxy.
  • provider: "api" calls the OpenAI Responses API with the hosted image_generation tool.
  • provider: "grok" starts bundled progrok on 127.0.0.1:18645, runs mandatory xAI Web Search plus a planner pass (default: grok-4.3, configurable in settings or via --planner-model), then calls xAI Images API through the local proxy.
  • provider: "agy" spawns the Antigravity CLI (agy -p) to generate images via Google Gemini's default_api:generate_image tool (model: nano-banana-2). Output is fixed at 1024×1024 JPEG, max 3 reference images. No web search, quality, or size controls.
  • provider: "gemini-api" calls the Google Generative Language API directly. Supports two models: nano-banana-2 (Gemini 3.1 Flash Image) and nano-banana-pro (Gemini 3 Pro Image). Auth is via GEMINI_API_KEY env var, web UI key management, or a Vertex AI service account JSON (VERTEX_SERVICE_ACCOUNT_JSON). When both an API key and Vertex credentials are configured, Vertex takes priority. Supports variable aspect ratios (1:1 through 21:9) and four resolution tiers (512px, 1K, 2K, 4K); these controls are only honored on the direct API path — the Vertex AI endpoint ignores aspect/size because it does not accept the response_format field. Per-model cost differs: nano-banana-2 (Flash): 512=$0.001, 1K=$0.003, 2K=$0.004, 4K=$0.006; nano-banana-pro: 1K=$0.007, 2K=$0.007, 4K=$0.013. No web search or mask controls.
  • API-key generation supports classic generate, edit, mask-guided edit, multimode, and node generation.
  • Grok generation supports Classic, Node, and Agent flows. If a Classic reference, Node parent image, or Agent current image is present, ima2 switches the final Grok call to xAI image edit so image-to-image context is preserved.

If no provider is specified, the app keeps the current GPT OAuth/default behavior. API-key generation defaults to gpt-5.4-mini, low reasoning, and 1024x1024 unless the request passes validated model, reasoning, size, or web-search options. Grok defaults to grok-imagine-image; quality: "high" promotes the final image call to grok-imagine-image-quality.

Grok image generation exposes a model picker (grok-imagine-image / grok-imagine-image-quality) and a size picker (aspect ratio + 1k/2k resolution). The Settings page shows a billing/quota bar with $used/$limit drawn from the Grok billing API, and a Switch Account button that starts a device-code OAuth flow (POST /api/auth/switch) for re-authenticating without leaving the app.

Grok video generation uses grok-imagine-video (default) or grok-imagine-video-1.5-preview. Three modes are auto-detected from reference count: text-to-video (0 refs), image-to-video (1 ref), and reference-to-video (2–7 refs, max 10s duration). grok-imagine-video-1.5-preview supports image-to-video but not reference_images Ref2V, so 2+ refs use grok-imagine-video as the effective model. Video edit and extension are also base-model only. Video controls include duration (1–15s), resolution (480p, 720p), and aspect ratio (1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, auto).

Settings workspace showing GPT OAuth active and API key provider available.

Model Guidance

The app defaults to gpt-5.4-mini for fast local iteration. Switch to gpt-5.4 when you want the safest balanced image workflow.

  • gpt-5.4 — recommended balanced choice.
  • gpt-5.4-mini — current default and faster draft model.
  • gpt-5.5 — strongest quality option when your Codex CLI/OAuth backend supports it. It may use more quota, expose different tool capabilities, or require updating Codex CLI before it works reliably.

The app also exposes quality (low, medium, high) and moderation (auto, low) controls.

Workflows

Classic Mode

Use Classic when you want one strong result quickly.

  1. Write a prompt.
  2. Attach or paste references if needed.
  3. Pick model, quality, size, format, and moderation.
  4. Generate one image, or enable multimode to fan out several candidate slots from the same prompt.
  5. Copy, download, continue from the result, or send it into Canvas Mode.

For a control-by-control guide to Prompt Studio, multimode recipes, Direct mode, reasoning effort, and gallery favorite behavior, see the Prompt Studio manual.

Multimode sequence with four candidate slots generating from one prompt and active job history in the sidebar.

Node Mode

Use Node mode when you want to explore branches.

Node mode with connected generated cards and compact per-node metadata.

Each node keeps its own prompt and result. Root nodes can attach local references; child nodes use the parent image as their source. Completed jobs are matched back to nodes by request ID, so reloads and graph version conflicts can recover finished results.

Canvas Mode

Use Canvas Mode when a generated image is close but needs targeted cleanup before the next prompt.

  • Separate viewport panning from selection so you can move around a zoomed image without accidentally changing annotations.
  • Use annotation, eraser, multiselect, grouping, undo/redo, and sticky notes while keeping the original gallery image available.
  • Pick background-cleanup seeds, preview the mask, and save the cleanup as a canvas version.
  • Detect transparent images and show a checkerboard preview; export with preserved alpha or with a chosen matte color.
  • Saved canvas versions stay hidden from Gallery and HistoryStrip, but Canvas Mode can reuse them and attach a canvas version as the next reference.

Canvas Mode with zoom controls, annotation marks, a sticky note, and the canvas toolbar.

Prompt Library And Imports

The prompt library can now be filled from local files, GitHub folders, curated sources, and GPT-image hint packs. Imported prompts are indexed locally so search and ranking work without re-importing the same source every session.

Prompt import dialog for bringing prompts into the library, showing GitHub folder controls, curated sources, and searched prompt candidates before import.

Experimental Card News Mode

Card News is still dev-only and experimental. It is hidden in the default published runtime unless explicitly enabled for development, and it should not be treated as a stable public feature yet.

Settings

The settings workspace keeps account, model, appearance, and language controls away from the generation sidebar.

Settings workspace with account navigation and generation model controls.

CLI Commands

Server

Command Description
ima2 serve [--dev] Start the local web server; --dev enables verbose server diagnostics
ima2 setup Reconfigure saved auth
ima2 status Show config and OAuth status
ima2 doctor Diagnose Node, package, config, and auth
ima2 doctor image-probe [--json] Run sanitized image probes for no-image diagnostics
ima2 open Open the web UI
ima2 reset Remove saved config

Client

These require a running ima2 serve. The CLI covers every server route. The most common ones are below — the full CLI reference lists everything (generation, history, sessions, prompt library, annotations, Card News, observability, config).

Command Description
ima2 gen <prompt> Generate from the CLI
ima2 edit <file> --prompt <text> Edit an existing image
ima2 multimode <prompt> Multi-image SSE generation
ima2 video <prompt> Video generation via Grok (SSE streaming with progress)
ima2 ls [--session <id>] [--favorites] List recent history
ima2 show <name> [--metadata] Reveal a generated asset
ima2 prompt ls -q <search> Search the prompt library
ima2 inflight ls [--terminal] List active and recent jobs (alias of ps)
ima2 config set <key> <value> Write to ~/.ima2/config.json
ima2 ping Health-check the running server

The server advertises its actual port at ~/.ima2/server.json. If 3333 is busy, the backend falls back to 3334+ and CLI commands follow the advertised URL. Override discovery with --server <url> or IMA2_SERVER=http://localhost:3333.

ima2 gen "poster" --model gpt-5.4 --reasoning-effort high
ima2 edit input.png --prompt "make it rainy" --web-search
ima2 multimode "two cats playing" -n 2
ima2 video "a cat playing piano" --duration 5 --resolution 720p
ima2 video "animate this" --ref photo.png --aspect-ratio 16:9
ima2 inflight ls --terminal
ima2 config set imageModels.reasoningEffort high

Full reference: docs/CLI.md.

Configuration

Config priority:

environment variables > ~/.ima2/config.json > built-in defaults
Variable Default Description
IMA2_PORT / PORT 3333 Web server port
IMA2_HOST 127.0.0.1 Web server bind host
IMA2_OAUTH_PROXY_PORT / OAUTH_PORT 10531 OAuth proxy port
IMA2_SERVER CLI target override
IMA2_CONFIG_DIR ~/.ima2 Config and SQLite location
IMA2_ADVERTISE_FILE ~/.ima2/server.json Runtime discovery file
IMA2_GENERATED_DIR ~/.ima2/generated Generated image directory
IMA2_IMAGE_MODEL_DEFAULT gpt-5.4-mini Server fallback image model
IMA2_REASONING_EFFORT medium Default reasoning effort for the default (GPT OAuth) path; one of none, low, medium, high, xhigh
IMA2_NO_OAUTH_PROXY Set 1 to disable the auto-started OAuth proxy
IMA2_LOG_LEVEL info Normal serve defaults to info; dev mode defaults to debug; supports debug, info, warn, error, or silent
IMA2_INFLIGHT_TERMINAL_TTL_MS 300000 Recent terminal job retention for debug views
OPENAI_API_KEY API key for the provider: "api" Responses API image path and auxiliary API-key features
IMA2_API_IMAGE_MODEL_DEFAULT gpt-5.4-mini Default image model for provider: "api"
IMA2_API_REASONING_EFFORT low Default reasoning effort for provider: "api"
IMA2_API_IMAGE_SIZE 1024x1024 Default size for provider: "api"
IMA2_API_ALLOW_WEB_SEARCH true Toggle web search for provider: "api"
IMA2_GROK_PROXY_HOST 127.0.0.1 Host for the bundled progrok proxy
IMA2_GROK_PROXY_PORT 18645 Port for the bundled progrok proxy
IMA2_NO_GROK_PROXY Set 1 to disable automatic progrok startup
IMA2_GROK_PLANNER_MODEL grok-4.3 Grok search/planner model (also configurable via settings UI or --planner-model CLI flag)
IMA2_GROK_PLANNER_TIMEOUT_MS 60000 Timeout for Grok search and planner calls
IMA2_GROK_IMAGE_MODEL_DEFAULT grok-imagine-image Default final Grok image model
IMA2_GROK_GENERATION_TIMEOUT_MS 120000 Timeout for the final Grok Images API call
IMA2_OAUTH_MASKED_EDIT_ENABLED false Opt-in feature flag for masked-edit requests on the OAuth path (#31, groundwork only)
GEMINI_API_KEY API key for provider: "gemini-api" direct Generative Language API path
VERTEX_SERVICE_ACCOUNT_JSON Google service account JSON for Vertex AI auth with provider: "gemini-api"; takes priority over GEMINI_API_KEY when both are set

Logging modes

ima2 serve keeps terminal output intentionally quiet: startup URLs, warnings, and errors stay visible, while request/node/OAuth structured logs are hidden by default.

Use ima2 serve --dev, npm run dev, or IMA2_LOG_LEVEL=debug ima2 serve when you need request IDs, node generation phases, OAuth stream diagnostics, or inflight state transitions. Explicit IMA2_LOG_LEVEL and ~/.ima2/config.json values still override the built-in defaults.

API Reference

The endpoint list moved to docs/API.md so this README can stay focused on first-run use.

Useful references:

Troubleshooting

ima2 ping says the server is unreachable Start ima2 serve, then check ~/.ima2/server.json. You can also run ima2 ping --server http://localhost:3333.

GPT OAuth login does not work Re-run ima2 setup (option 1), confirm ima2 status, then restart ima2 serve.

fetch failed repeats on a proxy/VPN network Check that the local OAuth proxy is reachable. On networks that require a proxy, enable your proxy client's TUN/TURN-style mode, then retry openai-oauth --port 10531. If it still fails, set HTTP_PROXY and HTTPS_PROXY in the same terminal that runs ima2 serve or openai-oauth. On Windows, also check for auto-start network interception tools, including DNS/fragmentation bypass tools such as SecretDNS, because they can break OAuth or streaming image responses even when the browser appears connected.

Images fail with API_KEY_REQUIRED Set OPENAI_API_KEY or configure an API key before using provider: "api". The default GPT OAuth path still works without an API key.

Image generation returns EMPTY_RESPONSE or no image data Run ima2 doctor image-probe --json > ima2-image-probe.json and attach the safe JSON when opening an issue. For GPT OAuth cases, also capture ima2 gen "고양이" --no-web-search --json and ima2 gen "고양이" --json while ima2 serve is running. Do not share ChatGPT cookies, OAuth token files, API keys, raw upstream responses, prompt history, or generated base64. See the FAQ support bundle.

A large reference image fails The app compresses large JPEG/PNG references before upload. If a file still fails, convert it to JPEG or PNG at a lower resolution and try again. HEIC/HEIF files are not supported by the browser path.

Old gallery images are missing after updating Recent versions moved generated images from the installed package folder to ~/.ima2/generated. Run ima2 doctor and see Recover old images.

gpt-5.5 fails but other models work Update Codex CLI first, then retry. If it still fails, your account or backend route may not expose the same image capability or quota for gpt-5.5 yet; use gpt-5.4 as the stable fallback.

The app opened on a different port If the requested server port is busy, ima2-gen falls back to the next available port and records it in ~/.ima2/server.json. If the port is unexpectedly 3457, your shell may also have inherited PORT=3457 from another local tool. Run unset PORT or start with IMA2_PORT=3333 ima2 serve.

Port 10531 is already used on Windows Some Windows security tools, including AnySign4PC.exe, may occupy the default OAuth proxy port. Current builds track the actual fallback OAuth port. If you still need a manual override, start with IMA2_OAUTH_PROXY_PORT=11531 ima2 serve and check ima2 doctor.

For more beginner-friendly answers, see the FAQ.

Development

git clone https://github.com/lidge-jun/ima2-gen.git
cd ima2-gen
npm install
npm run dev
npm run typecheck
npm test
npm run build

npm run dev builds the UI and starts the TypeScript server entry with --watch and verbose server diagnostics. npm run typecheck, npm run build:server, and npm run build:cli verify the TypeScript migration and package emit path. Node mode and Canvas Mode are part of the packaged UI by default.

Contributors

License

MIT

About

Minimal CLI + web UI for OpenAI GPT Image 2 generation. Dual auth: API Key (paid) or OAuth via ChatGPT (free). Text-to-image, image-to-image, parallel gen, custom sizes.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors