Skip to content

feat(clis/chatgptweb): add ChatGPT web image generation command#973

Merged
jackwener merged 3 commits intojackwener:mainfrom
asimons81:feat/chatgptweb-image-generation
Apr 12, 2026
Merged

feat(clis/chatgptweb): add ChatGPT web image generation command#973
jackwener merged 3 commits intojackwener:mainfrom
asimons81:feat/chatgptweb-image-generation

Conversation

@asimons81
Copy link
Copy Markdown
Contributor

Summary

Add opencli chatgptweb image command that generates images using ChatGPT web (GPT-4o native image generation) and saves them locally to disk.

This is the ChatGPT web equivalent of the existing opencli gemini image command.

Motivation

ChatGPT (GPT-4o) supports native image generation that is qualitatively different from DALL-E — it understands context better and produces more coherent results for complex scenes. Providing a CLI interface for this enables:

  • Scripted/automatic image generation workflows
  • Integration with other CLI tools
  • Cross-platform support (Linux/macOS/Windows via OpenCLI browser automation)

Approach

The command uses OpenCLI's browser automation (CDP/Puppeteer) to:

  1. Navigate to chatgpt.com/new with full page reload to ensure clean state
  2. Close the sidebar if open (it can obscure the chat composer)
  3. Use Playwright's page.type() for reliable text input into the TipTap editor
  4. Click the "Send prompt" button
  5. Poll for response completion (handles thinking/throttling states)
  6. Extract generated image URLs from the DOM (backend-api/estuary/content)
  7. Download and save images as PNG/JPEG files

Key implementation decisions:

  • chatgptweb site name: ChatGPT is also available as an Electron Desktop app. Using chatgptweb as the site name ensures the browser-based implementation is used instead of the Desktop app adapter.
  • Full page reload: chatgpt.com/new with networkidle0 to clear React sidebar state that persists across client-side navigations.
  • Sidebar close: ChatGPT's sidebar covers the chat composer on some layouts — we close it before typing.
  • page.type() over execCommand: Playwright's native type method is more reliable for the TipTap contenteditable editor than DOM manipulation.

Files Changed

File Description
clis/chatgptweb/image.js CLI command definition with args (prompt, --op, --sd)
clis/chatgptweb/utils.js DOM helpers: sendChatGPTMessage, waitForChatGPTImages, getChatGPTVisibleImageUrls, getChatGPTImageAssets

Usage

# Generate an image and save to default directory (~/Pictures/chatgpt)
opencli chatgptweb image "a cyberpunk city at night"

# Specify output directory
opencli chatgptweb image "a robot" --op ~/Downloads/ai-art

# Skip download, just get the ChatGPT link
opencli chatgptweb image "a cat" --sd

Testing

Tested successfully on Linux with the following prompts:

  • "a small orange cat on a windowsill" — 1024x1536 PNG, 1.9MB ✓
  • "a robot sitting at a desk coding" — 1024x1536 PNG, 2.2MB ✓
  • "a cyberpunk city at night" — 1024x1536 PNG ✓

Notes

  • Requires being logged into ChatGPT web (cookie-based auth via OpenCLI)
  • Output directory defaults to ~/Pictures/chatgpt
  • The --sd flag is useful for previewing the result before downloading
  • Images are saved with timestamp filenames to avoid overwrites

Tony Simons and others added 3 commits April 11, 2026 11:43
Add `opencli chatgptweb image` command that generates images using
ChatGPT web (GPT-4o image generation) and saves them locally.

Features:
- Navigates to chatgpt.com/new with full page reload to ensure clean state
- Uses Playwright's page.type() for reliable text input in TipTap editor
- Closes sidebar if open (covers the chat composer on some layouts)
- Polls for response completion (handles thinking/throttling states)
- Extracts generated images from DOM (backend-api/estuary/content URLs)
- Downloads and saves as PNG/JPEG files to user-specified directory
- Supports --op for output directory and --sd to skip download

Files:
- clis/chatgptweb/image.js: CLI command definition
- clis/chatgptweb/utils.js: DOM helpers, send/wait/export functions

Works cross-platform (Linux/macOS/Windows) via OpenCLI browser automation.
@Astro-Han
Copy link
Copy Markdown
Contributor

Nice addition. The flow makes sense, and I like that it follows the existing gemini image shape.

One follow-up I noticed, not blocking from my side: the new chatgptweb adapter has its own doc page, but it is not wired into the usual entry points yet. I could only find docs/adapters/browser/chatgptweb.md. It looks like docs/adapters/index.md, docs/.vitepress/config.mts, README.md, and README.zh-CN.md still need the matching entries so users can actually discover it.

If you are still touching this PR, I would add those here. Otherwise, a small follow-up PR would also work.

@jackwener jackwener merged commit 4d1fa8a into jackwener:main Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants