Skip to content

feat(chatgpt-app): add temporary chat and multi-modal image attachment support#1783

Merged
jackwener merged 2 commits into
jackwener:mainfrom
pg-adm1n:feat/opencli-vision-temp-chat
May 31, 2026
Merged

feat(chatgpt-app): add temporary chat and multi-modal image attachment support#1783
jackwener merged 2 commits into
jackwener:mainfrom
pg-adm1n:feat/opencli-vision-temp-chat

Conversation

@pg-adm1n
Copy link
Copy Markdown
Contributor

@pg-adm1n pg-adm1n commented May 30, 2026

PR Title: feat(chatgpt-app): add temporary chat and multi-modal image attachment support

Description

This Pull Request introduces two major features to the macOS ChatGPT Desktop App adapter (chatgpt-app CLI):

  1. Temporary Chat Mode (--temp): Provides a privacy-isolated session within the desktop client.
  2. Local Image Uploads (--image): Adds Vision capabilities by programmatic image attachment.

Both features have been built using Cocoa's native frameworks, macOS Accessibility APIs (osascript & Swift), NSPasteboard, and Quartz CGEvent simulation. They include robust safeguards to prevent clipboard data loss, ensure event targeting isolation, eliminate thread-blocking delays, and resolve cross-platform/localization inconsistencies.


Usage Examples & Console Outputs

1. Launch a New Privacy-Protected Temporary Chat

opencli chatgpt-app new --temp

Console Output:

┌─────────┐
│ Status  │
├─────────┤
│ Success │
└─────────┘

(Result: The native ChatGPT macOS client is brought to the foreground, and a New Temporary Chat is successfully started under a Chinese or English UI system. All conversations inside this mode are private and will not be saved in your chat history.)

2. Send local image file for Vision analysis

opencli chatgpt-app ask "Describe the main colors in this chart" --image ~/Desktop/data_report.png

Console Output:

┌───────────┬──────────────────────────────────────────────────────────────────┐
│ Role      │ Text                                                             │
├───────────┼──────────────────────────────────────────────────────────────────┤
│ User      │ Describe the main colors in this chart                           │
│ Assistant │ The chart is a visual bar graph featuring three primary colors:  │
│           │ a deep vibrant blue representing revenue, a bright emerald green │
│           │ for operating costs, and a soft light-grey for background grid. │
└───────────┴──────────────────────────────────────────────────────────────────┘

(Result: The local image file is securely loaded, your clipboard contents are fully backed up, the image is pasted directly into ChatGPT's input area, the clipboard is restored back to its original state, and ChatGPT's multimodality answers your prompt successfully.)


Key Enhancements & Technical Design

1. Temporary Chat Mode (new.js)

  • Added a --temp CLI flag for opencli chatgpt-app new --temp.
  • Implemented multi-language adaptive UI scripting using AppleScript.
  • Added Traditional Chinese (新的臨時聊天 / 檔案), Simplified Chinese (新的临时聊天 / 文件), and English (New Temporary Chat / File) menu path click selectors.
  • Removed silent fallback behaviors. If a secure temporary chat session cannot be initialized, the system throws a fail-fast error, preventing silent degradation to standard persistent chat modes.

2. Multi-Modal Local Image Attachments (ask.js & ax.js)

  • Added optional --image paths for opencli chatgpt-app ask "Describe" --image /path/to/image.png.
  • Clipboard Preservation: Implemented a safe backup-and-restore cache in Swift (AX_SEND_SCRIPT) that serializes existing user pasteboard items before writing images to the General Pasteboard, fully restoring them immediately after pasting to prevent any data loss.
  • Process-Targeted Pasting: Replaced global window server tap posts (.cghidEventTap) with PID-targeted events (.postToPid(app.processIdentifier)), preventing keyboard shortcut simulation drift if the user focuses on other windows.
  • Rich Text State Handling: Resolved placeholder mismatch checks by dynamically polling and verifying input fields using the value right before trigger sending (valueBeforeSend).

3. Safety, Robustness, and Synchronization

  • Zero Thread Blocking: Converted blocking synchronous shell sleep loops in Node (execSync("sleep ...")) into non-blocking asynchronous Promise/setTimeout chains.
  • Safe Swift Castings: Replaced all forced downcastings (as!) with safe castings (as?) and custom nil checking to prevent runtime SIGABRT crashes.
  • Dynamic Element Polling: Replaced rigid sleeps in Swift with a lightweight 50ms interval polling check (waitForElement) to significantly speed up popover automation.
  • Fallback Window Finder: Integrated application window list scans to target background or unfocused open chat windows gracefully.
  • Extended Localizations: Added full Simplified/Traditional Chinese translation mappings for popover options and stop-generating buttons ("停止產生", "停止傳送", "经典模型", "經典模型").

Unit & Integration Test Logs

Added extensive tests in clis/chatgpt-app/ax.test.js to ensure stability and verify pasteboard preservation, process-targeted event delivery, dynamic polling, safe castings, and Chinese localized translation mappings.

All 3,450 unit and integration tests under OpenCLI pass successfully (100% success rate):

RUN  v4.1.4 /Users/pierre/.gemini/antigravity/scratch/OpenCLI

 ✓  adapter  clis/chatgpt-app/ax.test.js (13 tests) 4ms
 ✓  adapter  clis/chatgpt-app/commands.test.js (3 tests) 2ms

 Test Files  364 passed (364)
      Tests  3450 passed (3450)
   Start at  16:02:23
   Duration  13.23s (transform 6.64s, setup 0ms, import 23.67s, tests 21.80s, environment 23ms)

@pg-adm1n pg-adm1n force-pushed the feat/opencli-vision-temp-chat branch 3 times, most recently from 51b1f9c to ae2c873 Compare May 30, 2026 08:08
@pg-adm1n
Copy link
Copy Markdown
Contributor Author

Hi @jackwener,

I have created this PR to add two key capabilities to the native macOS chatgpt-app adapter:

  1. Temporary Chat Support (--temp): Launches a privacy-isolated session. Fully supports English, Simplified Chinese, and Traditional Chinese system menus.
  2. Vision Multi-Modal Image Attachment (--image): Securely attaches local images to the chat composer.

Key Technical Safeguards & Enhancements:

  • Clipboard Preservation: Swift serializes and backs up active pasteboard contents (NSPasteboard) before pasting, and fully restores them immediately after to prevent user clipboard data loss.
  • Process Targeting: Key events are targeted directly to the ChatGPT application's PID (postToPid) to prevent simulated key broadcast leakage to other foreground applications.
  • Dynamic Polling: Uses a Swift waitForElement 50ms polling loop to replace rigid thread sleeps.
  • Safe Downcasting: Replaced all forced castings (as!) with safe bindings (as?) to completely eliminate process crash risks.

All 3,450 adapter and commands unit tests pass successfully. Please take a look when you have a moment!

@pg-adm1n pg-adm1n force-pushed the feat/opencli-vision-temp-chat branch from ae2c873 to 66dd80f Compare May 30, 2026 08:17
…t support with clipboard and process protection
@pg-adm1n pg-adm1n force-pushed the feat/opencli-vision-temp-chat branch from 66dd80f to 07f2858 Compare May 30, 2026 08:18
@jackwener jackwener force-pushed the feat/opencli-vision-temp-chat branch from d3f982f to 0c945c4 Compare May 31, 2026 13:05
@jackwener jackwener merged commit 76fcc28 into jackwener:main May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants