Skip to content

Refactor shellgenius output flow and default to gpt-5.4-mini#4

Merged
sderev merged 6 commits intomainfrom
ux-openai-refactor
Mar 20, 2026
Merged

Refactor shellgenius output flow and default to gpt-5.4-mini#4
sderev merged 6 commits intomainfrom
ux-openai-refactor

Conversation

@sderev
Copy link
Copy Markdown
Owner

@sderev sderev commented Mar 20, 2026

What changed

  • Migrate the OpenAI integration to the current openai SDK and the Responses API, while keeping the n fallback path and rejecting unsupported stop values for GPT-5.4-family models before any API call.
  • Add pytest and scriv scaffolding, plus mocked and opt-in real smoke tests for the GPT-5.4 family.
  • Parse model output through shellgenius.response_parser instead of an inline regex.
  • Refactor the CLI for TTY-aware output and execution, add --model, --no-stream, --plain, --command-only, --execute, and --yes, and make --execute honor the parsed shell fence instead of silently switching shells.
  • Default the CLI to gpt-5.4-mini, update the README and changelog fragments, and make .ci/gate run each Python version in an isolated environment.

Why

  • Non-interactive use should print output and exit, not block on a prompt.
  • The project should use the current OpenAI SDK and a current default model.
  • The CLI needed tests and a clear output contract before changing model and TTY behavior.
  • --execute should not accept one shell fence and run the command in a different shell.
  • gate should be a reliable multi-version preflight.

How to test

  • gate
  • uv run --group dev pytest -q tests/test_cli.py tests/test_response_parser.py
  • uv run --group dev pytest -q tests/test_openai_real_smoke.py --run-live -m real
  • Manual checks:
    • uv run shellgenius "list files in the current directory"
    • uv run shellgenius --plain "list files in the current directory"
    • uv run shellgenius --command-only "list files in the current directory"
    • uv run shellgenius --execute "print ok" and answer n
    • uv run shellgenius --execute --yes "print ok"
    • uv run shellgenius --execute "list files" </dev/null and confirm it fails with --yes guidance instead of hanging

Risk/comp notes

  • The default flow no longer asks whether to execute the generated command. Execution now requires --execute.
  • --execute now follows the parsed fence language and rejects incompatible shell fences on the current platform.
  • The default model is now gpt-5.4-mini, so users need access to that model or must override --model.
  • GPT-5.4-family models now raise a clear error when callers pass stop, because that parameter is not supported on the current SDK path for those models.
  • Real smoke tests remain opt-in because they require credentials and model access.

Changelog fragment: yes (CLI behavior and default model changed)

@sderev sderev force-pushed the ux-openai-refactor branch 4 times, most recently from 10fe08f to fe2d7c7 Compare March 20, 2026 01:06
* add a `tests/` scaffold with opt-in live test support
* configure `pytest` and `scriv` in `pyproject.toml`
* add `CHANGELOG.md` and a Markdown fragment template

Co-authored-by: AI <ai@sderev.com>
@sderev sderev force-pushed the ux-openai-refactor branch 3 times, most recently from fb8478f to 50c8d98 Compare March 20, 2026 01:59
sderev and others added 5 commits March 20, 2026 03:12
* Add `OpenAIResponsesBackend` and prompt adaptation for the Responses API.
* Call `responses.create` for single-response requests without `stop`, and keep `chat.completions.create` as the fallback.
* Add mocked tests for non-streaming, streaming, callback, and rate-limit paths.

Co-authored-by: AI <ai@sderev.com>
* Parse fenced command output into command, explanation, raw text, and fence language.
* Preserve embedded fence lines inside heredoc-style commands and accept blank-line plain-text explanations.
* Add parser and CLI regressions for malformed and non-shell fenced output.

Co-authored-by: AI <ai@sderev.com>
* add TTY-aware output modes and explicit execution flags
* execute generated commands with the parsed shell fence and reject incompatible fences
* update CLI tests, README usage, and the changelog fragment for the new behavior

Co-authored-by: AI <ai@sderev.com>
* move the project to `openai>=2,<3` and refresh `uv.lock`
* default ShellGenius to `gpt-5.4-mini`
* add opt-in `real` smoke tests for default and GPT-5.4-family requests
* document the default model and live-test opt-in path in `README.md`

OpenAI docs describe `gpt-5.4-mini` as the strongest mini model for coding, and `gpt-5-mini` recommends starting with `gpt-5.4 mini` for most new low-latency, high-volume workloads.

Co-authored-by: AI <ai@sderev.com>
* update `README.md` to match the current Python requirement, default model, and execution flow
* drop a stale inline comment in `shellgenius/gpt_integration.py`

Co-authored-by: AI <ai@sderev.com>
@sderev sderev force-pushed the ux-openai-refactor branch from 50c8d98 to e3d42ff Compare March 20, 2026 02:16
@sderev sderev merged commit e3d42ff into main Mar 20, 2026
@sderev sderev deleted the ux-openai-refactor branch March 20, 2026 02:30
sderev added a commit that referenced this pull request Apr 8, 2026
* pygments <2.20: ReDoS via inefficient GUID regex (alert #4)
* requests <2.33: insecure temp file reuse in extract_zipped_paths() (alert #3)
* Both are transitive deps (via rich/tiktoken); pin minimums to force patched versions

Co-authored-by: AI <ai@sderev.com>
sderev added a commit that referenced this pull request Apr 8, 2026
* pygments <2.20: ReDoS via inefficient GUID regex (alert #4)
* requests <2.33: insecure temp file reuse in extract_zipped_paths() (alert #3)
* Both are transitive deps (via rich/tiktoken); pin minimums to force patched versions

Co-authored-by: AI <ai@sderev.com>
sderev added a commit that referenced this pull request Apr 8, 2026
* pygments <2.20: ReDoS via inefficient GUID regex (alert #4)
* requests <2.33: insecure temp file reuse in extract_zipped_paths() (alert #3)
* Both are transitive deps (via rich/tiktoken); pin minimums to force patched versions

Co-authored-by: AI <ai@sderev.com>
sderev added a commit that referenced this pull request Apr 8, 2026
* pygments <2.20: ReDoS via inefficient GUID regex (alert #4)
* requests <2.33: insecure temp file reuse in extract_zipped_paths() (alert #3)
* Both are transitive deps (via rich/tiktoken); pin minimums to force patched versions

Co-authored-by: AI <ai@sderev.com>
sderev added a commit that referenced this pull request Apr 8, 2026
* pygments <2.20: ReDoS via inefficient GUID regex (alert #4)
* requests <2.33: insecure temp file reuse in extract_zipped_paths() (alert #3)
* Both are transitive deps (via rich/tiktoken); pin minimums to force patched versions

Co-authored-by: AI <ai@sderev.com>
sderev added a commit that referenced this pull request Apr 8, 2026
* pygments <2.20: ReDoS via inefficient GUID regex (alert #4)
* requests <2.33: insecure temp file reuse in extract_zipped_paths() (alert #3)
* Both are transitive deps (via rich/tiktoken); pin minimums to force patched versions

Co-authored-by: AI <ai@sderev.com>
sderev added a commit that referenced this pull request Apr 8, 2026
* pygments <2.20: ReDoS via inefficient GUID regex (alert #4)
* requests <2.33: insecure temp file reuse in extract_zipped_paths() (alert #3)
* Both are transitive deps (via rich/tiktoken); pin minimums to force patched versions

Co-authored-by: AI <ai@sderev.com>
sderev added a commit that referenced this pull request Apr 8, 2026
* pygments <2.20: ReDoS via inefficient GUID regex (alert #4)
* requests <2.33: insecure temp file reuse in extract_zipped_paths() (alert #3)
* Both are transitive deps (via rich/tiktoken); pin minimums to force patched versions

Co-authored-by: AI <ai@sderev.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant