Refactor `shellgenius` output flow and default to `gpt-5.4-mini` by sderev · Pull Request #4 · sderev/shellgenius

sderev · 2026-03-20T00:14:44Z

What changed

Migrate the OpenAI integration to the current openai SDK and the Responses API, while keeping the n fallback path and rejecting unsupported stop values for GPT-5.4-family models before any API call.
Add pytest and scriv scaffolding, plus mocked and opt-in real smoke tests for the GPT-5.4 family.
Parse model output through shellgenius.response_parser instead of an inline regex.
Refactor the CLI for TTY-aware output and execution, add --model, --no-stream, --plain, --command-only, --execute, and --yes, and make --execute honor the parsed shell fence instead of silently switching shells.
Default the CLI to gpt-5.4-mini, update the README and changelog fragments, and make .ci/gate run each Python version in an isolated environment.

Why

Non-interactive use should print output and exit, not block on a prompt.
The project should use the current OpenAI SDK and a current default model.
The CLI needed tests and a clear output contract before changing model and TTY behavior.
--execute should not accept one shell fence and run the command in a different shell.
gate should be a reliable multi-version preflight.

How to test

gate
uv run --group dev pytest -q tests/test_cli.py tests/test_response_parser.py
uv run --group dev pytest -q tests/test_openai_real_smoke.py --run-live -m real
Manual checks:
- uv run shellgenius "list files in the current directory"
- uv run shellgenius --plain "list files in the current directory"
- uv run shellgenius --command-only "list files in the current directory"
- uv run shellgenius --execute "print ok" and answer n
- uv run shellgenius --execute --yes "print ok"
- uv run shellgenius --execute "list files" </dev/null and confirm it fails with --yes guidance instead of hanging

Risk/comp notes

The default flow no longer asks whether to execute the generated command. Execution now requires --execute.
--execute now follows the parsed fence language and rejects incompatible shell fences on the current platform.
The default model is now gpt-5.4-mini, so users need access to that model or must override --model.
GPT-5.4-family models now raise a clear error when callers pass stop, because that parameter is not supported on the current SDK path for those models.
Real smoke tests remain opt-in because they require credentials and model access.

Changelog fragment: yes (CLI behavior and default model changed)

* add a `tests/` scaffold with opt-in live test support * configure `pytest` and `scriv` in `pyproject.toml` * add `CHANGELOG.md` and a Markdown fragment template Co-authored-by: AI <ai@sderev.com>

* Add `OpenAIResponsesBackend` and prompt adaptation for the Responses API. * Call `responses.create` for single-response requests without `stop`, and keep `chat.completions.create` as the fallback. * Add mocked tests for non-streaming, streaming, callback, and rate-limit paths. Co-authored-by: AI <ai@sderev.com>

* Parse fenced command output into command, explanation, raw text, and fence language. * Preserve embedded fence lines inside heredoc-style commands and accept blank-line plain-text explanations. * Add parser and CLI regressions for malformed and non-shell fenced output. Co-authored-by: AI <ai@sderev.com>

* add TTY-aware output modes and explicit execution flags * execute generated commands with the parsed shell fence and reject incompatible fences * update CLI tests, README usage, and the changelog fragment for the new behavior Co-authored-by: AI <ai@sderev.com>

* move the project to `openai>=2,<3` and refresh `uv.lock` * default ShellGenius to `gpt-5.4-mini` * add opt-in `real` smoke tests for default and GPT-5.4-family requests * document the default model and live-test opt-in path in `README.md` OpenAI docs describe `gpt-5.4-mini` as the strongest mini model for coding, and `gpt-5-mini` recommends starting with `gpt-5.4 mini` for most new low-latency, high-volume workloads. Co-authored-by: AI <ai@sderev.com>

* update `README.md` to match the current Python requirement, default model, and execution flow * drop a stale inline comment in `shellgenius/gpt_integration.py` Co-authored-by: AI <ai@sderev.com>

* pygments <2.20: ReDoS via inefficient GUID regex (alert #4) * requests <2.33: insecure temp file reuse in extract_zipped_paths() (alert #3) * Both are transitive deps (via rich/tiktoken); pin minimums to force patched versions Co-authored-by: AI <ai@sderev.com>

sderev force-pushed the ux-openai-refactor branch 4 times, most recently from 10fe08f to fe2d7c7 Compare March 20, 2026 01:06

Add pytest and scriv scaffolding

cdb4130

* add a `tests/` scaffold with opt-in live test support * configure `pytest` and `scriv` in `pyproject.toml` * add `CHANGELOG.md` and a Markdown fragment template Co-authored-by: AI <ai@sderev.com>

sderev force-pushed the ux-openai-refactor branch 3 times, most recently from fb8478f to 50c8d98 Compare March 20, 2026 01:59

sderev and others added 5 commits March 20, 2026 03:12

Align README.md with current CLI output

e3d42ff

* update `README.md` to match the current Python requirement, default model, and execution flow * drop a stale inline comment in `shellgenius/gpt_integration.py` Co-authored-by: AI <ai@sderev.com>

sderev force-pushed the ux-openai-refactor branch from 50c8d98 to e3d42ff Compare March 20, 2026 02:16

sderev merged commit e3d42ff into main Mar 20, 2026

sderev deleted the ux-openai-refactor branch March 20, 2026 02:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `shellgenius` output flow and default to `gpt-5.4-mini`#4

Refactor `shellgenius` output flow and default to `gpt-5.4-mini`#4
sderev merged 6 commits intomainfrom
ux-openai-refactor

sderev commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sderev commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sderev commented Mar 20, 2026 •

edited

Loading