Skip to content

feat: add OAuth provider flow for Codex#58

Open
iBreaker wants to merge 24 commits intonyroway:masterfrom
iBreaker:OAUTH
Open

feat: add OAuth provider flow for Codex#58
iBreaker wants to merge 24 commits intonyroway:masterfrom
iBreaker:OAUTH

Conversation

@iBreaker
Copy link
Copy Markdown
Contributor

Summary

  • add provider/channel auth mode metadata and Codex OAuth preset config
  • add OpenAI/Codex OAuth session, callback, runtime binding, and token refresh flow
  • add server, tauri, and webui wiring for provider OAuth connect/status actions

Validation

  • cargo check -p nyro-core -p nyro-server
  • cd webui && corepack pnpm build

Closes #55

Copy link
Copy Markdown
Contributor Author

@iBreaker iBreaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat: add OAuth provider flow for Codex

Overview

This PR introduces a complete OAuth PKCE authentication flow for Codex (OpenAI's Codex service), covering auth session lifecycle management, token exchange/refresh, provider binding, background token refresh, and WebUI integration. The feature scope is comprehensive and the overall architecture is clean.


🔴 Must Fix

1. Silent fallback with expired token (Bug)

In resolve_provider_runtime (crates/nyro-core/src/admin/mod.rs), when the access token is expired and no refresh token is available, the function silently returns the expired token, causing downstream API calls to fail with 401:

```rust
// Happy path only when token is not expired
if !access_token.is_empty() && !is_expired_at(...) { return Ok(...) }

// When refresh_token is empty:
if refresh_token.is_empty() {
if !access_token.is_empty() {
return Ok(ResolvedProviderRuntime { access_token, ... }) // ← returns expired token!
}
}
```

Suggestion: When the token is expired and cannot be refreshed, return an explicit error instead of silently falling back.

2. OAuth tokens included in provider exports

ExportProvider includes access_token, refresh_token, and expires_at, which means exported config files contain live OAuth credentials. This is a potential credential leakage risk. Confirm whether this is intentional; at minimum document it clearly, or strip these fields during export.


🟡 Suggestions

3. Auth sessions / bindings stored in the settings KV store

Auth sessions and provider bindings are stored via settings.set("oauth.session.{id}", ...) rather than dedicated DB tables:

  • No atomic transaction guarantees
  • Orphaned sessions have no TTL-based cleanup (relies entirely on explicit cancel/complete)
  • "Deletion" is implemented by writing an empty string — semantically ambiguous
  • No index on the settings table; performance will degrade as data grows

Consider adding dedicated auth_sessions and provider_auth_bindings tables in a follow-up.

4. resolve_preset_channel_auth_mode is duplicated

There are two separate implementations in crates/nyro-core/src/db/models.rs and crates/nyro-core/src/admin/mod.rs with identical logic but different code. Consolidate into a single shared function.

5. PROVIDER_PRESETS_SNAPSHOT is parsed in three places

The same JSON is include_str! + serde_json::from_str parsed independently in openai.rs, db/models.rs, and admin/mod.rs. Consider a shared lazy-initialized structure or a single parse entry point.

6. Gemini headers built twice

In test_provider_models and get_provider_models, when protocol == "gemini", the first headers build is immediately discarded and the same logic runs again. Replace the build-then-override pattern with a proper if/else branch.

7. resolve_provider_credential may return expired tokens in OAuth mode

query_http_capability and detect_embedding_dimensions use the synchronous resolve_provider_credential, which does not trigger token refresh. If the token is expired, those calls will silently use stale credentials. Consider routing these through resolve_provider_runtime, or document this known limitation explicitly.

8. Clearing last_error via Some(String::new())

```rust
last_error: Some(String::new()), // empty string used to mean "no error"
```

Using Some("") instead of None to represent "no error" is semantically confusing and forces every caller to check both None and empty-string cases.

9. extract_models_from_response behavioral change is outside OAuth scope

The protocol parameter was renamed to _protocol (unused), the Gemini-gated model field extraction was broadened to all providers, and slug / id field fallbacks were added. This is a real behavioral change with broader impact — it should either be in a separate PR or explicitly called out in this one.


Test Coverage

The PR validation only includes cargo check and pnpm build. Suggest adding tests for:

  • Expired token + valid refresh token → successful refresh and binding update
  • Expired token + no refresh token → explicit error (currently a bug, see item 1)
  • Callback state mismatch → exchange request rejected
  • resolve_preset_channel_auth_mode preset lookup logic

Summary

Category Count
🔴 Must Fix (Bug / Security) 2
🟡 Suggestions 7

The core OAuth flow is functionally complete. The PKCE implementation, driver abstraction, and RuntimeBinding design are clean. The expired token silent fallback bug (item 1) should be fixed before merge. The remaining items can be prioritized for follow-up.

Copy link
Copy Markdown
Contributor Author

@iBreaker iBreaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Review Notes

[High] auth_mode has no effect at runtime — api_key providers can be hijacked by stale OAuth bindings

resolve_provider_runtime (crates/nyro-core/src/admin/mod.rs) does not branch on provider.effective_auth_mode(). Whenever a binding exists for the provider's driver key, the function unconditionally takes the OAuth path — resolving the access token, triggering a refresh if needed, and constructing a RuntimeBinding with base_url_override and extra_headers.

The proxy main path now fully depends on this resolved runtime for token, base_url_override, and extra_headers (crates/nyro-core/src/proxy/handler.rs).

Consequence: if a user switches a provider back to an api_key channel but the old OAuth binding is still present in storage, traffic will continue to carry the OAuth token and route to the OAuth base URL. The auth_mode field is effectively ignored at runtime.

Suggested fix — guard at the entry of resolve_provider_runtime:

pub(crate) async fn resolve_provider_runtime(&self, provider: &Provider) -> anyhow::Result<ResolvedProviderRuntime> {
    // Respect the configured auth mode first
    if provider.effective_auth_mode().trim() != "oauth" {
        let api_key = provider.api_key.trim().to_string();
        if api_key.is_empty() {
            anyhow::bail!("provider api key is empty");
        }
        return Ok(ResolvedProviderRuntime {
            access_token: api_key,
            binding: RuntimeBinding::default(),
        });
    }
    // ... existing OAuth logic below
}

[Medium] Disconnecting OAuth leaves the provider in an unrecoverable state

logout_provider_oauth deletes the binding and then clears api_key, access_token, refresh_token, and expires_at on the provider record (crates/nyro-core/src/admin/mod.rs).

reconnect_provider_oauth, however, requires an existing binding with a non-empty refresh token to proceed. Once the binding is deleted by logout, reconnect will always fail with "provider oauth binding not found".

There is no API path to re-initiate an OAuth authorization flow and re-attach it to an existing provider. The user's only recovery option is to delete and recreate the provider entirely.

Suggested fix — pick one:

  1. Soft disconnect: logout_provider_oauth marks the binding as disconnected (status only) without deleting it or clearing the provider's credentials. The user can then reconnect using the existing refresh token if still valid.

  2. Re-authorize existing provider: Add a new endpoint (e.g. POST /providers/{id}/oauth/authorize) that initiates a fresh OAuth session scoped to an existing provider ID, and on completion re-binds the new credentials to that provider without requiring a delete-and-recreate flow.

Option 2 is the more complete fix and also unblocks scenarios where the refresh token has genuinely expired.

Copy link
Copy Markdown
Contributor Author

@iBreaker iBreaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • High: auth_mode does not appear to be enforced as the runtime source of truth. A provider switched back to api_key can still be affected by stale OAuth bindings/runtime credentials, which means requests may follow the wrong auth path. Runtime resolution should hard-gate on provider.effective_auth_mode() first, and completely ignore OAuth binding/refresh logic for api_key providers. Please review the runtime resolution path around crates/nyro-core/src/admin/mod.rs:1796.

  • Medium: Disconnecting OAuth appears to leave the provider in a partially broken or inconsistent state instead of a cleanly recoverable one. After logout, the provider can still retain OAuth-oriented config/state without a complete reconnect/reset flow, which risks inconsistent behavior across testing, model discovery, and later edits. The logout path should produce a single well-defined post-disconnect state and keep frontend/backend behavior aligned, especially across crates/nyro-core/src/admin/mod.rs and webui/src/pages/providers.tsx.

…r existing providers

- resolve_provider_runtime now checks effective_auth_mode() at entry;
  non-oauth providers skip binding lookup entirely, preventing stale
  oauth bindings from hijacking api_key mode traffic
- logout_provider_oauth now soft-deletes the binding (status=disconnected)
  instead of hard-deleting, preserving the refresh token for re-bind
- add bind_provider_with_oauth_session to attach a completed oauth session
  to an existing provider, enabling recovery after disconnect
- expose bind via POST /providers/:id/oauth/bind (server) and
  bind_provider_oauth tauri command
- remove now-unused delete_provider_auth_binding_record

Closes nyroway#59, Closes nyroway#60

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@iBreaker
Copy link
Copy Markdown
Contributor Author

Pushed a fix commit (7b67990) addressing the two issues raised in the review:

#59 — auth_mode ignored at runtime
resolve_provider_runtime now checks effective_auth_mode() at entry. Non-oauth providers return immediately with their api_key and a default RuntimeBinding, so stale oauth bindings can no longer hijack api_key mode traffic.

#60 — provider unrecoverable after OAuth disconnect
logout_provider_oauth now soft-deletes the binding (marks it disconnected) instead of hard-deleting, preserving the refresh token. Added bind_provider_with_oauth_session (POST /providers/:id/oauth/bind + Tauri command) to re-attach a completed OAuth session to an existing provider — no delete-and-recreate needed.

@iBreaker
Copy link
Copy Markdown
Contributor Author

iBreaker commented Apr 18, 2026

Codex OAuth upstream routing: root cause and proposed fix

Symptom

With this PR's OAuth flow, a route pointing at an OpenAI provider configured with the Codex channel (auth_mode = oauth, base_url = https://chatgpt.com/backend-api/codex) fails on every upstream call:

  • POST /v1/responses (Codex CLI ingress) → 502 Bad Gateway: all route targets failed
  • POST /v1/messages (Anthropic ingress, e.g. Claude Code) → upstream 404 / 403
  • Gateway debug logs show the final upstream URL as https://chatgpt.com/backend-api/codex/chat/completions

OAuth token issuance and refresh work correctly — the failure is in request routing, not authentication.

Root cause

The gateway unconditionally encodes every OpenAI-protocol egress via OpenAIEncoder, which targets Chat Completions. The Codex OAuth backend at https://chatgpt.com/backend-api/codex only accepts the Responses API (POST /responses, streaming only). Relevant code:

  • crates/nyro-core/src/protocol/openai/encoder.rs:72-74OpenAIEncoder::egress_path() hardcodes /v1/chat/completions.
  • crates/nyro-core/src/protocol/mod.rs:143-148get_encoder(ResponsesAPI) falls back to OpenAIEncoder; there is no dedicated Responses egress encoder.
  • crates/nyro-core/src/protocol/mod.rs:230-234resolve_egress() actively maps ResponsesAPI → OpenAI at lookup time, so even an explicit openai_responses endpoint on the provider would be ignored.
  • crates/nyro-core/src/proxy/adapter.rs:47-55OpenAICompatAdapter::build_url() strips /v1 when the base URL has a non-root path, producing https://chatgpt.com/backend-api/codex/chat/completions, which the Codex backend does not serve.

As a result, regardless of ingress protocol, the request body is encoded as Chat Completions and sent to /chat/completions — guaranteed to fail against the Codex backend.

Why this PR's OAuth scaffolding does not address it

This PR correctly decouples authentication (API key vs OAuth) at the provider/channel level, but the gateway still couples authentication with request format through the preset. The openai preset's codex channel ships with:

"baseUrls": { "openai": "https://chatgpt.com/backend-api/codex" },
"auth_mode": "oauth"

The baseUrls key is "openai", so the provider row ends up with protocol_endpoints = {"openai": ...}. The gateway therefore never considers Responses-format egress, even though the abstraction for it (Protocol::ResponsesAPI, per-protocol protocol_endpoints) already exists in the codebase. OAuth and request format are orthogonal concerns, but today they are entangled through this preset shape.

Proposed fix (clean, no vendor hardcoding)

Complete the per-protocol endpoint abstraction that is already half-built, and use it instead of introducing vendor-specific branches in the handler.

  1. Add ResponsesEncoder at crates/nyro-core/src/protocol/openai/responses/encoder.rs:

    • encode_request() emits a Responses-format body: instructions, input[] with message / function_call / function_call_output items, preserved tools / tool_choice.
    • Forces stream: true (required by the Codex backend).
    • egress_path() returns /v1/responses.
  2. Add ResponsesResponseParser / ResponsesStreamParser for upstream Responses-format SSE decoding. A working reference implementation exists in branch archive/feature-oauth-reference, commit 53a20ae (crates/nyro-core/src/protocol/openai/responses/parser.rs).

  3. Update protocol/mod.rs factories:

    • get_encoder(Protocol::ResponsesAPI) → new ResponsesEncoder.
    • get_response_parser(Protocol::ResponsesAPI) / get_stream_parser(Protocol::ResponsesAPI) → new Responses parsers.
    • Remove the ResponsesAPI → OpenAI lookup remapping in resolve_egress() so providers that declare an openai_responses endpoint actually route there.
  4. Update the Codex channel preset (crates/nyro-core/assets/providers.json):

    "baseUrls": { "openai_responses": "https://chatgpt.com/backend-api/codex" }
  5. Aggregate upstream stream for non-streaming clients in proxy/handler.rs. The Codex backend requires stream=true; when the ingress request is non-streaming, the gateway must consume the upstream SSE and return a single response body.

  6. Migrate existing Codex OAuth providers at startup (or via a one-shot admin call) from protocol_endpoints["openai"] to protocol_endpoints["openai_responses"] so existing rows don't need manual edits.

No changes to the vendor field or authentication logic. With this fix, authentication (auth_mode) and request format (protocol_endpoints key) are independently configurable — a future API-key-auth Responses endpoint, or an OAuth-auth Chat-Completions endpoint, would work via configuration alone.

Alternative considered

Branch archive/feature-oauth-reference (commit 53a20ae Fix Codex responses routing) takes a pragmatic shortcut: a vendor check (is_codex_providernormalize_driver_key(vendor) == "codex") inside handler.rs that bypasses the generic encoder for Codex providers, synthesises the Responses body inline, and pins the path to /responses. This fixes Codex specifically but couples request-format selection to the vendor string. Any other Responses-API-only upstream, or any API-key-authenticated Responses upstream, would require another vendor branch. The clean fix above subsumes that case without the coupling and is strictly additive to what this PR already ships.

Validation plan

  • codex-nyro-gpt54 exec --ephemeral 'Reply OK.' returns HTTP 200 with correct output.
  • claude-nyro-gpt54 -p 'Reply OK.' returns HTTP 200 with correct output.
  • Existing API-key-based OpenAI providers (pointing at api.openai.com/v1) continue to work without change.
  • OAuth token issuance, refresh, and re-bind paths introduced by this PR remain untouched.

Update 2026-04-18: this routing fix is tracked separately in issue #61 — it is orthogonal to OAuth (auth vs request format) and will land as its own PR against master after this OAuth PR merges. A prototype exists at iBreaker/nyro@b2f41b0 but is intentionally not included in this PR.

iBreaker added a commit to iBreaker/nyro that referenced this pull request Apr 18, 2026
Complete the Protocol::ResponsesAPI abstraction so auth_mode and request
format become orthogonal. Previously the Codex OAuth backend failed every
upstream call because all OpenAI-protocol egress was encoded as Chat
Completions and sent to /v1/chat/completions, which Codex does not serve.

- add ResponsesEncoder (targets /v1/responses, forces stream:true)
- add ResponsesResponseParser + ResponsesStreamParser for Responses SSE
- route Protocol::ResponsesAPI through dedicated encoder/parsers in factories
- drop ResponsesAPI → OpenAI remap in resolve_egress; ProviderProtocols
  falls back to any configured endpoint when declared default is missing
- add handle_non_stream_via_upstream_stream to aggregate SSE for clients
  that requested non-stream when upstream forces stream
- flip codex preset baseUrls key from "openai" to "openai_responses"
- add 7 unit tests covering encoder body shape, path, tool round-trip,
  max_output_tokens drop, and both parsers

Known limitation: max_output_tokens is not forwarded because the Codex
backend rejects it; callers needing a cap can pass it via req.extra.
Follow-ups tracked on PR nyroway#58: startup migration for existing rows,
preset defaultProtocol per-channel override, error-path verification.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
iBreaker and others added 2 commits April 19, 2026 06:17
Normalize provider credentials by auth mode so OAuth providers store tokens in access_token while keeping runtime fallback compatibility for legacy api_key data.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rename lingering OAuth binding references to provider status terminology and drop the outdated binding design doc so the branch matches the current implementation model.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@iBreaker iBreaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Thanks for the large OAuth integration work here — the overall direction makes sense, and the architecture is moving in a good direction:

  • provider-level OAuth credential fields
  • auth driver abstraction
  • runtime credential resolution in proxy/admin
  • OAuth session lifecycle APIs
  • frontend/Tauri wiring for create/bind/reconnect/logout

That said, I think there are two issues worth calling out, with one of them being a likely logic bug.

1. logout_provider_oauth changes the provider into api_key mode

I think this is the biggest issue in the PR.

In logout_provider_oauth, the code clears the OAuth credentials, but it also does:

  • auth_mode = "api_key"
  • clears api_key
  • clears access_token
  • clears refresh_token
  • clears expires_at

That effectively turns an OAuth-backed provider into an API-key provider with an empty key.

Why this is a problem

Elsewhere in the codebase, whether a provider participates in the OAuth lifecycle is largely determined by effective_auth_mode() == "oauth":

  • resolve_provider_runtime
  • refresh_oauth_providers
  • build_provider_oauth_status
  • frontend status/loading behavior

After logout, this provider is no longer clearly represented as “an OAuth provider that is currently disconnected”; instead it becomes “an API-key provider with no key”.

That causes a semantic mismatch:

  • users lose the clean distinction between “OAuth provider” and “API key provider”
  • reconnect flows become more fragile
  • behavior may now depend on preset/channel-derived auth mode to recover the intended semantics
  • non-preset / manually created OAuth providers are especially vulnerable to this

Suggested fix

I think logout should:

  • keep auth_mode = "oauth"
  • clear token-related fields
  • return a disconnected OAuth status

In other words: preserve the provider type, only remove the active binding.

2. OAuth sessions are in-memory only

The current auth session lifecycle is backed by gw.auth_sessions, which appears to be purely in-memory.

Why this matters

This means the OAuth flow is not resilient to:

  • app restart
  • backend restart
  • Tauri reload/dev refresh
  • process crash

If any of those happen in the middle of auth, then:

  • session_id becomes invalid
  • get_oauth_session_status fails
  • complete_oauth_session fails
  • create_provider_with_oauth_session / bind flows cannot recover

That may be acceptable for an MVP, but if this is intended as production-ready OAuth support, I think this limitation should either:

  • be made explicit in UX/API behavior, or
  • be addressed by persisting sessions in storage

At minimum, I would strongly suggest documenting this behavior so failures after restart are understandable rather than looking like random OAuth breakage.

Recommendation

I’d recommend addressing the logout semantics before merging.

The in-memory session model is a little more subjective, but I do think it should at least be explicitly acknowledged as a limitation in the current design.

iBreaker and others added 2 commits April 19, 2026 08:21
Keep logged out OAuth providers in oauth mode so they remain disconnected bindings instead of being rewritten as empty API key providers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stop treating api_key as an OAuth fallback credential, fail fast when refresh is required but unavailable, and document logout's legacy api_key cleanup behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@iBreaker
Copy link
Copy Markdown
Contributor Author

Follow-up: I addressed the OAuth issues called out in review.

Fixed in:

  • 333e2b9fix(oauth): preserve oauth mode on logout
  • 61366b2fix(oauth): tighten runtime credential handling

Summary of changes:

  • logout_provider_oauth now preserves auth_mode = "oauth" and only clears credential fields, so logged-out providers remain disconnected OAuth bindings instead of turning into empty API key providers.
  • build_provider_oauth_status no longer treats api_key as an OAuth fallback signal.
  • resolve_provider_runtime no longer falls back to api_key for OAuth providers, and now fails fast when refresh is required but unavailable / unsuccessful.
  • Added a regression test covering logout semantics and runtime resolution after disconnect.

I’m OK with the in-memory auth session model for now, so I’m considering that point accepted as a current limitation rather than a merge blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add OAuth 2.0 Support

1 participant