feat: add OAuth provider flow for Codex#58
Conversation
There was a problem hiding this comment.
Code Review: feat: add OAuth provider flow for Codex
Overview
This PR introduces a complete OAuth PKCE authentication flow for Codex (OpenAI's Codex service), covering auth session lifecycle management, token exchange/refresh, provider binding, background token refresh, and WebUI integration. The feature scope is comprehensive and the overall architecture is clean.
🔴 Must Fix
1. Silent fallback with expired token (Bug)
In resolve_provider_runtime (crates/nyro-core/src/admin/mod.rs), when the access token is expired and no refresh token is available, the function silently returns the expired token, causing downstream API calls to fail with 401:
```rust
// Happy path only when token is not expired
if !access_token.is_empty() && !is_expired_at(...) { return Ok(...) }
// When refresh_token is empty:
if refresh_token.is_empty() {
if !access_token.is_empty() {
return Ok(ResolvedProviderRuntime { access_token, ... }) // ← returns expired token!
}
}
```
Suggestion: When the token is expired and cannot be refreshed, return an explicit error instead of silently falling back.
2. OAuth tokens included in provider exports
ExportProvider includes access_token, refresh_token, and expires_at, which means exported config files contain live OAuth credentials. This is a potential credential leakage risk. Confirm whether this is intentional; at minimum document it clearly, or strip these fields during export.
🟡 Suggestions
3. Auth sessions / bindings stored in the settings KV store
Auth sessions and provider bindings are stored via settings.set("oauth.session.{id}", ...) rather than dedicated DB tables:
- No atomic transaction guarantees
- Orphaned sessions have no TTL-based cleanup (relies entirely on explicit cancel/complete)
- "Deletion" is implemented by writing an empty string — semantically ambiguous
- No index on the settings table; performance will degrade as data grows
Consider adding dedicated auth_sessions and provider_auth_bindings tables in a follow-up.
4. resolve_preset_channel_auth_mode is duplicated
There are two separate implementations in crates/nyro-core/src/db/models.rs and crates/nyro-core/src/admin/mod.rs with identical logic but different code. Consolidate into a single shared function.
5. PROVIDER_PRESETS_SNAPSHOT is parsed in three places
The same JSON is include_str! + serde_json::from_str parsed independently in openai.rs, db/models.rs, and admin/mod.rs. Consider a shared lazy-initialized structure or a single parse entry point.
6. Gemini headers built twice
In test_provider_models and get_provider_models, when protocol == "gemini", the first headers build is immediately discarded and the same logic runs again. Replace the build-then-override pattern with a proper if/else branch.
7. resolve_provider_credential may return expired tokens in OAuth mode
query_http_capability and detect_embedding_dimensions use the synchronous resolve_provider_credential, which does not trigger token refresh. If the token is expired, those calls will silently use stale credentials. Consider routing these through resolve_provider_runtime, or document this known limitation explicitly.
8. Clearing last_error via Some(String::new())
```rust
last_error: Some(String::new()), // empty string used to mean "no error"
```
Using Some("") instead of None to represent "no error" is semantically confusing and forces every caller to check both None and empty-string cases.
9. extract_models_from_response behavioral change is outside OAuth scope
The protocol parameter was renamed to _protocol (unused), the Gemini-gated model field extraction was broadened to all providers, and slug / id field fallbacks were added. This is a real behavioral change with broader impact — it should either be in a separate PR or explicitly called out in this one.
Test Coverage
The PR validation only includes cargo check and pnpm build. Suggest adding tests for:
- Expired token + valid refresh token → successful refresh and binding update
- Expired token + no refresh token → explicit error (currently a bug, see item 1)
- Callback state mismatch → exchange request rejected
resolve_preset_channel_auth_modepreset lookup logic
Summary
| Category | Count |
|---|---|
| 🔴 Must Fix (Bug / Security) | 2 |
| 🟡 Suggestions | 7 |
The core OAuth flow is functionally complete. The PKCE implementation, driver abstraction, and RuntimeBinding design are clean. The expired token silent fallback bug (item 1) should be fixed before merge. The remaining items can be prioritized for follow-up.
iBreaker
left a comment
There was a problem hiding this comment.
Additional Review Notes
[High] auth_mode has no effect at runtime — api_key providers can be hijacked by stale OAuth bindings
resolve_provider_runtime (crates/nyro-core/src/admin/mod.rs) does not branch on provider.effective_auth_mode(). Whenever a binding exists for the provider's driver key, the function unconditionally takes the OAuth path — resolving the access token, triggering a refresh if needed, and constructing a RuntimeBinding with base_url_override and extra_headers.
The proxy main path now fully depends on this resolved runtime for token, base_url_override, and extra_headers (crates/nyro-core/src/proxy/handler.rs).
Consequence: if a user switches a provider back to an api_key channel but the old OAuth binding is still present in storage, traffic will continue to carry the OAuth token and route to the OAuth base URL. The auth_mode field is effectively ignored at runtime.
Suggested fix — guard at the entry of resolve_provider_runtime:
pub(crate) async fn resolve_provider_runtime(&self, provider: &Provider) -> anyhow::Result<ResolvedProviderRuntime> {
// Respect the configured auth mode first
if provider.effective_auth_mode().trim() != "oauth" {
let api_key = provider.api_key.trim().to_string();
if api_key.is_empty() {
anyhow::bail!("provider api key is empty");
}
return Ok(ResolvedProviderRuntime {
access_token: api_key,
binding: RuntimeBinding::default(),
});
}
// ... existing OAuth logic below
}[Medium] Disconnecting OAuth leaves the provider in an unrecoverable state
logout_provider_oauth deletes the binding and then clears api_key, access_token, refresh_token, and expires_at on the provider record (crates/nyro-core/src/admin/mod.rs).
reconnect_provider_oauth, however, requires an existing binding with a non-empty refresh token to proceed. Once the binding is deleted by logout, reconnect will always fail with "provider oauth binding not found".
There is no API path to re-initiate an OAuth authorization flow and re-attach it to an existing provider. The user's only recovery option is to delete and recreate the provider entirely.
Suggested fix — pick one:
-
Soft disconnect:
logout_provider_oauthmarks the binding asdisconnected(status only) without deleting it or clearing the provider's credentials. The user can then reconnect using the existing refresh token if still valid. -
Re-authorize existing provider: Add a new endpoint (e.g.
POST /providers/{id}/oauth/authorize) that initiates a fresh OAuth session scoped to an existing provider ID, and on completion re-binds the new credentials to that provider without requiring a delete-and-recreate flow.
Option 2 is the more complete fix and also unblocks scenarios where the refresh token has genuinely expired.
iBreaker
left a comment
There was a problem hiding this comment.
-
High:
auth_modedoes not appear to be enforced as the runtime source of truth. A provider switched back toapi_keycan still be affected by stale OAuth bindings/runtime credentials, which means requests may follow the wrong auth path. Runtime resolution should hard-gate onprovider.effective_auth_mode()first, and completely ignore OAuth binding/refresh logic forapi_keyproviders. Please review the runtime resolution path aroundcrates/nyro-core/src/admin/mod.rs:1796. -
Medium: Disconnecting OAuth appears to leave the provider in a partially broken or inconsistent state instead of a cleanly recoverable one. After logout, the provider can still retain OAuth-oriented config/state without a complete reconnect/reset flow, which risks inconsistent behavior across testing, model discovery, and later edits. The logout path should produce a single well-defined post-disconnect state and keep frontend/backend behavior aligned, especially across
crates/nyro-core/src/admin/mod.rsandwebui/src/pages/providers.tsx.
…r existing providers - resolve_provider_runtime now checks effective_auth_mode() at entry; non-oauth providers skip binding lookup entirely, preventing stale oauth bindings from hijacking api_key mode traffic - logout_provider_oauth now soft-deletes the binding (status=disconnected) instead of hard-deleting, preserving the refresh token for re-bind - add bind_provider_with_oauth_session to attach a completed oauth session to an existing provider, enabling recovery after disconnect - expose bind via POST /providers/:id/oauth/bind (server) and bind_provider_oauth tauri command - remove now-unused delete_provider_auth_binding_record Closes nyroway#59, Closes nyroway#60 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Pushed a fix commit (7b67990) addressing the two issues raised in the review: #59 — auth_mode ignored at runtime #60 — provider unrecoverable after OAuth disconnect |
Codex OAuth upstream routing: root cause and proposed fixSymptomWith this PR's OAuth flow, a route pointing at an OpenAI provider configured with the Codex channel (
OAuth token issuance and refresh work correctly — the failure is in request routing, not authentication. Root causeThe gateway unconditionally encodes every OpenAI-protocol egress via
As a result, regardless of ingress protocol, the request body is encoded as Chat Completions and sent to Why this PR's OAuth scaffolding does not address itThis PR correctly decouples authentication (API key vs OAuth) at the provider/channel level, but the gateway still couples authentication with request format through the preset. The "baseUrls": { "openai": "https://chatgpt.com/backend-api/codex" },
"auth_mode": "oauth"The Proposed fix (clean, no vendor hardcoding)Complete the per-protocol endpoint abstraction that is already half-built, and use it instead of introducing vendor-specific branches in the handler.
No changes to the vendor field or authentication logic. With this fix, authentication ( Alternative consideredBranch Validation plan
|
Complete the Protocol::ResponsesAPI abstraction so auth_mode and request format become orthogonal. Previously the Codex OAuth backend failed every upstream call because all OpenAI-protocol egress was encoded as Chat Completions and sent to /v1/chat/completions, which Codex does not serve. - add ResponsesEncoder (targets /v1/responses, forces stream:true) - add ResponsesResponseParser + ResponsesStreamParser for Responses SSE - route Protocol::ResponsesAPI through dedicated encoder/parsers in factories - drop ResponsesAPI → OpenAI remap in resolve_egress; ProviderProtocols falls back to any configured endpoint when declared default is missing - add handle_non_stream_via_upstream_stream to aggregate SSE for clients that requested non-stream when upstream forces stream - flip codex preset baseUrls key from "openai" to "openai_responses" - add 7 unit tests covering encoder body shape, path, tool round-trip, max_output_tokens drop, and both parsers Known limitation: max_output_tokens is not forwarded because the Codex backend rejects it; callers needing a cap can pass it via req.extra. Follow-ups tracked on PR nyroway#58: startup migration for existing rows, preset defaultProtocol per-channel override, error-path verification. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Normalize provider credentials by auth mode so OAuth providers store tokens in access_token while keeping runtime fallback compatibility for legacy api_key data. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rename lingering OAuth binding references to provider status terminology and drop the outdated binding design doc so the branch matches the current implementation model. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
iBreaker
left a comment
There was a problem hiding this comment.
Review Summary
Thanks for the large OAuth integration work here — the overall direction makes sense, and the architecture is moving in a good direction:
- provider-level OAuth credential fields
- auth driver abstraction
- runtime credential resolution in proxy/admin
- OAuth session lifecycle APIs
- frontend/Tauri wiring for create/bind/reconnect/logout
That said, I think there are two issues worth calling out, with one of them being a likely logic bug.
1. logout_provider_oauth changes the provider into api_key mode
I think this is the biggest issue in the PR.
In logout_provider_oauth, the code clears the OAuth credentials, but it also does:
auth_mode = "api_key"- clears
api_key - clears
access_token - clears
refresh_token - clears
expires_at
That effectively turns an OAuth-backed provider into an API-key provider with an empty key.
Why this is a problem
Elsewhere in the codebase, whether a provider participates in the OAuth lifecycle is largely determined by effective_auth_mode() == "oauth":
resolve_provider_runtimerefresh_oauth_providersbuild_provider_oauth_status- frontend status/loading behavior
After logout, this provider is no longer clearly represented as “an OAuth provider that is currently disconnected”; instead it becomes “an API-key provider with no key”.
That causes a semantic mismatch:
- users lose the clean distinction between “OAuth provider” and “API key provider”
- reconnect flows become more fragile
- behavior may now depend on preset/channel-derived auth mode to recover the intended semantics
- non-preset / manually created OAuth providers are especially vulnerable to this
Suggested fix
I think logout should:
- keep
auth_mode = "oauth" - clear token-related fields
- return a disconnected OAuth status
In other words: preserve the provider type, only remove the active binding.
2. OAuth sessions are in-memory only
The current auth session lifecycle is backed by gw.auth_sessions, which appears to be purely in-memory.
Why this matters
This means the OAuth flow is not resilient to:
- app restart
- backend restart
- Tauri reload/dev refresh
- process crash
If any of those happen in the middle of auth, then:
session_idbecomes invalidget_oauth_session_statusfailscomplete_oauth_sessionfailscreate_provider_with_oauth_session/ bind flows cannot recover
That may be acceptable for an MVP, but if this is intended as production-ready OAuth support, I think this limitation should either:
- be made explicit in UX/API behavior, or
- be addressed by persisting sessions in storage
At minimum, I would strongly suggest documenting this behavior so failures after restart are understandable rather than looking like random OAuth breakage.
Recommendation
I’d recommend addressing the logout semantics before merging.
The in-memory session model is a little more subjective, but I do think it should at least be explicitly acknowledged as a limitation in the current design.
Keep logged out OAuth providers in oauth mode so they remain disconnected bindings instead of being rewritten as empty API key providers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stop treating api_key as an OAuth fallback credential, fail fast when refresh is required but unavailable, and document logout's legacy api_key cleanup behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Follow-up: I addressed the OAuth issues called out in review. Fixed in:
Summary of changes:
I’m OK with the in-memory auth session model for now, so I’m considering that point accepted as a current limitation rather than a merge blocker. |
Summary
Validation
Closes #55