MSAL Token Renewal Fix — Safari Session Loss#27214
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Safari session loss by making MSAL token renewal more resilient (earlier renewal window + improved silent/interactive fallback behavior) and by adjusting token-refresh bookkeeping in the shared TokenService.
Changes:
- Increased the pre-expiry buffer used to schedule silent renewal from 1 minute to 5 minutes.
- Updated TokenService refresh error handling/flag clearing (including special-casing the “Frame window timed out” case).
- Updated MSAL renewal to use
forceRefresh, switch toacquireTokenPopup, and fall back to redirect; updated related unit tests.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| openmetadata-ui/src/main/resources/ui/src/utils/AuthProvider.util.ts | Adjusts the global token expiry buffer used to schedule refresh attempts. |
| openmetadata-ui/src/main/resources/ui/src/utils/Auth/TokenService/TokenServiceUtil.ts | Modifies refresh error handling and refresh-in-progress flag lifecycle. |
| openmetadata-ui/src/main/resources/ui/src/components/Auth/AppAuthenticators/MsalAuthenticator.tsx | Changes MSAL renewal strategy (silent refresh + popup/redirect fallback). |
| openmetadata-ui/src/main/resources/ui/src/components/Auth/AppAuthenticators/MsalAuthenticator.test.tsx | Updates tests to match the new MSAL renewal behavior. |
| } finally { | ||
| // If response is not null then clear the refresh flag | ||
| // For Callback based refresh token, response will be void | ||
| response && this.clearRefreshInProgress(); | ||
| this.clearRefreshInProgress(); | ||
| } |
There was a problem hiding this comment.
fetchNewToken() now clears refreshInProgress unconditionally in finally. This breaks callback-based renewal flows where renewToken() intentionally resolves to void and the refresh is completed later via a callback (e.g., OidcAuthenticator uses signinSilent() and clears the flag in handleSilentSignInSuccess). Clearing here can allow concurrent refresh attempts while the silent flow is still in progress. Suggestion: only clear in finally when the renewal returns a non-void response (e.g., response !== undefined), and clear on error paths explicitly (including the "Frame window timed out" case). Also add a unit test covering the void renewal case to prevent regressions.
| } catch (error) { | ||
| // Silent Frame window timeout error since it doesn't affect refresh token process | ||
| if ((error as AxiosError).message !== 'Frame window timed out') { | ||
| // Perform logout for any error | ||
| this.clearRefreshInProgress(); | ||
| this.clearRefreshInProgress(); | ||
|
|
||
| throw new Error( | ||
| `Failed to refresh token: ${(error as Error).message}` | ||
| ); | ||
| // Silent Frame window timeout error since it doesn't affect refresh token process | ||
| if ((error as AxiosError).message === 'Frame window timed out') { | ||
| return null; | ||
| } | ||
| // Do nothing | ||
|
|
There was a problem hiding this comment.
In this catch block, clearRefreshInProgress() is called both here and again in finally, which is redundant and makes the control flow harder to reason about. After adjusting the finally logic (see comment on the finally block), consider removing the duplicate clear from the catch and letting a single, well-defined path handle flag cleanup.
| // Popup blocked or failed — fall back to redirect (works on Safari) | ||
| await instance.acquireTokenRedirect(tokenRequest); | ||
|
|
||
| return msalResponse; | ||
| throw popupError; | ||
| } |
There was a problem hiding this comment.
After falling back from acquireTokenPopup to acquireTokenRedirect, the code rethrows popupError. In the token-renewal path this is likely to surface as an unhandled promise rejection (the refresh timer calls refreshToken() without awaiting/catching) and may also trigger higher-level error handling that treats renewal as a hard failure, even though a redirect-based auth flow has been started. Consider not throwing after initiating the redirect (e.g., return null/void to indicate "redirect in progress") and ensure the refresh-in-progress flag is handled appropriately for this redirect-based flow.
openmetadata-ui/src/main/resources/ui/src/utils/Auth/TokenService/TokenServiceUtil.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (1)
openmetadata-ui/src/main/resources/ui/src/components/Auth/AppAuthenticators/MsalAuthenticator.tsx:107
- The renewal fallback here only tries
acquireTokenPopup(). For Safari/WebKit (and any non-user-gesture renewal), popups can be blocked, which is the core issue described in the PR summary. If the intended fix is anacquireTokenPopup()→acquireTokenRedirect()chain, that redirect fallback is currently missing; add a redirect fallback on popup failure (or use redirect directly for renewal) so renewal can complete without a user gesture.
const fetchIdToken = async (
shouldFallbackToPopup = false
): Promise<OidcUser> => {
const tokenRequest = {
account: account || accounts[0],
scopes: msalLoginRequest.scopes,
forceRefresh: shouldFallbackToPopup,
};
try {
const response = await instance.acquireTokenSilent(tokenRequest);
const msalResponse = await parseMSALResponse(response);
return msalResponse;
} catch (error) {
if (
error instanceof InteractionRequiredAuthError &&
shouldFallbackToPopup
) {
const response = await instance.acquireTokenPopup(tokenRequest);
const msalResponse = await parseMSALResponse(response);
return msalResponse;
} else {
// eslint-disable-next-line no-console
console.error(error);
throw error;
}
}
| // eslint-disable-next-line no-console | ||
| console.warn( | ||
| 'Token persistence timed out, proceeding with callback' | ||
| ); |
There was a problem hiding this comment.
Avoid leaving console.* calls in production code. Even with eslint-disable, these warnings add noise and can leak internal timing behavior; consider using an existing app logger/telemetry hook or remove the log and rely on the boolean return value/metrics instead.
| // eslint-disable-next-line no-console | |
| console.warn( | |
| 'Token persistence timed out, proceeding with callback' | |
| ); | |
| // Continue with the refresh callback even if persistence confirmation times out |
| // Call renewal method according to the provider | ||
| async fetchNewToken() { | ||
| let response: string | AccessTokenResponse | null | void = null; | ||
| if (typeof this.renewToken === 'function') { | ||
| try { | ||
| response = await this.renewToken(); | ||
| } catch (error) { | ||
| // Silent Frame window timeout error since it doesn't affect refresh token process | ||
| if ((error as AxiosError).message !== 'Frame window timed out') { | ||
| // Perform logout for any error | ||
| this.clearRefreshInProgress(); | ||
|
|
||
| throw new Error( | ||
| `Failed to refresh token: ${(error as Error).message}` | ||
| ); | ||
| if ((error as AxiosError).message === 'Frame window timed out') { | ||
| return null; | ||
| } | ||
| // Do nothing | ||
|
|
||
| throw new Error(`Failed to refresh token: ${(error as Error).message}`); | ||
| } finally { | ||
| // If response is not null then clear the refresh flag | ||
| // For Callback based refresh token, response will be void | ||
| response && this.clearRefreshInProgress(); | ||
| this.clearRefreshInProgress(); | ||
| } |
There was a problem hiding this comment.
fetchNewToken() now clears refreshInProgress unconditionally in finally. For callback-based renewals (where renewToken() returns void and the actual token update happens later in the silent-callback), this drops the cross-tab lock while the renewal is still in-flight, which can allow parallel renewals (extra iframes/popups) and defeat the intended de-duping. Consider only clearing in finally when a concrete token/response is returned, and clearing on error paths separately, or introducing a separate "callbackRenewalInFlight" flag that is cleared by the silent-callback handler.
| useEffect(() => { | ||
| const handleVisibilityChange = async () => { | ||
| if (document.visibilityState !== 'visible') { | ||
| return; | ||
| } | ||
| try { | ||
| const token = await getOidcToken(); | ||
| const { isExpired, timeoutExpiry } = extractDetailsFromToken(token); | ||
|
|
||
| // eslint-disable-next-line no-console | ||
| console.debug( | ||
| '[VisibilityHandler] token length:', | ||
| token?.length, | ||
| 'isExpired:', | ||
| isExpired, | ||
| 'timeoutExpiry:', | ||
| timeoutExpiry, | ||
| 'hasTokenService:', | ||
| !!tokenService.current | ||
| ); | ||
|
|
||
| if (isExpired || timeoutExpiry <= 0) { | ||
| tokenService.current?.refreshToken(); | ||
| } else { | ||
| startTokenExpiryTimer(); | ||
| } | ||
| } catch (error) { |
There was a problem hiding this comment.
This visibility-change effect uses startTokenExpiryTimer() but has an empty dependency array, so the handler will capture the initial (potentially stale) version of startTokenExpiryTimer/timeoutId/authConfig. This can lead to timers not being cleared/rescheduled correctly after state updates. Fix by memoizing startTokenExpiryTimer with useCallback and including it (and any other referenced values) in the effect deps, or by moving the needed state into refs.
| // eslint-disable-next-line no-console | ||
| console.debug( | ||
| '[VisibilityHandler] token length:', | ||
| token?.length, | ||
| 'isExpired:', | ||
| isExpired, | ||
| 'timeoutExpiry:', | ||
| timeoutExpiry, | ||
| 'hasTokenService:', | ||
| !!tokenService.current | ||
| ); | ||
|
|
||
| if (isExpired || timeoutExpiry <= 0) { | ||
| tokenService.current?.refreshToken(); | ||
| } else { | ||
| startTokenExpiryTimer(); | ||
| } | ||
| } catch (error) { | ||
| // eslint-disable-next-line no-console | ||
| console.error('[VisibilityHandler] error:', error); |
There was a problem hiding this comment.
Please remove the new console.debug/console.error calls from this auth flow. Even with eslint-disable, the UI guidelines for this repo treat console statements as debug code that should not ship; if you need observability here, route through the app’s logging/toast mechanisms or keep this strictly behind a development-only flag.
| // eslint-disable-next-line no-console | |
| console.debug( | |
| '[VisibilityHandler] token length:', | |
| token?.length, | |
| 'isExpired:', | |
| isExpired, | |
| 'timeoutExpiry:', | |
| timeoutExpiry, | |
| 'hasTokenService:', | |
| !!tokenService.current | |
| ); | |
| if (isExpired || timeoutExpiry <= 0) { | |
| tokenService.current?.refreshToken(); | |
| } else { | |
| startTokenExpiryTimer(); | |
| } | |
| } catch (error) { | |
| // eslint-disable-next-line no-console | |
| console.error('[VisibilityHandler] error:', error); | |
| if (isExpired || timeoutExpiry <= 0) { | |
| tokenService.current?.refreshToken(); | |
| } else { | |
| startTokenExpiryTimer(); | |
| } | |
| } catch (error) { | |
| // No-op: avoid shipping console statements in auth flow. |
| // Capture console messages to verify the handler fires | ||
| const logs: string[] = []; | ||
| page.on('console', (msg) => logs.push(msg.text())); | ||
|
|
||
| // Simulate the tab becoming visible (user returns from idle period). | ||
| // The handler should detect the expired token and call refreshToken(). | ||
| await page.evaluate(() => { | ||
| document.dispatchEvent(new Event('visibilitychange')); | ||
| }); | ||
|
|
||
| // Wait for the async handler to complete | ||
| await page.waitForTimeout(3000); | ||
|
|
||
| // Verify the handler detected the expired token and attempted refresh. | ||
| // The deployed handler logs "[VisibilityHandler] token length: X isExpired: true" | ||
| const handlerDetectedExpiry = logs.some( | ||
| (l) => | ||
| l.includes('[VisibilityHandler]') && l.includes('isExpired: true') | ||
| ); | ||
|
|
||
| expect(handlerDetectedExpiry).toBe(true); | ||
| }); |
There was a problem hiding this comment.
This test asserts behavior by scraping console output ([VisibilityHandler] ... isExpired: true). That makes the test brittle and effectively forces debug logging to remain in production auth code. Prefer asserting observable outcomes instead (e.g., token value in IndexedDB changed, mock OIDC refresh metrics incremented, or a protected API call succeeds after the visibilitychange).
| - "openmetadata-ui/src/main/resources/ui/src/components/Auth/**" | ||
| - "openmetadata-ui/src/main/resources/ui/src/rest/auth-API*" | ||
| - "openmetadata-ui/src/main/resources/ui/src/utils/AuthProvider*" | ||
| - "openmetadata-ui/src/main/resources/ui/src/utils/TokenServiceUtil*" |
There was a problem hiding this comment.
The workflow path filter includes openmetadata-ui/src/main/resources/ui/src/utils/TokenServiceUtil*, but the TokenService util lives under src/utils/Auth/TokenService/TokenServiceUtil.ts. As written, TokenService-only changes may not trigger this workflow. Update the paths filter to match the actual directory (src/utils/Auth/TokenService/** or similar).
| - "openmetadata-ui/src/main/resources/ui/src/utils/TokenServiceUtil*" | |
| - "openmetadata-ui/src/main/resources/ui/src/utils/Auth/TokenService/**" |
| permissions: | ||
| contents: read | ||
| pull-requests: write | ||
|
|
There was a problem hiding this comment.
pull-requests: write is granted at the workflow level, so the main test job also runs with PR write access. For least privilege, consider moving PR write permission to the sso-summary job only (job-level permissions:), and keep the test job as contents: read.
| ); | ||
|
|
||
| if (isExpired || timeoutExpiry <= 0) { | ||
| tokenService.current?.refreshToken(); |
There was a problem hiding this comment.
tokenService.current?.refreshToken() is invoked without await. Since refreshToken() is async and can throw, this can create an unhandled promise rejection (the surrounding try/catch won’t catch it). Await the call (and optionally handle/log the error) to keep failures contained.
| tokenService.current?.refreshToken(); | |
| await tokenService.current?.refreshToken(); |
Code Review ✅ Approved 2 resolved / 2 findingsFixes MSAL token renewal to prevent Safari session loss by removing redundant clearRefreshInProgress calls and ensuring waitForTokenPersistence timeout prevents success callbacks from firing unexpectedly. ✅ 2 resolved✅ Quality: clearRefreshInProgress called redundantly in catch and finally
✅ Edge Case: waitForTokenPersistence silently times out, success callback still fires
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
🟡 Playwright Results — all passed (24 flaky)✅ 3594 passed · ❌ 0 failed · 🟡 24 flaky · ⏭️ 207 skipped
🟡 24 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
* MSAL Token Renewal Fix — Safari Session Loss * MSAL Token Renewal Fix — Safari Session Loss * MSAL Token Renewal Fix — Safari Session Loss * apply lint * MSAL Token Renewal Fix — OIDC fix * wait for token update * fix unit tests * Add SSO playwright tests * Add tests --------- Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com> (cherry picked from commit b2b49db)
* MSAL Token Renewal Fix — Safari Session Loss * MSAL Token Renewal Fix — Safari Session Loss * MSAL Token Renewal Fix — Safari Session Loss * apply lint * MSAL Token Renewal Fix — OIDC fix * wait for token update * fix unit tests * Add SSO playwright tests * Add tests --------- Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com> (cherry picked from commit b2b49db)
* MSAL Token Renewal Fix — Safari Session Loss * MSAL Token Renewal Fix — Safari Session Loss * MSAL Token Renewal Fix — Safari Session Loss * apply lint * MSAL Token Renewal Fix — OIDC fix * wait for token update * fix unit tests * Add SSO playwright tests * Add tests --------- Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
* MSAL Token Renewal Fix — Safari Session Loss * MSAL Token Renewal Fix — Safari Session Loss * MSAL Token Renewal Fix — Safari Session Loss * apply lint * MSAL Token Renewal Fix — OIDC fix * wait for token update * fix unit tests * Add SSO playwright tests * Add tests --------- Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>



Summary
acquireTokenSilent was called without forceRefresh, so MSAL returned cached tokens from its internal cache without making a network call. The "renewed" idToken had the same expiry as the
original — the renewal was a no-op. The token expired immediately, triggered a 401, and the user was force-logged out.
On Safari this was worse: when MSAL eventually needed a network call (refresh token expired), it used a hidden iframe to Azure AD. Safari's ITP blocked the third-party cookies, throwing
InteractionRequiredAuthError. The existing loginPopup fallback (added in #27189) was also blocked by Safari since popups require a direct user gesture — a timer-based renewal is not one.
Additionally, TokenServiceUtil.fetchNewToken had a bug where a 'Frame window timed out' error left the refreshInProgress flag permanently set in localStorage, blocking all future renewal
attempts.
Changes
MsalAuthenticator.tsx
TokenServiceUtil.ts
MsalAuthenticator.test.tsx
Playwright Tests added
SSO Authentication E2E Tests
OIDC Discovery
Test Control API
OIDC Login Flow
Silent Token Renewal
Iframe Renewal Failure with Popup Fallback
Multi-Tab Token Renewal
clearRefreshInProgress Recovery
401 Interceptor and Request Retry
WebKit.
Refresh Token Scenarios
Logout Cleanup
Token Expiry Timer