Skip to content

fix(core): restore Integration status to ENABLED on successful re-auth#574

Merged
d-klotz merged 2 commits intonextfrom
fix/core-restore-integration-status-on-reauth
Apr 16, 2026
Merged

fix(core): restore Integration status to ENABLED on successful re-auth#574
d-klotz merged 2 commits intonextfrom
fix/core-restore-integration-status-on-reauth

Conversation

@d-klotz
Copy link
Copy Markdown
Contributor

@d-klotz d-klotz commented Apr 15, 2026

Summary

When a user re-authorizes an integration whose parent Integration record is in a broken state (ERROR or DISABLED), the framework refreshes the credential correctly but leaves Integration.status untouched. The queue worker short-circuits on DISABLED (backend-utils.js:157,173), so webhooks are silently discarded forever even after the customer completes a new OAuth flow.

This PR walks from the re-authorized entity up to any parent integrations and flips those in {ERROR, DISABLED} back to ENABLED, so customer-driven reconnect actually recovers the integration.

This is Gap C of 3 from an RCA on Attio dead-token loops. Gap A (OAuth2Requester.refreshAuth unconditionally POSTs grant_type=refresh_token even when the stored refresh_token is null) and Gap B (no auto-flip to ERROR on DLGT_INVALID_AUTH) are separate upcoming PRs.

Changes

  • New interface method findIntegrationsByEntityId(entityId) on IntegrationRepositoryInterface.
  • Implementations:
    • Postgres: many-to-many via entities.some.id
    • Mongo: scalar-array via entityIds.has
    • DocumentDB: raw findMany with { entityIds: objectId }
  • Use case ProcessAuthorizationCallback:
    • Accepts optional integrationRepository (backward compatible — omitted injection is a silent no-op).
    • At the end of execute(), calls new restoreIntegrationsForEntity(entityId) helper which looks up integrations referencing the entity and flips any in STATUSES_RESET_ON_REAUTH = ['ERROR', 'DISABLED'] to ENABLED.
    • Wrapped in try/catch so a DB hiccup never fails an otherwise-successful re-auth; logs console.error on failure.
    • Logs [Frigg] Restoring integration X from Y to ENABLED after successful re-auth (entityId=Z) for operator visibility.
  • Wiring: integration-router.js passes the already-created integrationRepository to the ProcessAuthorizationCallback constructor.
  • Test double TestIntegrationRepository extended with matching in-memory findIntegrationsByEntityId.

Semantic decision

Both ERROR and DISABLED are reset on re-auth. Reasoning: the user clicking "Reconnect" is an implicit opt-in to re-enable — whether the integration went bad via a system error or a manual user pause, the action of re-authorizing expresses intent to use it again. PROCESSING (active-run state) and NEEDS_CONFIG (config-incomplete, auth alone doesn't fix) are intentionally excluded.

Test plan

Unit tests (all 5 pass):

  • ERROR integration flips to ENABLED
  • DISABLED integration flips to ENABLED
  • ENABLED integration is a no-op (no unnecessary updates)
  • Empty integrations list (first-time auth) does not throw
  • Mixed statuses: only broken ones flip, count is exact

Manual end-to-end verification on a local install:

  • Integration id=48 set to status=DISABLED + Credential.authIsValid=false
  • Completed full Attio OAuth re-auth flow
  • Confirmed via GET /api/integrations and direct DB query:
    • Integration[48].statusENABLED
    • Credential[48].authIsValidtrue
    • access_token advanced
    • Log line [Frigg] Restoring integration 48 from DISABLED to ENABLED after successful re-auth (entityId=60) present

🤖 Generated with Claude Code

When a user re-authorizes an integration whose parent Integration record
is in a broken state (ERROR or DISABLED), Frigg refreshes the credential
correctly but leaves Integration.status untouched. The queue worker
short-circuits on DISABLED, so webhooks are silently discarded forever
even after the customer completes a new OAuth flow.

This fix walks from the re-authorized entity up to any parent integrations
and flips those in {ERROR, DISABLED} back to ENABLED, so customer-driven
reconnect actually recovers the integration.

Changes:
- Add findIntegrationsByEntityId to IntegrationRepositoryInterface,
  implemented in Postgres (many-to-many via entities.some), Mongo
  (entityIds.has scalar array), and DocumentDB (raw findMany).
- Inject integrationRepository into ProcessAuthorizationCallback; run
  the restoration at the end of execute(), after credential + entity
  are persisted.
- Status reset is best-effort (try/catch + console.error on failure)
  so a DB hiccup never fails an otherwise-successful re-auth.
- Log each status flip for operator visibility.
- Extend TestIntegrationRepository double with matching in-memory method.
- 5 unit tests covering ERROR flip, DISABLED flip, ENABLED no-op, empty
  list (first-time auth), and mixed statuses.

This is Gap C of 3 from an RCA on Attio dead-token loops. Gaps A
(unconditional refresh POST when refresh_token is null) and B (no
auto-flip to ERROR on DLGT_INVALID_AUTH) are separate upcoming PRs.

Verified end-to-end against a live Attio integration: status flipped
DISABLED → ENABLED on re-auth with log line
"[Frigg] Restoring integration 48 from DISABLED to ENABLED after
successful re-auth (entityId=60)".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@d-klotz d-klotz added release Create a release when this pr is merged prerelease This change is available in a prerelease. labels Apr 15, 2026
The comment was referencing a Quo-internal RCA. Removed for upstream clarity
without changing semantics — the rule itself and its rationale remain in the
comment block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@d-klotz
Copy link
Copy Markdown
Contributor Author

d-klotz commented Apr 15, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
D Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

d-klotz added a commit that referenced this pull request Apr 15, 2026
…sible

For OAuth providers that never issue refresh tokens (canonical case: Attio),
Frigg's 401 retry path today makes a guaranteed HTTP 400 on every retry
attempt because the token endpoint explicitly rejects grant_type=refresh_token.
In production this produces the primary error-log volume on the Attio queue
worker.

The `isRefreshable` flag is already the gate for the 401 retry path at
requester.js:74 — when false, the retry short-circuits straight to
notify(DLGT_INVALID_AUTH) with no doomed POST. The bug is that the default
is unconditionally `true`, ignoring whether a refresh_token was ever issued.

This makes `isRefreshable` honest: `true` means "I can refresh," `false`
means "I can't — go straight to invalidation." Derivation is consistent at
every point where `refresh_token` state is set.

Changes:
- Constructor (oauth-2.js): narrow `isRefreshable` to false when the grant
  type is non-`client_credentials` and no `refresh_token` is present.
  Subclass opt-outs via `this.isRefreshable = false` after `super(params)`
  continue to work — the narrowing runs before the subclass body.
- setTokens (oauth-2.js): re-derive `isRefreshable` from the current state
  after any token-response update. This correctly WIDENS from false → true
  on first-time auth for refresh-supporting providers (Pipedrive, Zoho)
  whose initial auth-code response carries a refresh_token. Idempotent on
  subsequent refreshes.
- refreshAuth (oauth-2.js): defense-in-depth guard for direct callers.
  When `isRefreshable` is false, short-circuit to
  `notify(DLGT_INVALID_AUTH)` without invoking the token endpoint.

Unit tests (oauth-2.test.js):
- 5 tests for constructor narrowing across all grant types, including
  subclass-override-wins regression
- 4 tests for setTokens re-derivation covering widen / preserve / narrow /
  client_credentials-passthrough
- 1 test for refreshAuth guard
- 1 integration test through _get: OAuth2Requester with no refresh_token
  receives 401, verifies refreshAuth/refreshAccessToken are never called,
  fetch invoked exactly once, DLGT_INVALID_AUTH fired once
- Updated 1 pre-existing test to supply refresh_token explicitly, since
  "default isRefreshable=true" now requires a refresh_token

This is Gap A of three framework-level fixes for the Attio dead-token RCA.
Gap C (restore Integration.status on successful re-auth) landed in #574
— this PR reuses no Gap-C code but was stacked on that branch during
local testing. Gap B (auto-flip Integration.status to DISABLED on
DLGT_INVALID_AUTH) is a separate upcoming PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d-klotz added a commit that referenced this pull request Apr 16, 2026
When OAuth2Requester.refreshAuth fails and Module.markCredentialsInvalid
flips Credential.authIsValid to false, Integration.status was left as
ENABLED. The queue worker only short-circuits on DISABLED, so every
subsequent webhook kept getting processed, failed again, and re-entered
the loop — producing 94k+ error lines/day for Attio integrations with
dead tokens.

This fix uses Frigg's existing Delegate pattern to propagate the failure
upward:

1. Module fires a new CREDENTIAL_INVALIDATED event from
   markCredentialsInvalid (best-effort, try/catch so it never alters
   refreshAuth's documented `return false` contract).

2. IntegrationBase.receiveNotification catches the event and calls
   updateIntegrationStatus.execute(this.id, 'DISABLED').

3. The delegate wire (module.delegate = this) is installed in
   IntegrationBase._appendModules, which covers all 7 code paths that
   construct Integration instances (HTTP reads, queue workers,
   create/update/delete flows).

Recovery is handled by the Gap C fix (PR #574):
ProcessAuthorizationCallback.restoreIntegrationsForEntity flips
{ERROR, DISABLED} back to ENABLED on successful re-auth.

This is Gap B of 3 from the Attio dead-token RCA. Gap A (refresh
short-circuit when refresh_token is null) is PR #575.

6 unit tests covering: delegate propagation, null-delegate backward
compat, status flip, unknown-delegate no-op, missing-id guard, and
module-delegate wiring at construction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d-klotz added a commit that referenced this pull request Apr 16, 2026
…ker discard

Per Frigg team lead feedback: DISABLED is reserved for user-intent
("turn off integration, keep settings"). System-driven credential
failures should use ERROR to preserve the semantic distinction.

Changes:
- IntegrationBase.receiveNotification: DISABLED -> ERROR
- backend-utils.js queue worker: extend status discard check at
  both lines 157 and 173 from `=== 'DISABLED'` to
  `['DISABLED', 'ERROR'].includes(status)` so ERROR integrations
  are also short-circuited (no wasted webhook processing)
- Updated log messages to show actual status dynamically
- Added test for ERROR discard in integration-defined-workers

Gap C (PR #574) already resets both {ERROR, DISABLED} on re-auth
via STATUSES_RESET_ON_REAUTH — no change needed there.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@d-klotz d-klotz merged commit 55a8027 into next Apr 16, 2026
5 of 8 checks passed
@d-klotz d-klotz deleted the fix/core-restore-integration-status-on-reauth branch April 16, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

prerelease This change is available in a prerelease. release Create a release when this pr is merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant