Skip to content

feat(kb): harden sync engine and add connector audit logging#3697

Merged
waleedlatif1 merged 5 commits intostagingfrom
waleedlatif1/kb-connector-audit-logs
Mar 21, 2026
Merged

feat(kb): harden sync engine and add connector audit logging#3697
waleedlatif1 merged 5 commits intostagingfrom
waleedlatif1/kb-connector-audit-logs

Conversation

@waleedlatif1
Copy link
Collaborator

Summary

  • Stuck sync recovery: Added finally block in executeSync to reset stale syncing status on crash, plus cron-based stale lock detection (2hr TTL) that recovers connectors stuck in syncing and schedules retry in 10min
  • Token refresh mid-sync: OAuth tokens are now refreshed between pagination pages and before deferred content hydration — prevents auth failures during long syncs
  • GitHub deferred content loading: listDocuments now returns lightweight stubs using Git blob SHA as contentHash (no blob fetches). Content is only fetched via getDocument for new/changed documents, eliminating thousands of unnecessary API calls for large repos
  • Network error retry: Added fetch failed, econnreset, econnrefused, etimedout, enetunreach, socket hang up, network error to isRetryableError
  • S3 key length fix: Extracted sanitizeStorageTitle helper that truncates titles to 200 chars, fixing "Your key is too long" errors
  • Audit logging: Added recordAudit calls for connector CRUD, manual sync triggers, document exclude/restore operations, and restoration paths (KB, tables, workflows, files)
  • Loading skeleton lint fixes: Tailwind class ordering fixes from linter

Type of Change

  • Bug fix
  • New feature

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

- Fix stuck syncing status: added finally block in executeSync + stale lock recovery in cron scheduler (2hr TTL)
- Fix token expiry mid-sync: refresh OAuth token between pagination pages and before deferred content hydration
- GitHub deferred content loading: use Git blob SHA for change detection, only fetch content for new/changed docs
- Add network error keywords to isRetryableError (fetch failed, econnreset, etc.)
- Extract sanitizeStorageTitle helper to fix S3 key length limit issues
- Add audit logging for connector CRUD, sync triggers, document exclude/restore, and resource restoration paths
@cursor
Copy link

cursor bot commented Mar 21, 2026

PR Summary

Medium Risk
Medium risk: changes core connector sync behavior (deferred hydration, token refresh cadence, stale-lock recovery) and adds new DB updates that could affect sync scheduling/state if misconfigured.

Overview
Improves knowledge connector reliability and observability. The sync scheduler now recovers connectors stuck in syncing (marks as error, sets a retry nextSyncAt), and the sync engine adds a finally safety reset plus periodic OAuth token refresh during pagination and deferred hydration.

Reduces GitHub connector API load. listDocuments now returns stub documents with deferred content and uses Git blob SHA-based contentHash, fetching full content only via getDocument when needed; the sync engine adds support for contentDeferred hydration.

Adds audit coverage for KB operations. New AuditAction/AuditResourceType.CONNECTOR events are recorded for connector CRUD, manual sync triggers, connector document exclude/restore, and restore endpoints (knowledge base, table, workflow, workspace file), with tests updated to mock audit logging.

Also hardens storage key generation by sanitizing/truncating document titles, expands retryable network error detection, and includes small Tailwind class-order fixes in loading skeletons.

Written by Cursor Bugbot for commit e0649ad. Configure here.

@vercel
Copy link

vercel bot commented Mar 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Mar 21, 2026 4:32pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR significantly hardens the knowledge base sync engine with crash recovery, mid-sync token refresh, deferred GitHub content loading, network retry improvements, S3 key length fixes, and comprehensive audit logging across connector CRUD, document operations, and resource restore paths.

Key changes:

  • Sync crash recovery: A syncExitedCleanly flag + finally block prevents connectors from being stuck in syncing forever after unexpected termination. A cron-based stale lock detector (2hr TTL) provides a secondary recovery mechanism.
  • GitHub deferred loading: listDocuments now returns lightweight stubs using git tree SHA as contentHash, deferring blob fetches to getDocument only for new/changed files — eliminates thousands of unnecessary API calls for large repos.
  • Token refresh mid-sync: OAuth tokens are refreshed between pagination pages and before each deferred content hydration batch.
  • Network retry coverage: isRetryableError now covers DNS/connection/socket errors in addition to rate-limit responses.
  • S3 key fix: sanitizeStorageTitle truncates titles to 200 characters, fixing "key too long" upload errors.
  • Audit logging: recordAudit calls added for connector created/updated/deleted/synced, document excluded/restored, and all resource restore paths (KB, table, workflow, file).
  • Bug: The early-return guard that skips reconciliation when the source returns 0 documents (line 361–388 of sync-engine.ts) does not set syncExitedCleanly = true. The finally block then overwrites the correctly-written active status with error — connectors will appear broken after every skip-reconciliation sync.

Confidence Score: 4/5

  • Near-safe to merge; one targeted one-liner fix needed in sync-engine.ts before the stale-lock recovery logic is fully correct.
  • The PR addresses several genuine reliability problems and the changes are well-structured. The syncExitedCleanly flag approach resolves the previously-flagged race condition correctly in all paths except the early skip-reconciliation return, which is a simple missing assignment. That omission causes the connector status to be incorrectly reset to error after every successful skip-reconciliation sync — a visible regression in connector health reporting. All other changes (GitHub stub loading, token refresh, retry expansion, audit logging) are clean and correct. Score reflects one concrete fix needed before merge.
  • apps/sim/lib/knowledge/connectors/sync-engine.ts — missing syncExitedCleanly = true in the early-return skip-reconciliation path (line 388).

Important Files Changed

Filename Overview
apps/sim/lib/knowledge/connectors/sync-engine.ts Adds syncExitedCleanly flag for crash recovery, per-page token refresh, deferred content hydration batching, and S3 key length fix. Contains a bug: the early-return path that skips reconciliation (lines 361–388) does not set syncExitedCleanly = true, causing the finally block to corrupt the connector's status to error after a successful skip-reconciliation sync.
apps/sim/connectors/github/github.ts Replaces eager blob fetching with lightweight stubs using git tree SHA as contentHash; content is now fetched lazily via getDocument only for new/changed files. Significantly reduces API calls for large repos. Clean and correct.
apps/sim/app/api/knowledge/connectors/sync/route.ts Adds stale-lock recovery: connectors stuck in syncing for >2 hours are reset to error and rescheduled 10 minutes out. Straightforward and safe.
apps/sim/lib/audit/log.ts Adds new audit action constants for connector CRUD, connector document operations, and restore paths for KB, table, workflow, and file. No logic, additive-only change.
apps/sim/lib/knowledge/documents/utils.ts Extends isRetryableError with network-level keywords (econnreset, etimedout, fetch failed, etc.) and refactors to reuse lowercased message string. Clean improvement.
apps/sim/app/api/workspaces/[id]/files/[fileId]/restore/route.ts Adds FILE_RESTORED audit log. Uses fileId as resourceName (acceptable), but description is generic with no identifier unlike all other restore routes.
apps/sim/app/api/knowledge/[id]/connectors/[connectorId]/route.ts Adds audit logging for connector UPDATE and DELETE. Fetches connectorType for DELETE by extending the select. Well-structured.
apps/sim/connectors/types.ts Adds optional contentDeferred flag to ExternalDocument and updates contentHash doc comment to reflect connector-specific format. Clean type extension.

Sequence Diagram

sequenceDiagram
    participant Cron as Cron Job
    participant SyncRoute as /connectors/sync
    participant Engine as executeSync
    participant GitHub as GitHub API
    participant DB as Database
    participant S3 as S3 Storage

    Cron->>SyncRoute: GET (scheduled)
    SyncRoute->>DB: Reset stale syncing connectors (>2hr TTL)
    SyncRoute->>Engine: dispatchSync(connectorId)

    Engine->>DB: Acquire sync lock (set status=syncing)
    Note over Engine: syncExitedCleanly = false

    loop For each page
        alt OAuth connector
            Engine->>Engine: resolveAccessToken() — refresh token
        end
        Engine->>GitHub: listDocuments() — returns lightweight stubs (contentDeferred=true)
        GitHub-->>Engine: ExternalDocument stubs (contentHash = git-sha:{sha})
    end

    Engine->>DB: Fetch existing docs + excluded list
    Engine->>Engine: Diff stubs vs existing (hash compare)

    loop For each batch of pending ops
        alt Has deferred ops
            Engine->>Engine: resolveAccessToken() — refresh token
            Engine->>GitHub: getDocument() per deferred stub
            GitHub-->>Engine: Full content + blob SHA hash
        end
        Engine->>S3: uploadFile (sanitizeStorageTitle ≤200 chars)
        Engine->>DB: upsert document records
    end

    Engine->>DB: Set status=active, update lastSyncAt
    Note over Engine: syncExitedCleanly = true
    Engine-->>SyncRoute: SyncResult

    Note over Engine: finally block — no-op if syncExitedCleanly
Loading

Comments Outside Diff (1)

  1. apps/sim/lib/knowledge/connectors/sync-engine.ts, line 388 (link)

    P1 Early return doesn't set syncExitedCleanly

    The early-exit guard that skips reconciliation when the source returns 0 documents (lines 361–388) returns without setting syncExitedCleanly = true. The finally block then sees the flag as false and immediately overwrites the active status that was just written a few lines above with error and "Sync terminated unexpectedly".

    Concretely: the db.update at line 369–386 sets status: 'active', the function returns… then finally fires and sets status: 'error'. The connector ends up in an error state after what was actually a successful (skip-reconciliation) sync.

    Needs syncExitedCleanly = true before the return:

          syncExitedCleanly = true
          return result
        }

Last reviewed commit: "fix(kb): address PR ..."

…ion, resourceName

- Replace DB-read finally block with local syncExitedCleanly flag to avoid race condition
- Propagate fullDoc.contentHash during deferred content hydration
- Add resourceName to file restore audit record
@waleedlatif1
Copy link
Collaborator Author

@greptile

@waleedlatif1
Copy link
Collaborator Author

@cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@waleedlatif1 waleedlatif1 merged commit a64afac into staging Mar 21, 2026
11 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/kb-connector-audit-logs branch March 21, 2026 16:36
@waleedlatif1 waleedlatif1 restored the waleedlatif1/kb-connector-audit-logs branch March 21, 2026 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant