Skip to content

feat: pcc sync worker (CM-1086)#4006

Merged
themarolt merged 27 commits intomainfrom
feat/pcc-sync-CM-1086-CM-1087-CM-1088-CM-1089
Apr 20, 2026
Merged

feat: pcc sync worker (CM-1086)#4006
themarolt merged 27 commits intomainfrom
feat/pcc-sync-CM-1086-CM-1087-CM-1088-CM-1089

Conversation

@themarolt
Copy link
Copy Markdown
Contributor

@themarolt themarolt commented Apr 7, 2026

Note

Medium Risk
Adds a new worker that performs production DB updates to segments/insightsProjects and introduces new job-claiming/filtering semantics in MetadataStore, which could affect Snowflake job processing and cleanup behavior if misconfigured.

Overview
Introduces a new pcc_sync_worker service that schedules daily Snowflake exports/cleanup via Temporal, streams exported Parquet files from S3, and syncs PCC project metadata into CDP by updating matching segments and creating/updating insightsProjects.

Adds DB support for the sync by introducing a nullable segments.maturity column and a new pcc_projects_sync_errors table with deduping indexes for tracking schema issues, hierarchy/slug mismatches, and name conflicts.

Refactors the shared @crowd/snowflake library to centralize MetadataStore, S3Service, and SnowflakeExporter (including buildS3FilenamePrefix), and updates snowflake_connectors to use these exports plus platform-filtered job claiming/cleanup (with a new releaseClaim flow and optional skipped-count cleanup gating).

Reviewed by Cursor Bugbot for commit 15bb77c. Bugbot is set up for automated code reviews on this repo. Configure here.

Copilot AI review requested due to automatic review settings April 7, 2026 08:42
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventional Commits FTW!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new Temporal worker to export PCC project hierarchy data from Snowflake to S3 (Parquet) and sync it into CDP (segments + insightsProjects), while refactoring shared Snowflake/S3/metadata components into @crowd/snowflake and updating the existing snowflake_connectors app to consume them.

Changes:

  • Add pcc_sync_worker app with Temporal schedules/workflows, export/cleanup activities, Parquet parsing, and a DB-sync consumer.
  • Move/centralize Snowflake export job metadata + S3/Parquet consumption logic into services/libs/snowflake and update snowflake_connectors to use it.
  • Add DB migration for PCC sync support (segments.maturity + pcc_projects_sync_errors table + dedup index), plus worker Docker/compose setup.

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
services/libs/snowflake/src/snowflakeExporter.ts Fix internal import to avoid self-package import/cycles.
services/libs/snowflake/src/s3Service.ts New S3 download/delete + Parquet row iteration utility.
services/libs/snowflake/src/metadataStore.ts Add platform filtering + named params for export job bookkeeping.
services/libs/snowflake/src/index.ts Export new Snowflake lib surface (metadata store, S3 service, exporter).
services/libs/snowflake/package.json Add S3 + Parquet deps and DB dependency for the library.
services/apps/snowflake_connectors/src/consumer/transformerConsumer.ts Use @crowd/snowflake MetadataStore/S3Service; add enabled-platform filtering.
services/apps/snowflake_connectors/src/activities/exportActivity.ts Switch imports to @crowd/snowflake.
services/apps/snowflake_connectors/src/activities/cleanupActivity.ts Use shared MetadataStore/S3Service and pass enabled platforms to cleanup.
services/apps/snowflake_connectors/package.json Remove direct S3/Parquet deps (now come from @crowd/snowflake).
services/apps/pcc_sync_worker/tsconfig.json New TS config for PCC worker app.
services/apps/pcc_sync_worker/src/workflows/index.ts Workflow exports.
services/apps/pcc_sync_worker/src/workflows/exportWorkflow.ts Temporal workflow to run PCC export activity.
services/apps/pcc_sync_worker/src/workflows/cleanupWorkflow.ts Temporal workflow to run PCC cleanup activity.
services/apps/pcc_sync_worker/src/scripts/triggerExport.ts Manual script to start export workflow.
services/apps/pcc_sync_worker/src/scripts/triggerCleanup.ts Manual script to start cleanup workflow.
services/apps/pcc_sync_worker/src/schedules/pccS3Export.ts Temporal schedule registration for daily PCC export.
services/apps/pcc_sync_worker/src/schedules/pccS3Cleanup.ts Temporal schedule registration for daily PCC cleanup.
services/apps/pcc_sync_worker/src/schedules/index.ts Schedule exports.
services/apps/pcc_sync_worker/src/parser/types.ts Parquet-row + parsed-project types.
services/apps/pcc_sync_worker/src/parser/rowParser.ts Pure PCC row parsing + hierarchy mapping rules.
services/apps/pcc_sync_worker/src/parser/index.ts Parser exports.
services/apps/pcc_sync_worker/src/main.ts ServiceWorker archetype configuration.
services/apps/pcc_sync_worker/src/index.ts Worker entrypoint: init + schedule + start consumer + start Temporal worker.
services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts PCC job polling + Parquet processing + DB sync + error recording.
services/apps/pcc_sync_worker/src/config/settings.ts Re-export Temporal config helpers.
services/apps/pcc_sync_worker/src/activities/index.ts Activity exports.
services/apps/pcc_sync_worker/src/activities/exportActivity.ts Snowflake recursive CTE export into S3 + metadata insert.
services/apps/pcc_sync_worker/src/activities/cleanupActivity.ts Cleanup exported S3 files + mark jobs cleaned + Slack alerting on failures.
services/apps/pcc_sync_worker/package.json PCC worker package manifest + scripts.
scripts/services/pcc-sync-worker.yaml Compose service definitions for PCC worker (prod/dev).
scripts/services/docker/Dockerfile.pcc_sync_worker.dockerignore Docker ignore file for PCC worker build context.
scripts/services/docker/Dockerfile.pcc_sync_worker Multi-stage build for PCC worker.
backend/src/database/migrations/V1775312770__pcc-sync-worker-setup.sql Add segments.maturity + PCC sync errors table + dedup index.
backend/src/database/migrations/U1775312770__pcc-sync-worker-setup.sql Rollback for PCC sync DB changes.
Comments suppressed due to low confidence (2)

services/libs/snowflake/src/metadataStore.ts:77

  • When platforms is provided as an empty array, this method falls back to no filter and will claim jobs for all platforms. That’s risky if CROWD_SNOWFLAKE_ENABLED_PLATFORMS is accidentally empty/misconfigured. Consider treating an explicit empty platforms list as “match nothing” (return null early, or inject an AND FALSE filter).
    services/libs/snowflake/src/metadataStore.ts:125
  • platforms being an empty array currently results in no platform filter, so cleanup can target jobs for all platforms if the enabled-platforms list is empty/misconfigured. Consider returning [] early when platforms is provided but empty (or otherwise ensuring the filter matches nothing).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts Outdated
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts Outdated
Comment thread services/apps/snowflake_connectors/src/consumer/transformerConsumer.ts Outdated
Comment thread services/libs/snowflake/src/metadataStore.ts Outdated
Comment thread services/apps/pcc_sync_worker/src/parser/rowParser.ts
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts Outdated
Signed-off-by: Uroš Marolt <uros@marolt.me>
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts
Comment thread services/apps/pcc_sync_worker/src/parser/rowParser.ts
@themarolt themarolt changed the title feat/pcc-sync (CM-1086, CM-1087, CM-1088, CM-1089) feat/pcc sync worker (CM-1086, CM-1087, CM-1088, CM-1089) Apr 14, 2026
@themarolt themarolt changed the title feat/pcc sync worker (CM-1086, CM-1087, CM-1088, CM-1089) feat/pcc sync worker (CM-1086) Apr 14, 2026
@themarolt themarolt changed the title feat/pcc sync worker (CM-1086) feat: pcc sync worker (CM-1086) Apr 14, 2026
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts
Signed-off-by: Uroš Marolt <uros@marolt.me>
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts Outdated
Signed-off-by: Uroš Marolt <uros@marolt.me>
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts
@themarolt themarolt requested review from joanagmaia and mbani01 April 14, 2026 12:07
Comment thread services/apps/pcc_sync_worker/src/activities/exportActivity.ts Outdated
Comment thread services/libs/snowflake/src/metadataStore.ts Outdated
Copy link
Copy Markdown
Contributor

@mbani01 mbani01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done @themarolt 💪 left a couple comments

Copy link
Copy Markdown
Contributor

@joanagmaia joanagmaia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main thing to update is the query to snowflake which in return will affect how we detect hierarchy mismatches. Main changes are:

  • Use PROJECTS_SPINE to rely on depth mapping
  • Do not hard code depth level up to 5, simply flat depth into multiple rows

Comment thread services/apps/pcc_sync_worker/src/activities/cleanupActivity.ts
Comment thread services/apps/pcc_sync_worker/src/activities/exportActivity.ts
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts
Comment thread services/apps/pcc_sync_worker/src/parser/rowParser.ts Outdated
Comment thread services/apps/pcc_sync_worker/src/parser/rowParser.ts Outdated
Comment thread services/apps/pcc_sync_worker/src/parser/rowParser.ts Outdated
Copilot AI review requested due to automatic review settings April 20, 2026 09:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 35 changed files in this pull request and generated 2 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/apps/pcc_sync_worker/src/index.ts Outdated
Comment thread services/apps/pcc_sync_worker/src/parser/rowParser.ts Outdated
Signed-off-by: Uroš Marolt <uros@marolt.me>
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts
- index.ts: wrap startup IIFE in try/catch to surface init failures cleanly
- rowParser.ts: select leafSlug by hierarchy_level=1 instead of array position
- pccProjectConsumer.ts: write sync errors on the outer connection so they
  survive a tx rollback and preserve diagnostics for failed jobs

Signed-off-by: Uroš Marolt <uros@marolt.me>
Copilot AI review requested due to automatic review settings April 20, 2026 11:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 35 changed files in this pull request and generated 3 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)

services/libs/snowflake/src/metadataStore.ts:109

  • filter.clause is interpolated directly into the SQL string. Even though current callers use buildPlatformFilter, exporting PlatformFilter makes it easy for future call sites to accidentally pass unsanitized SQL and introduce injection risk. Consider changing the API to accept platforms: string[] (or filterPlatforms?: string[]) and build the clause internally, or keep PlatformFilter internal/unexported to ensure the SQL fragment can’t come from untrusted input.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/apps/pcc_sync_worker/src/index.ts
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts
- index.ts: register shutdown handler before svc.init() so the consumer
  drains before the archetype tears down shared infra (DB, Temporal)
- consumer: make sleep abortable via AbortController so stop() interrupts
  the polling backoff immediately instead of waiting up to 30 min
- consumer: record a SCHEMA_MISMATCH sync error for Parquet rows with
  missing PROJECT_ID instead of dropping them silently

Signed-off-by: Uroš Marolt <uros@marolt.me>
Copilot AI review requested due to automatic review settings April 20, 2026 11:28
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 35 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts Outdated
Signed-off-by: Uroš Marolt <uros@marolt.me>
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts Outdated
joanagmaia
joanagmaia previously approved these changes Apr 20, 2026
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts Outdated
Signed-off-by: Uroš Marolt <uros@marolt.me>
Copilot AI review requested due to automatic review settings April 20, 2026 12:54
Signed-off-by: Uroš Marolt <uros@marolt.me>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 35 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/apps/pcc_sync_worker/src/activities/cleanupActivity.ts
Comment thread services/apps/snowflake_connectors/src/consumer/transformerConsumer.ts Outdated
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts
Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c6b6350. Configure here.

Comment thread services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts Outdated
Signed-off-by: Uroš Marolt <uros@marolt.me>
@themarolt themarolt merged commit 9a40daf into main Apr 20, 2026
15 checks passed
@themarolt themarolt deleted the feat/pcc-sync-CM-1086-CM-1087-CM-1088-CM-1089 branch April 20, 2026 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants