Conversation
Signed-off-by: Uroš Marolt <uros@marolt.me>
There was a problem hiding this comment.
Pull request overview
Introduces a new Temporal worker to export PCC project hierarchy data from Snowflake to S3 (Parquet) and sync it into CDP (segments + insightsProjects), while refactoring shared Snowflake/S3/metadata components into @crowd/snowflake and updating the existing snowflake_connectors app to consume them.
Changes:
- Add
pcc_sync_workerapp with Temporal schedules/workflows, export/cleanup activities, Parquet parsing, and a DB-sync consumer. - Move/centralize Snowflake export job metadata + S3/Parquet consumption logic into
services/libs/snowflakeand update snowflake_connectors to use it. - Add DB migration for PCC sync support (segments.maturity +
pcc_projects_sync_errorstable + dedup index), plus worker Docker/compose setup.
Reviewed changes
Copilot reviewed 33 out of 34 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| services/libs/snowflake/src/snowflakeExporter.ts | Fix internal import to avoid self-package import/cycles. |
| services/libs/snowflake/src/s3Service.ts | New S3 download/delete + Parquet row iteration utility. |
| services/libs/snowflake/src/metadataStore.ts | Add platform filtering + named params for export job bookkeeping. |
| services/libs/snowflake/src/index.ts | Export new Snowflake lib surface (metadata store, S3 service, exporter). |
| services/libs/snowflake/package.json | Add S3 + Parquet deps and DB dependency for the library. |
| services/apps/snowflake_connectors/src/consumer/transformerConsumer.ts | Use @crowd/snowflake MetadataStore/S3Service; add enabled-platform filtering. |
| services/apps/snowflake_connectors/src/activities/exportActivity.ts | Switch imports to @crowd/snowflake. |
| services/apps/snowflake_connectors/src/activities/cleanupActivity.ts | Use shared MetadataStore/S3Service and pass enabled platforms to cleanup. |
| services/apps/snowflake_connectors/package.json | Remove direct S3/Parquet deps (now come from @crowd/snowflake). |
| services/apps/pcc_sync_worker/tsconfig.json | New TS config for PCC worker app. |
| services/apps/pcc_sync_worker/src/workflows/index.ts | Workflow exports. |
| services/apps/pcc_sync_worker/src/workflows/exportWorkflow.ts | Temporal workflow to run PCC export activity. |
| services/apps/pcc_sync_worker/src/workflows/cleanupWorkflow.ts | Temporal workflow to run PCC cleanup activity. |
| services/apps/pcc_sync_worker/src/scripts/triggerExport.ts | Manual script to start export workflow. |
| services/apps/pcc_sync_worker/src/scripts/triggerCleanup.ts | Manual script to start cleanup workflow. |
| services/apps/pcc_sync_worker/src/schedules/pccS3Export.ts | Temporal schedule registration for daily PCC export. |
| services/apps/pcc_sync_worker/src/schedules/pccS3Cleanup.ts | Temporal schedule registration for daily PCC cleanup. |
| services/apps/pcc_sync_worker/src/schedules/index.ts | Schedule exports. |
| services/apps/pcc_sync_worker/src/parser/types.ts | Parquet-row + parsed-project types. |
| services/apps/pcc_sync_worker/src/parser/rowParser.ts | Pure PCC row parsing + hierarchy mapping rules. |
| services/apps/pcc_sync_worker/src/parser/index.ts | Parser exports. |
| services/apps/pcc_sync_worker/src/main.ts | ServiceWorker archetype configuration. |
| services/apps/pcc_sync_worker/src/index.ts | Worker entrypoint: init + schedule + start consumer + start Temporal worker. |
| services/apps/pcc_sync_worker/src/consumer/pccProjectConsumer.ts | PCC job polling + Parquet processing + DB sync + error recording. |
| services/apps/pcc_sync_worker/src/config/settings.ts | Re-export Temporal config helpers. |
| services/apps/pcc_sync_worker/src/activities/index.ts | Activity exports. |
| services/apps/pcc_sync_worker/src/activities/exportActivity.ts | Snowflake recursive CTE export into S3 + metadata insert. |
| services/apps/pcc_sync_worker/src/activities/cleanupActivity.ts | Cleanup exported S3 files + mark jobs cleaned + Slack alerting on failures. |
| services/apps/pcc_sync_worker/package.json | PCC worker package manifest + scripts. |
| scripts/services/pcc-sync-worker.yaml | Compose service definitions for PCC worker (prod/dev). |
| scripts/services/docker/Dockerfile.pcc_sync_worker.dockerignore | Docker ignore file for PCC worker build context. |
| scripts/services/docker/Dockerfile.pcc_sync_worker | Multi-stage build for PCC worker. |
| backend/src/database/migrations/V1775312770__pcc-sync-worker-setup.sql | Add segments.maturity + PCC sync errors table + dedup index. |
| backend/src/database/migrations/U1775312770__pcc-sync-worker-setup.sql | Rollback for PCC sync DB changes. |
Comments suppressed due to low confidence (2)
services/libs/snowflake/src/metadataStore.ts:77
- When
platformsis provided as an empty array, this method falls back to no filter and will claim jobs for all platforms. That’s risky ifCROWD_SNOWFLAKE_ENABLED_PLATFORMSis accidentally empty/misconfigured. Consider treating an explicit emptyplatformslist as “match nothing” (return null early, or inject anAND FALSEfilter).
services/libs/snowflake/src/metadataStore.ts:125 platformsbeing an empty array currently results in no platform filter, so cleanup can target jobs for all platforms if the enabled-platforms list is empty/misconfigured. Consider returning[]early whenplatformsis provided but empty (or otherwise ensuring the filter matches nothing).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
mbani01
left a comment
There was a problem hiding this comment.
Well done @themarolt 💪 left a couple comments
joanagmaia
left a comment
There was a problem hiding this comment.
Main thing to update is the query to snowflake which in return will affect how we detect hierarchy mismatches. Main changes are:
- Use PROJECTS_SPINE to rely on depth mapping
- Do not hard code depth level up to 5, simply flat depth into multiple rows
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 33 out of 35 changed files in this pull request and generated 2 comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Uroš Marolt <uros@marolt.me>
- index.ts: wrap startup IIFE in try/catch to surface init failures cleanly - rowParser.ts: select leafSlug by hierarchy_level=1 instead of array position - pccProjectConsumer.ts: write sync errors on the outer connection so they survive a tx rollback and preserve diagnostics for failed jobs Signed-off-by: Uroš Marolt <uros@marolt.me>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 33 out of 35 changed files in this pull request and generated 3 comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)
services/libs/snowflake/src/metadataStore.ts:109
filter.clauseis interpolated directly into the SQL string. Even though current callers usebuildPlatformFilter, exportingPlatformFiltermakes it easy for future call sites to accidentally pass unsanitized SQL and introduce injection risk. Consider changing the API to acceptplatforms: string[](orfilterPlatforms?: string[]) and build the clause internally, or keepPlatformFilterinternal/unexported to ensure the SQL fragment can’t come from untrusted input.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- index.ts: register shutdown handler before svc.init() so the consumer drains before the archetype tears down shared infra (DB, Temporal) - consumer: make sleep abortable via AbortController so stop() interrupts the polling backoff immediately instead of waiting up to 30 min - consumer: record a SCHEMA_MISMATCH sync error for Parquet rows with missing PROJECT_ID instead of dropping them silently Signed-off-by: Uroš Marolt <uros@marolt.me>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 33 out of 35 changed files in this pull request and generated 1 comment.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 33 out of 35 changed files in this pull request and generated 4 comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c6b6350. Configure here.
Signed-off-by: Uroš Marolt <uros@marolt.me>

Note
Medium Risk
Adds a new worker that performs production DB updates to
segments/insightsProjectsand introduces new job-claiming/filtering semantics inMetadataStore, which could affect Snowflake job processing and cleanup behavior if misconfigured.Overview
Introduces a new
pcc_sync_workerservice that schedules daily Snowflake exports/cleanup via Temporal, streams exported Parquet files from S3, and syncs PCC project metadata into CDP by updating matchingsegmentsand creating/updatinginsightsProjects.Adds DB support for the sync by introducing a nullable
segments.maturitycolumn and a newpcc_projects_sync_errorstable with deduping indexes for tracking schema issues, hierarchy/slug mismatches, and name conflicts.Refactors the shared
@crowd/snowflakelibrary to centralizeMetadataStore,S3Service, andSnowflakeExporter(includingbuildS3FilenamePrefix), and updatessnowflake_connectorsto use these exports plus platform-filtered job claiming/cleanup (with a newreleaseClaimflow and optional skipped-count cleanup gating).Reviewed by Cursor Bugbot for commit 15bb77c. Bugbot is set up for automated code reviews on this repo. Configure here.