feat: filter by snapshot id initial snapshot for member/organizations#4018
feat: filter by snapshot id initial snapshot for member/organizations#4018
Conversation
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
2 similar comments
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Pull request overview
Updates the Tinybird “initial snapshot” copy pipes for CDP member/organization segment aggregates to operate on a single deduplicated snapshot, aligning them with the lambda architecture convention of querying only the latest snapshotId.
Changes:
- Filter organization segment aggregate initial snapshot to
max(snapshotId). - Filter member segment aggregate initial snapshot to
max(snapshotId).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| services/libs/tinybird/pipes/cdp_organization_segment_aggregates_initial_snapshot.pipe | Restricts initial aggregate build to the latest snapshot to avoid cross-snapshot double counting. |
| services/libs/tinybird/pipes/cdp_member_segment_aggregates_initial_snapshot.pipe | Restricts initial aggregate build to the latest snapshot to avoid cross-snapshot double counting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
Note
Medium Risk
Changes the source dataset selection for initial aggregate backfills by filtering to the max
snapshotId, which can materially change computed results if snapshots are incomplete or delayed.Overview
Initial backfill pipes for member and organization segment aggregates now only aggregate rows from the latest snapshot by adding
WHERE snapshotId = (SELECT max(snapshotId) ...)to bothcdp_member_segment_aggregates_initial_snapshot.pipeandcdp_organization_segment_aggregates_initial_snapshot.pipe.This ensures the on-demand
COPY_MODE replaceoutputs are built from a single consistent snapshot rather than mixing historical snapshot data.Reviewed by Cursor Bugbot for commit 4ef0a6c. Bugbot is set up for automated code reviews on this repo. Configure here.