Skip to content

feat(preprod): Add Datadog metrics for snapshot upload and diff lifecycle#111024

Merged
NicoHinderling merged 5 commits intomasterfrom
nico/feat/snapshot-analytics-events
Mar 19, 2026
Merged

feat(preprod): Add Datadog metrics for snapshot upload and diff lifecycle#111024
NicoHinderling merged 5 commits intomasterfrom
nico/feat/snapshot-analytics-events

Conversation

@NicoHinderling
Copy link
Copy Markdown
Contributor

Adds metrics.distribution and metrics.incr instrumentation to the snapshot upload endpoint and compare_snapshots task so the Preprod Health dashboard can track snapshot usage and build quality signals.

Metrics added

On upload (ProjectPreprodSnapshotEndpoint.post):

  • preprod.snapshots.upload.image_count — number of images per upload, tagged has_vcs to distinguish CI builds from standalone uploads
  • preprod.snapshots.upload.duplicate_image_file_names — count of image_file_name collisions within a single manifest (proxy for same screen uploaded under multiple hashes)
  • preprod.snapshots.upload.bundles_per_commit — how many snapshot bundles have been uploaded for the same commit (only emitted when has_vcs=True)

On diff completion (compare_snapshots task):

  • preprod.snapshots.diff.duration_s — time from comparison record creation to diff task completion
  • preprod.snapshots.e2e_duration_s — time from artifact upload to diff completion, mirrors the existing preprod.size_analysis.results_e2e pattern
  • preprod.snapshots.image.avg_size_bytes — average byte size of images actually fetched for pixel diff (excludes added/removed)
  • preprod.snapshots.diff.zero_changes — incremented when a diff completes with no changed, added, or removed images

All metrics use sample_rate=1.0. No Amplitude analytics events are included — this is Datadog/tracemetrics only, targeting the existing Preprod Health dashboard.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 18, 2026
@NicoHinderling NicoHinderling marked this pull request as ready for review March 18, 2026 20:39
@NicoHinderling NicoHinderling requested a review from a team as a code owner March 18, 2026 20:39
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

NicoHinderling and others added 4 commits March 18, 2026 15:36
…ycle

Instruments the snapshot upload endpoint and compare_snapshots task with
distribution metrics to enable observability into upload patterns, image
volumes, diff durations, and build quality signals on the Preprod Health
dashboard.

Co-Authored-By: Claude <noreply@anthropic.com>
diff_duration_s was measured from comparison.date_added, which is set
on first attempt creation via get_or_create. On retries the same record
is reused, so the metric would include idle time between attempts.
Now measured from task_start_time captured at the top of the function.

zero_changes was not accounting for renamed_pairs, causing rename-only
diffs to incorrectly increment the zero_changes counter.

Co-Authored-By: Claude <noreply@anthropic.com>
The PreprodArtifact count query for bundles_per_commit sat unguarded
in the critical path before downstream task dispatch. A DB timeout or
error would prevent create_preprod_snapshot_status_check_task from
ever being dispatched, orphaning the artifact. Wrap in try/except so
a metrics failure cannot block the upload completion flow.

Co-Authored-By: Claude <noreply@anthropic.com>
When all eligible image pairs error (e.g., objectstore outage, oversized
images), changed_count stays 0 and added/removed/renamed are empty, causing
zero_changes to fire incorrectly. Add error_count check so the metric only
increments when the diff genuinely found no differences.

Co-Authored-By: Claude <noreply@anthropic.com>
v["image_file_name"] for v in images.values() if v.get("image_file_name")
)
duplicate_count = sum(c - 1 for c in file_name_counts.values() if c > 1)
metrics.distribution(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often would we actually get duplicate images? This seems like a strange thing to record metrics on

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think max added that in my requests doc

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll add a comment as a reminder to consider removing this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually im just going to kill this for now

@NicoHinderling NicoHinderling merged commit 5bf5073 into master Mar 19, 2026
58 checks passed
@NicoHinderling NicoHinderling deleted the nico/feat/snapshot-analytics-events branch March 19, 2026 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants