Skip to content

fix: gnore weird transient packages#4176

Merged
themarolt merged 4 commits into
mainfrom
fix/ignore-weird-transient-packages
Jun 8, 2026
Merged

fix: gnore weird transient packages#4176
themarolt merged 4 commits into
mainfrom
fix/ignore-weird-transient-packages

Conversation

@themarolt
Copy link
Copy Markdown
Contributor

@themarolt themarolt commented Jun 5, 2026

Summary

Changes

Type of change

  • Bug fix
  • New feature
  • Refactor / cleanup
  • Performance improvement
  • Chore / dependency update
  • Documentation

JIRA ticket


Note

Low Risk
Query-only data filtering for ingestion; no auth or runtime logic, with a small chance of omitting rare valid names that contain >.

Overview
Filters deps.dev BigQuery ingest queries so rows with package Name containing > are excluded everywhere packages are resolved or exported (packages, versions, advisories, dependent counts, package repos). The same predicate is applied to purl_map joins and, for advisories, to unnested pkg.Name rows.

Incremental packages and versions watermark CTEs now also require Purl IS NOT NULL and the Name NOT LIKE '%>%' filter, matching the main snapshot queries.

Risk: Legitimate packages whose deps.dev name includes > would no longer be ingested; that pattern is treated as transient/noise data.

Reviewed by Cursor Bugbot for commit 8b60a76. Bugbot is set up for automated code reviews on this repo. Configure here.

themarolt added 2 commits June 5, 2026 11:35
Signed-off-by: Uroš Marolt <uros@marolt.me>
Copilot AI review requested due to automatic review settings June 5, 2026 15:52
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens the deps.dev BigQuery ingest in packages_worker to ignore “weird transient” package rows by filtering out package/version records whose Name contains > (and, for incremental diffs, ensuring watermark comparisons don’t reintroduce excluded rows).

Changes:

  • Added AND Name NOT LIKE '%>%' across deps.dev package/version export queries (full + incremental) and in the purl_map CTEs used by repo/count/advisory-related exports.
  • Added Purl IS NOT NULL to incremental watermark lookups for packages/versions to keep day-over-day diffs consistent with the new inclusion criteria.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
services/apps/packages_worker/src/deps-dev/queries/versionsSql.ts Filters out Name values containing > in both full and incremental (today + watermark) version exports.
services/apps/packages_worker/src/deps-dev/queries/packagesSql.ts Filters out Name values containing > in both full and incremental (today + watermark) package exports.
services/apps/packages_worker/src/deps-dev/queries/packageReposSql.ts Applies the same name filter in the purl_map used for package→repo mapping.
services/apps/packages_worker/src/deps-dev/queries/dependentCountsSql.ts Applies the same name filter in the purl_map used for dependents counting.
services/apps/packages_worker/src/deps-dev/queries/advisoriesSql.ts Applies the name filter in the purl_map used to map advisory packages to PURLs (see comment about still exporting rows via LEFT JOIN).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/apps/packages_worker/src/deps-dev/queries/advisoriesSql.ts
themarolt added 2 commits June 5, 2026 18:03
Signed-off-by: Uroš Marolt <uros@marolt.me>
@themarolt themarolt merged commit b831d1c into main Jun 8, 2026
15 checks passed
@themarolt themarolt deleted the fix/ignore-weird-transient-packages branch June 8, 2026 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants