Conversation
… that caused failures under load
|
@coderabbitai full review |
There was a problem hiding this comment.
Pull request overview
This PR addresses DB lock contention observed during the vulnerability cron’s SyncHostsSoftware cleanup by changing unused/orphaned software deletion from a single large DELETE to a batched cleanup loop.
Changes:
- Introduce
cleanupUnusedSoftwareto find and delete orphanedsoftwarerows in batches. - Add a
cleanupBatchSizetuning knob for the batched cleanup. - Add a changelog entry for the fix.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| server/datastore/mysql/software.go | Replaces single unbounded orphan-software DELETE with a batched cleanup helper invoked by SyncHostsSoftware. |
| changes/41374-unused-software | Adds release note for the lock contention fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| stmt, args, err := sqlx.In(`DELETE FROM software WHERE id IN (?)`, ids) | ||
| if err != nil { | ||
| return ctxerr.Wrap(ctx, err, "build delete unused software query") | ||
| } | ||
| if _, err := ds.writer(ctx).ExecContext(ctx, stmt, args...); err != nil { | ||
| return ctxerr.Wrap(ctx, err, "delete unused software batch") | ||
| } |
There was a problem hiding this comment.
cleanupUnusedSoftware is now a two-phase SELECT-then-DELETE. That introduces a race window where a concurrent ingestion transaction can insert a host_software row referencing one of the selected software IDs after the SELECT but before the DELETE executes; the DELETE would then remove that software row and cascade-delete the newly inserted host_software row, losing data. To keep the safety property the previous single-statement DELETE had, re-check the orphan conditions at DELETE time (e.g., include the NOT EXISTS/LEFT JOIN predicates in the DELETE statement, or use a batched DELETE with LIMIT in a single SQL statement/derived table) so rows that became referenced are not deleted.
There was a problem hiding this comment.
Not a huge concern. The chance of a host suddenly reporting that exact orphaned software in the millisecond window between SELECT and DELETE is negligible. If it did happen, the next hourly ingestion would simply re-add it. It's self-healing.
✅ Actions performedFull review triggered. |
WalkthroughThis PR addresses database lock contention in the vulnerability cleanup cron by replacing an unbounded DELETE statement with a batched cleanup approach. The Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/datastore/mysql/software.go`:
- Around line 2757-2794: The current cleanupUnusedSoftware function can delete
software rows that are mid-ingestion because preInsertSoftwareInventory writes
to software before linkSoftwareToHost inserts host_software; change
cleanupUnusedSoftware (and its findUnusedSoftwareStmt) to only select candidates
that are older than a safe threshold (e.g., created_at or last_seen is at least
N minutes/hours old) so newly inserted-but-not-yet-linked rows are skipped;
update the query to include a time cutoff (or a processed boolean column) and
ensure the threshold constant (cleanupBatchAge or similar) is used when calling
sqlx.SelectContext, leaving other logic (DELETE IN (?) using ids and
cleanupBatchSize) unchanged and referencing functions/objects:
cleanupUnusedSoftware, preInsertSoftwareInventory, linkSoftwareToHost,
host_software, software_host_counts, and cleanupBatchSize.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d23c7058-a2e5-40a9-a656-f8cf4c1a1701
📒 Files selected for processing (2)
changes/41374-unused-softwareserver/datastore/mysql/software.go
… that caused failures under load (#41375) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #41374 If some of the following don't apply, delete the relevant line. - [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. - [x] QA'd all new/changed functionality manually For unreleased bug fixes in a release candidate, one of: - [x] Alerted the release DRI if additional load testing is needed <!-- This is an auto-generated comment: release notes by coderabbit.ai --> * **Bug Fixes** * Resolved database lock contention that occurred during software cleanup operations, which previously caused failures under heavy load. The cleanup process now uses an optimized batched approach for improved reliability and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
… that caused failures under load (#41375) (#41380) Cherry pick. <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #41374 If some of the following don't apply, delete the relevant line. - [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. - [x] QA'd all new/changed functionality manually For unreleased bug fixes in a release candidate, one of: - [x] Alerted the release DRI if additional load testing is needed <!-- This is an auto-generated comment: release notes by coderabbit.ai --> * **Bug Fixes** * Resolved database lock contention that occurred during software cleanup operations, which previously caused failures under heavy load. The cleanup process now uses an optimized batched approach for improved reliability and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #41375 +/- ##
==========================================
+ Coverage 66.34% 66.36% +0.02%
==========================================
Files 2477 2477
Lines 198395 198542 +147
Branches 8854 8854
==========================================
+ Hits 131619 131762 +143
+ Misses 54875 54872 -3
- Partials 11901 11908 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Related issue: Resolves #41374
Checklist for submitter
If some of the following don't apply, delete the relevant line.
changes/,orbit/changes/oree/fleetd-chrome/changes.Testing
For unreleased bug fixes in a release candidate, one of:
Summary by CodeRabbit