Skip to content

Reduced database contention during the vulnerability cron#41667

Merged
getvictor merged 6 commits intomainfrom
victor-host-software
Mar 14, 2026
Merged

Reduced database contention during the vulnerability cron#41667
getvictor merged 6 commits intomainfrom
victor-host-software

Conversation

@getvictor
Copy link
Copy Markdown
Member

@getvictor getvictor commented Mar 13, 2026

Related issue: Resolves #41664

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.

Testing

  • QA'd all new/changed functionality manually

For unreleased bug fixes in a release candidate, one of:

  • Alerted the release DRI if additional load testing is needed

Summary by CodeRabbit

  • Chores
    • Optimized database performance for vulnerability processing to reduce contention during routine operations.
    • Improved query efficiency for software cleanup processes.

@getvictor
Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 13, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 13, 2026

Walkthrough

This PR reduces database contention during the vulnerability cron by introducing a database index on the host_software.software_id column and modifying the cleanupUnusedSoftware function to use the reader replica for its initial SELECT query. The changes include a new database migration that creates the index, updates to the schema and migration status, and a code modification that switches the first iteration of the cleanup loop to read from the replica instead of the writer, then reverts to the writer for subsequent iterations.

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR addresses only 1 of 3 required changes from #41664: using reader replica for cleanupUnusedSoftware. Missing implementation of 100ms batch sleep and SyncHostsSoftwareTitles ID range batching. Implement the remaining two fixes: add 100ms sleep between batch iterations in three functions and batch SyncHostsSoftwareTitles by ID ranges in 100K chunks.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description is incomplete. It lacks detail on implementation specifics, testing coverage, and several required checklist items are unchecked or missing. Complete the description by detailing the implementation (which of the three objectives were addressed), confirming all relevant checklist items, and specifying what testing was performed to validate the changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly summarizes the main change: reducing database contention during vulnerability cron operations.
Out of Scope Changes check ✅ Passed All changes directly support the database contention reduction objective: changelog entry, database index migration, schema updates, and reader usage optimization.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch victor-host-software
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/datastore/mysql/software.go (1)

2800-2817: ⚠️ Potential issue | 🟠 Major

Add the 100ms pause between cleanup batches.

Line 2800 onward still iterates continuously. That misses the contention-relief requirement from issue #41664 and can keep pressure on concurrent host writes.

💡 Proposed patch
 func (ds *Datastore) cleanupUnusedSoftware(ctx context.Context) error {
@@
 	for range cleanupMaxIterations {
 		var ids []uint
 		if err := sqlx.SelectContext(ctx, db, &ids, findUnusedSoftwareStmt, cleanupBatchSize); err != nil {
 			return ctxerr.Wrap(ctx, err, "find unused software for cleanup")
 		}
@@
 		if _, err := ds.writer(ctx).ExecContext(ctx, stmt, args...); err != nil {
 			return ctxerr.Wrap(ctx, err, "delete unused software batch")
 		}
 		db = ds.writer(ctx)
+
+		// Brief pause between batches to reduce lock contention with concurrent writers.
+		select {
+		case <-ctx.Done():
+			return ctxerr.Wrap(ctx, ctx.Err(), "cleanup unused software canceled")
+		case <-time.After(100 * time.Millisecond):
+		}
 	}
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/datastore/mysql/software.go` around lines 2800 - 2817, The cleanup
loop over cleanupMaxIterations in the function that queries
findUnusedSoftwareStmt and deletes batches using cleanupBatchSize and
ds.writer(ctx) needs a 100ms pause between iterations to reduce contention;
after successfully deleting a non-empty batch (i.e., after the ExecContext call
and before the next loop iteration), insert a context-aware 100ms delay (use
time.After with select on ctx.Done or a context-aware sleep) so the code yields
to concurrent host writes and respects cancellation; ensure you add the time
import if missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@server/datastore/mysql/software.go`:
- Around line 2800-2817: The cleanup loop over cleanupMaxIterations in the
function that queries findUnusedSoftwareStmt and deletes batches using
cleanupBatchSize and ds.writer(ctx) needs a 100ms pause between iterations to
reduce contention; after successfully deleting a non-empty batch (i.e., after
the ExecContext call and before the next loop iteration), insert a context-aware
100ms delay (use time.After with select on ctx.Done or a context-aware sleep) so
the code yields to concurrent host writes and respects cancellation; ensure you
add the time import if missing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 67d904d4-1d03-4894-a05a-dead5a510ca4

📥 Commits

Reviewing files that changed from the base of the PR and between 0a00d20 and d507e9e.

📒 Files selected for processing (4)
  • changes/41664-vulnerability-cron-db-contention
  • server/datastore/mysql/migrations/tables/20260313120000_AddHostSoftwareSoftwareIDIndex.go
  • server/datastore/mysql/schema.sql
  • server/datastore/mysql/software.go

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 54.54545% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.42%. Comparing base (2733f8a) to head (3c39d38).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...s/20260314120000_AddHostSoftwareSoftwareIDIndex.go 50.00% 3 Missing and 1 partial ⚠️
server/datastore/mysql/software.go 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #41667      +/-   ##
==========================================
+ Coverage   66.38%   66.42%   +0.03%     
==========================================
  Files        2499     2495       -4     
  Lines      199767   199903     +136     
  Branches     9002     8899     -103     
==========================================
+ Hits       132618   132782     +164     
+ Misses      55172    55110      -62     
- Partials    11977    12011      +34     
Flag Coverage Δ
backend 68.21% <54.54%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@getvictor getvictor marked this pull request as ready for review March 14, 2026 12:54
@getvictor getvictor requested a review from a team as a code owner March 14, 2026 12:54
@getvictor getvictor requested a review from lukeheath March 14, 2026 12:54
@getvictor getvictor merged commit 8c81821 into main Mar 14, 2026
48 checks passed
@getvictor getvictor deleted the victor-host-software branch March 14, 2026 14:32
georgekarrv pushed a commit that referenced this pull request Mar 17, 2026
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #41664

If some of the following don't apply, delete the relevant line.

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.

- [x] QA'd all new/changed functionality manually

For unreleased bug fixes in a release candidate, one of:

- [x] Alerted the release DRI if additional load testing is needed

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

* **Chores**
* Optimized database performance for vulnerability processing to reduce
contention during routine operations.
  * Improved query efficiency for software cleanup processes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vulnerability cron causes DB overload due to contention with per-host software writes

4 participants