Optimize software/versions queries.#35670
Conversation
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
WalkthroughAdds a new MySQL migration to replace an index on Changes
Sequence DiagramsequenceDiagram
participant Caller
participant selectSoftwareSQL
participant canUseOptimizedListQuery
participant buildOptimizedListSoftwareSQL
participant countSoftwareDB
participant DB as Database
Caller->>selectSoftwareSQL: Request software list (filters, order, page)
selectSoftwareSQL->>canUseOptimizedListQuery: Check eligibility
alt optimized path
canUseOptimizedListQuery-->>selectSoftwareSQL: true
selectSoftwareSQL->>buildOptimizedListSoftwareSQL: Build inner/outer index-backed SQL
buildOptimizedListSoftwareSQL->>DB: Execute optimized query (software_host_counts index)
DB-->>buildOptimizedListSoftwareSQL: Rows (IDs, counts)
buildOptimizedListSoftwareSQL->>DB: Fetch full software rows by ID (join)
DB-->>buildOptimizedListSoftwareSQL: Full software rows
buildOptimizedListSoftwareSQL-->>selectSoftwareSQL: Paginated results
else fallback path
canUseOptimizedListQuery-->>selectSoftwareSQL: false
selectSoftwareSQL->>DB: Execute standard goqu-built query
DB-->>selectSoftwareSQL: Results
end
selectSoftwareSQL-->>Caller: Return list
Caller->>countSoftwareDB: Request count
alt use optimized count
countSoftwareDB->>DB: COUNT from software_host_counts (with filters)
else standard count
countSoftwareDB->>DB: Standard count query (possibly join)
end
DB-->>countSoftwareDB: Count
countSoftwareDB-->>Caller: Return count
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
server/datastore/mysql/migrations/tables/20251112120000_OptimizeSoftwareHostCountsIndex.go(1 hunks)server/datastore/mysql/schema.sql(2 hunks)server/datastore/mysql/software.go(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
⚙️ CodeRabbit configuration file
When reviewing SQL queries that are added or modified, ensure that appropriate filtering criteria are applied—especially when a query is intended to return data for a specific entity (e.g., a single host). Check for missing WHERE clauses or incorrect filtering that could lead to incorrect or non-deterministic results (e.g., returning the first row instead of the correct one). Flag any queries that may return unintended results due to lack of precise scoping.
Files:
server/datastore/mysql/migrations/tables/20251112120000_OptimizeSoftwareHostCountsIndex.goserver/datastore/mysql/software.go
🧠 Learnings (3)
📚 Learning: 2025-07-08T16:13:39.114Z
Learnt from: getvictor
Repo: fleetdm/fleet PR: 30589
File: server/datastore/mysql/migrations/tables/20250707095725_HostIdentitySCEPCertificates.go:53-55
Timestamp: 2025-07-08T16:13:39.114Z
Learning: In the Fleet codebase, Down migration functions are intentionally left empty/no-op. The team does not implement rollback functionality for database migrations, so empty Down_* functions in migration files are correct and should not be flagged as issues.
Applied to files:
server/datastore/mysql/migrations/tables/20251112120000_OptimizeSoftwareHostCountsIndex.go
📚 Learning: 2025-08-01T15:08:16.858Z
Learnt from: sgress454
Repo: fleetdm/fleet PR: 31508
File: server/datastore/mysql/schema.sql:102-116
Timestamp: 2025-08-01T15:08:16.858Z
Learning: The schema.sql file in server/datastore/mysql/ is auto-generated from migrations for use with tests, so it cannot be manually edited. Any changes must be made through migrations.
Applied to files:
server/datastore/mysql/schema.sql
📚 Learning: 2025-09-12T13:04:23.777Z
Learnt from: getvictor
Repo: fleetdm/fleet PR: 32823
File: server/datastore/mysql/software.go:4457-4471
Timestamp: 2025-09-12T13:04:23.777Z
Learning: In the Fleet codebase, the `status` column in the `host_software_installs` table is defined as `GENERATED ALWAYS`, meaning it's automatically computed by the database based on other field values (like exit codes) and should not be explicitly included in INSERT statements.
Applied to files:
server/datastore/mysql/schema.sql
🔇 Additional comments (4)
server/datastore/mysql/migrations/tables/20251112120000_OptimizeSoftwareHostCountsIndex.go (3)
8-10: LGTM!Migration registration follows the standard pattern.
12-36: Well-structured index optimization!The migration properly optimizes the
software_host_countsindex by:
- Adding
global_statsto enable efficient filtering by team_id + global_stats combination- Using
DESConhosts_countto read rows in sorted order without additional sorting- Including
software_idat the end for uniquenessThe comments clearly explain the optimization benefits, and error handling is appropriate for both DROP and CREATE operations.
38-40: No-op Down migration is correct.Empty Down migration function is intentional and follows the team's policy of not implementing rollback functionality for database migrations. Based on learnings.
server/datastore/mysql/schema.sql (1)
1684-1686: Schema changes correctly reflect the migration.The schema.sql updates are consistent with the migration file:
- AUTO_INCREMENT properly incremented to 445
- New migration entry added for 20251112120000
- Index definition
idx_software_host_counts_team_global_hosts_descmatches the CREATE INDEX statement in the migrationNote: schema.sql is auto-generated from migrations, so these changes are expected. Based on learnings.
Also applies to: 2527-2527
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #35670 +/- ##
==========================================
+ Coverage 66.26% 66.28% +0.01%
==========================================
Files 2112 2113 +1
Lines 179579 179771 +192
Branches 7545 7546 +1
==========================================
+ Hits 119004 119167 +163
- Misses 49672 49698 +26
- Partials 10903 10906 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
server/datastore/mysql/software.go (1)
1337-1529: Well-scoped fast path; consider adding a deterministic tie-breakerThe fast-path eligibility (
canUseOptimizedListQuery) is nicely constrained to the intended “global list by hosts_count” cases: no HostID, no vulnerability filters, no search, and an effective order key ofhosts_countonly, with a guard against multi-column sorts. That aligns with the covering index onsoftware_host_countsand keeps more complex combinations on the original goqu path.The two-stage query in
buildOptimizedListSoftwareSQL(inner covering-index scan + outer joins) also looks correct and preserves team/global_stats scoping and optional CVE score joins for the selected page only.One optional improvement: pagination over large datasets is more robust if ties in
hosts_counthave a deterministic secondary ordering. Right now the SQL only orders by hosts_count, so rows with equal counts can be returned in an undefined order. You could make this stable (and still index-friendly) by ordering bysoftware_idas a secondary key in both inner and outer queries:- innerSQL += " ORDER BY shc.hosts_count " + direction + innerSQL += " ORDER BY shc.hosts_count " + direction + ", shc.software_id ASC" ... - outerSQL += " ORDER BY top.hosts_count " + direction + outerSQL += " ORDER BY top.hosts_count " + direction + ", top.software_id ASC"This keeps pagination consistent across pages when multiple software entries share the same
hosts_countwhile still leveraging the(team_id, global_stats, hosts_count DESC, software_id)index.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
server/datastore/mysql/migrations/tables/20251112120000_OptimizeSoftwareHostCountsIndex.go(1 hunks)server/datastore/mysql/schema.sql(2 hunks)server/datastore/mysql/software.go(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
⚙️ CodeRabbit configuration file
When reviewing SQL queries that are added or modified, ensure that appropriate filtering criteria are applied—especially when a query is intended to return data for a specific entity (e.g., a single host). Check for missing WHERE clauses or incorrect filtering that could lead to incorrect or non-deterministic results (e.g., returning the first row instead of the correct one). Flag any queries that may return unintended results due to lack of precise scoping.
Files:
server/datastore/mysql/software.goserver/datastore/mysql/migrations/tables/20251112120000_OptimizeSoftwareHostCountsIndex.go
🧠 Learnings (3)
📚 Learning: 2025-07-08T16:13:39.114Z
Learnt from: getvictor
Repo: fleetdm/fleet PR: 30589
File: server/datastore/mysql/migrations/tables/20250707095725_HostIdentitySCEPCertificates.go:53-55
Timestamp: 2025-07-08T16:13:39.114Z
Learning: In the Fleet codebase, Down migration functions are intentionally left empty/no-op. The team does not implement rollback functionality for database migrations, so empty Down_* functions in migration files are correct and should not be flagged as issues.
Applied to files:
server/datastore/mysql/migrations/tables/20251112120000_OptimizeSoftwareHostCountsIndex.go
📚 Learning: 2025-08-01T15:08:16.858Z
Learnt from: sgress454
Repo: fleetdm/fleet PR: 31508
File: server/datastore/mysql/schema.sql:102-116
Timestamp: 2025-08-01T15:08:16.858Z
Learning: The schema.sql file in server/datastore/mysql/ is auto-generated from migrations for use with tests, so it cannot be manually edited. Any changes must be made through migrations.
Applied to files:
server/datastore/mysql/schema.sql
📚 Learning: 2025-09-12T13:04:23.777Z
Learnt from: getvictor
Repo: fleetdm/fleet PR: 32823
File: server/datastore/mysql/software.go:4457-4471
Timestamp: 2025-09-12T13:04:23.777Z
Learning: In the Fleet codebase, the `status` column in the `host_software_installs` table is defined as `GENERATED ALWAYS`, meaning it's automatically computed by the database based on other field values (like exit codes) and should not be explicitly included in INSERT statements.
Applied to files:
server/datastore/mysql/schema.sql
🔇 Additional comments (5)
server/datastore/mysql/schema.sql (2)
1678-1686: migration_status_tables entry and AUTO_INCREMENT look consistentAUTO_INCREMENT=445 and the appended row with id=444/version_id=20251112120000 match the new migration. Just ensure schema.sql was regenerated from migrations rather than edited directly. Based on learnings
2518-2528: New software_host_counts index matches migration and optimization goalThe added index
idx_software_host_counts_team_global_hosts_desc (team_id, global_stats, hosts_count DESC, software_id)aligns with the new migration and should support fast, ordered scans for the optimized software-versions queries. No issues spotted here.server/datastore/mysql/migrations/tables/20251112120000_OptimizeSoftwareHostCountsIndex.go (1)
8-40: Index migration is correct and aligned with schema/sqlUp drops the old
idx_software_host_counts_team_id_hosts_count_software_idindex and createsidx_software_host_counts_team_global_hosts_desc (team_id, global_stats, hosts_count DESC, software_id), matching the schema and the intended optimized query path. A no-op Down is consistent with Fleet’s migration pattern.server/datastore/mysql/software.go (2)
1532-1539: Fast-path routing from selectSoftwareSQL preserves fallback behaviorThe early check in
selectSoftwareSQLto delegate tobuildOptimizedListSoftwareSQLonly whencanUseOptimizedListQuery(opts)is true cleanly separates the optimized host-count listing from all other cases. BecausecanUseOptimizedListQueryrequiresHostID == nilandorderKey == "hosts_count"(plus no CVE filters or search), host-specific queries, non–hosts_count orderings, and CVE-driven scenarios all continue to use the original goqu-based builder unchanged. This keeps the public API behavior intact while enabling the optimized path where it’s safe.
1794-1897: Optimized countSoftwareDB correctly mirrors list filters while avoiding the heavy subqueryThe new
countSoftwareDBlogic forHostID == nillooks sound:
- It always scopes to
software_host_countswithshc.hosts_count > 0and the sameteam_id/global_statscombinations as the list query, so counts stay aligned with what the list returns.- Conditional joins to
software,software_cve, andcve_metabased onMatchQuery,VulnerableOnly, and CVE meta filters ensure you only pay for the joins when needed, whileCOUNT(DISTINCT shc.software_id)protects against duplication when those joins are present.- The host-specific path correctly falls back to the existing
selectSoftwareSQL-based subquery count, with pagination options stripped fromListOptionsso you always count the full result set.Overall this should significantly reduce COUNT query cost without changing the semantics of existing filters or scoping.
…-versions # Conflicts: # server/datastore/mysql/schema.sql
mostlikelee
left a comment
There was a problem hiding this comment.
Great optimization. I'm hoping we get to discuss this datastore API in the near future, the surface area seems too large.
…-versions # Conflicts: # server/datastore/mysql/schema.sql
<!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #34677 and Resolves #35349 Loadtest results: ```text Description Average Worst Results ----------- ------- ----- ------- Page 0, DESC order 441ms 506ms 20 items Page 0, ASC order 1.099s 1.8s 20 items Page 1000, DESC order 484ms 641ms 20 items 100 per_page 426ms 450ms 100 items With CVE scores 467ms 630ms 20 items Order by name, page 0 7.589s 7.812s 20 items Order by name, page 1000 9.103s 9.656s 20 items Vulnerable only 6.098s 6.34s 20 items Search 'chrome' 14.305s 14.868s 20 items Known exploit filter 20.253s 21.238s 20 items Min CVSS score 7.0 33.743s 35.169s 20 items Max CVSS score 8.0 39.825s 41.83s 20 items CVSS range 7.0-9.0 42.556s 43.267s 20 items ``` Follow-up issue: #35799 # Checklist for submitter - [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. ## Testing - [x] Added/updated automated tests - [x] Where appropriate, [automated tests simulate multiple hosts and test for host isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing) (updates to one hosts's records do not affect another) - [x] QA'd all new/changed functionality manually ## Database migrations - [x] Checked schema for all modified table for columns that will auto-update timestamps during migration. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Improved software listing and counting performance via database index and query optimizations, resulting in faster retrieval and reduced load times for software lists across team and global views. * **Chores** * Added a migration to apply the index changes and updated migration tracking. <!-- end of auto-generated comment: release notes by coderabbit.ai --> (cherry picked from commit 3724166)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #34677 and Resolves #35349 Loadtest results: ```text Description Average Worst Results ----------- ------- ----- ------- Page 0, DESC order 441ms 506ms 20 items Page 0, ASC order 1.099s 1.8s 20 items Page 1000, DESC order 484ms 641ms 20 items 100 per_page 426ms 450ms 100 items With CVE scores 467ms 630ms 20 items Order by name, page 0 7.589s 7.812s 20 items Order by name, page 1000 9.103s 9.656s 20 items Vulnerable only 6.098s 6.34s 20 items Search 'chrome' 14.305s 14.868s 20 items Known exploit filter 20.253s 21.238s 20 items Min CVSS score 7.0 33.743s 35.169s 20 items Max CVSS score 8.0 39.825s 41.83s 20 items CVSS range 7.0-9.0 42.556s 43.267s 20 items ``` Follow-up issue: #35799 # Checklist for submitter - [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. ## Testing - [x] Added/updated automated tests - [x] Where appropriate, [automated tests simulate multiple hosts and test for host isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing) (updates to one hosts's records do not affect another) - [x] QA'd all new/changed functionality manually ## Database migrations - [x] Checked schema for all modified table for columns that will auto-update timestamps during migration. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Improved software listing and counting performance via database index and query optimizations, resulting in faster retrieval and reduced load times for software lists across team and global views. * **Chores** * Added a migration to apply the index changes and updated migration tracking. <!-- end of auto-generated comment: release notes by coderabbit.ai --> (cherry picked from commit 3724166)
Related issue: Resolves #34677 and Resolves #35349
Loadtest results:
Follow-up issue: #35799
Checklist for submitter
changes/,orbit/changes/oree/fleetd-chrome/changes.See Changes files for more information.
Testing
Added/updated automated tests
Where appropriate, automated tests simulate multiple hosts and test for host isolation (updates to one hosts's records do not affect another)
QA'd all new/changed functionality manually
Database migrations
Summary by CodeRabbit