Skip to content

Osquery perf split common software between hosts#33958

Merged
ksykulev merged 5 commits intomainfrom
software-host-split
Oct 8, 2025
Merged

Osquery perf split common software between hosts#33958
ksykulev merged 5 commits intomainfrom
software-host-split

Conversation

@ksykulev
Copy link
Copy Markdown
Contributor

@ksykulev ksykulev commented Oct 7, 2025

If the total requested common software is 10,000 and there are 200 hosts. Instead of all hosts having 10,000 pieces of software, each host will have a 50 software slice of the 10,000 total requested. This is controlled by common_software_count. If you wish to have each host have a certain amount of software, use the unique_software_count

Related issue: Resolves #33668

If the total requested common software is 10,000 and there are 200
hosts. Instead of all hosts having 10,000 pieces of software, each host
will have a 50 software slice of the 10,000 total requested. This is
controlled by `common_software_count`. If you wish to have each host
have a certain amount of software, use the `unique_software_count`
@ksykulev ksykulev requested a review from a team as a code owner October 7, 2025 21:08
@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 7, 2025

Codecov Report

❌ Patch coverage is 0% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.18%. Comparing base (9e3cab6) to head (0a01584).
⚠️ Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
cmd/osquery-perf/agent.go 0.00% 87 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #33958      +/-   ##
==========================================
- Coverage   64.18%   64.18%   -0.01%     
==========================================
  Files        2054     2055       +1     
  Lines      206397   206521     +124     
  Branches     6794     6794              
==========================================
+ Hits       132477   132550      +73     
- Misses      63518    63568      +50     
- Partials    10402    10403       +1     
Flag Coverage Δ
backend 65.28% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread cmd/osquery-perf/agent.go
sharon-fdm
sharon-fdm previously approved these changes Oct 7, 2025
Copy link
Copy Markdown
Collaborator

@sharon-fdm sharon-fdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will approve assuming the answer to Lucas' comment is positive.

@ksykulev , if not, please fix and Slack for me to reapprove.

Comment thread cmd/osquery-perf/agent.go Outdated
Comment on lines +1661 to +1663
totalCommon := a.softwareCount.common
totalDuplicates := (a.softwareCount.common * a.softwareCount.duplicateBundleIdentifiersPercent) / 100
totalSoftware := totalCommon + totalDuplicates // total software to distribute
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check the math, could you explain what are the expected values for this load test for these flags?

  • --common_software_count
  • --duplicate_bundle_identifiers_percent

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the params I want to use.

  --host_count 500
  --total_host_count 50000
  --host_index_offset [varies per container: 0, 500, 1000, ..., 49500]
  --common_software_count 50000
  --duplicate_bundle_identifiers_percent 1000

This will generate 500 hosts per container. I want a total of 50000 pieces of common software and 1000% duplicate. So:

  • Common software: 50,000
  • Duplicate software: 50,000 × 1000 / 100 = 500,000
  • Total: 550,000 software entries to distribute
  • perHostCount = 550,000 / 50,000 hosts = 11 software entries per host
  • Each bundle ID appears 11 times on average (1 Common + 10 Duplicates). So when the rename happens there will be a total of 10 renames happening from hosts who have duplicate bundle ids (hosts 5,000 - 50,000 will be doing the renaming). This is where the concurrency load test comes in, because all of these (approximately) 45k hosts will be trying to renaming fairly close to the same time.

Container 0 (host_index_offset = 0)

Agent 1
- globalAgentIndex = 0 + (1-1) = 0
- startIdx = 0 × 11 = 0
- endIdx = 0 + 11 = 11
- Gets software indices 0-10
...
Agent 500
- Agent 500:
- globalAgentIndex = 0 + (500-1) = 499
- startIdx = 499 × 11 = 5,489
- endIdx = 5,489 + 11 = 5,500
- Gets software indices 5,489-5,499

Container 10 (host_index_offset = 5000):

Agent 1:
- globalAgentIndex = 5000 + (1-1) = 5,000
- startIdx = 5,000 × 11 = 55,000
- endIdx = 55,000 + 11 = 55,011
- Gets software indices 55,000-55,010

Container 99 (host_index_offset = 49500):

Agent 1:
    - globalAgentIndex = 49500 + (1-1) = 49,500
    - startIdx = 49,500 × 11 = 544,500
    - endIdx = 544,500 + 11 = 544,511
    - Gets software indices 544,500-544,510

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agents with software indices >= 50_000 will trigger the heavy renaming, right?
Sorry, I'm trying to understand the intent.

Overall looks good.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly! So round one we run osquery perf. It generates a bunch of common software and software with duplicate bundle ids.
"com.fleetdm.osquery-perf.common_0"
"com.fleetdm.osquery-perf.common_1"
...
"com.fleetdm.osquery-perf.common_N"

We need to run osquery perf again with the softwareRenaming flag. This will take all the software with duplicate bundle ids and force a rename. This is precisely the reason we need a stable bundle id generation, and the reason for the need for a globalAgentIndex. Between osquery perf run we need to generate exactly the same bundle ids. (otherwise we could just use a hashing function to generate unique bundle ids and avoid all this complexity.)

@lucasmrod lucasmrod self-assigned this Oct 8, 2025
Copy link
Copy Markdown
Member

@lucasmrod lucasmrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left one final question.

@ksykulev ksykulev merged commit b906b1b into main Oct 8, 2025
39 checks passed
@ksykulev ksykulev deleted the software-host-split branch October 8, 2025 16:22
ksykulev added a commit that referenced this pull request Oct 8, 2025
… during renames (#33993)

cherry-pick #33958
no original issue

Co-authored-by: Victor Lyuboslavsky <2685025+getvictor@users.noreply.github.com>
ksykulev added a commit that referenced this pull request Oct 8, 2025
If the total requested common software is 10,000 and there are 200
hosts. Instead of all hosts having 10,000 pieces of software, each host
will have a 50 software slice of the 10,000 total requested. This is
controlled by `common_software_count`. If you wish to have each host
have a certain amount of software, use the `unique_software_count`

<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #33668
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Increased DB Load and http 5xx/4xx after update to v4.73.3

3 participants