Find hosts with the most issues #18115

noahtalerman · 2024-04-08T14:11:42Z

Goal

User story
As a security leader,
I want to sort hosts by issues (# failing policis + # critical vulns - CVSS > 8.9)
so that I can ask for the owners of these hosts to focus on fixing the hosts w/ the most issues.

Context

Product designer: @noahtalerman

Changes

Product

UI changes: Figma wireframes
REST API changes: API design PR
Permissions changes: All roles can access the number of issues
Outdated documentation changes: If documented, update the definition of issues in Fleet. UPDATE: No mention of these "issues" found (noahtalerman 2024-07-01)
Changes to paid features or tiers: Failing policies count and Issues count is available in Fleet Free and Fleet Premium. Critical vulns count is only available in Fleet Premium

Engineering

Database schema migrations: TODO
Load testing: TODO

ℹ️ Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

Risk assessment

Requires load testing: Yes
Risk level: High
Risk description: We added an additional DB write for every policy result processing from the host.

Load testing plan

For the below scenarios, monitor latency and DB performance.

Start with 100K hosts failing a policy. Modify the SQL of that policy.
Start with 100K hosts failing a policy. Modify the platforms of that policy (like uncheck "windows").
Start with 100K hosts failing a policy. Transfer them to a different team.
Start with 100K hosts failing a policy. Delete that policy.

Testing notes

Confirmation

Engineer (@____): Added comment to user story confirming successful completion of QA.
QA (@____): Added comment to user story confirming successful completion of QA.

noahtalerman · 2024-04-11T19:51:31Z

Add sort to the "Issues" column on the Hosts page. Update issues count = # failing critical policies + # of vulnerabilities w/ known exploits (CISA KEV)

noahtalerman · 2024-04-11T20:18:42Z

Hey @cjwalton this story covers the next iteration of Fleet's version of a host "risk score."

Our understanding is that y'all are looking for a way to prioritize hosts that need fixing/updating/patching.

The plan is to allow y'all to sort hosts by "issues" in Fleet: # critical policies failed + # vulns w/ known exploits (from CISA KEV)

Jason: EPSS takes CISA KEV into an account. Maybe let's start w/ EPSS > 70% and/or CVSS > 8

We want to start simple so that we move quickly for y'all while leaving the door open for future iterations.

I recorded a Loom video that walks through the improvement in more details here: https://www.loom.com/share/d594151980ec47298efafb159f0e91b1?sid=c4465470-cc23-47d9-9d86-b2542898774f

What do you think?

noahtalerman · 2024-05-22T13:58:14Z

Jason: EPSS takes CISA KEV into an account. Maybe let's start w/ EPSS > 70% and/or CVSS > 8

Hey @cjwalton, based on your feedback (above) we tweaked the "Issues" count to include critical vulns (CVEs w/ CVSS score > 8.9):

The plan is to start with this.

In future iterations we can add the ability to customize the "Issues" count. For example:

Only include critical vulns (no failing policies)
Include vulns w/ EPSS > 70%. With or w/o critical vulns. With or w/o failing policies

Does that work for you?

sharon-fdm · 2024-05-29T18:38:32Z

BE 5
FE 5

RachelElysia · 2024-05-29T18:59:55Z

@sharon-fdm

@jacobshandling mentioned the scope of this for FE is probably larger than anticipated.

TLDR: Looks like device user page and host details page use the same code, HostSummary card to render issues. Make that into reusable component for ManageHostsPage, and then add sort and empty state…. And basic tests for all?

To be thorough, this might be a 5.

## Issue Unreleased fix for #18115 ## Description - BE shows `0` count for empty state so FE needs to account for `0` instead of `undefined` ## Screenshot of fix <img width="1219" alt="Screenshot 2024-06-18 at 5 00 04 PM" src="https://github.com/fleetdm/fleet/assets/71795832/cd6ec944-ce99-4f8e-a630-9bf037abd0b9"> # Checklist for submitter If some of the following don't apply, delete the relevant line.  - [x] Manual QA for all new/changed functionality

getvictor · 2024-06-21T11:56:18Z

@xpkoala The regular QA was done by @RachelElysia

I added a few test scenarios for load testing in the description, and moved issue back to Awaiting QA.

xpkoala · 2024-06-21T15:16:38Z

Thanks @getvictor!

xpkoala · 2024-06-24T16:04:48Z

Found an issue when modifying a policy that affects 50k+ hosts with 100k+ hosts enrolled.

level=error ts=2024-06-24T15:33:47.706507287Z component=http user=tomas@fleetdm.com method=PATCH uri=/api/latest/fleet/policies/1 took=3.712123448s name="Q1 (1)" sql="SELECT * FROM osquery_info" err="saving policy: update failing policies in host issues: Error 1436 (HY000): Thread stack overrun: 242191 bytes used of a 262144 byte stack, and 20000 bytes needed. Use 'mysqld --thread_stack=#' to specify a bigger stack."

Reproduce:

With a policy that affects a large number of hosts (50k in this scenario)
Choose to edit the affected OS's for the policy (I removed 'mac' which would have set the # of hosts affected by the policy to 0)
Error banner "Something went wrong"

A 422 http error is recorded in the web console.

sharon-fdm · 2024-06-24T16:40:24Z

Remaining work reset to 1 point.

xpkoala · 2024-06-24T16:44:12Z

Docker image being used to test this fix is 4530loadtestA

#18115 Fixing unreleased bug found when load testing host issues update.

#18115 Fixing issue saw in load test: ``` level=error ts=2024-06-25T17:09:08.230514976Z cron=vulnerabilities schedule=vulnerabilities instanceID="5boTc/PamsSp8Jsh4kiEOpECmPu+bmOAJaVX4XV7ZOG4vgO4U6peHyxH8mFQhBXYJt+roRpwNuGmUoEI8n/otg==" err="running job" details="get critical vulnerabilities count: Error 1114 (HY000): The table '/rdsdbdata/tmp/#sql127_6b4b_ad107' is full" jobID=update_host_issues_vulnerabilities_counts ```

#18115 Fixing unreleased bug found when load testing host issues update. (cherry picked from commit 246c6d1)

#18115 Fixing issue saw in load test: ``` level=error ts=2024-06-25T17:09:08.230514976Z cron=vulnerabilities schedule=vulnerabilities instanceID="5boTc/PamsSp8Jsh4kiEOpECmPu+bmOAJaVX4XV7ZOG4vgO4U6peHyxH8mFQhBXYJt+roRpwNuGmUoEI8n/otg==" err="running job" details="get critical vulnerabilities count: Error 1114 (HY000): The table '/rdsdbdata/tmp/#sql127_6b4b_ad107' is full" jobID=update_host_issues_vulnerabilities_counts ``` (cherry picked from commit 918773b)

marko-lisica · 2024-06-27T14:50:47Z

Hey @pintomi1989 this story has shipped.

@noahtalerman There are TODOs in issue description to solve before moving to closed.

API changes for the "Find hosts with the most issues" story - #18115

noahtalerman · 2024-07-01T20:06:53Z

There are TODOs in issue description to solve before moving to closed.

Docs are merged!

fleet-release · 2024-07-01T20:06:56Z

Sorting hosts by flaws,
A beacon in the cloud haze,
Security evolves.

noahtalerman added story A user story defining an entire feature customer-schur :product Product Design department (shows up on 🦢 Drafting board) labels Apr 8, 2024

noahtalerman self-assigned this Apr 8, 2024

noahtalerman added ~feature fest Will be reviewed at next Feature Fest and removed :product Product Design department (shows up on 🦢 Drafting board) labels Apr 18, 2024

noahtalerman removed their assignment Apr 19, 2024

noahtalerman removed the ~feature fest Will be reviewed at next Feature Fest label Apr 19, 2024

noahtalerman added ~feature fest Will be reviewed at next Feature Fest #g-endpoint-ops Endpoint ops product group and removed ~feature fest Will be reviewed at next Feature Fest labels May 9, 2024

noahtalerman self-assigned this May 10, 2024

noahtalerman added the :product Product Design department (shows up on 🦢 Drafting board) label May 10, 2024

noahtalerman mentioned this issue May 22, 2024

API design: Find hosts with the most issues #19198

Closed

noahtalerman assigned sharon-fdm and unassigned noahtalerman May 23, 2024

sharon-fdm assigned getvictor and RachelElysia and unassigned sharon-fdm Jun 3, 2024

sharon-fdm added :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. and removed :product Product Design department (shows up on 🦢 Drafting board) labels Jun 3, 2024

sharon-fdm added this to the 4.52.0-tentative milestone Jun 3, 2024

spokanemac mentioned this issue Jun 3, 2024

Release article: v4.53.0 #19469

Closed

1 task

RachelElysia mentioned this issue Jun 6, 2024

Fleet UI: Add critical vuln count to host issues count #19574

Merged

2 tasks

lukeheath removed this from the 4.52.0-tentative milestone Jun 7, 2024

RachelElysia mentioned this issue Jun 18, 2024

Fleet UI: Account for 0 as empty state instead of undefined #19854

Merged

1 task

RachelElysia mentioned this issue Jun 20, 2024

[unreleased. bug] Fleet UI: Grey empty cell --- for host issues #19900

Merged

1 task

getvictor mentioned this issue Jun 24, 2024

Fixing MySQL thread stack error during host issues update. #19956

Merged

lukeheath modified the milestones: 4.53.0, 4.54.0-tentative Jun 24, 2024

sharon-fdm unassigned RachelElysia Jun 24, 2024

getvictor added a commit that referenced this issue Jun 25, 2024

Fixing MySQL thread stack error during host issues update. (#19956)

246c6d1

#18115 Fixing unreleased bug found when load testing host issues update.

getvictor mentioned this issue Jun 25, 2024

Fixing DB issue with vuln counts. #19989

Merged

xpkoala modified the milestones: 4.54.0-tentative, 4.53.0 Jun 25, 2024

getvictor added a commit that referenced this issue Jun 25, 2024

Fixing MySQL thread stack error during host issues update. (#19956)

91c7479

#18115 Fixing unreleased bug found when load testing host issues update. (cherry picked from commit 246c6d1)

getvictor added a commit that referenced this issue Jun 25, 2024

2 unreleased bug fixes for #18115 from main (#19992)

8412696

lukeheath added :product Product Design department (shows up on 🦢 Drafting board) and removed :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. labels Jun 26, 2024

noahtalerman assigned noahtalerman and unassigned getvictor Jun 28, 2024

noahtalerman mentioned this issue Jul 1, 2024

API design: Find hosts with the most issues #20122

Merged

noahtalerman added a commit that referenced this issue Jul 1, 2024

API design: Find hosts with the most issues (#20122)

93eb610

API changes for the "Find hosts with the most issues" story - #18115

noahtalerman closed this as completed Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find hosts with the most issues #18115

Find hosts with the most issues #18115

noahtalerman commented Apr 8, 2024 •

edited

Loading

noahtalerman commented Apr 11, 2024

noahtalerman commented Apr 11, 2024 •

edited

Loading

noahtalerman commented May 22, 2024 •

edited

Loading

sharon-fdm commented May 29, 2024 •

edited

Loading

RachelElysia commented May 29, 2024

getvictor commented Jun 21, 2024

xpkoala commented Jun 21, 2024

xpkoala commented Jun 24, 2024

sharon-fdm commented Jun 24, 2024

xpkoala commented Jun 24, 2024

marko-lisica commented Jun 27, 2024

noahtalerman commented Jul 1, 2024

fleet-release commented Jul 1, 2024

Find hosts with the most issues #18115

Find hosts with the most issues #18115

Comments

noahtalerman commented Apr 8, 2024 • edited Loading

Goal

Context

Changes

Product

Engineering

QA

Risk assessment

Load testing plan

Testing notes

Confirmation

noahtalerman commented Apr 11, 2024

noahtalerman commented Apr 11, 2024 • edited Loading

noahtalerman commented May 22, 2024 • edited Loading

sharon-fdm commented May 29, 2024 • edited Loading

RachelElysia commented May 29, 2024

getvictor commented Jun 21, 2024

xpkoala commented Jun 21, 2024

xpkoala commented Jun 24, 2024

sharon-fdm commented Jun 24, 2024

xpkoala commented Jun 24, 2024

marko-lisica commented Jun 27, 2024

noahtalerman commented Jul 1, 2024

fleet-release commented Jul 1, 2024

noahtalerman commented Apr 8, 2024 •

edited

Loading

noahtalerman commented Apr 11, 2024 •

edited

Loading

noahtalerman commented May 22, 2024 •

edited

Loading

sharon-fdm commented May 29, 2024 •

edited

Loading