Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new low_disk_space URL parameter to the GET /hosts endpoint #7592

Closed
1 task
lukeheath opened this issue Sep 6, 2022 · 15 comments
Closed
1 task

Add new low_disk_space URL parameter to the GET /hosts endpoint #7592

lukeheath opened this issue Sep 6, 2022 · 15 comments
Assignees
Labels
~backend Backend-related issue. ~legacy-interface-product-group Associated with the legacy "interface" product group. (No longer exists)
Milestone

Comments

@lukeheath
Copy link
Member

lukeheath commented Sep 6, 2022

Problem

I'm an engineer managing thousands of hosts and I'm overwhelmed with tracking the following goal:

  • Know how many hosts, and which hosts, have low disk space.

This makes it hard to know which hosts I need to take action on.

Goal

  1. Know which hosts have low disk space

Requirements

  • Fleet Premium users can navigate to a list of hosts filtered by hosts that have less than 32 GB (the API will support any amount between 1-100) of their disk space remaining. The user can navigate to this list via the Fleet UI or by pasting a URL in their browsers URL bar.
  • The Fleet API supports adding more granularity in the future. For example, later, we may want to add the ability to see how many hosts have less than 16 GB of their disk space remaining.

Related

Figma

https://www.figma.com/file/hdALBDsrti77QuDNSzLdkx/%F0%9F%9A%A7-Fleet-EE-(dev-ready%2C-scratchpad)?node-id=9223%3A304723

API Specs

Example response

GET /hosts?low_disk_space=32

{
  "hosts": [
    {
      "created_at": "2020-11-05T05:09:44Z",
      "updated_at": "2020-11-05T06:03:39Z",
      "id": 1,
      "detail_updated_at": "2020-11-05T05:09:45Z",
      "label_updated_at": "2020-11-05T05:14:51Z",
      "seen_time": "2020-11-05T06:03:39Z",
      "hostname": "2ceca32fe484",
      "uuid": "392547dc-0000-0000-a87a-d701ff75bc65",
      "platform": "centos",
      "osquery_version": "2.7.0",
      "os_version": "CentOS Linux 7",
      "build": "",
      "platform_like": "rhel fedora",
      "code_name": "",
      "uptime": 8305000000000,
      "memory": 2084032512,
      "cpu_type": "6",
      "cpu_subtype": "142",
      "cpu_brand": "Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz",
      "cpu_physical_cores": 4,
      "cpu_logical_cores": 4,
      "hardware_vendor": "",
      "hardware_model": "",
      "hardware_version": "",
      "hardware_serial": "",
      "computer_name": "2ceca32fe484",
      "public_ip": "",
      "primary_ip": "",
      "primary_mac": "",
      "distributed_interval": 10,
      "config_tls_refresh": 10,
      "logger_tls_period": 8,
      "additional": {},
      "status": "offline",
      "display_text": "2ceca32fe484",
      "team_id": null,
      "team_name": null,
      "pack_stats": null,
      "issues": {
        "failing_policies_count": 2,
        "total_issues_count": 2
      },
      "geolocation": {
        "country_iso": "US",
        "city_name": "New York",
        "geometry": {
          "type": "point",
          "coordinates": [40.6799, -74.0028]
        }
      }
    },
    ...
  ]
}

Tasks

1

  • Add new low_disk_space integer query parameter to the GET /hosts endpoint.
  • This number is calculated as hosts that have less than that number of GB of available disk space.
  • This filter should also apply when used in addition to other query parameters like team_id.
  • Only numbers between 1-100 inclusively should be allowed.
  • NOTE: as always when adding filters to GET /hosts, it also affects GET /hosts/count, which uses the same filters to get the matching counts.
@lukeheath lukeheath added ~legacy-interface-product-group Associated with the legacy "interface" product group. (No longer exists) ~backend Backend-related issue. labels Sep 6, 2022
@lukeheath lukeheath changed the title Add new low_disk_space query parameter to the GET /hosts endpoint Add new low_disk_space URL parameter to the GET /hosts endpoint Sep 6, 2022
@lukeheath
Copy link
Member Author

@mna Another API update. Would you please review the specs and let me know if you have any questions or concerns?

@mna
Copy link
Member

mna commented Sep 7, 2022

@lukeheath the concern here is performance - this information is in the hosts.gigs_disk_space_available field, which would need to be indexed to avoid a full scan. A new index impacts data ingestion, @chiiph any thoughts about that?

@chiiph
Copy link
Contributor

chiiph commented Sep 7, 2022

We need to have a bigger conversation before we keep adding more and more options to this. If, for some reason, we cannot have that conversation before this ticket, the best I can think of is: let's move that column out of the host table, and index that. Which is going to be an annoying migration to run.

@lukeheath lukeheath added the 5 label Sep 7, 2022
@noahtalerman
Copy link
Member

move that column out of the host table, and index that. Which is going to be an annoying migration to run.

@mna can you help me understand the costs of doing this? Will this increase the time it takes for migrations to run when users upgrade? Will this force us the change the response of the GET /hosts API route? Something else...

That being said, having a conversation about filtering and performance would be valuable. Goal being Noah understands the costs for adding more and more filters.

@mna
Copy link
Member

mna commented Sep 7, 2022

Will this increase the time it takes for migrations to run when users upgrade?

@noahtalerman @chiiph

Potentially, but what we could do is a "lazy migration" in that we create the new table (so the gigs_disk_space_available column lives outside the main hosts table, and has its index in that new table), but leave the old field around and leave the new table empty. Then, as hosts report back to fleet, store that disk space information into the new table instead of the old one.

So the migration is very fast, at the cost of not back-filling it with already existing information - instead we wait for new information to come in to fill it. So it would take ~1h with default settings for all hosts to report back the information in the new table, and until then we would have incomplete information when querying about disk space (even though that information exists in the old table, because we now ignore that old table's field).

And then in a future version we remove that column, which is slow-ish but not as slow as moving all the data for all hosts to the new table and then removing that column. Note that removing could be done right now, but I'd wait for the new table to fill up before doing this just in case e.g. users have to rollback the upgrade or something and end up with no data for this column.

Will this force us the change the response of the GET /hosts API route?

I think not, it should be an implementation detail that doesn't leak to the payload.

@chiiph
Copy link
Contributor

chiiph commented Sep 7, 2022

So the migration is very fast, at the cost of not back-filling it with already existing information - instead we wait for new information to come in to fill it.

+1 to this.

So it would take ~1h with default settings for all hosts to report back the information in the new table

12hrs in some customer's setups, fwiw. And that is assuming all devices are online.

@mna
Copy link
Member

mna commented Sep 14, 2022

@noahtalerman Can you confirm that the "lazy" migration process mentioned in the above 2 comments is acceptable from your point of view? Basically, it would be as if the gigs_disk_space_available field was a new host info that we just added, and we would only see it as we receive that information from the hosts. As Tomas mentions, this could take up to 12h for some users, depending on their configuration (and of course the offline hosts would not have that information until they did come back online).

@mna
Copy link
Member

mna commented Sep 14, 2022

Given the requirements of the disk encryption ticket (#3906 (comment)), I'll go ahead and create a new host_disks table, which can accommodate both information.

@noahtalerman
Copy link
Member

confirm that the "lazy" migration process mentioned in the above 2 comments is acceptable

@mna this sounds good to me 👍

@mna
Copy link
Member

mna commented Sep 19, 2022

A bit surprised that it's not already there, but osquery-perf does not return this disk space information, I'll have to add it.

@mna
Copy link
Member

mna commented Sep 19, 2022

Tested with data (1000 hosts) that it does use the new index on host_disks.gigs_disk_space_available when searching hosts with less than a few GB (I updated the WHERE clause to use 10GB instead of 32 as there were too many matches at 32 to be worth using an index):

mysql> explain SELECT     h.id,     h.osquery_host_id,     h.created_at,     h.updated_at,     h.detail_updated_at,     h.node_key,     h.hostname,     h.uuid,     h.platform,     h.osquery_version,     h.os_version,     h.build,     h.platform_like,     h.code_name,     h.uptime,     h.memory,     h.cpu_type,
h.cpu_subtype,     h.cpu_brand,     h.cpu_physical_cores,     h.cpu_logical_cores,     h.hardware_vendor,     h.hardware_model,     h.hardware_version,     h.hardware_serial,     h.computer_name,     h.primary_ip_id,     h.distributed_interval,     h.logger_tls_period,     h.config_tls_refresh,     h.primary_ip,
  h.primary_mac,     h.label_updated_at,     h.last_enrolled_at,     h.refetch_requested,     h.team_id,     h.policy_updated_at,     h.public_ip,     COALESCE(hd.gigs_disk_space_available, 0) as gigs_disk_space_available,     COALESCE(hd.percent_disk_space_available, 0) as percent_disk_space_available, COALESCE(hst.seen_time, h.created_at) AS seen_time, t.name AS team_name, coalesce(failing_policies.count, 0) as failing_policies_count, coalesce(failing_policies.count,
0) as total_issues_count FROM hosts h LEFT JOIN host_seen_times hst ON (h.id = hst.host_id) LEFT JOIN teams t ON (h.team_id = t.id)     LEFT JOIN host_disks hd ON hd.host_id = h.id LEFT JOIN (     SELECT host_id, count(*) as count FROM policy_membership WHERE passes = 0     GROUP BY host_id ) as failing_policies ON (h.id=failing_policies.host_id) WHERE TRUE AND TRUE AND TRUE AND TRUE AND hd.gigs_disk_space_available < 10      LIMIT 1000000;
+----+-------------+-------------------+------------+--------+-----------------------------------------------------------------------------------------------------------+------------------------------------------+---------+------------------+------+----------+-----------------------------------------------------------+
| id | select_type | table             | partitions | type   | possible_keys                                                                                             | key                                      | key_len | ref              | rows | filtered | Extra                                                     |
+----+-------------+-------------------+------------+--------+-----------------------------------------------------------------------------------------------------------+------------------------------------------+---------+------------------+------+----------+-----------------------------------------------------------+
|  1 | PRIMARY     | hd                | NULL       | range  | PRIMARY,idx_host_disks_gigs_disk_space_available                                                          | idx_host_disks_gigs_disk_space_available | 4       | NULL             |  156 |   100.00 | Using index condition                                     |
|  1 | PRIMARY     | h                 | NULL       | eq_ref | PRIMARY                                                                                                   | PRIMARY                                  | 4       | fleet.hd.host_id |    1 |   100.00 | NULL                                                      |
|  1 | PRIMARY     | hst               | NULL       | eq_ref | PRIMARY                                                                                                   | PRIMARY                                  | 4       | fleet.hd.host_id |    1 |   100.00 | NULL                                                      |
|  1 | PRIMARY     | t                 | NULL       | eq_ref | PRIMARY                                                                                                   | PRIMARY                                  | 4       | fleet.h.team_id  |    1 |   100.00 | NULL                                                      |
|  1 | PRIMARY     | <derived2>        | NULL       | ALL    | NULL                                                                                                      | NULL                                     | NULL    | NULL             |    2 |   100.00 | Using where; Using join buffer (Block Nested Loop)        |
|  2 | DERIVED     | policy_membership | NULL       | ref    | PRIMARY,idx_policy_membership_passes,idx_policy_membership_policy_id,idx_policy_membership_host_id_passes | idx_policy_membership_passes             | 2       | const            |    1 |   100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------------+------------+--------+-----------------------------------------------------------------------------------------------------------+------------------------------------------+---------+------------------+------+----------+-----------------------------------------------------------+

(updated with the actual query executed when listing hosts by low_disk_space)

@mna
Copy link
Member

mna commented Sep 20, 2022

@noahtalerman @lukeheath Two things I'd like to clarify:

  1. The low_disk_space=true criteria is a boolean, but do we want to return only hosts that are not low on disk space when/if it is set to false? Or is it just a meaningful criteria when it is set to true?
  2. The requirements clearly state that this is for Fleet Premium users, so for free uses, this criteria would be ignored (as if it wasn't provided), correct?

@lukeheath
Copy link
Member Author

@mna

  1. I believe this only matters if set to true. As of now, there is not a way in the UI to set the filter to false.
  2. Correct, for free users it would be ignored.

One item worth considering is that in the future it's likely there will be additional ranges, for example "show all hosts with less than 64 GB". Is it better for us to have separate boolean query params for this, or should we consider setting a number now? For example, disk_space_gb_less_than=32. It seems like an awkwardly long query param name, but maybe more scaleable? For performance purposes, does it matter if we're passing a number or a boolean?

@mna
Copy link
Member

mna commented Sep 20, 2022

@lukeheath This is an interesting point - I think a number would better convey that it only works for filtering hosts with low disk space (otherwise it could be expected for API users that low_disk_space=false returns those with sufficient disk space only, even though this wouldn't be used in the frontend).

Of course we'd have to validate that number so we don't accept extreme values that could cause full scans, but otherwise it might be best to allow that flexibility? What do you think of simply using that query parameter as-is, but as a number instead of boolean, e.g. low_disk_space=32 and then we document that this number represents the threshold in gigs, and we accept say anything between 1-100 for now?

For performance this does not matter as long as we don't blindly accept any number - internally the boolean is converted to a number in the query anyway, i.e. we don't pre-compute the low-disk hosts or something like that.

@lukeheath
Copy link
Member Author

@mna Got it. Good idea on the query param. Yes, let's go ahead with setting the query param to low_disk_space=32 as that will set us up nicely for future filters. Would you please update the specs on this ticket and notify the frontend of the change? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
~backend Backend-related issue. ~legacy-interface-product-group Associated with the legacy "interface" product group. (No longer exists)
Projects
No open projects
Archived in project
Development

No branches or pull requests

4 participants