You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to periodically clean old refetch entries from nano_commands, nano_enrollment_queue and nano_command_results
so that I can trust that querying those tables stays performant.
Roadmap item
Original requests
Related to #34797 but no specific customer request
Resources
Just some back of the envelope math:
A deployment with iOS/iPadOS devices will generate:
3 nano_commands entries every 10 minutes for REFETCH-* commands which will be sent to any devices that haven't had their details updated in 1 hour and don't have outstanding REFETCH commands.
Which works out to 3 entries in nano_enrollment_queue and nano_command_results every hour for online and responding devices.
Assuming devices are online enough to respond for 12 hours a day(optimistically low perhaps) we get:
3 commands * 500 devices * 12 hours = 18000 entries per day = 6,570,000 entries per year. And we never delete them which leads to significant table bloat. For customer-fairbank 95% of their entries in nano_enrollment_queue and nano_command_results were REFETCH-* entries and this data is of low use(it all gets pumped into things like host details and softwre inventory anyways) but slows down queries like the "which devices have active commands" query pretty badly.
Current thinking is that unless a user disables it we could delete all previous(or perhaps some previous - maybe 3 at a time) REFETCH-* nano_enrollment_queue and nano_command_results entries of a given type for a device. This should spread out the deletes enough not to impact performance too badly while also shrinking the table quite a bit over time
Changes
Nano commands deletion: Whenever a REFETCH command is successfully acknowledged by a host, delete up to 3(ths was originally 100 but changed to 3 for performance/load reasons) nano_command_results and nano_enrollment_queue entries, of the same type of commands, of commands sent and acknowledged(or errored) at least 1 month prior. The current command should never be deleted, only old ones, but they need not be the oldest. Data ingestion based on the REFETCH command will proceed as expected with no changes.
As a technical note, you should be absolutely sure that corresponding nano_enrollment_queue and nano_command_results entries are deleted in the same transaction so as not to risk re-sending or any other issues
Add a cron job that runs hourly, SELECTs up to 100 REFETCH- prefixed commands older than 30 days from nano_commands. Delete any which have no references in nano_enrollment_queue and nano_command_results.
The only user visible change, listed below, will be that admins will over time see less historic entries for iOS/iPadOS host refetch commands
Product
UI changes: None
CLI (fleetctl) usage changes: None
YAML changes: None
REST API changes: None
Fleet's agent (fleetd) changes: None
GitOps mode UI changes:None
GitOps generation changes: None
Activity changes: None
Permissions changes: None
Changes to paid features or tiers: None
My device and fleetdm.com/better changes: None
Usage statistics: None
Other reference documentation changes: We likely need to call out that we will no longer be listing old results for the regularly-scheduled ListInstalledApplication, ListCertificates and DeviceInformation
Load testing: Build out a test with 1500 simulated iOS/iPadOS devices an an instrumented that sends refetch queries every 5 minutes to iOS/iPadOS host. Let this run for at least a few days. Run a query so that all of those commands are now > 30 days in the past in terms of created_at timestamps, then upgrade to this build and verify that there is not significantly increased load during deletion(and also that the number of commands in the table shrinks)
This branch can be used to load up the DB prior to deploying the fix branch: DO NOT MERGE/FOR QA ONLY: Make refetches happen way faster #42644 . If you run it for a couple of days you will have very large nano tables. Note that if the DB is already under heavy load/at 99% you may still see some contention since the tables are already under contention but if the DB is not at that point yet you should not see heavy load associated with deletes
Load testing/osquery-perf improvements: None
This is a premium only feature: No
ℹ️ Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".
QA
Risk assessment
Requires load testing: Yes, see above
Risk level: Low
Test plan
Make sure to go through the list and consider all events that might be related to this story, so we catch edge cases earlier.
Verify that iOS/iPhone refetching operates as expected(e.g. hourly cadence) including software and certificate lists updating for all enrollment types(ADE, profile-based manual enrollment, personal enrollment)
Verify that the mdm command list in the UI returns at least the result of the most recent REFETCH- commands
Verify that nano_enrollment_queue and nano_command_results DB tables decrease in size when a system is "Steady state"(i.e. no new profiles/VPP apps being pushed) for several days and all of the refetch commands are set to > 30 days old
Verify that if a load test is done as described in the Engineering section above the size of nano_enrollment_queue and nano_command_results is reduced
Send a manual InstalledApplicationList, CertificateList and DeviceInformation query via the commands API(e.g. not in the refetch flow). Use DB queries to update their nano_commands, nano_enrollment_queue and nano_command_results table creatd_at and updated_at values to be greater than 30 days in the past. Verify that they do not get deleted, even if the device has less than 100 prior commands in the list
Testing notes
It may help for engineering to provide an instrumented build of a prior release which runs the refetcher much more often than usual to help fill up these tables so that the impact of the deletion can be observed. Engineering can also provide the queries discussed above
Confirmation
Engineer: Added comment to user story confirming successful completion of test plan.
QA: Added comment to user story confirming successful completion of test plan.
Goal
Roadmap item
Original requests
Related to #34797 but no specific customer request
Resources
Just some back of the envelope math:
A deployment with iOS/iPadOS devices will generate:
3 nano_commands entries every 10 minutes for REFETCH-* commands which will be sent to any devices that haven't had their details updated in 1 hour and don't have outstanding REFETCH commands.
Which works out to 3 entries in nano_enrollment_queue and nano_command_results every hour for online and responding devices.
Assuming devices are online enough to respond for 12 hours a day(optimistically low perhaps) we get:
3 commands * 500 devices * 12 hours = 18000 entries per day = 6,570,000 entries per year. And we never delete them which leads to significant table bloat. For
customer-fairbank95% of their entries in nano_enrollment_queue and nano_command_results were REFETCH-* entries and this data is of low use(it all gets pumped into things like host details and softwre inventory anyways) but slows down queries like the "which devices have active commands" query pretty badly.Current thinking is that unless a user disables it we could delete all previous(or perhaps some previous - maybe 3 at a time) REFETCH-* nano_enrollment_queue and nano_command_results entries of a given type for a device. This should spread out the deletes enough not to impact performance too badly while also shrinking the table quite a bit over time
Changes
Nano commands deletion: Whenever a REFETCH command is successfully acknowledged by a host, delete up to 3(ths was originally 100 but changed to 3 for performance/load reasons) nano_command_results and nano_enrollment_queue entries, of the same type of commands, of commands sent and acknowledged(or errored) at least 1 month prior. The current command should never be deleted, only old ones, but they need not be the oldest. Data ingestion based on the REFETCH command will proceed as expected with no changes.
As a technical note, you should be absolutely sure that corresponding nano_enrollment_queue and nano_command_results entries are deleted in the same transaction so as not to risk re-sending or any other issues
Add a cron job that runs hourly, SELECTs up to 100 REFETCH- prefixed commands older than 30 days from nano_commands. Delete any which have no references in nano_enrollment_queue and nano_command_results.
The only user visible change, listed below, will be that admins will over time see less historic entries for iOS/iPadOS host refetch commands
Product
Engineering
This branch can be used to load up the DB prior to deploying the fix branch: DO NOT MERGE/FOR QA ONLY: Make refetches happen way faster #42644 . If you run it for a couple of days you will have very large nano tables. Note that if the DB is already under heavy load/at 99% you may still see some contention since the tables are already under contention but if the DB is not at that point yet you should not see heavy load associated with deletes
QA
Risk assessment
Test plan
Testing notes
It may help for engineering to provide an instrumented build of a prior release which runs the refetcher much more often than usual to help fill up these tables so that the impact of the deletion can be observed. Engineering can also provide the queries discussed above
Confirmation