fix issues with batch script summary#32516
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## rc-minor-fleet-v4.73.0 #32516 +/- ##
=========================================================
Coverage ? 62.06%
=========================================================
Files ? 1986
Lines ? 194301
Branches ? 6458
=========================================================
Hits ? 120586
Misses ? 64134
Partials ? 9581
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| case fleet.BatchScriptExecutionRan: | ||
| batchScriptExecutionIDFilter += ` AND hsr.exit_code = 0` | ||
| batchScriptExecutionIDFilter += ` AND hsr.exit_code = 0 AND hsr.canceled = 0` | ||
| case fleet.BatchScriptExecutionPending: | ||
| // Pending can mean "waiting for execution" or "waiting for results". | ||
| batchScriptExecutionJoin += ` LEFT JOIN upcoming_activities ua ON ua.execution_id = bsehr.host_execution_id` | ||
| batchScriptExecutionIDFilter += ` AND ((ua.execution_id IS NOT NULL) OR (hsr.host_id is NOT NULL AND hsr.exit_code IS NULL AND hsr.canceled = 0 AND bsehr.error IS NULL))` | ||
| // hsr.exit_code IS NULL <- this means the script has not reported back | ||
| // (hsr.canceled IS NULL OR hsr.canceled = 0) <- this can mean the script is running, or that it hasn't been activated yet, | ||
| // but either way we haven't canceled it. | ||
| // bsehr.error IS NULL <- this means the batch script framework didn't mark this host as incompatible | ||
| // with this script run. | ||
| batchScriptExecutionIDFilter += ` AND (hsr.exit_code IS NULL AND (hsr.canceled IS NULL OR hsr.canceled = 0) AND bsehr.error IS NULL)` | ||
| case fleet.BatchScriptExecutionErrored: | ||
| // TODO - remove exit code condition when we split up "errored" and "failed" | ||
| batchScriptExecutionIDFilter += ` AND hsr.exit_code > 0` | ||
| batchScriptExecutionIDFilter += ` AND hsr.exit_code > 0 AND hsr.canceled = 0` | ||
| case fleet.BatchScriptExecutionIncompatible: | ||
| batchScriptExecutionIDFilter += ` AND bsehr.error IS NOT NULL` | ||
| case fleet.BatchScriptExecutionCanceled: |
There was a problem hiding this comment.
This code has been moved to a separate method on main, but is otherwise the same. It fixes issues around mistakenly counting canceled hosts in other states, and fixes the definition of "pending".
| COUNT(*) as num_targeted, | ||
| COUNT(bsehr.host_id) as num_targeted, | ||
| COUNT(bsehr.error) as num_did_not_run, | ||
| COUNT(CASE WHEN hsr.exit_code = 0 THEN 1 END) as num_succeeded, | ||
| COUNT(CASE WHEN hsr.exit_code > 0 THEN 1 END) as num_failed, | ||
| COUNT(CASE WHEN hsr.canceled = 1 AND hsr.exit_code IS NULL THEN 1 END) as num_cancelled | ||
| FROM | ||
| batch_activity_host_results bsehr | ||
| batch_activities ba | ||
| LEFT JOIN batch_activity_host_results bsehr | ||
| ON ba.execution_id = bsehr.batch_execution_id | ||
| LEFT JOIN | ||
| host_script_results hsr | ||
| ON bsehr.host_execution_id = hsr.execution_id | ||
| WHERE | ||
| bsehr.batch_execution_id = ?` | ||
| ba.execution_id = ?` |
There was a problem hiding this comment.
This logic is no longer executed on main since we don't use this modal anymore, but is the roughly the same as this logic in the newer ListBatchScriptExecutions method. It uses batch_activities as the base table and left joins everything else, since the row representing the batch script run will still remain even if all the hosts in the run (and their related batch_activity_host_results and host_script_results records) are deleted. This lets us accurately report on the status of batches with deleted hosts.
| COUNT(*) AS num_targeted, | ||
| COUNT(bahr.host_id) AS num_targeted, | ||
| COUNT(bahr.error) AS num_incompatible, | ||
| COUNT(IF(hsr.exit_code = 0, 1, NULL)) AS num_ran, | ||
| COUNT(IF(hsr.exit_code > 0, 1, NULL)) AS num_errored, | ||
| COUNT(IF(hsr.canceled = 1 AND hsr.exit_code IS NULL, 1, NULL)) AS num_canceled | ||
| FROM batch_activity_host_results AS bahr | ||
| FROM batch_activities AS ba2 | ||
| LEFT JOIN batch_activity_host_results AS bahr | ||
| ON ba2.execution_id = bahr.batch_execution_id | ||
| LEFT JOIN host_script_results AS hsr | ||
| ON bahr.host_execution_id = hsr.execution_id | ||
| JOIN batch_activities AS ba2 | ||
| ON ba2.execution_id = bahr.batch_execution_id | ||
| ON bahr.host_execution_id = hsr.execution_id |
There was a problem hiding this comment.
Similar to above, this ensures that we accurately mark a batch as completed if it has deleted hosts. This will be applied to main separately.
# Details Applying the patches from #32516 (comment) and #32563 onto `main`. This fixes: * Batches where all remaining hosts are deleted will be correctly marked as "finished" * Batches scheduled for the future, and then canceled, will have al hosts marked as "canceled" rather than pending # Checklist for submitter If some of the following don't apply, delete the relevant line. - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] If paths of existing endpoints are modified without backwards compatibility, checked the frontend/CLI for any necessary changes ## Testing - [X] Added/updated automated tests - [X] Where appropriate, [automated tests simulate multiple hosts and test for host isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing) (updates to one hosts's records do not affect another) - [X] QA'd all new/changed functionality manually - [X] Started a new batch script run with one host, deleted that host, and triggered the batch_activity_completion_checker schedule, and verified that the batch was moved to "finished". - [X] Scheduled script - Created a new batch script run scheduled for a future date - Canceled that batch run - Clicked on the batch run in the "finished" tab of the batch scripts list - Verified that the number of canceled hosts = the number of targeted hosts for that batch, and all other numbers were 0. - Verified that clicking the canceled hosts number navigated to the correct list of canceled hosts - [X] "Run now" script - Created a new batch script run with "run now" - Waited for at least one host to run. - Canceled that batch - Clicked on the batch run in the "finished" tab of the batch scripts list - Verified that the number of canceled hosts = the number that were still pending when I canceled the script, and that the # of pending hosts was 0 - Verified that clicking the canceled hosts number navigated to the correct list of canceled hosts - [X] Multiple batches with the same hosts don't bleed into each other - Created another batch script with the same hosts, scheduled for the future - Verified that the "pending" host list is correct
for #32468
for #32466
Details
This PR fixes the following issues:
The first three issues are already fixed on the main branch, hence the PR directly to the 4.73 release candidate. The last issue will be patched on main in a separate PR (with tests) immediately following this once.
Checklist for submitter
If some of the following don't apply, delete the relevant line.
SELECT *is avoided, SQL injection is prevented (using placeholders for values in statements)Testing
Added/updated automated tests
Where appropriate, automated tests simulate multiple hosts and test for host isolation (updates to one hosts's records do not affect another)
QA'd all new/changed functionality manually
batch_activity_completion_checkerschedule, and verified that the batch was moved to "finished".For unreleased bug fixes in a release candidate, one of: