fix issues with batch script summary by sgress454 · Pull Request #32516 · fleetdm/fleet

sgress454 · 2025-09-02T19:28:31Z

Details

This PR fixes the following issues:

When viewing the summary for a batch script scheduled for the future, clicking the "Pending" hosts number now shows the correct hosts targeted by the batch, rather than an empty list.
When viewing the summary for a batch script that has been canceled, the number of canceled hosts will correctly be displayed in the "canceled" row rather than the "pending row".
When viewing the summary for any batch script, the table now splits up Error and Incompatible states and clicking on either goes to the correct list of hosts in that state.
When a batch script run has all of its targeted hosts deleted, it will now be cleaned up and marked as "finished".

The first three issues are already fixed on the main branch, hence the PR directly to the 4.73 release candidate. The last issue will be patched on main in a separate PR (with tests) immediately following this once.

Checklist for submitter

If some of the following don't apply, delete the relevant line.

Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements)

Testing

Added/updated automated tests
Where appropriate, automated tests simulate multiple hosts and test for host isolation (updates to one hosts's records do not affect another)
QA'd all new/changed functionality manually
- Started a new batch script to run "now" using osquery-perf, and quit the agent halfway through the run so I could test the summary modal with hosts in different states. Verified that the host list had the correct # of hosts for each state.
- Canceled that batch and verified that the # of hosts formerly shown as "pending" in the summary modal now appeared as "canceled", and verified that clicking that number showed the correct list of hosts.
- Started a new batch script to run in the future, and verified that clicking the "pending" hosts number showed the correct hosts in the list
- Started a new batch script run with one host, deleted that host, and triggered the batch_activity_completion_checker schedule, and verified that the batch was moved to "finished".

For unreleased bug fixes in a release candidate, one of:

Confirmed that the fix is not expected to adversely impact load test results

codecov · 2025-09-02T19:31:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (rc-minor-fleet-v4.73.0@937e970). Learn more about missing BASE report.

Additional details and impacted files

@@                    Coverage Diff                    @@
##             rc-minor-fleet-v4.73.0   #32516   +/-   ##
=========================================================
  Coverage                          ?   62.06%           
=========================================================
  Files                             ?     1986           
  Lines                             ?   194301           
  Branches                          ?     6458           
=========================================================
  Hits                              ?   120586           
  Misses                            ?    64134           
  Partials                          ?     9581

Flag	Coverage Δ
backend	`63.17% <100.00%> (?)`
frontend	`51.46% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sgress454 · 2025-09-02T19:29:36Z

 			case fleet.BatchScriptExecutionRan:
-				batchScriptExecutionIDFilter += ` AND hsr.exit_code = 0`
+				batchScriptExecutionIDFilter += ` AND hsr.exit_code = 0 AND hsr.canceled = 0`
 			case fleet.BatchScriptExecutionPending:
 				// Pending can mean "waiting for execution" or "waiting for results".
-				batchScriptExecutionJoin += ` LEFT JOIN upcoming_activities ua ON ua.execution_id = bsehr.host_execution_id`
-				batchScriptExecutionIDFilter += ` AND ((ua.execution_id IS NOT NULL) OR (hsr.host_id is NOT NULL AND hsr.exit_code IS NULL AND hsr.canceled = 0 AND bsehr.error IS NULL))`
+				// hsr.exit_code IS NULL <- this means the script has not reported back
+				// (hsr.canceled IS NULL OR hsr.canceled = 0) <- this can mean the script is running, or that it hasn't been activated yet,
+				//                      but either way we haven't canceled it.
+				// bsehr.error IS NULL <- this means the batch script framework didn't mark this host as incompatible
+				//                        with this script run.
+				batchScriptExecutionIDFilter += ` AND (hsr.exit_code IS NULL AND (hsr.canceled IS NULL OR hsr.canceled = 0) AND bsehr.error IS NULL)`
 			case fleet.BatchScriptExecutionErrored:
-				// TODO - remove exit code condition when we split up "errored" and "failed"
-				batchScriptExecutionIDFilter += ` AND hsr.exit_code > 0`
+				batchScriptExecutionIDFilter += ` AND hsr.exit_code > 0 AND hsr.canceled = 0`
 			case fleet.BatchScriptExecutionIncompatible:
 				batchScriptExecutionIDFilter += ` AND bsehr.error IS NOT NULL`
 			case fleet.BatchScriptExecutionCanceled:


This code has been moved to a separate method on main, but is otherwise the same. It fixes issues around mistakenly counting canceled hosts in other states, and fixes the definition of "pending".

sgress454 · 2025-09-02T19:33:32Z

-	COUNT(*) as num_targeted,
+	COUNT(bsehr.host_id) as num_targeted,
 	COUNT(bsehr.error) as num_did_not_run,
 	COUNT(CASE WHEN hsr.exit_code = 0 THEN 1 END) as num_succeeded,
 	COUNT(CASE WHEN hsr.exit_code > 0 THEN 1 END) as num_failed,
 	COUNT(CASE WHEN hsr.canceled = 1 AND hsr.exit_code IS NULL THEN 1 END) as num_cancelled
 FROM
-	batch_activity_host_results bsehr
+  batch_activities ba
+LEFT JOIN batch_activity_host_results bsehr
+		ON ba.execution_id = bsehr.batch_execution_id
 LEFT JOIN
 	host_script_results hsr
 		ON bsehr.host_execution_id = hsr.execution_id
 WHERE
-	bsehr.batch_execution_id = ?`
+	ba.execution_id = ?`


This logic is no longer executed on main since we don't use this modal anymore, but is the roughly the same as this logic in the newer ListBatchScriptExecutions method. It uses batch_activities as the base table and left joins everything else, since the row representing the batch script run will still remain even if all the hosts in the run (and their related batch_activity_host_results and host_script_results records) are deleted. This lets us accurately report on the status of batches with deleted hosts.

sgress454 · 2025-09-02T19:34:30Z

-    COUNT(*)                                                   AS num_targeted,
+    COUNT(bahr.host_id)                                        AS num_targeted,
    COUNT(bahr.error)                                          AS num_incompatible,
    COUNT(IF(hsr.exit_code = 0, 1, NULL))                      AS num_ran,
    COUNT(IF(hsr.exit_code > 0, 1, NULL))                      AS num_errored,
    COUNT(IF(hsr.canceled = 1 AND hsr.exit_code IS NULL, 1, NULL)) AS num_canceled
-  FROM batch_activity_host_results AS bahr
+  FROM batch_activities AS ba2
+  LEFT JOIN batch_activity_host_results AS bahr
+	  ON ba2.execution_id = bahr.batch_execution_id
  LEFT JOIN host_script_results AS hsr
-         ON bahr.host_execution_id = hsr.execution_id
-  JOIN batch_activities AS ba2
-         ON ba2.execution_id = bahr.batch_execution_id
+	  ON bahr.host_execution_id = hsr.execution_id


Similar to above, this ensures that we accurately mark a batch as completed if it has deleted hosts. This will be applied to main separately.

jacobshandling

FE lgtm

# Details Applying the patches from #32516 (comment) and #32563 onto `main`. This fixes: * Batches where all remaining hosts are deleted will be correctly marked as "finished" * Batches scheduled for the future, and then canceled, will have al hosts marked as "canceled" rather than pending # Checklist for submitter If some of the following don't apply, delete the relevant line. - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] If paths of existing endpoints are modified without backwards compatibility, checked the frontend/CLI for any necessary changes ## Testing - [X] Added/updated automated tests - [X] Where appropriate, [automated tests simulate multiple hosts and test for host isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing) (updates to one hosts's records do not affect another) - [X] QA'd all new/changed functionality manually - [X] Started a new batch script run with one host, deleted that host, and triggered the batch_activity_completion_checker schedule, and verified that the batch was moved to "finished". - [X] Scheduled script - Created a new batch script run scheduled for a future date - Canceled that batch run - Clicked on the batch run in the "finished" tab of the batch scripts list - Verified that the number of canceled hosts = the number of targeted hosts for that batch, and all other numbers were 0. - Verified that clicking the canceled hosts number navigated to the correct list of canceled hosts - [X] "Run now" script - Created a new batch script run with "run now" - Waited for at least one host to run. - Canceled that batch - Clicked on the batch run in the "finished" tab of the batch scripts list - Verified that the number of canceled hosts = the number that were still pending when I canceled the script, and that the # of pending hosts was 0 - Verified that clicking the canceled hosts number navigated to the correct list of canceled hosts - [X] Multiple batches with the same hosts don't bleed into each other - Created another batch script with the same hosts, scheduled for the future - Verified that the "pending" host list is correct

fix issues with batch script summary

c6af1ba

sgress454 requested review from a team as code owners September 2, 2025 19:28

sgress454 temporarily deployed to Docker Hub September 2, 2025 19:28 — with GitHub Actions Inactive

sgress454 commented Sep 2, 2025

View reviewed changes

sgress454 assigned dantecatalfamo and jacobshandling Sep 2, 2025

update test

54a2dad

sgress454 temporarily deployed to Docker Hub September 2, 2025 19:42 — with GitHub Actions Inactive

sgress454 mentioned this pull request Sep 2, 2025

Fix reporting of hosts in batch script runs #32517

Merged

9 tasks

jacobshandling approved these changes Sep 2, 2025

View reviewed changes

dantecatalfamo approved these changes Sep 3, 2025

View reviewed changes

sgress454 merged commit 92fa2c3 into rc-minor-fleet-v4.73.0 Sep 3, 2025
56 of 63 checks passed

sgress454 deleted the sgress454/scheduled-batch-fixes branch September 3, 2025 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix issues with batch script summary#32516

fix issues with batch script summary#32516
sgress454 merged 2 commits intorc-minor-fleet-v4.73.0from
sgress454/scheduled-batch-fixes

sgress454 commented Sep 2, 2025

Uh oh!

codecov Bot commented Sep 2, 2025 •

edited

Loading

Uh oh!

sgress454 Sep 2, 2025

Uh oh!

sgress454 Sep 2, 2025

Uh oh!

sgress454 Sep 2, 2025

Uh oh!

jacobshandling left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sgress454 commented Sep 2, 2025

Details

Checklist for submitter

Testing

Uh oh!

codecov Bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sgress454 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

sgress454 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

sgress454 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

jacobshandling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Sep 2, 2025 •

edited

Loading