Skip to content

fix issues with batch script summary#32516

Merged
sgress454 merged 2 commits intorc-minor-fleet-v4.73.0from
sgress454/scheduled-batch-fixes
Sep 3, 2025
Merged

fix issues with batch script summary#32516
sgress454 merged 2 commits intorc-minor-fleet-v4.73.0from
sgress454/scheduled-batch-fixes

Conversation

@sgress454
Copy link
Copy Markdown
Contributor

for #32468
for #32466

Details

This PR fixes the following issues:

  1. When viewing the summary for a batch script scheduled for the future, clicking the "Pending" hosts number now shows the correct hosts targeted by the batch, rather than an empty list.
  2. When viewing the summary for a batch script that has been canceled, the number of canceled hosts will correctly be displayed in the "canceled" row rather than the "pending row".
  3. When viewing the summary for any batch script, the table now splits up Error and Incompatible states and clicking on either goes to the correct list of hosts in that state.
  4. When a batch script run has all of its targeted hosts deleted, it will now be cleaned up and marked as "finished".

The first three issues are already fixed on the main branch, hence the PR directly to the 4.73 release candidate. The last issue will be patched on main in a separate PR (with tests) immediately following this once.

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements)

Testing

  • Added/updated automated tests

  • Where appropriate, automated tests simulate multiple hosts and test for host isolation (updates to one hosts's records do not affect another)

  • QA'd all new/changed functionality manually

    • Started a new batch script to run "now" using osquery-perf, and quit the agent halfway through the run so I could test the summary modal with hosts in different states. Verified that the host list had the correct # of hosts for each state.
    • Canceled that batch and verified that the # of hosts formerly shown as "pending" in the summary modal now appeared as "canceled", and verified that clicking that number showed the correct list of hosts.
    • Started a new batch script to run in the future, and verified that clicking the "pending" hosts number showed the correct hosts in the list
    • Started a new batch script run with one host, deleted that host, and triggered the batch_activity_completion_checker schedule, and verified that the batch was moved to "finished".

For unreleased bug fixes in a release candidate, one of:

  • Confirmed that the fix is not expected to adversely impact load test results

@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (rc-minor-fleet-v4.73.0@937e970). Learn more about missing BASE report.

Additional details and impacted files
@@                    Coverage Diff                    @@
##             rc-minor-fleet-v4.73.0   #32516   +/-   ##
=========================================================
  Coverage                          ?   62.06%           
=========================================================
  Files                             ?     1986           
  Lines                             ?   194301           
  Branches                          ?     6458           
=========================================================
  Hits                              ?   120586           
  Misses                            ?    64134           
  Partials                          ?     9581           
Flag Coverage Δ
backend 63.17% <100.00%> (?)
frontend 51.46% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines 1217 to 1231
case fleet.BatchScriptExecutionRan:
batchScriptExecutionIDFilter += ` AND hsr.exit_code = 0`
batchScriptExecutionIDFilter += ` AND hsr.exit_code = 0 AND hsr.canceled = 0`
case fleet.BatchScriptExecutionPending:
// Pending can mean "waiting for execution" or "waiting for results".
batchScriptExecutionJoin += ` LEFT JOIN upcoming_activities ua ON ua.execution_id = bsehr.host_execution_id`
batchScriptExecutionIDFilter += ` AND ((ua.execution_id IS NOT NULL) OR (hsr.host_id is NOT NULL AND hsr.exit_code IS NULL AND hsr.canceled = 0 AND bsehr.error IS NULL))`
// hsr.exit_code IS NULL <- this means the script has not reported back
// (hsr.canceled IS NULL OR hsr.canceled = 0) <- this can mean the script is running, or that it hasn't been activated yet,
// but either way we haven't canceled it.
// bsehr.error IS NULL <- this means the batch script framework didn't mark this host as incompatible
// with this script run.
batchScriptExecutionIDFilter += ` AND (hsr.exit_code IS NULL AND (hsr.canceled IS NULL OR hsr.canceled = 0) AND bsehr.error IS NULL)`
case fleet.BatchScriptExecutionErrored:
// TODO - remove exit code condition when we split up "errored" and "failed"
batchScriptExecutionIDFilter += ` AND hsr.exit_code > 0`
batchScriptExecutionIDFilter += ` AND hsr.exit_code > 0 AND hsr.canceled = 0`
case fleet.BatchScriptExecutionIncompatible:
batchScriptExecutionIDFilter += ` AND bsehr.error IS NOT NULL`
case fleet.BatchScriptExecutionCanceled:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has been moved to a separate method on main, but is otherwise the same. It fixes issues around mistakenly counting canceled hosts in other states, and fixes the definition of "pending".

Comment on lines -2218 to +2231
COUNT(*) as num_targeted,
COUNT(bsehr.host_id) as num_targeted,
COUNT(bsehr.error) as num_did_not_run,
COUNT(CASE WHEN hsr.exit_code = 0 THEN 1 END) as num_succeeded,
COUNT(CASE WHEN hsr.exit_code > 0 THEN 1 END) as num_failed,
COUNT(CASE WHEN hsr.canceled = 1 AND hsr.exit_code IS NULL THEN 1 END) as num_cancelled
FROM
batch_activity_host_results bsehr
batch_activities ba
LEFT JOIN batch_activity_host_results bsehr
ON ba.execution_id = bsehr.batch_execution_id
LEFT JOIN
host_script_results hsr
ON bsehr.host_execution_id = hsr.execution_id
WHERE
bsehr.batch_execution_id = ?`
ba.execution_id = ?`
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is no longer executed on main since we don't use this modal anymore, but is the roughly the same as this logic in the newer ListBatchScriptExecutions method. It uses batch_activities as the base table and left joins everything else, since the row representing the batch script run will still remain even if all the hosts in the run (and their related batch_activity_host_results and host_script_results records) are deleted. This lets us accurately report on the status of batches with deleted hosts.

Comment on lines -2432 to +2444
COUNT(*) AS num_targeted,
COUNT(bahr.host_id) AS num_targeted,
COUNT(bahr.error) AS num_incompatible,
COUNT(IF(hsr.exit_code = 0, 1, NULL)) AS num_ran,
COUNT(IF(hsr.exit_code > 0, 1, NULL)) AS num_errored,
COUNT(IF(hsr.canceled = 1 AND hsr.exit_code IS NULL, 1, NULL)) AS num_canceled
FROM batch_activity_host_results AS bahr
FROM batch_activities AS ba2
LEFT JOIN batch_activity_host_results AS bahr
ON ba2.execution_id = bahr.batch_execution_id
LEFT JOIN host_script_results AS hsr
ON bahr.host_execution_id = hsr.execution_id
JOIN batch_activities AS ba2
ON ba2.execution_id = bahr.batch_execution_id
ON bahr.host_execution_id = hsr.execution_id
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, this ensures that we accurately mark a batch as completed if it has deleted hosts. This will be applied to main separately.

Copy link
Copy Markdown
Contributor

@jacobshandling jacobshandling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE lgtm

@sgress454 sgress454 merged commit 92fa2c3 into rc-minor-fleet-v4.73.0 Sep 3, 2025
56 of 63 checks passed
@sgress454 sgress454 deleted the sgress454/scheduled-batch-fixes branch September 3, 2025 14:30
sgress454 added a commit that referenced this pull request Sep 4, 2025
# Details

Applying the patches from 
#32516 (comment) and
#32563
onto `main`.

This fixes:
* Batches where all remaining hosts are deleted will be correctly marked
as "finished"
* Batches scheduled for the future, and then canceled, will have al
hosts marked as "canceled" rather than pending

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

- [X] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements)
- [X] If paths of existing endpoints are modified without backwards
compatibility, checked the frontend/CLI for any necessary changes

## Testing

- [X] Added/updated automated tests
- [X] Where appropriate, [automated tests simulate multiple hosts and
test for host
isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing)
(updates to one hosts's records do not affect another)

- [X] QA'd all new/changed functionality manually
- [X] Started a new batch script run with one host, deleted that host,
and triggered the batch_activity_completion_checker schedule, and
verified that the batch was moved to "finished".
  - [X] Scheduled script
    - Created a new batch script run scheduled for a future date
    - Canceled that batch run
- Clicked on the batch run in the "finished" tab of the batch scripts
list
- Verified that the number of canceled hosts = the number of targeted
hosts for that batch, and all other numbers were 0.
- Verified that clicking the canceled hosts number navigated to the
correct list of canceled hosts
  - [X] "Run now" script
    - Created a new batch script run with "run now"
    - Waited for at least one host to run.
    - Canceled that batch
- Clicked on the batch run in the "finished" tab of the batch scripts
list
- Verified that the number of canceled hosts = the number that were
still pending when I canceled the script, and that the # of pending
hosts was 0
- Verified that clicking the canceled hosts number navigated to the
correct list of canceled hosts
  - [X] Multiple batches with the same hosts don't bleed into each other
- Created another batch script with the same hosts, scheduled for the
future
    - Verified that the "pending" host list is correct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants