Skip to content

Add retries for software installs#39827

Merged
cdcme merged 10 commits intomainfrom
feat-34068-software-install-retries
Feb 23, 2026
Merged

Add retries for software installs#39827
cdcme merged 10 commits intomainfrom
feat-34068-software-install-retries

Conversation

@cdcme
Copy link
Copy Markdown
Member

@cdcme cdcme commented Feb 13, 2026

Fixes #34068

Summary

Adds automatic retries (up to 3 attempts) for failed software installs from host details, self-service, and setup experience across all installer types.

Details

Non-policy software installs previously had no retry mechanism. A single failure required manual re-triggering. This extends the existing policy-automation retry pattern to all install paths.

Key changes:

  • New WithRetries flag on HostSoftwareInstallOptions enables attempt tracking at activation time
  • On failure, the service layer re-queues the install up to MaxSoftwareInstallAttempts (3 times)
  • Manual re-installs reset old attempt counters and cancel pending retries for a fresh sequence
  • Setup experience status updates are deferred during intermediate retry attempts
  • VPP app install retries are broadened from license-not-found errors only (VPP app install retry on 9610 #38008) to any MDM error (verification timeouts still mark the install as failed without retrying)

Expected Behavior

  • Failed installs from host details, self-service, and setup experience retry up to 3 times automatically
  • VPP app installs retry on any MDM command error (up to 3 times); verification timeouts don't trigger retries
  • Manual re-triggers clear old retry state and start fresh
  • Policy automation retries are unchanged
  • Activity feed entries are created for each attempt

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 73.45133% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.32%. Comparing base (6f52646) to head (40bfec5).
⚠️ Report is 281 commits behind head on main.

Files with missing lines Patch % Lines
server/service/orbit.go 66.66% 9 Missing and 5 partials ⚠️
server/datastore/mysql/software_installers.go 79.41% 4 Missing and 3 partials ⚠️
ee/server/service/software_installers.go 16.66% 4 Missing and 1 partial ⚠️
server/service/apple_mdm.go 60.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #39827      +/-   ##
==========================================
+ Coverage   66.26%   66.32%   +0.06%     
==========================================
  Files        2438     2447       +9     
  Lines      195266   196546    +1280     
  Branches     8643     8643              
==========================================
+ Hits       129386   130366     +980     
- Misses      54167    54362     +195     
- Partials    11713    11818     +105     
Flag Coverage Δ
backend 68.12% <73.45%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cdcme cdcme marked this pull request as ready for review February 13, 2026 23:12
@cdcme cdcme requested a review from a team as a code owner February 13, 2026 23:12
@mna
Copy link
Copy Markdown
Member

mna commented Feb 16, 2026

Arrrrrrghhhh I had a somewhat completed review and lost all comments when I switched back to conversation and back again to files changed...

@mna
Copy link
Copy Markdown
Member

mna commented Feb 16, 2026

Oh no wait, I can go back to it, it just started a new review but the pending one is still available... New UX I think?

Copy link
Copy Markdown
Member

@mna mna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a first look without reviewing the tests, left some comments. My main concern is with the unified queue, to ensure we don't introduce scenarios where the queue gets stuck.

I'll take a full look (including tests) on my next review.

Comment thread ee/server/service/software_installers.go Outdated
Comment thread ee/server/service/software_installers.go Outdated
Comment thread server/datastore/mysql/software_installers.go Outdated
Comment thread server/service/apple_mdm_cmd_results.go Outdated
Comment thread server/service/orbit.go
Comment thread server/datastore/mysql/activities.go
Comment thread server/datastore/mysql/software.go Outdated
Comment thread server/datastore/mysql/software_installers.go Outdated
Comment thread server/datastore/mysql/software_installers.go Outdated
@cdcme cdcme marked this pull request as draft February 17, 2026 14:12
@cdcme cdcme marked this pull request as ready for review February 18, 2026 15:08
@cdcme
Copy link
Copy Markdown
Member Author

cdcme commented Feb 18, 2026

@mna I think I've addressed your feedback either with changes or by adding comments. Please lmk if anything else jumps out at you, and ty!

@cdcme cdcme requested a review from mna February 18, 2026 15:08
@mna
Copy link
Copy Markdown
Member

mna commented Feb 18, 2026

@cdcme ty for the ping, I'll take another look this PM!

Copy link
Copy Markdown
Member

@mna mna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a minor thing and the reuse of the cancel activity logic. Let me know if you'd prefer to address this in a subsequent PR, if so I'll approve this one.

Comment thread server/datastore/mysql/software_installers.go Outdated
Comment thread server/service/orbit.go Outdated
Comment thread changes/34068-software-install-retries Outdated

_, err := svc.ds.InsertSoftwareInstallRequest(ctx, host.ID, installer.InstallerID, fleet.HostSoftwareInstallOptions{
SelfService: false,
WithRetries: true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at other usages here, it looks like we manage retries three different ways:

  1. This (manually initiated installs)
  2. Setup experience software retries (already existed prior)
  3. Policy failure software install retries (already existed prior)

The latter two don't use the retry mechanism implemented here because they perform retries at a higher level of abstraction, right?

I'm pretty sure I saw the setup experience retries ticket come across a bit ago but don't recall when policy failure based retries happened. Can you link me the ticket for that one?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Yes, broadly speaking, I think that's right. Are you thinking of PR #38018?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's what I'm thinking of for policies. Setup experience was done earlier.

@iansltx
Copy link
Copy Markdown
Contributor

iansltx commented Feb 20, 2026

Deferring back to @mna for the last review pass since he's starting at 90% context here rather than 20%, so this'll get merged quicker (on Monday) that way.

Copy link
Copy Markdown
Member

@mna mna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cdcme cdcme merged commit e070072 into main Feb 23, 2026
48 checks passed
@cdcme cdcme deleted the feat-34068-software-install-retries branch February 23, 2026 13:48
iansltx pushed a commit that referenced this pull request Feb 24, 2026
Fixes #34068 Adds automatic retries (up to 3 attempts) for failed software installs from host details, self-service, and setup experience across all installer types.
iansltx added a commit that referenced this pull request Feb 24, 2026
Merged into `main` in #39827.

---------

Co-authored-by: Carlo <1778532+cdcme@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Host details and Self-service: Add retries to software installs

3 participants