Skip to content

Detect Ubuntu 24.04+ and disable apt-daily auto-updates at VC startup#698

Open
AlexWFMS wants to merge 2 commits intomainfrom
users/alexwill/apt-daily-mask-followup
Open

Detect Ubuntu 24.04+ and disable apt-daily auto-updates at VC startup#698
AlexWFMS wants to merge 2 commits intomainfrom
users/alexwill/apt-daily-mask-followup

Conversation

@AlexWFMS
Copy link
Copy Markdown
Contributor

@AlexWFMS AlexWFMS commented Apr 24, 2026

Summary

Defends Virtual Client against mid-run SIGKILL caused by apt-daily-upgrade.timer + needrestart on Ubuntu 24.04+. The mitigation is applied at VC startup for every profile automatically, superseding the per-profile MaskAptDailyTimers step added in #694.

Background

On Ubuntu 24.04+, needrestart (installed by default) is configured to auto-restart services whose shared libraries are updated by unattended upgrades. Combined with apt-daily-upgrade.timer firing between 06:00-07:00 UTC (RandomizedDelaySec=60min), any long-running Virtual Client process gets SIGKILL'd mid-run.

CRC telemetry showed this affecting ~29% of Ubuntu 24.04 VMs, with the restart distribution concentrated at hour 6 UTC and uniform across the 0-59 minute window — the exact apt timer signature. The issue is profile-agnostic (observed on SPEC CPU AND FIO workloads) which is why a per-profile mitigation is insufficient.

Distro matrix (from documented defaults):

Distro needrestart default Auto-restart on unattended-upgrade apt-daily-upgrade.timer
Ubuntu 24.04+ yes yes (default — the regression) enabled
Ubuntu 22.04 / 22.10 / 23.04 yes (server) no (list-only) enabled
Debian 11 / 12 no n/a enabled

Only Ubuntu 24.04+ is strictly affected, so the gate is scoped to that.

What changed

ExecuteProfileCommand.cs

  • New DisableLinuxAutoUpdatesAsync hook called once per VC invocation, right after Platform.Initialize.
  • Skipped on Windows and on any non-Ubuntu Linux.
  • On Ubuntu, parses the major version from PRETTY_NAME (via the new TryGetUbuntuMajorVersion helper) and only runs when >= 24.
  • Masks and stops the four apt-daily units with a bash -c "..." one-liner. Uses double-quote grouping.NET Process tokenization follows CommandLineToArgvW rules where single quotes pass through literally (this is the same bug the CRC team reported against Mask apt-daily timers in SPEC CPU profiles to prevent VC restarts on Ubuntu 24.04 #694 — the quoting is now fixed structurally and covered by a code-level helper).
  • Best-effort: every exception is caught and logged. VC startup is never blocked by this mitigation.
  • Emits telemetry: DisabledLinuxAutoUpdates (success) / DisableLinuxAutoUpdatesFailed (warning).

Profile cleanup
Removed the now-redundant per-profile MaskAptDailyTimers step from:

  • PERF-SPECCPU-FPRATE.json
  • PERF-SPECCPU-FPSPEED.json
  • PERF-SPECCPU-INTRATE.json
  • PERF-SPECCPU-INTSPEED.json
  • PERF-IO-FIO.json
  • PERF-IO-FIO-OLTP.json

Version
Bumped VERSION to 3.1.3.

Testing

  • Added 9 parameterized TryGetUbuntuMajorVersion cases covering Ubuntu 24.04.1 LTS, 22.04.3 LTS, 25.10, lowercase variants, Debian names, codename-only Ubuntu (development), empty, and null.
  • dotnet test VirtualClient.UnitTests --filter ExecuteProfileCommandTests: 24/24 pass.
  • Full build: dotnet build VirtualClient.Main -c Release succeeds with 0 warnings / 0 errors.

Risk

Low. The mitigation is gated on Ubuntu ≥ 24.04 and wrapped in a single try/catch that swallows everything. Worst case on an unexpected system: a single warning-level telemetry event, then normal startup proceeds.

Follow-ups (not in this PR)

  • If telemetry shows the helper firing cleanly across the fleet, we could also extend coverage to Ubuntu 22.04 as a defense-in-depth measure, but current field data does not justify it.
  • Could be surfaced as an --apt-daily-mask=false opt-out flag if any user objects; deliberately omitted for now since the default-on is the desired behavior.

Alex Williams-Ferreira and others added 2 commits April 24, 2026 11:54
Follow-up to #694.

Two issues reported by the CRC QoS team after the initial change merged:

1. Quoting bug: the Command parameter used single quotes to group the bash
   subcommand:
       bash -c 'systemctl mask ...'
   VC's ExecuteCommand splits the first whitespace-delimited token (bash)
   as the executable and passes the remainder to .NET Process.Start as the
   Arguments string. .NET's argument tokenizer (on both Linux and Windows)
   treats double quotes as the grouping character and does NOT strip single
   quotes. As a result bash received argv[2] = 'systemctl literally, which
   caused it to fail with a command not found style error. Replaced with
   escaped double quotes:
       bash -c "systemctl mask ..."

   The CRC team validated this form locally and has a Juno experiment running
   against a profile with this corrected syntax.

2. Scope: the mid-run VC restart on Ubuntu 24.04 is not SPEC-specific. The
   team explicitly reported reproducing it with FIO in experiment
   SYSAUTO-CHPI_MWH25PrdApp17_Fio_PV2_all_sizes_ga_version_Standard_M176bs_v3-
   20260421135821. Added the same MaskAptDailyTimers dependency step to
   PERF-IO-FIO.json and PERF-IO-FIO-OLTP.json.

Broader coverage across all Linux perf profiles (or a code-level startup hook
in ExecuteProfileCommand so no profile opt-in is needed) can follow after
further review discussion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
-------
On Ubuntu 24.04+, the default installation of `needrestart` combined with the
`apt-daily-upgrade.timer` (fires daily 06:00-07:00 UTC with
RandomizedDelaySec=60min) automatically restarts any service whose shared
libraries are updated by unattended upgrades. For long-running Virtual
Client workloads this manifests as VC being SIGKILL'd mid-run - observed
at ~29% of VMs in CRC SYSAUTO experiments, concentrated at hour 6 UTC
(uniform 0-59 minute distribution, matching the apt timer signature).

Web / distro research confirmed this behavior is specific to Ubuntu 24.04+:
- Ubuntu 24.04+: needrestart installed AND auto-restart-on-unattended-upgrade
  is the default (this is the regression).
- Ubuntu 22.04/22.10/23.04: needrestart installed but list-only in
  non-interactive mode; not known to cause the issue in the field.
- Debian 11/12: needrestart not installed by default.

Fix
---
Add a best-effort startup hook in ExecuteProfileCommand that:
- runs exactly once per VC invocation, immediately after Platform.Initialize
- is a no-op on non-Unix platforms
- parses the Ubuntu major version out of PRETTY_NAME and only runs on >=24
- masks + stops the four apt-daily units via `bash -c "..."` (double-quoted
  because .NET Process argument tokenization follows Windows
  CommandLineToArgvW rules - single quotes do not group)
- swallows any exception so VC startup is never blocked by this mitigation
- logs telemetry (DisabledLinuxAutoUpdates / DisableLinuxAutoUpdatesFailed)

Because the mitigation now runs unconditionally for every profile on the
affected OS, the per-profile MaskAptDailyTimers step added to the SPEC CPU
and FIO profiles in PR #694 is redundant and has been removed.

Bumped VERSION to 3.1.3.

Tests
-----
Added parameterized coverage for TryGetUbuntuMajorVersion (9 cases, all
passing). ExecuteProfileCommandTests: 24/24 pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant