Skip to content

use redis to block double profile work for apple devices setting up#42421

Merged
MagnusHJensen merged 12 commits intomainfrom
34433-add-redis-to-block-double-profile-work
Mar 30, 2026
Merged

use redis to block double profile work for apple devices setting up#42421
MagnusHJensen merged 12 commits intomainfrom
34433-add-redis-to-block-double-profile-work

Conversation

@MagnusHJensen
Copy link
Copy Markdown
Member

@MagnusHJensen MagnusHJensen commented Mar 25, 2026

Related issue: Resolves #34433 Part 2

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information. Added by first PR

  • Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements), JS inline code is prevented especially for url redirects, and untrusted data interpolated into shell scripts/commands is validated against shell metacharacters.

  • If paths of existing endpoints are modified without backwards compatibility, checked the frontend/CLI for any necessary changes

Testing

  • Added/updated automated tests
  • QA'd all new/changed functionality manually

Summary by CodeRabbit

  • New Features

    • Profiles now install during device enrollment setup
  • Bug Fixes

    • Enhanced Apple MDM profile synchronization to handle concurrent processing scenarios
    • Improved profile reconciliation to prevent conflicts when multiple workers process the same device simultaneously

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 72.00000% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.70%. Comparing base (8a70c44) to head (b2878fa).
⚠️ Report is 136 commits behind head on main.

Files with missing lines Patch % Lines
server/service/redis_key_value/redis_key_value.go 65.51% 5 Missing and 5 partials ⚠️
server/service/apple_mdm.go 82.92% 4 Missing and 3 partials ⚠️
server/worker/apple_mdm.go 33.33% 1 Missing and 1 partial ⚠️
cmd/fleet/cron.go 0.00% 1 Missing ⚠️
cmd/fleet/serve.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #42421      +/-   ##
==========================================
+ Coverage   66.51%   66.70%   +0.19%     
==========================================
  Files        2528     2532       +4     
  Lines      202790   203448     +658     
  Branches     9025     9025              
==========================================
+ Hits       134878   135714     +836     
+ Misses      55728    55436     -292     
- Partials    12184    12298     +114     
Flag Coverage Δ
backend 68.53% <72.00%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@MagnusHJensen MagnusHJensen marked this pull request as ready for review March 26, 2026 18:03
@MagnusHJensen MagnusHJensen requested a review from a team as a code owner March 26, 2026 18:03
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review.

Tip: disable this comment in your organization's Code Review settings.

@MagnusHJensen
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 26, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 26, 2026

Walkthrough

This PR introduces Redis-backed processing markers for Apple MDM profile installation. A new AdvancedKeyValueStore interface is added to extend key-value store capabilities with batch retrieval (MGet) and deletion operations. Two new constants define a Redis key prefix and TTL (1 minute) for marking hosts currently undergoing profile reconciliation. The ReconcileAppleProfiles service method now accepts this key-value store as a parameter and skips profile reconciliation for any host with an active processing marker. During MDM checkin and token updates, the corresponding Redis keys are set and deleted to coordinate between the post-enrollment worker and the reconciliation scheduler. Dependency wiring is updated across server initialization, scheduled jobs, and service constructors. Test coverage includes mock implementations and integration scenarios validating the skip behavior.

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: using Redis to prevent duplicate profile work for Apple devices during setup.
Description check ✅ Passed The description references the related issue (#34433 Part 2) and includes most required checklist items, though it lacks a changes file entry (noted as added by first PR) and omits the database schema migration and QA sections.
Linked Issues check ✅ Passed The PR comprehensively implements the objectives from issue #34433: using Redis as a lock mechanism to prevent duplicate profile work, calling ReconcileAppleProfiles with redisKeyValue, setting profile processing keys during enrollment, skipping reconciliation for hosts being processed, and adding extensive test coverage including the new TestReconcileAppleProfilesSkipsHostBeingProcessed.
Out of Scope Changes check ✅ Passed All changes are directly related to the PR objectives: adding Redis-backed key-value storage for blocking double profile work, implementing profile processing markers, updating reconciler logic to skip processing hosts under lock, and adding comprehensive tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 34433-add-redis-to-block-double-profile-work

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
server/service/integration_mdm_test.go (1)

12168-12178: Use the host UUID when clearing the processing key here too.

The Redis marker is host-scoped, and the other new tests in this file delete it with host.UUID. This case uses macDevice.UUID, which makes the test depend on createHostThenEnrollMDM returning identical host/device UUIDs. If those ever diverge, this delete becomes a no-op and the test stops exercising the unlocked reconciler path.

♻️ Suggested cleanup
-	_, macDevice := createHostThenEnrollMDM(s.ds, s.server.URL, t)
+	macHost, macDevice := createHostThenEnrollMDM(s.ds, s.server.URL, t)
@@
-	err := kv.Delete(ctx, fleet.MDMProfileProcessingKeyPrefix+":"+macDevice.UUID)
+	err := kv.Delete(ctx, fleet.MDMProfileProcessingKeyPrefix+":"+macHost.UUID)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/service/integration_mdm_test.go` around lines 12168 - 12178, The test
currently discards the host returned by createHostThenEnrollMDM and deletes the
Redis marker using macDevice.UUID, which can be incorrect; change the call to
capture the host (host, macDevice := createHostThenEnrollMDM(...)) and use
host.UUID when building the key passed to kv.Delete with
fleet.MDMProfileProcessingKeyPrefix so the host-scoped processing marker is
cleared reliably.
server/service/apple_mdm_test.go (2)

3915-3920: Snapshot the first upsert payload before asserting on it.

upsertedProfiles = payload keeps the live slice from the mock call. Because this test inspects it after ReconcileAppleProfiles returns, any later payload reuse/mutation inside the reconciler can change what the test observes.

♻️ Suggested change
	ds.BulkUpsertMDMAppleHostProfilesFunc = func(ctx context.Context, payload []*fleet.MDMAppleBulkUpsertHostProfilePayload) error {
		bulkUpsertCallCount++
		if bulkUpsertCallCount == 1 {
-			upsertedProfiles = payload
+			upsertedProfiles = make([]*fleet.MDMAppleBulkUpsertHostProfilePayload, len(payload))
+			for i, p := range payload {
+				cp := *p
+				upsertedProfiles[i] = &cp
+			}
		}
		return nil
	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/service/apple_mdm_test.go` around lines 3915 - 3920, The test
currently assigns the live slice from the mock into upsertedProfiles in
BulkUpsertMDMAppleHostProfilesFunc which allows later mutations in
ReconcileAppleProfiles to change the observed value; change the assignment in
the mock (BulkUpsertMDMAppleHostProfilesFunc) to snapshot/copy the payload
(e.g., create a new slice and copy elements into it or deep-copy each payload)
so upsertedProfiles holds an immutable snapshot for assertions after
ReconcileAppleProfiles returns.

3855-3859: Exercise the remove path in this Redis-lock regression test.

This case only queues installs. ReconcileAppleProfiles now applies the processing-marker check across the reconciliation pass, so a regression that still emits RemoveProfile work for a blocked host would pass here unnoticed. Add at least one removal for blockedHostUUID and assert it stays suppressed until the key is cleared.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/service/apple_mdm_test.go` around lines 3855 - 3859, The test
currently only returns install-only profiles from the mocked
ListMDMAppleProfilesToInstallAndRemoveFunc; modify that mock to include at least
one removal payload for the blockedHostUUID (i.e., return a slice in the
"remove" position that contains a fleet.MDMAppleProfilePayload with HostUUID:
blockedHostUUID and same ProfileUUID/Identifier), then update the test to assert
that ReconcileAppleProfiles (and the worker queue output) does not emit a
RemoveProfile work item for that blockedHostUUID while the processing-marker key
is present and only emits it after the key is cleared; reference the mocked
function ListMDMAppleProfilesToInstallAndRemoveFunc, the ReconcileAppleProfiles
flow, and the RemoveProfile work item and blockedHostUUID when adding the
removal case and assertions.
server/service/integration_mdm_profiles_test.go (1)

37-37: Reuse the suite KV store instead of creating another Redis wrapper.

The rest of this file already goes through s.keyValueStore. Reusing it here keeps the test aligned with the suite’s actual wiring and avoids a second code path to maintain.

♻️ Suggested simplification
-	"github.com/fleetdm/fleet/v4/server/service/redis_key_value"
@@
-	kv := redis_key_value.New(s.redisPool)
@@
-	require.NoError(t, ReconcileAppleProfiles(ctx, s.ds, s.mdmCommander, kv, s.logger, 0))
+	require.NoError(t, ReconcileAppleProfiles(ctx, s.ds, s.mdmCommander, s.keyValueStore, s.logger, 0))
@@
-	require.NoError(t, ReconcileAppleProfiles(ctx, s.ds, s.mdmCommander, kv, s.logger, 0))
+	require.NoError(t, ReconcileAppleProfiles(ctx, s.ds, s.mdmCommander, s.keyValueStore, s.logger, 0))

Also applies to: 5236-5236, 5278-5278, 5358-5358

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/service/integration_mdm_profiles_test.go` at line 37, Replace the
test-local Redis wrapper usage with the suite's existing key-value store: remove
the direct creation of a redis_key_value client (e.g., calls to
redis_key_value.NewRedisKeyValueStore / new Redis wrapper variables) and use
s.keyValueStore everywhere in this test file (and at the other referenced spots)
so the test exercises the suite wiring; ensure any helper calls or assertions
that expected the local wrapper now reference s.keyValueStore methods and adjust
imports to drop redis_key_value if unused.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/service/apple_mdm.go`:
- Around line 5256-5291: The current loop that checks redis for in-progress
setups mutates installTargets and hostProfiles before you finish aggregating
targets, which causes dropping install commands for other hosts that share a
ProfileUUID and fails to remove skipped hosts from removeTargets; instead, first
build a skip set of host UUIDs (using the same redisKeyValue.MGet logic and
fleet.MDMProfileProcessingKeyPrefix) and log/skipped UUIDs, then when you later
populate installTargets and removeTargets (and
hostProfilesToInstallMap/hostProfilesToRemoveMap if present) simply exclude any
host whose HostUUID is in that skip set; ensure you still clear
hp.Status/CommandUUID and add to hostProfilesToInstallMap only when you actually
intend to skip enqueueing, and do not delete from installTargets/removeTargets
during the redis-check step.

In `@server/service/integration_mdm_profiles_test.go`:
- Around line 190-192: The TokenUpdate/post-enrollment path skips synchronous
profile creation because ensureFleetProfiles is only run by the scheduled
ReconcileAppleProfiles; update the TokenUpdate flow (mdmLifecycle.Do →
turnOnApple or right after enqueueing the post-enrollment worker) to call
ensureFleetProfiles (or invoke the same reconciliation handler used by
ReconcileAppleProfiles) synchronously so new enrollments receive profiles
immediately; update/remove the awaitTriggerProfileSchedule() test workaround and
ensure tests assert that ensureFleetProfiles/its handler was invoked during the
TokenUpdate flow.

In `@server/service/integration_mdm_test.go`:
- Around line 398-403: The test's shared callback onAppleMDMWorkerScheduleDone
can be invoked by a prior schedule run and cause the waiter to spuriously
release or drive a WaitGroup negative; replace this fragile callback-based
coordination by using atomic counters (e.g. add appleMDMWorkerStarted and
appleMDMWorkerFinished as atomic.Int64 on integrationMDMTestSuite), remove
onAppleMDMWorkerScheduleDone, and change awaitRunAppleMDMWorkerSchedule to
snapshot the current started/finished counts, call Trigger() (or start the run)
and then wait for the finished counter to increase past the snapshot (or for
started->finished delta) instead of relying on the shared callback or WaitGroup
from ProcessJobs; ensure ProcessJobs still increments the started/finished
atomics so the waiter only observes state changes from the new run and cannot be
released by prior retries.

In `@server/service/redis_key_value/redis_key_value.go`:
- Around line 60-83: In RedisKeyValue.MGet replace redigo.Strings with
redigo.Values to preserve nil bulk replies, early-return an empty map when keys
is empty, call conn.Do("MGET", redisKeys...) and cast the result via
redigo.Values, then iterate the returned []interface{}: if value == nil set
result[key] = nil, else convert the []byte to string (or use redigo.String if
preferred) and take its address into result[key]; keep the existing ctxerr.Wrap
on errors and ensure you still build redisKeys using r.testPrefix + prefix +
key.

---

Nitpick comments:
In `@server/service/apple_mdm_test.go`:
- Around line 3915-3920: The test currently assigns the live slice from the mock
into upsertedProfiles in BulkUpsertMDMAppleHostProfilesFunc which allows later
mutations in ReconcileAppleProfiles to change the observed value; change the
assignment in the mock (BulkUpsertMDMAppleHostProfilesFunc) to snapshot/copy the
payload (e.g., create a new slice and copy elements into it or deep-copy each
payload) so upsertedProfiles holds an immutable snapshot for assertions after
ReconcileAppleProfiles returns.
- Around line 3855-3859: The test currently only returns install-only profiles
from the mocked ListMDMAppleProfilesToInstallAndRemoveFunc; modify that mock to
include at least one removal payload for the blockedHostUUID (i.e., return a
slice in the "remove" position that contains a fleet.MDMAppleProfilePayload with
HostUUID: blockedHostUUID and same ProfileUUID/Identifier), then update the test
to assert that ReconcileAppleProfiles (and the worker queue output) does not
emit a RemoveProfile work item for that blockedHostUUID while the
processing-marker key is present and only emits it after the key is cleared;
reference the mocked function ListMDMAppleProfilesToInstallAndRemoveFunc, the
ReconcileAppleProfiles flow, and the RemoveProfile work item and blockedHostUUID
when adding the removal case and assertions.

In `@server/service/integration_mdm_profiles_test.go`:
- Line 37: Replace the test-local Redis wrapper usage with the suite's existing
key-value store: remove the direct creation of a redis_key_value client (e.g.,
calls to redis_key_value.NewRedisKeyValueStore / new Redis wrapper variables)
and use s.keyValueStore everywhere in this test file (and at the other
referenced spots) so the test exercises the suite wiring; ensure any helper
calls or assertions that expected the local wrapper now reference
s.keyValueStore methods and adjust imports to drop redis_key_value if unused.

In `@server/service/integration_mdm_test.go`:
- Around line 12168-12178: The test currently discards the host returned by
createHostThenEnrollMDM and deletes the Redis marker using macDevice.UUID, which
can be incorrect; change the call to capture the host (host, macDevice :=
createHostThenEnrollMDM(...)) and use host.UUID when building the key passed to
kv.Delete with fleet.MDMProfileProcessingKeyPrefix so the host-scoped processing
marker is cleared reliably.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 660d882f-8002-4d54-8283-200bf3439bcf

📥 Commits

Reviewing files that changed from the base of the PR and between c6538bd and 5820800.

📒 Files selected for processing (13)
  • cmd/fleet/cron.go
  • cmd/fleet/cron_test.go
  • cmd/fleet/serve.go
  • server/fleet/mdm.go
  • server/fleet/service.go
  • server/mock/redis_advanced/advanced_key_value_store.go
  • server/mock/service.go
  • server/service/apple_mdm.go
  • server/service/apple_mdm_test.go
  • server/service/integration_mdm_profiles_test.go
  • server/service/integration_mdm_test.go
  • server/service/redis_key_value/redis_key_value.go
  • server/worker/apple_mdm.go

Comment thread server/service/apple_mdm.go
Comment thread server/service/integration_mdm_profiles_test.go
Comment thread server/service/integration_mdm_test.go
Comment thread server/service/redis_key_value/redis_key_value.go
Comment thread server/service/integration_mdm_lifecycle_test.go Outdated
Copy link
Copy Markdown
Member

@mna mna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a review focused on the Redis usage (not a full review as Sarah already did that). The main thing is that this key pattern will result in always using the same node in cluster mode, instead of balancing the load over all nodes:

key_value_{mdm_profile_processing}:<hostUUID>

Basically, the hashing part of the key is what is inside the {}, and usually we use a dynamic part (e.g. the host ID) so that it can distribute the load in the cluster, but with a static part being used as the hash, this key pattern is basically a Redis hash key (in the sense of the map data structure) with host UUIDs as fields.

I understand why - in ReconcileAppleProfiles you want to batch-load a bunch of those keys so they need to live in the same node. There would be some ways to alleviate this, by using a key pattern that distributes the hosts in, say 5 or 10 different hashes and then when you read them you'd have to have "intelligent batches" instead of just 1000 random hosts at a time, but it's more complexity and it's not clear if it's even needed (maybe if thousands of hosts go through setup exp. at the same time?). It also would not guarantee that the computed slots would actually be on different nodes!

So I think it's fine as-is, with that caveat in mind. If we do find that the single redis node is struggling in load tests, we can do the more complex load distribution.

Comment thread server/fleet/mdm.go Outdated
Co-authored-by: Martin Angers <martin.n.angers@gmail.com>
@MagnusHJensen
Copy link
Copy Markdown
Member Author

I did a review focused on the Redis usage (not a full review as Sarah already did that). The main thing is that this key pattern will result in always using the same node in cluster mode, instead of balancing the load over all nodes:

Thanks @mna definitely the code piece that needs the most review here. I think we are okay with this approach, but to make sure I updated the original ticket to call out load testing 10k Apple MDM hosts, and verify the Redis node load, to ensure it's not too bad.

I did opt for the simple approach and hash it to the same node, for simplicity sake, if we turns out to cause issues then we can look into a more intelligent approach for distributed keys.

@MagnusHJensen MagnusHJensen merged commit 16d62da into main Mar 30, 2026
51 checks passed
@MagnusHJensen MagnusHJensen deleted the 34433-add-redis-to-block-double-profile-work branch March 30, 2026 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speed up macOS profile delivery during automated enrollments

3 participants