use redis to block double profile work for apple devices setting up#42421
use redis to block double profile work for apple devices setting up#42421MagnusHJensen merged 12 commits intomainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #42421 +/- ##
==========================================
+ Coverage 66.51% 66.70% +0.19%
==========================================
Files 2528 2532 +4
Lines 202790 203448 +658
Branches 9025 9025
==========================================
+ Hits 134878 135714 +836
+ Misses 55728 55436 -292
- Partials 12184 12298 +114
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review.
Tip: disable this comment in your organization's Code Review settings.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughThis PR introduces Redis-backed processing markers for Apple MDM profile installation. A new Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
server/service/integration_mdm_test.go (1)
12168-12178: Use the host UUID when clearing the processing key here too.The Redis marker is host-scoped, and the other new tests in this file delete it with
host.UUID. This case usesmacDevice.UUID, which makes the test depend oncreateHostThenEnrollMDMreturning identical host/device UUIDs. If those ever diverge, this delete becomes a no-op and the test stops exercising the unlocked reconciler path.♻️ Suggested cleanup
- _, macDevice := createHostThenEnrollMDM(s.ds, s.server.URL, t) + macHost, macDevice := createHostThenEnrollMDM(s.ds, s.server.URL, t) @@ - err := kv.Delete(ctx, fleet.MDMProfileProcessingKeyPrefix+":"+macDevice.UUID) + err := kv.Delete(ctx, fleet.MDMProfileProcessingKeyPrefix+":"+macHost.UUID)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/service/integration_mdm_test.go` around lines 12168 - 12178, The test currently discards the host returned by createHostThenEnrollMDM and deletes the Redis marker using macDevice.UUID, which can be incorrect; change the call to capture the host (host, macDevice := createHostThenEnrollMDM(...)) and use host.UUID when building the key passed to kv.Delete with fleet.MDMProfileProcessingKeyPrefix so the host-scoped processing marker is cleared reliably.server/service/apple_mdm_test.go (2)
3915-3920: Snapshot the first upsert payload before asserting on it.
upsertedProfiles = payloadkeeps the live slice from the mock call. Because this test inspects it afterReconcileAppleProfilesreturns, any later payload reuse/mutation inside the reconciler can change what the test observes.♻️ Suggested change
ds.BulkUpsertMDMAppleHostProfilesFunc = func(ctx context.Context, payload []*fleet.MDMAppleBulkUpsertHostProfilePayload) error { bulkUpsertCallCount++ if bulkUpsertCallCount == 1 { - upsertedProfiles = payload + upsertedProfiles = make([]*fleet.MDMAppleBulkUpsertHostProfilePayload, len(payload)) + for i, p := range payload { + cp := *p + upsertedProfiles[i] = &cp + } } return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/service/apple_mdm_test.go` around lines 3915 - 3920, The test currently assigns the live slice from the mock into upsertedProfiles in BulkUpsertMDMAppleHostProfilesFunc which allows later mutations in ReconcileAppleProfiles to change the observed value; change the assignment in the mock (BulkUpsertMDMAppleHostProfilesFunc) to snapshot/copy the payload (e.g., create a new slice and copy elements into it or deep-copy each payload) so upsertedProfiles holds an immutable snapshot for assertions after ReconcileAppleProfiles returns.
3855-3859: Exercise the remove path in this Redis-lock regression test.This case only queues installs.
ReconcileAppleProfilesnow applies the processing-marker check across the reconciliation pass, so a regression that still emitsRemoveProfilework for a blocked host would pass here unnoticed. Add at least one removal forblockedHostUUIDand assert it stays suppressed until the key is cleared.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/service/apple_mdm_test.go` around lines 3855 - 3859, The test currently only returns install-only profiles from the mocked ListMDMAppleProfilesToInstallAndRemoveFunc; modify that mock to include at least one removal payload for the blockedHostUUID (i.e., return a slice in the "remove" position that contains a fleet.MDMAppleProfilePayload with HostUUID: blockedHostUUID and same ProfileUUID/Identifier), then update the test to assert that ReconcileAppleProfiles (and the worker queue output) does not emit a RemoveProfile work item for that blockedHostUUID while the processing-marker key is present and only emits it after the key is cleared; reference the mocked function ListMDMAppleProfilesToInstallAndRemoveFunc, the ReconcileAppleProfiles flow, and the RemoveProfile work item and blockedHostUUID when adding the removal case and assertions.server/service/integration_mdm_profiles_test.go (1)
37-37: Reuse the suite KV store instead of creating another Redis wrapper.The rest of this file already goes through
s.keyValueStore. Reusing it here keeps the test aligned with the suite’s actual wiring and avoids a second code path to maintain.♻️ Suggested simplification
- "github.com/fleetdm/fleet/v4/server/service/redis_key_value" @@ - kv := redis_key_value.New(s.redisPool) @@ - require.NoError(t, ReconcileAppleProfiles(ctx, s.ds, s.mdmCommander, kv, s.logger, 0)) + require.NoError(t, ReconcileAppleProfiles(ctx, s.ds, s.mdmCommander, s.keyValueStore, s.logger, 0)) @@ - require.NoError(t, ReconcileAppleProfiles(ctx, s.ds, s.mdmCommander, kv, s.logger, 0)) + require.NoError(t, ReconcileAppleProfiles(ctx, s.ds, s.mdmCommander, s.keyValueStore, s.logger, 0))Also applies to: 5236-5236, 5278-5278, 5358-5358
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/service/integration_mdm_profiles_test.go` at line 37, Replace the test-local Redis wrapper usage with the suite's existing key-value store: remove the direct creation of a redis_key_value client (e.g., calls to redis_key_value.NewRedisKeyValueStore / new Redis wrapper variables) and use s.keyValueStore everywhere in this test file (and at the other referenced spots) so the test exercises the suite wiring; ensure any helper calls or assertions that expected the local wrapper now reference s.keyValueStore methods and adjust imports to drop redis_key_value if unused.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/service/apple_mdm.go`:
- Around line 5256-5291: The current loop that checks redis for in-progress
setups mutates installTargets and hostProfiles before you finish aggregating
targets, which causes dropping install commands for other hosts that share a
ProfileUUID and fails to remove skipped hosts from removeTargets; instead, first
build a skip set of host UUIDs (using the same redisKeyValue.MGet logic and
fleet.MDMProfileProcessingKeyPrefix) and log/skipped UUIDs, then when you later
populate installTargets and removeTargets (and
hostProfilesToInstallMap/hostProfilesToRemoveMap if present) simply exclude any
host whose HostUUID is in that skip set; ensure you still clear
hp.Status/CommandUUID and add to hostProfilesToInstallMap only when you actually
intend to skip enqueueing, and do not delete from installTargets/removeTargets
during the redis-check step.
In `@server/service/integration_mdm_profiles_test.go`:
- Around line 190-192: The TokenUpdate/post-enrollment path skips synchronous
profile creation because ensureFleetProfiles is only run by the scheduled
ReconcileAppleProfiles; update the TokenUpdate flow (mdmLifecycle.Do →
turnOnApple or right after enqueueing the post-enrollment worker) to call
ensureFleetProfiles (or invoke the same reconciliation handler used by
ReconcileAppleProfiles) synchronously so new enrollments receive profiles
immediately; update/remove the awaitTriggerProfileSchedule() test workaround and
ensure tests assert that ensureFleetProfiles/its handler was invoked during the
TokenUpdate flow.
In `@server/service/integration_mdm_test.go`:
- Around line 398-403: The test's shared callback onAppleMDMWorkerScheduleDone
can be invoked by a prior schedule run and cause the waiter to spuriously
release or drive a WaitGroup negative; replace this fragile callback-based
coordination by using atomic counters (e.g. add appleMDMWorkerStarted and
appleMDMWorkerFinished as atomic.Int64 on integrationMDMTestSuite), remove
onAppleMDMWorkerScheduleDone, and change awaitRunAppleMDMWorkerSchedule to
snapshot the current started/finished counts, call Trigger() (or start the run)
and then wait for the finished counter to increase past the snapshot (or for
started->finished delta) instead of relying on the shared callback or WaitGroup
from ProcessJobs; ensure ProcessJobs still increments the started/finished
atomics so the waiter only observes state changes from the new run and cannot be
released by prior retries.
In `@server/service/redis_key_value/redis_key_value.go`:
- Around line 60-83: In RedisKeyValue.MGet replace redigo.Strings with
redigo.Values to preserve nil bulk replies, early-return an empty map when keys
is empty, call conn.Do("MGET", redisKeys...) and cast the result via
redigo.Values, then iterate the returned []interface{}: if value == nil set
result[key] = nil, else convert the []byte to string (or use redigo.String if
preferred) and take its address into result[key]; keep the existing ctxerr.Wrap
on errors and ensure you still build redisKeys using r.testPrefix + prefix +
key.
---
Nitpick comments:
In `@server/service/apple_mdm_test.go`:
- Around line 3915-3920: The test currently assigns the live slice from the mock
into upsertedProfiles in BulkUpsertMDMAppleHostProfilesFunc which allows later
mutations in ReconcileAppleProfiles to change the observed value; change the
assignment in the mock (BulkUpsertMDMAppleHostProfilesFunc) to snapshot/copy the
payload (e.g., create a new slice and copy elements into it or deep-copy each
payload) so upsertedProfiles holds an immutable snapshot for assertions after
ReconcileAppleProfiles returns.
- Around line 3855-3859: The test currently only returns install-only profiles
from the mocked ListMDMAppleProfilesToInstallAndRemoveFunc; modify that mock to
include at least one removal payload for the blockedHostUUID (i.e., return a
slice in the "remove" position that contains a fleet.MDMAppleProfilePayload with
HostUUID: blockedHostUUID and same ProfileUUID/Identifier), then update the test
to assert that ReconcileAppleProfiles (and the worker queue output) does not
emit a RemoveProfile work item for that blockedHostUUID while the
processing-marker key is present and only emits it after the key is cleared;
reference the mocked function ListMDMAppleProfilesToInstallAndRemoveFunc, the
ReconcileAppleProfiles flow, and the RemoveProfile work item and blockedHostUUID
when adding the removal case and assertions.
In `@server/service/integration_mdm_profiles_test.go`:
- Line 37: Replace the test-local Redis wrapper usage with the suite's existing
key-value store: remove the direct creation of a redis_key_value client (e.g.,
calls to redis_key_value.NewRedisKeyValueStore / new Redis wrapper variables)
and use s.keyValueStore everywhere in this test file (and at the other
referenced spots) so the test exercises the suite wiring; ensure any helper
calls or assertions that expected the local wrapper now reference
s.keyValueStore methods and adjust imports to drop redis_key_value if unused.
In `@server/service/integration_mdm_test.go`:
- Around line 12168-12178: The test currently discards the host returned by
createHostThenEnrollMDM and deletes the Redis marker using macDevice.UUID, which
can be incorrect; change the call to capture the host (host, macDevice :=
createHostThenEnrollMDM(...)) and use host.UUID when building the key passed to
kv.Delete with fleet.MDMProfileProcessingKeyPrefix so the host-scoped processing
marker is cleared reliably.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 660d882f-8002-4d54-8283-200bf3439bcf
📒 Files selected for processing (13)
cmd/fleet/cron.gocmd/fleet/cron_test.gocmd/fleet/serve.goserver/fleet/mdm.goserver/fleet/service.goserver/mock/redis_advanced/advanced_key_value_store.goserver/mock/service.goserver/service/apple_mdm.goserver/service/apple_mdm_test.goserver/service/integration_mdm_profiles_test.goserver/service/integration_mdm_test.goserver/service/redis_key_value/redis_key_value.goserver/worker/apple_mdm.go
There was a problem hiding this comment.
I did a review focused on the Redis usage (not a full review as Sarah already did that). The main thing is that this key pattern will result in always using the same node in cluster mode, instead of balancing the load over all nodes:
key_value_{mdm_profile_processing}:<hostUUID>
Basically, the hashing part of the key is what is inside the {}, and usually we use a dynamic part (e.g. the host ID) so that it can distribute the load in the cluster, but with a static part being used as the hash, this key pattern is basically a Redis hash key (in the sense of the map data structure) with host UUIDs as fields.
I understand why - in ReconcileAppleProfiles you want to batch-load a bunch of those keys so they need to live in the same node. There would be some ways to alleviate this, by using a key pattern that distributes the hosts in, say 5 or 10 different hashes and then when you read them you'd have to have "intelligent batches" instead of just 1000 random hosts at a time, but it's more complexity and it's not clear if it's even needed (maybe if thousands of hosts go through setup exp. at the same time?). It also would not guarantee that the computed slots would actually be on different nodes!
So I think it's fine as-is, with that caveat in mind. If we do find that the single redis node is struggling in load tests, we can do the more complex load distribution.
Co-authored-by: Martin Angers <martin.n.angers@gmail.com>
Thanks @mna definitely the code piece that needs the most review here. I think we are okay with this approach, but to make sure I updated the original ticket to call out load testing 10k Apple MDM hosts, and verify the Redis node load, to ensure it's not too bad. I did opt for the simple approach and hash it to the same node, for simplicity sake, if we turns out to cause issues then we can look into a more intelligent approach for distributed keys. |
Related issue: Resolves #34433 Part 2
Checklist for submitter
If some of the following don't apply, delete the relevant line.
Changes file added for user-visible changes in
changes/,orbit/changes/oree/fleetd-chrome/changes.See Changes files for more information. Added by first PR
Input data is properly validated,
SELECT *is avoided, SQL injection is prevented (using placeholders for values in statements), JS inline code is prevented especially for url redirects, and untrusted data interpolated into shell scripts/commands is validated against shell metacharacters.If paths of existing endpoints are modified without backwards compatibility, checked the frontend/CLI for any necessary changes
Testing
Summary by CodeRabbit
New Features
Bug Fixes