Skip to content

Allow disabling chart datasets: backend#44769

Merged
sgress454 merged 23 commits intomainfrom
sgress454/44077-chart-disabling-backend
May 7, 2026
Merged

Allow disabling chart datasets: backend#44769
sgress454 merged 23 commits intomainfrom
sgress454/44077-chart-disabling-backend

Conversation

@sgress454
Copy link
Copy Markdown
Contributor

@sgress454 sgress454 commented May 5, 2026

Related issue: For #44077

Details

This PR implements enforcement of the "disable dataset" feature.

When a dataset is disabled globally, we:

  • Stop collecting all data for that dataset (the Collect method for that dataset is not called in the cron job)
  • Remove all previously-collected data for the dataset via an asynchronous job

When a dataset is disabled for one or more fleets, we:

  • Provide the list of disabled fleets as an argument to each dataset's Collect method. Each dataset is responsible for filtering out hosts in the most efficient way possible
  • Scrub the data for the relevant datasets using a bitmask, so that all hosts from the disabled fleets are removed from the data. This is done via an asynchronous job.

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.
    n/a, unreleased

  • Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements), JS inline code is prevented especially for url redirects, and untrusted data interpolated into shell scripts/commands is validated against shell metacharacters.

  • Timeouts are implemented and retries are limited to avoid infinite loops

Testing

  • Added/updated automated tests

  • Where appropriate, automated tests simulate multiple hosts and test for host isolation (updates to one hosts's records do not affect another)

  • QA'd all new/changed functionality manually

    Prerequisites / Test Setup

    • Fleet running with at least 3 teams (call them T1, T2, T3) and ≥3 hosts in each, plus ≥2 hosts with no team
    • At least one host on each team has reported recent uptime (within the bucket window)
    • At least one host in each team is affected by a tracked CVE (so host_scd_data for dataset='cve' will have non-empty bitmaps)
    • AppConfig: both features.historical_data.uptime and features.historical_data.vulnerabilities start as true; same for every team
    • Let the collection cron run at least one full tick to populate baseline rows in host_scd_data for both uptime and cve
    • Note the current row count per dataset: SELECT dataset, COUNT(*) FROM host_scd_data GROUP BY dataset;

    1. Cron Skips Globally-Disabled Datasets

    1.1 Global disable of uptime

    • Disable globally: PATCH /api/v1/fleet/config with features.historical_data.uptime = false
    • Verify activity feed shows disabled_historical_dataset for uptime (existing behavior)
    • Wait for next collection tick (or trigger it via fleetctl debug if available)
    • Confirm no new rows appear for dataset='uptime':
      SELECT MAX(valid_from) FROM host_scd_data WHERE dataset='uptime';
      should not advance after the disable
    • Confirm cron still writes cve rows on the same tick (per-dataset isolation)
    • Re-enable: PATCH historical_data.uptime = true
    • Verify next tick resumes writing uptime rows

    1.2 Global disable of vulnerabilities

    • Repeat 1.1 with features.historical_data.vulnerabilities
    • Confirm cve writes stop, uptime continues

    1.3 Both disabled globally

    • Disable both globally
    • Confirm cron tick produces zero new rows for either dataset
    • Confirm cron does not error or get stuck
    • Re-enable both

    2. Per-Fleet Disable — Cron Filters at SQL

    2.1 Single team disabled for one dataset

    • Disable uptime for T1 only: PATCH team T1 with
      features.historical_data.uptime = false
    • Verify scoped disabled_historical_dataset activity emitted for T1
    • Wait for next cron tick / trigger cron
    • Pick a host known to be in T1 (call it H_T1); confirm its bit is NOT set in any uptime row written after the disable by filtering the chart to that host
    • Pick a host in T2 (H_T2); confirm its bit IS still set in the same rows (T2 is not disabled)
    • Pick a no-team host (H_none); confirm its bit IS still set (no-team hosts follow the global value)

    2.2 Same fleet, different dataset

    • With T1's uptime disabled, confirm T1's hosts ARE still written into cve rows on subsequent ticks (per-dataset isolation)

    2.3 All teams disabled, global on, no-team hosts

    • Disable uptime on every team (T1, T2, T3)
    • Confirm next tick still writes a row containing only no-team hosts' bits (global is on, no-team hosts always count)
    • Re-enable uptime on all teams

    3. Global Scrub — DELETE

    3.1 Successful global scrub

    • Note baseline:
      SELECT COUNT(*) FROM host_scd_data WHERE dataset='uptime';
      (should be > 5000 to exercise the loop; if not, manually
      insert filler rows or run multiple cron ticks)
    • Disable uptime globally via the API
    • Wait for the worker to pick up the scrub / trigger the job
    • Confirm the count drops to 0:
      SELECT COUNT(*) FROM host_scd_data WHERE dataset='uptime';
    • Confirm rows for other datasets are untouched
    • Test again but disable via GitOps

    4. Per-Fleet Scrub — ANDNOT

    4.1 Single-fleet scrub clears bits

    • Identify hosts in T1 and record their IDs (call this set S)
    • Pre-disable, confirm at least one host_scd_data row for dataset='uptime' has bits set at positions in S by filtering the chart to those hosts
    • Disable uptime on T1 only, via the API
    • Wait for the scrub to run / trigger it
    • Confirm: every existing row for dataset='uptime' now has NO bits set at any position in S. Spot-check by filtering the chart to those hosts
    • Confirm rows for dataset='cve' (different dataset) are untouched
    • Confirm bits for hosts in T2/T3 (not disabled) are still set
    • Run test again but disable via GitOps

    4.2 Multi-fleet scrub via GitOps batch

    • Apply a GitOps spec that flips cve to false on T1 and T3 in a single apply
    • Wait for scrub(s) to complete
    • Confirm bits for the union of T1∪T2 hosts are cleared from every row of dataset='cve'
    • Confirm T2 hosts' bits remain set

    5. Activity Feed Cross-Check

    • Each global flip emits exactly one disabled_historical_dataset
      activity (existing behavior, unchanged)
    • Each per-team flip emits one scoped activity with the team's
      ID and name
    • PATCH submitting unchanged values emits no activity and
      causes no scrub (no host_scd_data data change observed
      after the cron tick)
    • No new "scrub completed" or "scrub started" activity is
      emitted (out of scope for v1)
    • Re-enable flips emit enabled_historical_dataset activities
      and do NOT emit any scrub-related activity

    6. Regression Spot Checks

    • With everything enabled (default), the chart UI renders the
      same data as before this change (no behavior change in the
      "all on" case)
    • AppConfig YAML round-trip (fleetctl apply) is benign:
      applying the unchanged config produces no scrub jobs and no
      activities
    • GitOps apply with historical_data omitted from team specs
      defaults to true (per the gitops-api change) and does not
      trigger spurious scrubs
    • After a full disable+scrub of cve, the host_scd_data table
      has no dataset='cve' rows; the chart UI for "vulnerable
      hosts over time" shows an empty/zero state without errors

Summary by CodeRabbit

  • New Features
    • Chart collection now supports per-dataset scoping and honors team-level disables; new scrub jobs are registered and worker handlers added.
    • New dataset scrub operations: global and fleet-scoped scrubs; scrubs can be enqueued and are deduplicated to avoid duplicate pending jobs. Historical-data changes enqueue scrubs after save (errors logged, non-blocking).
  • Tests
    • Added unit tests for scope resolution, scrub enqueue/dedup behavior, scrub workers, scrub application, and low-level blob scrub logic.
  • Documentation
    • Added OpenSpec metadata for the chart scrub change.

sgress454 added 6 commits May 5, 2026 11:19
Adopt the same pattern BlobAND uses: reslice into bounded variables,
modernize the loop with `for i := range n`, and add the gosec nolint
that the linter requires even with the slicing in place.
@sgress454
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds per-dataset scoping to chart data collection and integrates scrub operations. The cron now builds a CollectScopeFn from AppConfig and team HistoricalData flags and calls chartSvc.CollectDatasets with that resolver. Dataset and datastore interfaces were extended to accept disabledFleetIDs. New datastore methods support batched deletion and masked scrubbing. Worker jobs ChartScrubGlobal and ChartScrubFleet, job-enqueue/dedup checks, and enqueueing from app/team historical_data flips were added; scrub invocations are logged and non-fatal on error.

Possibly related PRs

  • fleetdm/fleet#44488: Integrates historical_data feature and activity/enqueue mechanisms that this change uses to enqueue chart scrub jobs.
  • fleetdm/fleet#43910: Introduces the chart bounded-context and CollectDatasets surface that this change extends with per-dataset/fleet scoping and scrub APIs.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Allow disabling chart datasets: backend' accurately and specifically describes the main change in the PR, which implements enforcement for disabling historical chart datasets on the backend.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering implementation details, testing procedures, and validation of security measures.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sgress454/44077-chart-disabling-backend

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
server/fleet/historical_data_test.go (1)

84-126: ⚡ Quick win

Add a fleet test for the vulnerabilities disable transition.

Current fleet tests only assert the uptime path. Please also cover Vulnerabilities: true -> false so the per-fleet enqueue contract is verified for both supported datasets in this mode.

Proposed test addition
 func TestEnqueueHistoricalDataScrubs_Fleet(t *testing.T) {
+	t.Run("flips true→false on vulnerabilities: enqueues fleet scrub", func(t *testing.T) {
+		enq := &fakeJobEnqueuer{}
+		teamID := uint(7)
+		err := EnqueueHistoricalDataScrubs(t.Context(), enq,
+			HistoricalDataSettings{Uptime: true, Vulnerabilities: true},
+			HistoricalDataSettings{Uptime: true, Vulnerabilities: false},
+			&teamID,
+		)
+		require.NoError(t, err)
+		require.Len(t, enq.jobs, 1)
+		assert.Equal(t, "chart_scrub_dataset_fleet", enq.jobs[0].Name)
+		var args chartScrubFleetArgs
+		require.NoError(t, json.Unmarshal(*enq.jobs[0].Args, &args))
+		assert.Equal(t, "vulnerabilities", args.Dataset)
+		assert.Equal(t, []uint{7}, args.FleetIDs)
+	})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/fleet/historical_data_test.go` around lines 84 - 126, Add a new
subtest inside TestEnqueueHistoricalDataScrubs_Fleet that mirrors the existing
uptime "flips true→false" case but flips Vulnerabilities from true→false: create
a fakeJobEnqueuer, set teamID := uint(7), call EnqueueHistoricalDataScrubs with
HistoricalDataSettings{Vulnerabilities:true,...} → {Vulnerabilities:false,...}
and require one job enqueued; unmarshal into chartScrubFleetArgs and assert
args.Dataset is "vulnerabilities" and args.FleetIDs equals []uint{7}. Use the
same helper types (fakeJobEnqueuer, chartScrubFleetArgs) and test patterns as
the uptime case so the per-fleet enqueue contract is covered for
vulnerabilities.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/fleet/cron.go`:
- Around line 1576-1588: When ds.AppConfig(ctx) or ds.ListTeams(ctx, ...) fail,
do not fall back to unscoped collection by calling chartSvc.CollectDatasets(ctx,
time.Now(), nil); instead propagate the error (or skip the tick) so we avoid
unscoped dataset collection. Change the two error branches that currently call
chartSvc.CollectDatasets(...) to log the error as they do and then return the
error (e.g., return nil, err or return err depending on the surrounding function
signature) rather than calling chartSvc.CollectDatasets; update callers if
necessary to handle the propagated error. Ensure you modify the branches that
reference ds.AppConfig, ds.ListTeams and chartSvc.CollectDatasets accordingly.

In `@ee/server/service/teams.go`:
- Around line 425-436: Currently the code calls
fleet.EnqueueHistoricalDataScrubs after SaveTeam and returns an error on
failure, causing a partial-success; instead persist the "scrub intent"
atomically with the team save or fall back to durable write-on-failure so the
operation never surface-fails after commit. Modify the flow around SaveTeam and
EnqueueHistoricalDataScrubs: either (A) add/use a datastore method (e.g.,
SaveTeamWithScrubIntent or PersistHistoricalScrubIntent) that writes the updated
team and a scrub-intent/outbox row in one DB transaction so the worker can pick
it up, or (B) if enqueue fails, do not return an error — write a durable intent
record via svc.ds (e.g., a new CreateHistoricalScrubIntent or
UpsertTeamScrubIntent using team.ID and team.Config.Features.HistoricalData) and
log the enqueue failure, then continue executing the remaining post-save side
effects so state is consistent and retryable by a background worker.

In `@server/chart/internal/mysql/data.go`:
- Around line 506-512: The SELECT used for paging scrub rows is using
ds.reader(ctx) (reads from replica) which can miss recent primary rows; change
the DB handle to the writer connection (use ds.writer(ctx) in place of
ds.reader(ctx)) in the SelectContext call that fetches id, host_bitmap for the
scrub loop so the query reads from primary and avoids replica lag leaving behind
disabled historical data.
- Around line 455-456: The scrub path is reading fleet membership from a replica
via sqlx.SelectContext using ds.reader(ctx), which can return stale data; change
the DB handle to the primary by calling ds.writer(ctx) in the SelectContext call
(replace ds.reader(ctx) with ds.writer(ctx) in the sqlx.SelectContext invocation
that selects host IDs in fleets) so the scrub job reads current team membership.

In `@server/fleet/historical_data.go`:
- Around line 121-145: The job payload currently uses the config key strings
like "vulnerabilities" (from the `changes` loop over `dataset`), but the scrub
worker expects the internal chart dataset ID ("cve"); update the code that
builds `argsJSON` in the `for _, c := range changes` loop (where
`chartScrubDatasetGlobalJobName`/`chartScrubDatasetFleetJobName` and
`chartScrubGlobalArgs`/`chartScrubFleetArgs` are used) to translate config keys
to the internal dataset ID (e.g., map "vulnerabilities" -> "cve") and pass that
internal ID into the `Dataset` field (or use the existing chart dataset constant
if available) so scrub jobs target the correct dataset.

In `@server/service/appconfig.go`:
- Around line 958-970: After SaveAppConfig commits, do not abort the request on
EnqueueHistoricalDataScrubs failure; change the behavior in the caller around
SaveAppConfig/EnqueueHistoricalDataScrubs so that errors from
fleet.EnqueueHistoricalDataScrubs(ctx, svc.ds,
oldAppConfig.Features.HistoricalData, appConfig.Features.HistoricalData, nil)
are not returned as a hard error—either persist scrub intents together with the
config (transactional outbox) or catch the error, log it with context (including
oldAppConfig and appConfig IDs) and enqueue a durable retry (e.g., write a
persistent "pending scrub" record via svc.ds or schedule a background retry
task) so downstream post-save work still runs and the enqueue is retried
server-side until successful.

In `@server/worker/chart_scrub.go`:
- Around line 81-87: The handler currently treats an empty args.FleetIDs as
success by returning nil; change this to return a non-nil error so malformed
scrub jobs fail and are observable. In the len(args.FleetIDs) == 0 branch (the
check for empty fleet IDs), replace the silent return nil with returning an
error (e.g., errors.New or fmt.Errorf) that includes the dataset and that
fleet_ids was empty, and keep the existing c.Log.WarnContext call (or change to
c.Log.ErrorContext) so the error is both logged and returned from the function.

---

Nitpick comments:
In `@server/fleet/historical_data_test.go`:
- Around line 84-126: Add a new subtest inside
TestEnqueueHistoricalDataScrubs_Fleet that mirrors the existing uptime "flips
true→false" case but flips Vulnerabilities from true→false: create a
fakeJobEnqueuer, set teamID := uint(7), call EnqueueHistoricalDataScrubs with
HistoricalDataSettings{Vulnerabilities:true,...} → {Vulnerabilities:false,...}
and require one job enqueued; unmarshal into chartScrubFleetArgs and assert
args.Dataset is "vulnerabilities" and args.FleetIDs equals []uint{7}. Use the
same helper types (fakeJobEnqueuer, chartScrubFleetArgs) and test patterns as
the uptime case so the per-fleet enqueue contract is covered for
vulnerabilities.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 75fb660b-ee25-4940-bc54-cef984039d7e

📥 Commits

Reviewing files that changed from the base of the PR and between e102904 and ffa5c76.

⛔ Files ignored due to path filters (4)
  • openspec/changes/chart-disabling-collection-scrub/design.md is excluded by !**/*.md
  • openspec/changes/chart-disabling-collection-scrub/proposal.md is excluded by !**/*.md
  • openspec/changes/chart-disabling-collection-scrub/specs/chart-historical-data-collection/spec.md is excluded by !**/*.md
  • openspec/changes/chart-disabling-collection-scrub/tasks.md is excluded by !**/*.md
📒 Files selected for processing (22)
  • cmd/fleet/cron.go
  • cmd/fleet/cron_test.go
  • cmd/fleet/serve.go
  • ee/server/service/teams.go
  • openspec/changes/chart-disabling-collection-scrub/.openspec.yaml
  • server/chart/api/chart.go
  • server/chart/api/service.go
  • server/chart/blob.go
  • server/chart/blob_test.go
  • server/chart/datasets.go
  • server/chart/internal/mysql/charts.go
  • server/chart/internal/mysql/data.go
  • server/chart/internal/mysql/data_test.go
  • server/chart/internal/service/service.go
  • server/chart/internal/service/service_test.go
  • server/chart/internal/testutils/testutils.go
  • server/chart/internal/types/chart.go
  • server/fleet/historical_data.go
  • server/fleet/historical_data_test.go
  • server/service/appconfig.go
  • server/worker/chart_scrub.go
  • server/worker/chart_scrub_test.go

Comment thread cmd/fleet/cron.go
Comment thread ee/server/service/teams.go Outdated
Comment thread server/chart/internal/mysql/data.go
Comment thread server/chart/internal/mysql/data.go
Comment thread server/fleet/historical_data.go Outdated
Comment thread server/service/appconfig.go Outdated
Comment thread server/worker/chart_scrub.go
Comment thread cmd/fleet/cron.go
Comment on lines +1089 to +1097
chartScrubGlobal := &worker.ChartScrubGlobal{
ChartService: chartSvc,
Log: logger,
}
chartScrubFleet := &worker.ChartScrubFleet{
ChartService: chartSvc,
Log: logger,
}
w.Register(jira, zendesk, macosSetupAsst, dbMigrate, vppVerify, softwareWorker, chartScrubGlobal, chartScrubFleet)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding these to the worker schedule since they use the jobs table and run only when needed.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 65.24390% with 114 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.72%. Comparing base (f2b2e23) to head (91107ef).
⚠️ Report is 63 commits behind head on main.

Files with missing lines Patch % Lines
server/chart/internal/mysql/data.go 68.42% 28 Missing and 2 partials ⚠️
server/chart/internal/mysql/charts.go 21.62% 27 Missing and 2 partials ⚠️
cmd/fleet/cron.go 38.23% 20 Missing and 1 partial ⚠️
server/worker/chart_scrub.go 56.25% 8 Missing and 6 partials ⚠️
server/chart/internal/service/service.go 70.83% 3 Missing and 4 partials ⚠️
ee/server/service/teams.go 25.00% 6 Missing ⚠️
server/service/appconfig.go 25.00% 3 Missing ⚠️
server/fleet/historical_data.go 95.12% 1 Missing and 1 partial ⚠️
cmd/fleet/serve.go 0.00% 1 Missing ⚠️
server/datastore/mysql/jobs.go 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #44769      +/-   ##
==========================================
+ Coverage   66.68%   66.72%   +0.03%     
==========================================
  Files        2664     2668       +4     
  Lines      214605   215921    +1316     
  Branches     9876     9876              
==========================================
+ Hits       143106   144069     +963     
- Misses      58478    58735     +257     
- Partials    13021    13117      +96     
Flag Coverage Δ
backend 68.59% <65.24%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread ee/server/service/teams.go Outdated
@sgress454
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
server/chart/internal/mysql/data.go (1)

421-440: 💤 Low value

Mirror CleanupSCDData's loop pattern: explicit ctx check + earlier termination.

Two small consistency gaps with the sibling CleanupSCDData (lines 390–416) on the same table:

  1. No ctx.Err() check at the top of the loop. ExecContext will surface cancellation eventually, but the explicit check makes cancellation responsive between batches and matches the pattern in the same file.
  2. Termination on n == 0 costs an extra empty DELETE round-trip whenever the final batch happened to be exactly batchSize. CleanupSCDData uses n < int64(scdCleanupBatch) for the same loop shape.
♻️ Proposed alignment with CleanupSCDData
 func (ds *Datastore) DeleteAllForDataset(ctx context.Context, dataset string, batchSize int) error {
 	if batchSize <= 0 {
 		batchSize = 5000
 	}
 	for {
+		if err := ctx.Err(); err != nil {
+			return ctxerr.Wrap(ctx, err, "delete SCD rows for dataset")
+		}
 		res, err := ds.writer(ctx).ExecContext(ctx,
 			`DELETE FROM host_scd_data WHERE dataset = ? LIMIT ?`,
 			dataset, batchSize)
 		if err != nil {
 			return ctxerr.Wrap(ctx, err, "delete SCD rows for dataset")
 		}
 		n, err := res.RowsAffected()
 		if err != nil {
 			return ctxerr.Wrap(ctx, err, "rows affected for dataset delete")
 		}
-		if n == 0 {
+		if n < int64(batchSize) {
 			return nil
 		}
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/chart/internal/mysql/data.go` around lines 421 - 440,
DeleteAllForDataset's loop should mirror CleanupSCDData: add an explicit ctx
cancellation check at the top of the for loop (if ctx.Err() != nil { return
ctx.Err() }) so cancellation is responsive between batches, and change the loop
termination condition from if n == 0 to if n < int64(batchSize) (compare the
int64 result from res.RowsAffected() against the batchSize) to avoid an extra
empty DELETE when the last batch exactly equals batchSize; keep existing error
wrapping logic around ExecContext and RowsAffected.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/chart/internal/mysql/data.go`:
- Around line 421-440: DeleteAllForDataset's loop should mirror CleanupSCDData:
add an explicit ctx cancellation check at the top of the for loop (if ctx.Err()
!= nil { return ctx.Err() }) so cancellation is responsive between batches, and
change the loop termination condition from if n == 0 to if n < int64(batchSize)
(compare the int64 result from res.RowsAffected() against the batchSize) to
avoid an extra empty DELETE when the last batch exactly equals batchSize; keep
existing error wrapping logic around ExecContext and RowsAffected.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3bdaed7b-bf0e-4fd0-81b3-9b8b5451d9f5

📥 Commits

Reviewing files that changed from the base of the PR and between ffa5c76 and 2d4546e.

⛔ Files ignored due to path filters (3)
  • openspec/changes/chart-disabling-collection-scrub/design.md is excluded by !**/*.md
  • openspec/changes/chart-disabling-collection-scrub/specs/chart-historical-data-collection/spec.md is excluded by !**/*.md
  • openspec/changes/chart-disabling-collection-scrub/tasks.md is excluded by !**/*.md
📒 Files selected for processing (9)
  • cmd/fleet/cron.go
  • ee/server/service/teams.go
  • server/chart/internal/mysql/data.go
  • server/datastore/mysql/jobs.go
  • server/fleet/datastore.go
  • server/fleet/historical_data.go
  • server/fleet/historical_data_test.go
  • server/mock/datastore_mock.go
  • server/service/appconfig.go
👮 Files not reviewed due to content moderation or server errors (5)
  • server/datastore/mysql/jobs.go
  • server/fleet/historical_data.go
  • server/service/appconfig.go
  • ee/server/service/teams.go
  • cmd/fleet/cron.go

@sgress454
Copy link
Copy Markdown
Contributor Author

@claude review once

Comment thread server/fleet/historical_data.go Outdated
Comment thread ee/server/service/teams.go Outdated
Comment thread server/datastore/mysql/jobs.go
@sgress454 sgress454 marked this pull request as ready for review May 6, 2026 03:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements backend enforcement for disabling chart datasets by (1) scoping/halting collection during the cron tick based on global + per-team features.historical_data settings and (2) enqueueing asynchronous scrub jobs to delete or bit-scrub already-collected host_scd_data when a dataset is disabled.

Changes:

  • Extend chart collection plumbing to support per-dataset scoping (skip + disabledFleetIDs) and push fleet exclusions down into the dataset SQL queries.
  • Add chart scrub job types (global delete + fleet-scoped bitmap ANDNOT) and register them with the worker schedule.
  • Enqueue scrub jobs after config/team commits when historical data settings flip from enabled → disabled, with queued-job dedup by (name,args).

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
server/worker/chart_scrub.go Adds global/fleet scrub worker job handlers + enqueue helpers.
server/worker/chart_scrub_test.go Unit tests for scrub job handlers and wire-contract constants/JSON.
server/service/appconfig.go Enqueues global scrub jobs after app config disables a dataset.
server/mock/datastore_mock.go Adds mock hook for HasQueuedJobWithArgs.
server/fleet/historical_data.go Adds scrub enqueue helper + dedup gate via HasQueuedJobWithArgs.
server/fleet/historical_data_test.go Tests scrub enqueue behavior (flip detection, dedup, error propagation).
server/fleet/datastore.go Extends datastore interface with HasQueuedJobWithArgs.
server/datastore/mysql/jobs.go Implements HasQueuedJobWithArgs using JSON equality + primary reads.
server/chart/internal/types/chart.go Extends chart datastore interface for scrubbing + scoped queries.
server/chart/internal/testutils/testutils.go Adds helpers to insert/read host_scd_data rows with explicit bitmaps.
server/chart/internal/service/service.go Wires scope resolver into CollectDatasets and adds scrub service methods.
server/chart/internal/service/service_test.go Updates mocks/tests for new collection + scrub APIs and scope forwarding.
server/chart/internal/mysql/data.go Implements dataset deletes + fleet scrub bitmap application (paged + chunked UPDATE).
server/chart/internal/mysql/data_test.go Integration-ish tests for bitmap scrubbing behavior.
server/chart/internal/mysql/charts.go Pushes disabled-fleet exclusions down into uptime/CVE collection SQL.
server/chart/datasets.go Updates dataset Collect methods to accept/forward disabledFleetIDs.
server/chart/blob.go Adds BlobANDNOT bitmap helper.
server/chart/blob_test.go Unit tests for BlobANDNOT.
server/chart/api/service.go Extends chart service API for scoped collection + scrub methods.
server/chart/api/chart.go Extends Dataset/DatasetStore interfaces to support scoped collection.
openspec/changes/chart-disabling-collection-scrub/tasks.md Tracks implementation tasks for this change.
openspec/changes/chart-disabling-collection-scrub/specs/chart-historical-data-collection/spec.md Adds detailed requirements/spec for collection gating + scrub behavior.
openspec/changes/chart-disabling-collection-scrub/proposal.md Proposal describing motivation/approach.
openspec/changes/chart-disabling-collection-scrub/design.md Design document for scoping + scrubbing.
openspec/changes/chart-disabling-collection-scrub/.openspec.yaml OpenSpec metadata for the change.
ee/server/service/teams.go Enqueues per-team scrub jobs after team disables a dataset (PATCH + GitOps path).
cmd/fleetctl/fleetctl/testing_utils/testing_utils.go Stubs HasQueuedJobWithArgs for GitOps test server setup.
cmd/fleet/serve.go Passes chartSvc into worker integration schedule registration.
cmd/fleet/cron.go Registers scrub workers; builds chart scope resolver each tick before collection.
cmd/fleet/cron_test.go Unit tests for buildChartScopeResolver.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/fleet/cron.go
Comment thread server/worker/chart_scrub.go
Comment thread server/chart/internal/mysql/data.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ee/server/service/teams.go`:
- Around line 425-439: The scrub enqueue must run immediately after SaveTeam
commits to avoid missing jobs if later post-commit handlers fail; move the
EnqueueHistoricalDataScrubs(...) call so it executes right after the successful
SaveTeam call (before calling ModifyTeam, ApplyEnrollSecrets,
OnHistoricalDataChanged, or other post-save side effects). Update the code paths
in ModifyTeam and editTeamFromSpec to remove or not duplicate the enqueue there,
and keep the same error-handling (log-and-continue) behavior around
EnqueueHistoricalDataScrubs to avoid interrupting subsequent post-save logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 24df7a52-0d79-469f-b031-e3025a41d644

📥 Commits

Reviewing files that changed from the base of the PR and between 2d4546e and 31bd566.

⛔ Files ignored due to path filters (2)
  • openspec/changes/chart-disabling-collection-scrub/design.md is excluded by !**/*.md
  • openspec/changes/chart-disabling-collection-scrub/tasks.md is excluded by !**/*.md
📒 Files selected for processing (5)
  • cmd/fleetctl/fleetctl/testing_utils/testing_utils.go
  • ee/server/service/teams.go
  • server/datastore/mysql/jobs.go
  • server/fleet/historical_data.go
  • server/fleet/historical_data_test.go

Comment thread ee/server/service/teams.go Outdated
Comment thread server/chart/internal/mysql/charts.go
Comment thread server/chart/internal/mysql/data.go
Comment thread server/chart/internal/mysql/data.go
Comment thread server/chart/internal/mysql/data.go
Comment thread server/chart/internal/service/service.go
Comment thread server/fleet/historical_data.go
Comment thread server/worker/chart_scrub.go Outdated
Comment thread server/worker/chart_scrub.go Outdated
@sgress454
Copy link
Copy Markdown
Contributor Author

Local testing complete!

@sgress454 sgress454 merged commit 684beca into main May 7, 2026
59 checks passed
@sgress454 sgress454 deleted the sgress454/44077-chart-disabling-backend branch May 7, 2026 13:52
sgress454 added a commit that referenced this pull request May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants