Skip to content

USHIFT-6902 USHIFT-6903: Custom DNS configuration#6592

Open
pacevedom wants to merge 6 commits intoopenshift:mainfrom
pacevedom:USHIFT-6902
Open

USHIFT-6902 USHIFT-6903: Custom DNS configuration#6592
pacevedom wants to merge 6 commits intoopenshift:mainfrom
pacevedom:USHIFT-6902

Conversation

@pacevedom
Copy link
Copy Markdown
Contributor

@pacevedom pacevedom commented Apr 27, 2026

Summary by CodeRabbit

  • New Features

    • Support providing a custom CoreDNS Corefile via configuration; cluster will use that file instead of the default rendered Corefile.
    • A runtime watcher syncs the provided Corefile into the DNS ConfigMap and updates it on file changes without restarting the service.
    • The watcher is started/stopped as part of the MicroShift service lifecycle.
  • Validation

    • Disallow host-based overrides when a custom Corefile is set; require absolute path, existence, regular file, non-empty and ≤1MiB, and readable.
  • Tests

    • New unit tests and an end-to-end suite validating config, live updates, and DNS resolution.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 27, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 27, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 27, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pacevedom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 27, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds support for a user-supplied CoreDNS Corefile, validates the config, updates DNS controller startup to accept a custom Corefile, introduces a DNSConfigurationWatcherManager that watches the Corefile on disk and keeps the openshift-dns/dns-default ConfigMap synchronized, and adds tests and templates to validate behavior.

Changes

Cohort / File(s) Summary
Service Registration
pkg/cmd/run.go
Registers the DNS configuration watcher manager in MicroShift startup service list.
DNS Configuration & Validation
pkg/config/dns.go, pkg/config/dns_test.go
Adds ConfigFile field to DNS config; validates absolute path, existence, regular file, non-empty, ≤1MiB, readable; enforces mutual exclusion with hosts; adds unit tests covering success/failure cases and size limit.
DNS Controller Startup
pkg/components/controllers.go
Refactors startDNSController to apply a user-provided Corefile into the DNS ConfigMap when configured or render/apply default ConfigMap(s) otherwise; computes hostsEnabled once and injects into template parameters.
DNS Watcher Manager
pkg/controllers/dnsconfigurationwatcher.go
Adds DNSConfigurationWatcherManager type, constructor, and manager methods; on Run it writes/updates openshift-dns/dns-default ConfigMap with Corefile contents, labels/annotates it, watches the Corefile and parent directory via fsnotify, computes SHA-256 to avoid redundant updates, and handles file events to sync changes.
Test Assets & Suites
test/assets/dns/Corefile.template, test/suites/standard1/dns-custom-config.robot
Adds Corefile template and Robot Framework suite to validate custom Corefile usage, ConfigMap sync, runtime updates without restart, DNS resolution, and cleanup.
Config Merge
pkg/config/config.go
Extends user-settings merge to copy u.DNS.ConfigFile into effective config when set.

Sequence Diagram(s)

sequenceDiagram
    participant Admin as User/Admin
    participant FS as Filesystem
    participant Watcher as DNSConfigurationWatcherManager
    participant K8s as Kubernetes API
    participant CM as ConfigMap (openshift-dns/dns-default)
    participant CoreDNS as CoreDNS Pod

    Admin->>FS: Write or update Corefile
    FS-->>Watcher: fsnotify event (Create/Write/Rename/Remove)
    Watcher->>FS: Read Corefile contents
    Watcher->>Watcher: Compute SHA-256 of contents
    alt hash changed
        Watcher->>K8s: Create/Update ConfigMap with Corefile, labels, annotations
        K8s->>CM: Apply ConfigMap
        CM->>CoreDNS: ConfigMap change triggers reload
        CoreDNS->>CoreDNS: Reload Corefile
    else hash unchanged
        Watcher->>Watcher: Skip update (no-op)
    end
    Note right of Watcher: Loop until context cancellation
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The pull request contains no Ginkgo tests, only standard Go testing framework tests with static, descriptive names and a Robot Framework test suite with static case names.
Test Structure And Quality ✅ Passed PR adds standard Go unit tests using testing package and testify assertions, not Ginkgo test code. Check not applicable to this PR.
Microshift Test Compatibility ✅ Passed This pull request does not add any Ginkgo e2e tests. The only new test file is a Robot Framework test, outside the scope of this check.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The pull request does not add any Ginkgo e2e tests. New test files are standard Go unit tests and Robot Framework integration tests, neither using Ginkgo syntax.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces DNSConfigurationWatcherManager that synchronizes DNS config to a ConfigMap without creating workloads or specifying scheduling constraints.
Ote Binary Stdout Contract ✅ Passed PR introduces no stdout writes in process-level code; all logging uses klog (stderr by default) with no main(), init(), TestMain(), or Ginkgo suite setup functions.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR does not add Ginkgo e2e tests; it adds Go unit tests and a Robot Framework integration test, so the check is not applicable.
Title check ✅ Passed The title directly references the main feature: custom DNS configuration support added across config, controller, and watcher components.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pacevedom
Copy link
Copy Markdown
Contributor Author

/test ?

@pacevedom
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-tests

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (9)
test/suites/standard1/dns-custom-config.robot (1)

134-137: Remove Fake Hosts Entry is a no-op.

The keyword only logs. The runtime-reload test calls it for ${updated_hostname} in teardown, but the actual cleanup happens via Teardown Custom Corefile. Either inline the comment in the test teardown or remove the keyword to avoid future confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/suites/standard1/dns-custom-config.robot` around lines 134 - 137, The
keyword Remove Fake Hosts Entry is a no-op that only logs and duplicates cleanup
already performed by Teardown Custom Corefile; remove the Remove Fake Hosts
Entry keyword from the suite (or replace its usage) and update the test teardown
that calls Remove Fake Hosts Entry for ${updated_hostname} to either call
Teardown Custom Corefile directly or inline a short comment in the test teardown
explaining the cleanup is handled by Teardown Custom Corefile so there is no
separate hostname removal step.
pkg/controllers/dnsconfigurationwatcher.go (3)

41-48: Disabled-mode shutdown semantics.

When s.file == "" you close(ready) and return ctx.Err(). ctx.Err() is nil until cancellation, so Run will block indefinitely without a <-ctx.Done() here. Looks intentional (so the service stays "running"), but worth a <-ctx.Done() to make the wait explicit and to avoid return nil being mistaken for "completed immediately" by future readers.

♻️ Suggested clarification
 	if s.file == "" {
 		klog.Infof("%s is disabled (not configured)", s.Name())
 		close(ready)
-		return ctx.Err()
+		<-ctx.Done()
+		return ctx.Err()
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/dnsconfigurationwatcher.go` around lines 41 - 48, When s.file
== "" in DNSConfigurationWatcherManager.Run you currently close(ready) and
return ctx.Err() immediately, but ctx.Err() is nil until cancellation; change
the disabled-path to explicitly wait for context cancellation by performing a
<-ctx.Done() (or select on ctx.Done()) after close(ready) and before returning,
then return ctx.Err(); this makes the disabled-mode shutdown semantics explicit
and avoids a misleading immediate return value.

175-182: No conflict-retry on Update.

assets.ApplyConfigMapWithData (used by pkg/components/controllers.go) wraps writes in retry.RetryOnConflict. The watcher writes the same object without retry, so a benign 409 just fails this iteration. Wrapping the Get+Update in retry.RetryOnConflict(retry.DefaultBackoff, …) would match the existing convention in the codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/dnsconfigurationwatcher.go` around lines 175 - 182, The
updateConfigMap flow can fail on benign 409 conflicts because the Get+Update
path in createOrUpdateConfigMap isn't wrapped in a conflict retry; wrap the
update logic used by updateConfigMap/createOrUpdateConfigMap in
retry.RetryOnConflict(retry.DefaultBackoff, func() error { ... }) (from
k8s.io/apimachinery/pkg/util/retry) so transient resourceVersion conflicts are
retried and the final error is returned; locate the Get+Update sequence in
createOrUpdateConfigMap and wrap it with RetryOnConflict, returning its error
from updateConfigMap.

109-111: Double watcher.Close() on shutdown.

defer at lines 66–70 already closes the watcher; returning watcher.Close() here invokes it again. fsnotify tolerates it (returns nil on the second call), but return nil is clearer and avoids logging a misleading close error if behavior ever changes upstream.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/dnsconfigurationwatcher.go` around lines 109 - 111, The
shutdown branch in the select that triggers on ctx.Done() calls watcher.Close()
again causing a double close; replace the current return watcher.Close() with
return nil in the ctx.Done() case so the deferred close at the top of the
function handles cleanup and the method simply returns after logging via
klog.Infof("%s stopping", s.Name()); keep references to the watcher variable,
the select case listening on ctx.Done(), and s.Name() so you modify the correct
block.
pkg/config/dns.go (3)

99-105: Use a named constant (or 1<<20) for the 1 MiB size limit.

1048576 is repeated in both validateConfigFile and validateHosts. A single const dnsConfigMapMaxSize = 1 << 20 would document intent and avoid drift if the limit ever changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/dns.go` around lines 99 - 105, Replace the repeated literal
1048576 with a named constant to document intent and prevent drift: declare a
const dnsConfigMapMaxSize = 1 << 20 (or equivalent) near the top of the file and
use it in the size checks inside validateConfigFile and validateHosts where the
current comparisons use 1048576; update the error messages to reference the
constant value if needed so both functions consistently use dnsConfigMapMaxSize.

87-90: Use cleanPath consistently in errors.

You compute cleanPath and validate against it, but the error messages echo back the original t.ConfigFile. If the user supplies /etc/microshift/../microshift/dns/Corefile, the message will keep the unclean form. Minor — pick one.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/dns.go` around lines 87 - 90, The validation builds cleanPath with
filepath.Clean but the error returns the original t.ConfigFile; update the error
(and any related messages) to reference cleanPath instead so the normalized path
is shown to the user—look for the filepath.Clean(t.ConfigFile) usage and the
subsequent IsAbs check in pkg/config/dns.go and change occurrences of
t.ConfigFile in the error string to cleanPath.

107-111: Open-then-close as a readability probe.

Returning file.Close() directly means a transient close error (rare) becomes a startup-blocking validation error. Functionally fine, but os.OpenFile(cleanPath, os.O_RDONLY, 0) followed by an explicit deferred close + return nil would be a touch cleaner. Optional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/dns.go` around lines 107 - 111, Change the one-shot
open-and-return-close pattern to open the file for read-only with
os.OpenFile(cleanPath, os.O_RDONLY, 0), check the error as currently done, then
defer file.Close() and return nil so a rare transient Close() error won't be
treated as a validation failure; update the block that uses variables file,
cleanPath and t.ConfigFile accordingly (open, error check, defer close, return
nil).
pkg/config/dns_test.go (2)

12-110: Missing test for the readability branch.

validateConfigFile has a dedicated os.Open step (lines 107–111) for permission errors, but no test exercises it. A quick os.Chmod(tmpFile, 0o000) case (skipped when running as root) would close that gap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/dns_test.go` around lines 12 - 110, Add a test that exercises the
permission-readability branch of validateConfigFile by creating a temp Corefile
(use createTempCorefile or tmp file), os.Chmod the file to 0o000 to remove read
permissions, skip the test when running as root, instantiate DNS with ConfigFile
set to that path and Hosts.Status disabled, call dns.validate() and assert it
returns an error containing a permission/readability message; ensure you clean
up/reset file mode after the test. Reference the validateConfigFile path
exercised by DNS.validate / validateConfigFile so the test hits the os.Open
permission branch.

59-68: Hardcoded /tmp path is a very minor flake risk.

/tmp/nonexistent-corefile-test-12345 could (in theory) exist on a shared host. Prefer filepath.Join(t.TempDir(), "nonexistent") so the test is hermetic.

♻️ Suggested change
 func TestDNS_ValidateConfigFile_NonExistentFile(t *testing.T) {
 	dns := DNS{
-		ConfigFile: "/tmp/nonexistent-corefile-test-12345",
+		ConfigFile: filepath.Join(t.TempDir(), "nonexistent-corefile"),
 		Hosts: HostsConfig{
 			Status: HostsStatusDisabled,
 		},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/dns_test.go` around lines 59 - 68,
TestDNS_ValidateConfigFile_NonExistentFile uses a hardcoded "/tmp/..." path
which can flake; update the test to create a unique temp directory with
t.TempDir() and set DNS.ConfigFile to filepath.Join(t.TempDir(),
"nonexistent-corefile-test-12345") (or similar) before calling dns.validate() so
the file is guaranteed not to exist and the test is hermetic; keep assertions
the same and import "path/filepath" if needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/components/controllers.go`:
- Around line 337-346: Before calling os.ReadFile on cfg.DNS.ConfigFile, stat
the file and re-validate its size against the ConfigMap 1MiB limit (1048576
bytes) to guard against growth since dns.validate(); if the stat shows the file
is too large, return a clear error instead of reading (and avoid passing
oversized data to assets.ApplyConfigMapWithData), and consider factoring this
check into a shared helper used by both the watcher and the code path that reads
cfg.DNS.ConfigFile; reference cfg.DNS.ConfigFile, dns.validate(), os.ReadFile,
and assets.ApplyConfigMapWithData when locating where to add the stat+size check
and error return.
- Around line 337-352: The startup apply in controllers.go (the branch using
assets.ApplyConfigMapWithData / cm) and the DNSConfigurationWatcherManager's
createOrUpdateConfigMap are clobbering each other's annotations/labels; fix by
consolidating ownership: either skip the initial apply when cfg.DNS.ConfigFile
!= "" and let DNSConfigurationWatcherManager own the openshift-dns/dns-default
ConfigMap, or (preferred minimal change) change createOrUpdateConfigMap to merge
labels and annotations into the existing ConfigMap instead of doing
existing.Annotations = configMap.Annotations / replacing Labels — i.e., read
existing := get, set existing.Labels/Annotations = merge(existing, configMap)
(preserve keys from both, allow watcher to add/overwrite its keys) and then
update; reference functions: ApplyConfigMapWithData, assets.ApplyConfigMaps,
DNSConfigurationWatcherManager, createOrUpdateConfigMap, and cfg.DNS.ConfigFile
when making the change.

In `@pkg/controllers/dnsconfigurationwatcher.go`:
- Around line 56-59: The TOCTOU arises because s.updateConfigMap runs before
hashing, so a file change between updateConfigMap and getFileHash can make
lastHash reflect new content while the ConfigMap contains old content; fix by
calling s.getFileHash(ctx) first and assign s.lastHash (or record it
appropriately), then call s.updateConfigMap(ctx, kubeClient), then
s.setupWatches(...)/close ready — i.e., reorder to getFileHash -> set lastHash
-> updateConfigMap -> setupWatches (or alternatively ensure the first watch
event always forces a reconcile if reordering is undesirable).
- Around line 137-142: The current isRelevantEvent (in
DNSConfigurationWatcherManager) only accepts Write/Create so it loses the
per-file watch after editors do atomic rename/remove; update isRelevantEvent to
also treat fsnotify.Rename and fsnotify.Remove as relevant, and in the eventLoop
handler for events on s.file, when you observe Rename or Remove call
watcher.Remove if needed and mark the per-file watcher as stale, and on
subsequent Create (or immediately after Rename) call watcher.Add(s.file) to
re-add the file watcher before reconciling; reference the
DNSConfigurationWatcherManager.isRelevantEvent and the eventLoop code where
watcher.Add(s.file) / watcher.Remove should be invoked.
- Around line 204-210: The current code treats any error from
configMapsClient.Get as "not found" and always calls configMapsClient.Create;
instead check the error using apierrors.IsNotFound(err) inside the block that
handles Get errors: if apierrors.IsNotFound(err) then attempt
configMapsClient.Create(ctx, configMap, metav1.CreateOptions{}) and log using
s.Name(), dnsConfigMapNamespace, dnsConfigMapName on success, otherwise return
the original err immediately (do not fall through to Create) so
RBAC/throttling/API errors are propagated; keep handling of create errors (wrap
with fmt.Errorf) as before.

In `@test/suites/standard1/dns-custom-config.robot`:
- Around line 40-50: The test "Runtime Reload Without MicroShift Restart" is
flaky due to slow propagation (file → fsnotify → ConfigMap → kubelet sync →
CoreDNS reload); add a fast intermediate assertion to verify the Corefile
ConfigMap has been updated before calling Resolve Host From Pod. Specifically,
after Update Corefile With New Host and before Resolve Host From Pod use a
keyword to fetch and assert the ConfigMap content (the ConfigMap updated by
Setup Custom Corefile With Hosts Entry / Update Corefile With New Host) or poll
the ConfigMap until it contains the new hostname, or alternatively lower the
kubelet projected-volume sync period for the test run so the kubelet sync
(default ~60s) won’t cause flakiness.
- Around line 113-117: In the "Router Should Resolve Hostname" test, the Should
Contain line splits the expected substring into two positional args so
${hostname} becomes the failure message; update the assertion to pass the
expected substring as a single argument (e.g., combine "Name:" and ${hostname}
into one item) so the call reads something like Should Contain    ${output}   
Name: ${hostname}; edit the Should Contain line in that test (near the Oc Exec
call) to ensure the item is a single parameter rather than two separate tokens.

---

Nitpick comments:
In `@pkg/config/dns_test.go`:
- Around line 12-110: Add a test that exercises the permission-readability
branch of validateConfigFile by creating a temp Corefile (use createTempCorefile
or tmp file), os.Chmod the file to 0o000 to remove read permissions, skip the
test when running as root, instantiate DNS with ConfigFile set to that path and
Hosts.Status disabled, call dns.validate() and assert it returns an error
containing a permission/readability message; ensure you clean up/reset file mode
after the test. Reference the validateConfigFile path exercised by DNS.validate
/ validateConfigFile so the test hits the os.Open permission branch.
- Around line 59-68: TestDNS_ValidateConfigFile_NonExistentFile uses a hardcoded
"/tmp/..." path which can flake; update the test to create a unique temp
directory with t.TempDir() and set DNS.ConfigFile to filepath.Join(t.TempDir(),
"nonexistent-corefile-test-12345") (or similar) before calling dns.validate() so
the file is guaranteed not to exist and the test is hermetic; keep assertions
the same and import "path/filepath" if needed.

In `@pkg/config/dns.go`:
- Around line 99-105: Replace the repeated literal 1048576 with a named constant
to document intent and prevent drift: declare a const dnsConfigMapMaxSize = 1 <<
20 (or equivalent) near the top of the file and use it in the size checks inside
validateConfigFile and validateHosts where the current comparisons use 1048576;
update the error messages to reference the constant value if needed so both
functions consistently use dnsConfigMapMaxSize.
- Around line 87-90: The validation builds cleanPath with filepath.Clean but the
error returns the original t.ConfigFile; update the error (and any related
messages) to reference cleanPath instead so the normalized path is shown to the
user—look for the filepath.Clean(t.ConfigFile) usage and the subsequent IsAbs
check in pkg/config/dns.go and change occurrences of t.ConfigFile in the error
string to cleanPath.
- Around line 107-111: Change the one-shot open-and-return-close pattern to open
the file for read-only with os.OpenFile(cleanPath, os.O_RDONLY, 0), check the
error as currently done, then defer file.Close() and return nil so a rare
transient Close() error won't be treated as a validation failure; update the
block that uses variables file, cleanPath and t.ConfigFile accordingly (open,
error check, defer close, return nil).

In `@pkg/controllers/dnsconfigurationwatcher.go`:
- Around line 41-48: When s.file == "" in DNSConfigurationWatcherManager.Run you
currently close(ready) and return ctx.Err() immediately, but ctx.Err() is nil
until cancellation; change the disabled-path to explicitly wait for context
cancellation by performing a <-ctx.Done() (or select on ctx.Done()) after
close(ready) and before returning, then return ctx.Err(); this makes the
disabled-mode shutdown semantics explicit and avoids a misleading immediate
return value.
- Around line 175-182: The updateConfigMap flow can fail on benign 409 conflicts
because the Get+Update path in createOrUpdateConfigMap isn't wrapped in a
conflict retry; wrap the update logic used by
updateConfigMap/createOrUpdateConfigMap in
retry.RetryOnConflict(retry.DefaultBackoff, func() error { ... }) (from
k8s.io/apimachinery/pkg/util/retry) so transient resourceVersion conflicts are
retried and the final error is returned; locate the Get+Update sequence in
createOrUpdateConfigMap and wrap it with RetryOnConflict, returning its error
from updateConfigMap.
- Around line 109-111: The shutdown branch in the select that triggers on
ctx.Done() calls watcher.Close() again causing a double close; replace the
current return watcher.Close() with return nil in the ctx.Done() case so the
deferred close at the top of the function handles cleanup and the method simply
returns after logging via klog.Infof("%s stopping", s.Name()); keep references
to the watcher variable, the select case listening on ctx.Done(), and s.Name()
so you modify the correct block.

In `@test/suites/standard1/dns-custom-config.robot`:
- Around line 134-137: The keyword Remove Fake Hosts Entry is a no-op that only
logs and duplicates cleanup already performed by Teardown Custom Corefile;
remove the Remove Fake Hosts Entry keyword from the suite (or replace its usage)
and update the test teardown that calls Remove Fake Hosts Entry for
${updated_hostname} to either call Teardown Custom Corefile directly or inline a
short comment in the test teardown explaining the cleanup is handled by Teardown
Custom Corefile so there is no separate hostname removal step.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 9a68241a-4489-42f2-8ce4-1c42a7a1bf76

📥 Commits

Reviewing files that changed from the base of the PR and between 9350a63 and e755177.

📒 Files selected for processing (7)
  • pkg/cmd/run.go
  • pkg/components/controllers.go
  • pkg/config/dns.go
  • pkg/config/dns_test.go
  • pkg/controllers/dnsconfigurationwatcher.go
  • test/assets/dns/Corefile.template
  • test/suites/standard1/dns-custom-config.robot

Comment thread pkg/components/controllers.go
Comment thread pkg/components/controllers.go
Comment thread pkg/controllers/dnsconfigurationwatcher.go
Comment thread pkg/controllers/dnsconfigurationwatcher.go
Comment thread pkg/controllers/dnsconfigurationwatcher.go
Comment thread test/suites/standard1/dns-custom-config.robot Outdated
Comment thread test/suites/standard1/dns-custom-config.robot Outdated
@pacevedom
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-tests

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/suites/standard1/dns-custom-config.robot (1)

144-149: rm -f rc check is effectively a no-op.

rm -f swallows missing-file errors and returns 0 in virtually all real cases, so the assertion adds noise without catching anything. Either drop it or switch to rm (without -f) so a genuine permission/path issue actually surfaces.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/suites/standard1/dns-custom-config.robot` around lines 144 - 149, The
test step "Remove Custom Corefile" uses "rm -f ${CUSTOM_COREFILE_PATH}" which
makes the subsequent rc assertion meaningless; either remove the rc/assertion
entirely from that test or change the command to "rm ${CUSTOM_COREFILE_PATH}" so
failures bubble up; update the Execute Command invocation (the Robot keyword
call that sets ${stdout} ${stderr} ${rc}= Execute Command ...) and the Should Be
Equal As Integers assertion accordingly (referencing ${rc}), and keep
${CUSTOM_COREFILE_PATH} as the target variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/suites/standard1/dns-custom-config.robot`:
- Line 41: The test "Verify ConfigMap Uses Custom Corefile" runs immediately
after "Restart MicroShift" and can read a stale/default Corefile because
DNSConfigurationWatcherManager reconciles the dns-default ConfigMap
asynchronously; update the test to wrap the ConfigMap assertion in Robot's "Wait
Until Keyword Succeeds" (same pattern used by "Resolve Host From Pod") so it
will retry until the custom Corefile appears, referencing the "Verify ConfigMap
Uses Custom Corefile" keyword and using the retry/interval values consistent
with other waits in the suite.

---

Nitpick comments:
In `@test/suites/standard1/dns-custom-config.robot`:
- Around line 144-149: The test step "Remove Custom Corefile" uses "rm -f
${CUSTOM_COREFILE_PATH}" which makes the subsequent rc assertion meaningless;
either remove the rc/assertion entirely from that test or change the command to
"rm ${CUSTOM_COREFILE_PATH}" so failures bubble up; update the Execute Command
invocation (the Robot keyword call that sets ${stdout} ${stderr} ${rc}= Execute
Command ...) and the Should Be Equal As Integers assertion accordingly
(referencing ${rc}), and keep ${CUSTOM_COREFILE_PATH} as the target variable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 681fff44-1e28-4b73-a2d9-f7e7f8b020e0

📥 Commits

Reviewing files that changed from the base of the PR and between e755177 and fdc7791.

📒 Files selected for processing (2)
  • test/assets/dns/Corefile.template
  • test/suites/standard1/dns-custom-config.robot
✅ Files skipped from review due to trivial changes (1)
  • test/assets/dns/Corefile.template

Comment thread test/suites/standard1/dns-custom-config.robot
@pacevedom
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-tests

@pacevedom
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-tests

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/suites/standard1/dns-custom-config.robot`:
- Line 20: The Corefile template path is working-directory dependent; update the
assignment of ${COREFILE_TEMPLATE} to use the suite-relative path
"test/assets/dns/Corefile.template" instead of "./assets/dns/Corefile.template"
so the test consistently finds the template (locate where ${COREFILE_TEMPLATE}
is set in dns-custom-config.robot and replace the path).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1c7b7cd2-f47c-4400-b6f7-d69ac30925c6

📥 Commits

Reviewing files that changed from the base of the PR and between fdc7791 and f3a2982.

📒 Files selected for processing (2)
  • test/assets/dns/Corefile.template
  • test/suites/standard1/dns-custom-config.robot
✅ Files skipped from review due to trivial changes (1)
  • test/assets/dns/Corefile.template

Comment thread test/suites/standard1/dns-custom-config.robot Outdated
@pacevedom
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-tests

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (9)
test/suites/standard1/dns-custom-config.robot (4)

44-45: ⚠️ Potential issue | 🟡 Minor

Assert the ConfigMap update before the nslookup retry.

Right after Update Corefile With New Host, failures are ambiguous: the watcher may not have synced yet, or CoreDNS may not have reloaded yet. Polling dns-default.data.Corefile for ${updated_hostname} here makes this test much less flaky.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/suites/standard1/dns-custom-config.robot` around lines 44 - 45, After
the "Update Corefile With New Host" step, add a poll/assert that the ConfigMap
key dns-default.data.Corefile contains ${updated_hostname} before running
"Resolve Host From Pod"; implement this by polling the ConfigMap (e.g., a retry
loop or Robot's Wait Until Keyword Succeeds) until the Corefile text includes
${updated_hostname} and fail if the timeout is reached so the test only proceeds
once the watcher/CoreDNS reload has actually observed the change.

87-92: ⚠️ Potential issue | 🟡 Minor

Make the ConfigMap check retryable.

This keyword reads dns-default once, but the watcher updates it asynchronously after Restart MicroShift. Wrapping the assertion in Wait Until Keyword Succeeds will avoid reading the stale/default Corefile.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/suites/standard1/dns-custom-config.robot` around lines 87 - 92, The test
reads the dns-default ConfigMap once with the keyword Oc Get JsonPath and
asserts via Should Contain, but this can race with asynchronous updates after
Restart MicroShift; update the "Verify ConfigMap Uses Custom Corefile" case to
retry the read/assert by wrapping the Oc Get JsonPath + Should Contain sequence
inside a Wait Until Keyword Succeeds (or equivalent retry) block so it re-reads
${corefile_data} until the ${HOSTNAME} appears, keeping the existing keywords
names (Oc Get JsonPath, Should Contain, Wait Until Keyword Succeeds) and the
test name intact.

103-104: ⚠️ Potential issue | 🟡 Minor

Fix the Should Contain item argument.

${hostname} is currently the failure message, not part of the expected substring, so this passes for any successful lookup.

Proposed fix
-    Should Contain    ${output}    Name:    ${hostname}
+    Should Contain    ${output}    Name: ${hostname}
What is the Robot Framework BuiltIn `Should Contain` keyword signature, and how are positional arguments interpreted?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/suites/standard1/dns-custom-config.robot` around lines 103 - 104, The
test calls BuiltIn's Should Contain with ${output}, "Name:" and ${hostname}
which makes ${hostname} the custom failure message instead of part of the
expected substring; update the Should Contain invocation (after the Oc Exec that
sets ${output}) to pass the full expected substring as the second positional
argument — e.g. combine the tokens into one expected string like "Name:
${hostname}" (so call Should Contain    ${output}    Name: ${hostname}) so Robot
treats the whole "Name: <hostname>" as the substring to look for.

20-20: ⚠️ Potential issue | 🟠 Major

Use a suite-relative Corefile template path.

./assets/dns/Corefile.template is working-directory dependent. This suite lives under test/suites/standard1, so the current path will miss test/assets/dns/Corefile.template in many runners.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/suites/standard1/dns-custom-config.robot` at line 20, The path assigned
to ${COREFILE_TEMPLATE} is relative to the working directory and should be
suite-relative; update the value in dns-custom-config.robot (the
${COREFILE_TEMPLATE} assignment) from ./assets/dns/Corefile.template to the
suite-relative path ../../assets/dns/Corefile.template (or use a suite-dir
variable if preferred) so it resolves to test/assets/dns/Corefile.template when
the suite runs.
pkg/components/controllers.go (1)

337-343: ⚠️ Potential issue | 🟡 Minor

Re-check the Corefile size right before reading it.

dns.validate() runs earlier, but the file can grow before this os.ReadFile. If it exceeds the 1 MiB ConfigMap limit here, the failure moves to the API write path and becomes much less clear.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/components/controllers.go` around lines 337 - 343, Before reading the DNS
Corefile, re-check its size to ensure it does not exceed the 1 MiB ConfigMap
limit; specifically, in the branch handling cfg.DNS.ConfigFile (the block that
currently calls os.ReadFile and then assets.ApplyConfigMapWithData with cm and
kubeconfigPath), stat the file (os.Stat) and return a clear error if the size is
> 1*1024*1024 so the check catches growth that occurred after dns.validate() and
prevents a confusing API write failure when calling
assets.ApplyConfigMapWithData.
pkg/controllers/dnsconfigurationwatcher.go (4)

56-59: ⚠️ Potential issue | 🟡 Minor

Hash/write ordering still has a missed-update window.

If the file changes after the initial updateConfigMap() but before getFileHash(), lastHash can reflect new content while the ConfigMap still has old content, and the next event gets skipped as “unchanged”.

Also applies to: 77-80

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/dnsconfigurationwatcher.go` around lines 56 - 59, The race
comes from calling updateConfigMap before computing/setting the hash, allowing
the file to change between update and getFileHash and causing missed updates;
fix by computing the file hash with getFileHash() first, use that hash to decide
whether to update, pass or apply that same computed hash when calling
updateConfigMap (or store it as the ConfigMap annotation) and then set lastHash
= computedHash only after a successful update; apply the same change in both
places that currently update the ConfigMap (the initial creation path using
updateConfigMap(...) and the subsequent update path around the existing
getFileHash/lastHash checks).

204-210: ⚠️ Potential issue | 🔴 Critical

Don’t treat every Get failure as “not found”.

Transient API/RBAC/throttling errors will fall into Create() here and mask the real problem. This needs an apierrors.IsNotFound(err) split, with all other Get errors returned directly.

In Kubernetes client-go, should errors from `ConfigMaps().Get(...)` be distinguished with `apierrors.IsNotFound(err)` instead of treating any error as a create path?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/dnsconfigurationwatcher.go` around lines 204 - 210, The Get
error must not be assumed to mean "not found"; update the logic around
configMapsClient.Get(ctx, dnsConfigMapName, metav1.GetOptions{}) to check
apierrors.IsNotFound(err) and only call configMapsClient.Create(...) in that
branch; for any other error from Get (e.g. RBAC/throttling), return the error
immediately (wrap with fmt.Errorf as appropriate). Ensure you reference the
existing variables and functions (configMapsClient, dnsConfigMapName, configMap,
s.Name()) and keep the success log (klog.Infof) only after a successful Create.

212-214: ⚠️ Potential issue | 🟠 Major

Merge annotations instead of replacing them.

existing.Annotations = configMap.Annotations drops any metadata already present on openshift-dns/dns-default, including annotations coming from the startup asset apply path. That leaves the final shape dependent on whichever writer ran last.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/dnsconfigurationwatcher.go` around lines 212 - 214, The code
currently replaces annotations via existing.Annotations = configMap.Annotations
which drops preexisting metadata; instead merge configMap.Annotations into
existing.Annotations before calling configMapsClient.Update: ensure
existing.Annotations is non-nil (create map if nil), iterate over
configMap.Annotations and copy/overwrite keys into existing.Annotations, and
only then call configMapsClient.Update(ctx, existing, metav1.UpdateOptions{});
handle nil configMap.Annotations by leaving existing annotations untouched.

137-142: ⚠️ Potential issue | 🟠 Major

Handle atomic replace events and re-add the file watch.

A lot of editors/config tools update files via temp-file + rename. With only Write|Create, the watcher can miss future changes after a Rename/Remove on the original inode.

With fsnotify on Linux, do atomic file replacements emit `Rename`/`Remove` on the old watched file, and does the new inode require re-adding the file watch?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/dnsconfigurationwatcher.go` around lines 137 - 142, The
watcher currently only treats Write/Create as relevant; change isRelevantEvent
to also treat fsnotify.Rename and fsnotify.Remove as relevant, and ensure the
watcher logic (where events from isRelevantEvent are handled) attempts to remove
and re-add the watch for s.file when a Rename or Remove is seen so the new inode
is watched; update the handling around DNSConfigurationWatcherManager methods
(isRelevantEvent and the event processing path that manages the fsnotify
watcher) to detect Rename/Remove, call the code that removes the old watch and
then re-adds a watch for s.file (or call a helper like addWatch/reAddWatch if
present) so atomic replace (temp-file + rename) doesn’t lose future events.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/config/dns.go`:
- Around line 92-111: The validateConfigFile() path checks (os.Stat on cleanPath
producing fi) currently allow non-regular files (directories, FIFOs, device
nodes); update the validation after calling os.Stat to reject any non-regular
file by checking fi.Mode().IsRegular() and returning a clear error referencing
t.ConfigFile if it is not regular, before proceeding to size checks and os.Open;
keep existing size and readability checks (fi.Size(), os.Open) intact so only
regular files are accepted upfront.

---

Duplicate comments:
In `@pkg/components/controllers.go`:
- Around line 337-343: Before reading the DNS Corefile, re-check its size to
ensure it does not exceed the 1 MiB ConfigMap limit; specifically, in the branch
handling cfg.DNS.ConfigFile (the block that currently calls os.ReadFile and then
assets.ApplyConfigMapWithData with cm and kubeconfigPath), stat the file
(os.Stat) and return a clear error if the size is > 1*1024*1024 so the check
catches growth that occurred after dns.validate() and prevents a confusing API
write failure when calling assets.ApplyConfigMapWithData.

In `@pkg/controllers/dnsconfigurationwatcher.go`:
- Around line 56-59: The race comes from calling updateConfigMap before
computing/setting the hash, allowing the file to change between update and
getFileHash and causing missed updates; fix by computing the file hash with
getFileHash() first, use that hash to decide whether to update, pass or apply
that same computed hash when calling updateConfigMap (or store it as the
ConfigMap annotation) and then set lastHash = computedHash only after a
successful update; apply the same change in both places that currently update
the ConfigMap (the initial creation path using updateConfigMap(...) and the
subsequent update path around the existing getFileHash/lastHash checks).
- Around line 204-210: The Get error must not be assumed to mean "not found";
update the logic around configMapsClient.Get(ctx, dnsConfigMapName,
metav1.GetOptions{}) to check apierrors.IsNotFound(err) and only call
configMapsClient.Create(...) in that branch; for any other error from Get (e.g.
RBAC/throttling), return the error immediately (wrap with fmt.Errorf as
appropriate). Ensure you reference the existing variables and functions
(configMapsClient, dnsConfigMapName, configMap, s.Name()) and keep the success
log (klog.Infof) only after a successful Create.
- Around line 212-214: The code currently replaces annotations via
existing.Annotations = configMap.Annotations which drops preexisting metadata;
instead merge configMap.Annotations into existing.Annotations before calling
configMapsClient.Update: ensure existing.Annotations is non-nil (create map if
nil), iterate over configMap.Annotations and copy/overwrite keys into
existing.Annotations, and only then call configMapsClient.Update(ctx, existing,
metav1.UpdateOptions{}); handle nil configMap.Annotations by leaving existing
annotations untouched.
- Around line 137-142: The watcher currently only treats Write/Create as
relevant; change isRelevantEvent to also treat fsnotify.Rename and
fsnotify.Remove as relevant, and ensure the watcher logic (where events from
isRelevantEvent are handled) attempts to remove and re-add the watch for s.file
when a Rename or Remove is seen so the new inode is watched; update the handling
around DNSConfigurationWatcherManager methods (isRelevantEvent and the event
processing path that manages the fsnotify watcher) to detect Rename/Remove, call
the code that removes the old watch and then re-adds a watch for s.file (or call
a helper like addWatch/reAddWatch if present) so atomic replace (temp-file +
rename) doesn’t lose future events.

In `@test/suites/standard1/dns-custom-config.robot`:
- Around line 44-45: After the "Update Corefile With New Host" step, add a
poll/assert that the ConfigMap key dns-default.data.Corefile contains
${updated_hostname} before running "Resolve Host From Pod"; implement this by
polling the ConfigMap (e.g., a retry loop or Robot's Wait Until Keyword
Succeeds) until the Corefile text includes ${updated_hostname} and fail if the
timeout is reached so the test only proceeds once the watcher/CoreDNS reload has
actually observed the change.
- Around line 87-92: The test reads the dns-default ConfigMap once with the
keyword Oc Get JsonPath and asserts via Should Contain, but this can race with
asynchronous updates after Restart MicroShift; update the "Verify ConfigMap Uses
Custom Corefile" case to retry the read/assert by wrapping the Oc Get JsonPath +
Should Contain sequence inside a Wait Until Keyword Succeeds (or equivalent
retry) block so it re-reads ${corefile_data} until the ${HOSTNAME} appears,
keeping the existing keywords names (Oc Get JsonPath, Should Contain, Wait Until
Keyword Succeeds) and the test name intact.
- Around line 103-104: The test calls BuiltIn's Should Contain with ${output},
"Name:" and ${hostname} which makes ${hostname} the custom failure message
instead of part of the expected substring; update the Should Contain invocation
(after the Oc Exec that sets ${output}) to pass the full expected substring as
the second positional argument — e.g. combine the tokens into one expected
string like "Name: ${hostname}" (so call Should Contain    ${output}    Name:
${hostname}) so Robot treats the whole "Name: <hostname>" as the substring to
look for.
- Line 20: The path assigned to ${COREFILE_TEMPLATE} is relative to the working
directory and should be suite-relative; update the value in
dns-custom-config.robot (the ${COREFILE_TEMPLATE} assignment) from
./assets/dns/Corefile.template to the suite-relative path
../../assets/dns/Corefile.template (or use a suite-dir variable if preferred) so
it resolves to test/assets/dns/Corefile.template when the suite runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 13e9a4b4-69a8-44a2-a75c-4b33d2dee538

📥 Commits

Reviewing files that changed from the base of the PR and between f3a2982 and 1503c77.

📒 Files selected for processing (8)
  • pkg/cmd/run.go
  • pkg/components/controllers.go
  • pkg/config/config.go
  • pkg/config/dns.go
  • pkg/config/dns_test.go
  • pkg/controllers/dnsconfigurationwatcher.go
  • test/assets/dns/Corefile.template
  • test/suites/standard1/dns-custom-config.robot
✅ Files skipped from review due to trivial changes (2)
  • pkg/config/config.go
  • test/assets/dns/Corefile.template
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/cmd/run.go
  • pkg/config/dns_test.go

Comment thread pkg/config/dns.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/config/dns.go (1)

124-139: Redundant os.Stat calls.

os.Stat(cleanPath) is called twice (lines 126 and 134). The second call is unnecessary since the result from line 126 can be reused.

♻️ Proposed fix
 	cleanPath := filepath.Clean(t.Hosts.File)

 	fi, err := os.Stat(cleanPath)
-	if err == nil && fi.Size() > 1048576 {
-		return fmt.Errorf("hosts file %s exceeds 1MiB ConfigMap (and internal buffer) size limit (got %d bytes)", t.Hosts.File, fi.Size())
-	}
-	if !filepath.IsAbs(cleanPath) {
-		return fmt.Errorf("hosts file path must be absolute: got %s", t.Hosts.File)
-	}
-
-	_, err = os.Stat(cleanPath)
 	if os.IsNotExist(err) {
 		return fmt.Errorf("hosts file %s does not exist", t.Hosts.File)
 	} else if err != nil {
 		return fmt.Errorf("error checking hosts file %s: %v", t.Hosts.File, err)
 	}
+	if !filepath.IsAbs(cleanPath) {
+		return fmt.Errorf("hosts file path must be absolute: got %s", t.Hosts.File)
+	}
+	if fi.Size() > 1048576 {
+		return fmt.Errorf("hosts file %s exceeds 1MiB ConfigMap (and internal buffer) size limit (got %d bytes)", t.Hosts.File, fi.Size())
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/dns.go` around lines 124 - 139, The code redundantly calls
os.Stat(cleanPath) twice; reuse the initial os.Stat result (fi, err) instead of
calling os.Stat again: after cleaning path (cleanPath) call os.Stat once, check
err for IsNotExist and other errors, then if err == nil use fi.Size() to enforce
the 1MiB limit and also validate filepath.IsAbs(cleanPath) and return errors
referencing t.Hosts.File as currently done; update the logic around the existing
fi and err variables to remove the second os.Stat call and preserve the same
error messages.
pkg/controllers/dnsconfigurationwatcher.go (1)

110-112: Double watcher.Close() on context cancellation.

watcher.Close() is called here on ctx.Done(), but it's also deferred at line 67. Calling Close twice on fsnotify.Watcher is safe (returns nil on second call), but returning watcher.Close() error here is misleading since the deferred close will also run.

♻️ Proposed fix
 		case <-ctx.Done():
 			klog.Infof("%s stopping", s.Name())
-			return watcher.Close()
+			return nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/dnsconfigurationwatcher.go` around lines 110 - 112, The
ctx.Done() branch logs stopping with s.Name() then calls and returns
watcher.Close(), but watcher.Close() is already deferred (defer
watcher.Close()), so return watcher.Close() is redundant/misleading; change the
ctx.Done() case to just log and return (or return nil) so the deferred
watcher.Close() handles closing and no Close() error is propagated from this
branch—locate the ctx.Done() case that references s.Name() and watcher.Close()
in the same scope as the deferred watcher.Close() and replace the return
watcher.Close() with a plain return (or return nil).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/controllers/dnsconfigurationwatcher.go`:
- Around line 160-162: When s.updateConfigMap(ctx, kubeClient) fails, the code
currently returns currentHash which can cause future identical-file changes to
be skipped; change the error return to return an empty/zero hash (e.g. "" )
instead of currentHash so the watcher will not treat the failed update as
already-applied; update the error branch in the function containing
s.updateConfigMap to return false, "" , err (or another sentinel “no-hash” value
used elsewhere) so the next change with the same computed hash will trigger an
update.

---

Nitpick comments:
In `@pkg/config/dns.go`:
- Around line 124-139: The code redundantly calls os.Stat(cleanPath) twice;
reuse the initial os.Stat result (fi, err) instead of calling os.Stat again:
after cleaning path (cleanPath) call os.Stat once, check err for IsNotExist and
other errors, then if err == nil use fi.Size() to enforce the 1MiB limit and
also validate filepath.IsAbs(cleanPath) and return errors referencing
t.Hosts.File as currently done; update the logic around the existing fi and err
variables to remove the second os.Stat call and preserve the same error
messages.

In `@pkg/controllers/dnsconfigurationwatcher.go`:
- Around line 110-112: The ctx.Done() branch logs stopping with s.Name() then
calls and returns watcher.Close(), but watcher.Close() is already deferred
(defer watcher.Close()), so return watcher.Close() is redundant/misleading;
change the ctx.Done() case to just log and return (or return nil) so the
deferred watcher.Close() handles closing and no Close() error is propagated from
this branch—locate the ctx.Done() case that references s.Name() and
watcher.Close() in the same scope as the deferred watcher.Close() and replace
the return watcher.Close() with a plain return (or return nil).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: ecc76575-50bf-4dfc-9c46-c9beec447f22

📥 Commits

Reviewing files that changed from the base of the PR and between 1503c77 and 6564a9a.

📒 Files selected for processing (5)
  • pkg/config/dns.go
  • pkg/config/dns_test.go
  • pkg/controllers/dnsconfigurationwatcher.go
  • test/assets/dns/Corefile.template
  • test/suites/standard1/dns-custom-config.robot
✅ Files skipped from review due to trivial changes (1)
  • test/assets/dns/Corefile.template
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/suites/standard1/dns-custom-config.robot

Comment on lines +160 to +162
if err := s.updateConfigMap(ctx, kubeClient); err != nil {
klog.Errorf("%s failed to update ConfigMap: %v", s.Name(), err)
return false, currentHash, err
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hash returned on error may cause missed updates.

When updateConfigMap fails, currentHash is returned. If the next file change produces the same hash, the update will be skipped even though the ConfigMap was never successfully updated.

🛡️ Proposed fix
 	if err := s.updateConfigMap(ctx, kubeClient); err != nil {
 		klog.Errorf("%s failed to update ConfigMap: %v", s.Name(), err)
-		return false, currentHash, err
+		return false, lastHash, err
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err := s.updateConfigMap(ctx, kubeClient); err != nil {
klog.Errorf("%s failed to update ConfigMap: %v", s.Name(), err)
return false, currentHash, err
if err := s.updateConfigMap(ctx, kubeClient); err != nil {
klog.Errorf("%s failed to update ConfigMap: %v", s.Name(), err)
return false, lastHash, err
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/dnsconfigurationwatcher.go` around lines 160 - 162, When
s.updateConfigMap(ctx, kubeClient) fails, the code currently returns currentHash
which can cause future identical-file changes to be skipped; change the error
return to return an empty/zero hash (e.g. "" ) instead of currentHash so the
watcher will not treat the failed update as already-applied; update the error
branch in the function containing s.updateConfigMap to return false, "" , err
(or another sentinel “no-hash” value used elsewhere) so the next change with the
same computed hash will trigger an update.

@pacevedom pacevedom marked this pull request as ready for review April 28, 2026 12:57
@openshift-ci openshift-ci Bot requested review from agullon and copejon April 28, 2026 12:59
Add a new ConfigFile field to the DNS config struct that allows users
to specify a custom CoreDNS Corefile path. Validation ensures mutual
exclusivity with dns.hosts, and checks that the file is absolute,
exists, is readable, non-empty, and within the 1MiB ConfigMap limit.
When dns.configFile is set, read the file and apply its contents
directly as the Corefile key in the dns-default ConfigMap, bypassing
the default template rendering. When unset, existing behavior is
preserved.
Add DNSConfigurationWatcherManager that watches the custom Corefile
specified by dns.configFile for changes at runtime, updating the
dns-default ConfigMap when the file is modified. Follows the same
fsnotify + SHA256 hashing pattern as the existing hosts watcher.
Cover mutual exclusivity with dns.hosts, non-absolute path, missing
file, empty file, file exceeding 1MiB limit, valid config, and
default behavior when configFile is unset.
@pacevedom pacevedom changed the title WIP: Custom DNS configuration USHIFT-6902 USHIFT-6903: Custom DNS configuration Apr 28, 2026
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 28, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 28, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 28, 2026

@pacevedom: This pull request references USHIFT-6902 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

This pull request references USHIFT-6903 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary by CodeRabbit

  • New Features

  • Support providing a custom CoreDNS Corefile via configuration; cluster will use that file instead of the default rendered Corefile.

  • A runtime watcher syncs the provided Corefile into the DNS ConfigMap and updates it on file changes without restarting the service.

  • The watcher is started/stopped as part of the MicroShift service lifecycle.

  • Validation

  • Disallow host-based overrides when a custom Corefile is set; require absolute path, existence, regular file, non-empty and ≤1MiB, and readable.

  • Tests

  • New unit tests and an end-to-end suite validating config, live updates, and DNS resolution.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

RobotFramework tests covering:
- Custom Corefile is used by CoreDNS instead of the default
- Runtime reload propagates changes without MicroShift restart
- Cluster-local resolution works with custom Corefile
- Use apierrors.IsNotFound for ConfigMap Get errors instead of treating
  any error as not-found
- Merge annotations into existing ConfigMap instead of replacing them to
  avoid clobbering annotations set by the asset apply
- Handle atomic-rename file updates (Rename/Remove events) and re-add
  the file watch on Create so the watcher survives vim/sed -i edits
- Reject non-regular files (directories, FIFOs) in validateConfigFile
- Fix Should Contain assertion in robot test that treated the hostname
  as a failure message instead of a second assertion
- Add retry wrapper to ConfigMap verification after restart
@pacevedom
Copy link
Copy Markdown
Contributor Author

/retest

2 similar comments
@pacevedom
Copy link
Copy Markdown
Contributor Author

/retest

@pacevedom
Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

@pacevedom: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants