Skip to content

OPNET-679: Block external access to coredns#1968

Open
emy wants to merge 3 commits intoopenshift:masterfrom
emy:block-external-access-to-coredns
Open

OPNET-679: Block external access to coredns#1968
emy wants to merge 3 commits intoopenshift:masterfrom
emy:block-external-access-to-coredns

Conversation

@emy
Copy link
Copy Markdown
Member

@emy emy commented Apr 8, 2026

This enhancement proposes to block external access to CoreDNS instances running on bare metal OpenShift cluster nodes.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 8, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 8, 2026

@emy: This pull request references OPNET-679 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This enhancement proposes to block external access to CoreDNS instances running on bare metal OpenShift cluster nodes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from abhat and danwinship April 8, 2026 15:37
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 8, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign dougbtv for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 8, 2026

@emy: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 8a0e400 link true /test markdownlint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +81 to +82
* As a **cluster administrator**, I want to deploy proof-of-concept (POC)
clusters without sitewide DNS changes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear what this means, or how it relates to the rest of the enhancement.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was more of a use-case on why it would be useful to have a toggle to enable/disable the external access. If it's just confusing I can remove it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some environments it can take a long time to get sitewide DNS changes made. To enable timely deployments for those prospective customers, we have allowed them to deploy with one of the "internal" DNS servers as their "public" resolver so clients have those addresses available. We do not support that for production use, but we definitely have people taking advantage of it for test clusters.

Without this use case there is no need to maintain accessibility by making this configurable. That's why I think it's important to include it here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's also the monitoring use case where external access might be required. We could use that instead if it would be more clear.

Comment on lines +84 to +87
* As a **security engineer**, I want to ensure that DNS queries to the
on-prem CoreDNS instances can only originate from within the cluster
nodes or internal networks so that I can prevent DNS-based reconnaissance
and amplification attacks targeting my infrastructure.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically the same story as the first one, just with a different persona wanting the same thing...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this in there two times because both personas might want the same thing. If the redundancy can be left out I'm fine with removing one of them.


Use node-level firewall mechanisms (nftables) to block external
access to the CoreDNS service port (typically UDP/TCP 53) on each bare
metal node. This could be implemented via:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe nftables rules won't work for ovn-kubernetes in shared gateway mode, because in that case the external interface is on the OVS bridge and so the traffic passes directly into OVS without going through nftables rules first. (This was part of the theory for using eBPF in ingress-node-firewall.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. Should the CoreDNS ACL approach be considered as the primary solution then?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. I thought shared gateway mode had to do with how egress traffic is routed. We don't care about traffic coming from pods because they already have access to this DNS server (albeit indirectly) because it's configured as the "upstream" of the DNS operator pods.

Also, we're using host firewall rules to redirect API traffic coming into the host and AFAIK it works on shared gateway mode. If it didn't I'm pretty sure we'd be buried in API disruption bugs.

Comment on lines +191 to +192
- Allow DNS queries from other cluster nodes (node IPs within cluster
CIDR)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to allow queries from other nodes? Doesn't every node have its own resolver? (Or is it just that each resolver is only canonical for certain resources so they need to query each other sometimes?)

(This question also applies to the Risks section; depending on how important these cross-node DNS queries are, the risk of the ACLs being wrong/out-of-date is larger or smaller.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed it would be beneficial to have the nodes be able to go and query each other in case something like these dedicated resources are a requirement that is present somewhere I am not aware of.

If this is something we don't do anywhere I am fine with not allowing cross node queries.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the use cases are for the on-prem CoreDNS instance... you'd need to answer that on your side.

If you can avoid needing to allow cross-node requests, then obviously this all becomes much simpler, because you can just restrict it to localhost only.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of any normal use cases where one node needs to query another's host coredns instance. They should all be identical.

CoreDNS on bare metal nodes (e.g., for external monitoring systems or
specific network requirements):

**Option 1: Disable the feature gate** (if external access is needed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature gates are for enabling/disabling non-GA behavior only. Once the feature is GA, the feature gate will go away. So this isn't a long-term solution.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll remove this option.


If a cluster administrator needs to allow external access to on-prem
CoreDNS on bare metal nodes (e.g., for external monitoring systems or
specific network requirements):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How common will this be? If it's rare/unimportant, you could just force the admin to roll their own access method. (eg, ssh to a node to do the DNS query there)

- MachineConfig deployment may cause node reboots, impacting workload
availability during initial feature enablement

## Alternatives
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we control both the clients and servers here (right?) is there some way we could avoid making nodes query other nodes' DNS via their public IPs?

In the ovn-k case, we could have the node-to-node traffic go over the overlay to a private node IP rather than over the underlying node network to a public node IP. But that doesn't easily generalize to third-party network plugins. But is there anything else we could do?

Copy link
Copy Markdown
Member

@cybertron cybertron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more thoughts.


Use node-level firewall mechanisms (nftables) to block external
access to the CoreDNS service port (typically UDP/TCP 53) on each bare
metal node. This could be implemented via:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. I thought shared gateway mode had to do with how egress traffic is routed. We don't care about traffic coming from pods because they already have access to this DNS server (albeit indirectly) because it's configured as the "upstream" of the DNS operator pods.

Also, we're using host firewall rules to redirect API traffic coming into the host and AFAIK it works on shared gateway mode. If it didn't I'm pretty sure we'd be buried in API disruption bugs.

- Performance impact and resource overhead
- Compatibility with RHCOS and different node operating systems

2. How to make this configurable if administrators need to allow external
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first inclination is a field in the Infrastructure object. That's where most of the other host networking configuration lives.

Comment on lines +81 to +82
* As a **cluster administrator**, I want to deploy proof-of-concept (POC)
clusters without sitewide DNS changes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's also the monitoring use case where external access might be required. We could use that instead if it would be more clear.


#### Hypershift / Hosted Control Planes

**Not Applicable**: Hypershift deployments typically run on cloud
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hearing that baremetal hypershift is becoming more of a thing. That could throw a wrench in any MCO-based implementation since MCO doesn't run in hosted clusters. All we would be able to do is deploy a pod via ignition that manages the firewall rules. We couldn't rely on MCO to handle rolling out updates if changes are made on day 2.

We might be able to get around that by reading API fields ourselves, but it is something we'll need to consider.

NodeFirewallConfiguration) is deployed that configures firewall rules
on bare metal nodes to block external access to on-prem CoreDNS.

3. The MCO (or Node Firewall Operator) deploys the firewall configuration
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation detail: MCO is not going to directly manage these firewall rules. What I would propose is adding the ability to manage firewall rules to the coredns-monitor container. We already have similar functionality in haproxy-monitor that we can use as an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants