Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDN-4458: Do per-pod MCS/metadata blocking with nftables rather than iptables #1946

Merged
merged 3 commits into from Apr 22, 2024

Conversation

danwinship
Copy link
Contributor

Along with warning about customers using iptables in pods (https://issues.redhat.com/browse/SDN-4114) we need to make sure that we don't use iptables in pods.

This pulls in one of the commits from ovn-org/ovn-kubernetes#3709 to get the knftables library. Other than a no-op cleanup call inherited from there, it doesn't modify ovnkube-node to use nftables in the host network namespace.

I changed it to just always create the IPv4-specific rules whether or not we support IPv4, because it was easy, and I wanted to keep the code in sync between ovn-k and openshift-sdn (openshift/sdn#581).

@danwinship danwinship requested a review from dcbw as a code owner October 26, 2023 14:44
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2023
@danwinship
Copy link
Contributor Author

/retest

@dcbw
Copy link
Contributor

dcbw commented Nov 21, 2023

I think we should just try to upstream our MCS/metadata blocking stuff and hide it behind a config option that NVIDIA doesn't have to enable... otherwise looks pretty good to me.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 15, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 29, 2024
@danwinship danwinship changed the title Do per-pod MCS/metadata blocking with nftables rather than iptables SDN-4458: Do per-pod MCS/metadata blocking with nftables rather than iptables Feb 1, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 1, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 1, 2024

@danwinship: This pull request references SDN-4458 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.16.0" version, but no target version was set.

In response to this:

Along with warning about customers using iptables in pods (https://issues.redhat.com/browse/SDN-4114) we need to make sure that we don't use iptables in pods.

This pulls in one of the commits from ovn-org/ovn-kubernetes#3709 to get the knftables library. Other than a no-op cleanup call inherited from there, it doesn't modify ovnkube-node to use nftables in the host network namespace.

I changed it to just always create the IPv4-specific rules whether or not we support IPv4, because it was easy, and I wanted to keep the code in sync between ovn-k and openshift-sdn (openshift/sdn#581).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@danwinship
Copy link
Contributor Author

I think we should just try to upstream our MCS/metadata blocking stuff and hide it behind a config option that NVIDIA doesn't have to enable... otherwise looks pretty good to me.

I could do that... but this is a terrible hack. We had a plan for a better fix long ago but it never happened.

I guess the upstreamable version would be to allow the conf file to contain a list of ports and a list of IPs to block...

@danwinship
Copy link
Contributor Author

/assign @npinaeva @tssurya

@danwinship
Copy link
Contributor Author

I think we should just try to upstream our MCS/metadata blocking stuff and hide it behind a config option that NVIDIA doesn't have to enable... otherwise looks pretty good to me.

So it seems like we're right on the verge of being able to replace this hack with an AdminNetworkPolicy... (Deny egress to nodes: { matchLabels: {"node-role.kubernetes.io/master": ""} } on ports 22623 and 22624, and deny egress to networks: ["169.254.0.0/16"] except on port 53.) So let's leave it as an OCP-specific hack until then.

@danwinship
Copy link
Contributor Author

/retest

@npinaeva
Copy link
Member

the code looks fine, same question as for openshift/sdn#581: do we have any tests to confirm it works properly?

Copy link
Contributor

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main nftables logic looks simple and good to me based on how the older iptables logic were.
I have some silly questions that are mostly for my own understanding.

@@ -27,7 +27,7 @@ RUN INSTALL_PKGS=" \
python3-pip python3-pyyaml bind-utils procps-ng openssl numactl-libs firewalld-filesystem \
libpcap hostname kubernetes-client util-linux \
ovn ovn-central ovn-host python3-openvswitch tcpdump openvswitch-test python3-pyOpenSSL \
iptables iproute iputils strace socat koji \
iptables nftables iproute iputils strace socat koji \
Copy link
Contributor

@tssurya tssurya Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to go in first into ovn-org/ovnk; I don't think we want to diverge too much from upstream right?
if this is a OCP ONLY commit the commit should be updated to OCP CARRY prefix and leave a comment here, I bet the next d/s merge of something will conflict with this change if we are going to diverge it might be good to have the OCP specific commit/comment

UPDATE: This was discussed in team meeting; eventually when we convert stuff into nftables this will also go upstream until then let's just rename these commits to prefix OCPHACK or something so that we know we need to carry them specially during conflicts if any

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the middle commit here originally got pulled out of ovn-org/ovn-kubernetes#3709... maybe I can tweak it some to be less conflict-y since that got deferred to 4.17...

go-controller/pkg/node/nftables/helpers.go Outdated Show resolved Hide resolved
go-controller/pkg/node/nftables/helpers.go Outdated Show resolved Hide resolved
go-controller/pkg/node/nftables/helpers.go Outdated Show resolved Hide resolved
COMMIT
`
func doNFTablesRules(platformType string) error {
nft, err := knftables.New(knftables.InetFamily, "openshift-block-output")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not using GetNFTablesHelper if that is returning an interface? this part seems to be exactly doing what that func does right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, yeah, sorry, none of that code from upstream gets used here because it's all for ovnkube-node, not for the CNI shim

Copy link
Contributor

@tssurya tssurya Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you prefer to keep it that way? as in you don't want to use the utils for cni pkg at the moment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed. The stuff that was in pkg/node/nftables (which is no longer there in the current version of the commit) was just to make the nftables API look more like what the existing gateway init, etc, code was expecting based on the old iptables API, so I didn't have to rewrite as much stuff. But the CNI shim code wasn't making the same assumptions (and was much smaller anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack sg, thanks I can see all that is gone in the latest push

}

tx := nft.NewTransaction()
tx.Add(&knftables.Table{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for any names for the table? or is there some magic piece I am missing as to how the table name you pass is used to create this? for some reason I thought https://github.com/kubernetes-sigs/knftables/blob/b9d68958acd80a16a2b1d6a1ce650f38b3673b0f/nftables.go#L109 newInternal would actually create the table for you based on the provided value but its really just setting the context right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the API being weird... in the actual nft commands you have to say stuff like

add table inet openshift-ovn-kubernetes
add chain inet openshift-ovn-kubernetes mychain
add rule inet openshift-ovn-kubernetes mychain blah blah blah

but I didn't want to force people to have to include the family (inet) and table name (openshift-ovn-kubernetes) in every knftables Object, so I said you specify that at knftables.New() time and the values there get substituted in everywhere automatically.

Which means that the knftables equivalent of add table inet openshift-ovn-kubernetes is just tx.Add(&knftables.Table{}) because the family and table name are implied...

But yeah, this keeps being confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack thanks for the nice explanation I get it now!

go-controller/pkg/node/nftables/helpers.go Outdated Show resolved Hide resolved
go-controller/pkg/node/nftables/helpers.go Outdated Show resolved Hide resolved
@tssurya
Copy link
Contributor

tssurya commented Apr 18, 2024

so once we rename the commits to OCPHACK starter title and add some comments around (the node/pkg/nftables) utils is not used currently ; a skeleton has been added to allow us to go mod vendor that library so that it can be used in the pkg/cni etc we can merge this.
thanks Dan!

@danwinship
Copy link
Contributor Author

so once we rename the commits to OCPHACK starter title

We don't seem to do that anywhere else? We only use OCPHACK inside commits (which I've done now).

I modified the middle commit to remove the upstream bits we aren't using here, and noted in the commit message that it will probably eventually go away. (I guess we can try to prioritize getting at least one upstream nftables commit merged soon to minimize the number of rebases until then; in particular, sigs.k8s.io/knftables ends up right next to sigs.k8s.io/network-policy-api in the module/vendor files so there are going to be merge conflicts there...)

@tssurya
Copy link
Contributor

tssurya commented Apr 18, 2024

so once we rename the commits to OCPHACK starter title

We don't seem to do that anywhere else? We only use OCPHACK inside commits (which I've done now).

yea you are right I only see us doing it inside the commit message or code comments: bfe85db#diff-82f2da3587dee22da9d90bc8820bdd877c2f1e793e205f58c6c25030f58ed278R318
ack so let it be as is.

I modified the middle commit to remove the upstream bits we aren't using here, and noted in the commit message that it will probably eventually go away. (I guess we can try to prioritize getting at least one upstream nftables commit merged soon to minimize the number of rebases until then; in particular, sigs.k8s.io/knftables ends up right next to sigs.k8s.io/network-policy-api in the module/vendor files so there are going to be merge conflicts there...)

AH haha true I do plan to bring in ovn-org/ovn-kubernetes#4245 (#2132) which will conflict, but let's perhaps work together to control the ordering.

side-note something seems broken here images is not building?

@tssurya
Copy link
Contributor

tssurya commented Apr 18, 2024

CI is borked everywhere not just this PR

Copy link
Contributor

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

I want to wait for a green CI before I slap the official label so that we don't merge with broken CI.


"github.com/containernetworking/plugins/pkg/ns"
configv1 "github.com/openshift/api/config/v1"
"github.com/ovn-org/ovn-kubernetes/go-controller/pkg/config"

utilnet "k8s.io/utils/net"
"sigs.k8s.io/knftables"
Copy link
Contributor

@tssurya tssurya Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome so since OCP_HACKS is also under main go-controller binary this bit was enough to get our library vendored in I see deeming other things as not needed anymore.

@tssurya
Copy link
Contributor

tssurya commented Apr 18, 2024

/test images

@danwinship
Copy link
Contributor Author

/retest

1 similar comment
@tssurya
Copy link
Contributor

tssurya commented Apr 19, 2024

/retest

@danwinship
Copy link
Contributor Author

/retest-required

@tssurya
Copy link
Contributor

tssurya commented Apr 19, 2024

/test e2e-aws-live-migration-sdn-ovn-rollback
/test e2e-vsphere-ovn

Copy link
Contributor

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 19, 2024
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 2aad40a and 2 for PR HEAD 8e69dc1 in total

@tssurya
Copy link
Contributor

tssurya commented Apr 19, 2024

sh-5.1# nsenter -n -t 7138 nft list ruleset
table inet openshift-block-output {
        chain block {
                tcp dport { 22623, 22624 } tcp flags syn reject
                ip daddr 169.254.169.254 udp dport != 53 reject with icmp port-unreachable
                ip daddr 169.254.169.254 tcp dport != 53 reject with icmp port-unreachable
        }

        chain output {
                type filter hook output priority filter; policy accept;
                goto block
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
                goto block
        }
}
sh-5.1# 

self-reference

@tssurya
Copy link
Contributor

tssurya commented Apr 20, 2024

@danwinship could you rebase this one? the other conflicting PR got in...

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 20, 2024
(This will presumably eventually be replaced by a similar upstream
commit.)

Signed-off-by: Dan Winship <danwinship@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 20, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 20, 2024
Copy link
Contributor

openshift-ci bot commented Apr 21, 2024

@danwinship: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-windows c3413ff link true /test e2e-vsphere-windows
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway 8e69dc1 link false /test e2e-metal-ipi-ovn-dualstack-local-gateway
ci/prow/e2e-azure-ovn 90696a9 link false /test e2e-azure-ovn
ci/prow/security 90696a9 link false /test security
ci/prow/okd-e2e-gcp-ovn 90696a9 link false /test okd-e2e-gcp-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@tssurya
Copy link
Contributor

tssurya commented Apr 22, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 22, 2024
@tssurya
Copy link
Contributor

tssurya commented Apr 22, 2024

/test e2e-aws-ovn-windows

Copy link
Contributor

openshift-ci bot commented Apr 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit a7f0909 into openshift:master Apr 22, 2024
30 of 33 checks passed
@danwinship danwinship deleted the nftables-cni branch April 22, 2024 13:11
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-ovn-kubernetes-base-container-v4.16.0-202404221110.p0.ga7f0909.assembly.stream.el9 for distgit ovn-kubernetes-base.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants