New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDN-4458: Do per-pod MCS/metadata blocking with nftables rather than iptables #1946
SDN-4458: Do per-pod MCS/metadata blocking with nftables rather than iptables #1946
Conversation
6723010
to
2f71f50
Compare
/retest |
I think we should just try to upstream our MCS/metadata blocking stuff and hide it behind a config option that NVIDIA doesn't have to enable... otherwise looks pretty good to me. |
2f71f50
to
c3413ff
Compare
@danwinship: This pull request references SDN-4458 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
I could do that... but this is a terrible hack. We had a plan for a better fix long ago but it never happened. I guess the upstreamable version would be to allow the conf file to contain a list of ports and a list of IPs to block... |
So it seems like we're right on the verge of being able to replace this hack with an AdminNetworkPolicy... (Deny egress to |
/retest |
the code looks fine, same question as for openshift/sdn#581: do we have any tests to confirm it works properly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main nftables logic looks simple and good to me based on how the older iptables logic were.
I have some silly questions that are mostly for my own understanding.
dist/images/Dockerfile.fedora
Outdated
@@ -27,7 +27,7 @@ RUN INSTALL_PKGS=" \ | |||
python3-pip python3-pyyaml bind-utils procps-ng openssl numactl-libs firewalld-filesystem \ | |||
libpcap hostname kubernetes-client util-linux \ | |||
ovn ovn-central ovn-host python3-openvswitch tcpdump openvswitch-test python3-pyOpenSSL \ | |||
iptables iproute iputils strace socat koji \ | |||
iptables nftables iproute iputils strace socat koji \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to go in first into ovn-org/ovnk; I don't think we want to diverge too much from upstream right?
if this is a OCP ONLY commit the commit should be updated to OCP CARRY prefix and leave a comment here, I bet the next d/s merge of something will conflict with this change if we are going to diverge it might be good to have the OCP specific commit/comment
UPDATE: This was discussed in team meeting; eventually when we convert stuff into nftables this will also go upstream until then let's just rename these commits to prefix OCPHACK or something so that we know we need to carry them specially during conflicts if any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, the middle commit here originally got pulled out of ovn-org/ovn-kubernetes#3709... maybe I can tweak it some to be less conflict-y since that got deferred to 4.17...
COMMIT | ||
` | ||
func doNFTablesRules(platformType string) error { | ||
nft, err := knftables.New(knftables.InetFamily, "openshift-block-output") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we not using GetNFTablesHelper
if that is returning an interface? this part seems to be exactly doing what that func does right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, yeah, sorry, none of that code from upstream gets used here because it's all for ovnkube-node
, not for the CNI shim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you prefer to keep it that way? as in you don't want to use the utils for cni pkg at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not needed. The stuff that was in pkg/node/nftables
(which is no longer there in the current version of the commit) was just to make the nftables API look more like what the existing gateway init, etc, code was expecting based on the old iptables API, so I didn't have to rewrite as much stuff. But the CNI shim code wasn't making the same assumptions (and was much smaller anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack sg, thanks I can see all that is gone in the latest push
} | ||
|
||
tx := nft.NewTransaction() | ||
tx.Add(&knftables.Table{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for any names for the table? or is there some magic piece I am missing as to how the table name you pass is used to create this? for some reason I thought https://github.com/kubernetes-sigs/knftables/blob/b9d68958acd80a16a2b1d6a1ce650f38b3673b0f/nftables.go#L109 newInternal would actually create the table for you based on the provided value but its really just setting the context right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the API being weird... in the actual nft
commands you have to say stuff like
add table inet openshift-ovn-kubernetes
add chain inet openshift-ovn-kubernetes mychain
add rule inet openshift-ovn-kubernetes mychain blah blah blah
but I didn't want to force people to have to include the family (inet
) and table name (openshift-ovn-kubernetes
) in every knftables Object, so I said you specify that at knftables.New()
time and the values there get substituted in everywhere automatically.
Which means that the knftables equivalent of add table inet openshift-ovn-kubernetes
is just tx.Add(&knftables.Table{})
because the family and table name are implied...
But yeah, this keeps being confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack thanks for the nice explanation I get it now!
so once we rename the commits to OCPHACK starter title and add some comments around (the node/pkg/nftables) utils is not used currently ; a skeleton has been added to allow us to go mod vendor that library so that it can be used in the pkg/cni etc we can merge this. |
d6702d3
to
8e69dc1
Compare
We don't seem to do that anywhere else? We only use OCPHACK inside commits (which I've done now). I modified the middle commit to remove the upstream bits we aren't using here, and noted in the commit message that it will probably eventually go away. (I guess we can try to prioritize getting at least one upstream nftables commit merged soon to minimize the number of rebases until then; in particular, |
yea you are right I only see us doing it inside the commit message or code comments: bfe85db#diff-82f2da3587dee22da9d90bc8820bdd877c2f1e793e205f58c6c25030f58ed278R318
AH haha true I do plan to bring in ovn-org/ovn-kubernetes#4245 (#2132) which will conflict, but let's perhaps work together to control the ordering. side-note something seems broken here images is not building? |
CI is borked everywhere not just this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I want to wait for a green CI before I slap the official label so that we don't merge with broken CI.
|
||
"github.com/containernetworking/plugins/pkg/ns" | ||
configv1 "github.com/openshift/api/config/v1" | ||
"github.com/ovn-org/ovn-kubernetes/go-controller/pkg/config" | ||
|
||
utilnet "k8s.io/utils/net" | ||
"sigs.k8s.io/knftables" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome so since OCP_HACKS is also under main go-controller binary this bit was enough to get our library vendored in I see deeming other things as not needed anymore.
/test images |
/retest |
1 similar comment
/retest |
/retest-required |
/test e2e-aws-live-migration-sdn-ovn-rollback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
self-reference |
@danwinship could you rebase this one? the other conflicting PR got in... |
(This will presumably eventually be replaced by a similar upstream commit.) Signed-off-by: Dan Winship <danwinship@redhat.com>
8e69dc1
to
90696a9
Compare
@danwinship: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
/test e2e-aws-ovn-windows |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, tssurya The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
a7f0909
into
openshift:master
[ART PR BUILD NOTIFIER] This PR has been included in build ose-ovn-kubernetes-base-container-v4.16.0-202404221110.p0.ga7f0909.assembly.stream.el9 for distgit ovn-kubernetes-base. |
Along with warning about customers using iptables in pods (https://issues.redhat.com/browse/SDN-4114) we need to make sure that we don't use iptables in pods.
This pulls in one of the commits from ovn-org/ovn-kubernetes#3709 to get the knftables library. Other than a no-op cleanup call inherited from there, it doesn't modify ovnkube-node to use nftables in the host network namespace.
I changed it to just always create the IPv4-specific rules whether or not we support IPv4, because it was easy, and I wanted to keep the code in sync between ovn-k and openshift-sdn (openshift/sdn#581).