Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow unknown protocols #49

Closed
wants to merge 3 commits into from
Closed

Conversation

aojea
Copy link
Contributor

@aojea aojea commented Jul 2, 2024

Unknown protocols are not subject to Kubernetes policies, hence they must be allowed

If the receiving pod is local, then ipvs sends packet via the output hook (not forward). So both are needed

Fixes #46

Co-Authored-By: Lars Ekman uablrek@gmail.com @uablrek

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 2, 2024
@k8s-ci-robot k8s-ci-robot requested a review from thockin July 2, 2024 10:10
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 2, 2024
@aojea
Copy link
Contributor Author

aojea commented Jul 2, 2024

Also check if the IPv6 failure in #47 are related to the output chain is something environmental

cc: @uablrek

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 2, 2024
@aojea
Copy link
Contributor Author

aojea commented Jul 2, 2024

ok, the test passes here, so is the output rule

/hold

I don't have clear what is the expected behavior kubernetes-sigs/network-policy-api#187 (comment) regarding protocol

/assign @danwinship

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 2, 2024
Comment on lines +662 to +666
tx.Add(&knftables.Rule{
Chain: chainName,
Rule: knftables.Concat(
"oif", "lo", "accept"),
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uablrek can you check in your PR this solves the problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this update, but the address matches should never hit anything with "lo" since such packet would be a "martian"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked! My world is crumbling! My reasoning is that since the set's only contains pod-addresses, packets matching whose sets can't be routed to/from "lo", since that would make them martians.

Unless... a SatefulSet pod actually have a loopback address as pod-address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astoycos @tssurya do you have an explanation for this failurs on the conformance suite only with ipv6?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modified chain for reference:

        chain output {
                type filter hook output priority filter - 5; policy accept;
                ct state established,related accept
                oif "lo" accept
                ip saddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
                ip daddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
                ip6 saddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
                ip6 daddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
        }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule definitely seems wrong.

I think sometimes packets are considered by the kernel to be traversing lo if they have the same local source and destination IP, even if that IP isn't 127.0.0.1/::1?

Anyway, the failure here is probably caused by the fact that in ipvs, packets traverse the network stack twice (from source to ipvs0 and then from ipvs0 to destination), and the DNAT and SNAT don't happen in exactly the same place as with iptables/nftables kube-proxy. (The DNAT happens inside ipvs0... I forget if the SNAT happens in the first or second traversal.)

Anyway, point is, the iptables/nftables logic is going to end up doing the wrong thing for some packets in ipvs mode. The fix might be to ignore packets with oif = "ipvs0" so you only look at them on the second traversal, but I'm really not sure about that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know Calico needs to explicitly know whether you're using iptables or ipvs, so it's possible we actually need different logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this rule fixes the IPv6 jobs, ipvs is unrelated here

Copy link
Contributor Author

@aojea aojea Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sometimes packets are considered by the kernel to be traversing lo if they have the same local source and destination IP, even if that IP isn't 127.0.0.1/::1?

So, what I want to express is "we only need to process in OUTPUT local generated packets that are sent outside (not localhost dest)"

image

I had this issue in the past with kubelet probes and we solved it using the lo rule, I think it is correct but I can not explain the science behind it, just the evidence it works as expected, any packet destined to localhost does not have to be processed by network policies anyway, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this rule fixes the IPv6 jobs, ipvs is unrelated here

oh... so... the issue is that if we add processing on the output hook, then IPv6 (but not IPv4) breaks unless we also add the ! lo rule?

So, what I want to express is "we only need to process in OUTPUT local generated packets that are sent outside (not localhost dest)"

Why should it matter though? We shouldn't be blocking any packets that are destined for localhost. With the podips changes, we shouldn't even intercept them...

I can not explain the science behind it

boo!

any packet destined to localhost does not have to be processed by network policies anyway, right?

A packet from a pod-network pod addressed to 127.0.0.0/8 or ::1 would not leave the pod netns, and it's not possible (by validation) to have an EndpointSlice that points to 127.0.0.0/8 or ::1, so you can't end up with that via Service DNAT either. But you can add non-localhost IPs to lo I think?

klog.Infof("Can not process packet %d applying default policy (failOpen: %v): %v", *a.PacketID, c.config.FailOpen, err)
c.nfq.SetVerdict(*a.PacketID, verdict) //nolint:errcheck
klog.Infof("Can not process packet %d, allowing ... : %v", *a.PacketID, err)
c.nfq.SetVerdict(*a.PacketID, nfqueue.NfAccept) //nolint:errcheck

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is really correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means this packet is not TCP or UDP or SCTP

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it means "a parsing error occurred". I'm not sure we want to say all parse errors should be "accepts".

(I'm not sure we don't want that either...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it can only happens if is not ip or ipv6 or one of those protocols ... but indeed is confusing the way it is now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All packets with unknown protocol (not TCP or UDP or SCTP) must be accepted. Unfortunately there is no v1.ProtocolUnknown constant, but we can type-cast:

	default:
		//return t, fmt.Errorf("unknown protocol %d", protocol)
		t.proto = v1.Protocol("unknown")

but that feels a bit hackish...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be ok to just store the proto as an int in struct packet, and let the caller match on "syscall.IPPROTO_*". IMO that's the most flexible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that it doesn't seem right to say that every possible type of error from parsePacket should result in the packet being accepted. IMO it would be better to have parsePacket return both an error and a verdict, even if that verdict happens to be "accept" for all of the errors it currently catches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some investigation, I agree: all unrecognized packets shall not be accepted (#51 (comment)). However, I think parsePacket() should not set a verdict. It should parse the packet and let the caller decide, but then is can't return TCP,UDP,SCTP or an error for everything else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be ok to just store the proto as an int in struct packet, and let the caller match on "syscall.IPPROTO_*". IMO that's the most flexible.

that sounds like the best idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have experimented some with this, and I think the best is to keep packet.proto as v1.Protocol, and add a new "ipproto int" in the packet structure.

An IPPROTO_ROUTING extension header, or extension header out-of-order will return a

var ErrorExtensionHeader = fmt.Errorf("unsupported extension header")

@@ -73,7 +73,7 @@ func parsePacket(b []byte) (packet, error) {
t.proto = v1.ProtocolSCTP
dataOffset = hdrlen + 8
default:
return t, fmt.Errorf("unknown protocol %d", protocol)
return t, fmt.Errorf("IP family %s unknown transport protocol %d", t.family, protocol)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want to log an error for every ICMPv6 message. You should add a case 58: for that.
Probably parsePacket needs another return value for "not an error, but just accept the packet".

(It seems that your callback only gets called for IP and IPv6 protocol packets, right? And, in particular, not ARP and ICMP packets, which are separate L3 protocols, unlike ICMPv6/NDP, which are subprotocols of IPv6.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the next commit, only TCP , UDP and SCTP packets are sent to the userspace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment I think all fragments will also have an unknown protocol. But I am about to fix that (#15)

Copy link
Contributor

@uablrek uablrek Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the next commit, only TCP , UDP and SCTP packets are sent to the userspace

That may fix the fragment problem. The kernel will reassemble packets before sending them to user-space (perhaps it already does, in which case #15 isn't a kind/bug after all).

aojea and others added 3 commits July 2, 2024 14:57
Since network policies are only defined for L4 protocols and is not
well defined what happens with the protocols not supported for
Kubernetes, use the less surprised mode for users and allow any unknown
protocol for Kubernetes

Co-Authored-By: Lars Ekman <uablrek@gmail.com>
avoid to send to user spaces the protocols that we are going accept
anyway.

Co-Authored-By: Lars Ekman <uablrek@gmail.com>
Accept the packets destined to the loopback interface
in the output hook, because the job for IPv6 and admin network
policies fails, still not clear why, however, there seems
to be no reason to process a packet in OUTPUT to this interface.

Co-Authored-By: Lars Ekman <uablrek@gmail.com>
@aojea
Copy link
Contributor Author

aojea commented Jul 2, 2024

/hold cancel

@uablrek I've added you as co author and incorporated your commits with some amendments

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 2, 2024
tx.Add(&knftables.Rule{
Chain: chainName,
Rule: knftables.Concat(
"meta", "l4proto", "!=", "{ tcp, udp, sctp }", "accept"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There's no real reason to use knftables.Concat here. You can just write it as a single string.)

@aojea aojea added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 3, 2024
@aojea
Copy link
Contributor Author

aojea commented Jul 3, 2024

So, thinking as an user, I think the best behavior will be that if I don't define any protocol in a network policy I apply the existing policy rules ... we can do this just not returning an error during packet processing , it will return an empty string for the protocol field and zero for src and destination port, and it will fail to match if there is a rule specifying a protocol

@aojea
Copy link
Contributor Author

aojea commented Jul 9, 2024

So, it seems there are two AI:

  1. figure out why ipv6 fails allow unknown protocols #49 (comment)
  2. define the behavior on unrecognized packets allow unknown protocols #49 (comment)

@danwinship
Copy link

is there any part of this or #47 that can be merged as-is?

@uablrek
Copy link
Contributor

uablrek commented Jul 9, 2024

is there any part of this or #47 that can be merged as-is?

IMO #47 can wait. We know the consequences (proxy-mode=ipvs doesn't work). I plan to return the IPPROTO number and reject some protocols, like packets with IPv6 routing headers, so I am unsure about this PR also.

@uablrek
Copy link
Contributor

uablrek commented Jul 9, 2024

I think accepting too much (all unknown) may be a security issue in the IPv6 extension header case. For instance if we miss some combination of EH's that are valid TCP, we may accept TCP traffic that should be blocked.

Better be restrictive, and for instance reject packets with EH's in a non-recommended order.

@uablrek
Copy link
Contributor

uablrek commented Jul 10, 2024

Scratch #49 (comment).

Please merge this PR as-is an close #47. I'll rebase #51 if needed, then that PR should be next in line IMO. I have prepared a commit on top of #51 to handle EH's out-of-order, and routing-EH (#49 (comment))

@danwinship
Copy link

OK, so thinking about this some more, if a pod is unprivileged and does not have CAP_NET_ADMIN or CAP_NET_RAW, then it can't send anything except IP. Given that, and the fact that we want to avoid accidentally blocking ICMP reply packets (eg destination unreachable), I think we should not make any effort to ever block ICMP.

So, I'd say, we should keep the meta l4proto != { tcp, udp, sctp } accept rule and just not worry about anything else.

So that solves the "unrecognized protocol" problem and leaves:

  • fragments, which I think we think isn't a problem if we have the meta l4proto rule? We should figure out how to test this...
  • parse errors, which probably shouldn't happen, but I think we should treat as "deny", and log errors about, and see if they ever actually happen
  • IPv6 extensions, which I believe we are intentionally punting to Parse IPv6 extension headers #51 and not worrying about in this patch

Right?

@aojea
Copy link
Contributor Author

aojea commented Jul 17, 2024

OK, so thinking about this some more, if a pod is unprivileged and does not have CAP_NET_ADMIN or CAP_NET_RAW, then it can't send anything except IP. Given that, and the fact that we want to avoid accidentally blocking ICMP reply packets (eg destination unreachable), I think we should not make any effort to ever block ICMP.

So, I'd say, we should keep the meta l4proto != { tcp, udp, sctp } accept rule and just not worry about anything else.

So that solves the "unrecognized protocol" problem and leaves:

  • fragments, which I think we think isn't a problem if we have the meta l4proto rule? We should figure out how to test this...
  • parse errors, which probably shouldn't happen, but I think we should treat as "deny", and log errors about, and see if they ever actually happen
  • IPv6 extensions, which I believe we are intentionally punting to Parse IPv6 extension headers #51 and not worrying about in this patch

Right?

ICMP packets can be evaluated by network policies if no Port fields are set #51 (comment)

I'm going to close this to avoid more noise, once we have the parser done, I want to debug the OUTPUT chain problem separetly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Doesn't work with proxy-mode=ipvs
4 participants