Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reopen the egress policy debate :) #28

Closed
jayunit100 opened this issue Feb 17, 2022 · 19 comments
Closed

reopen the egress policy debate :) #28

jayunit100 opened this issue Feb 17, 2022 · 19 comments
Assignees
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@jayunit100
Copy link
Contributor

Per sig network - the idea of mesh like features for choke points / egress policies, seems to have gotten a second wind.

maybe we should reopen #9

@rikatz :)

@rikatz
Copy link

rikatz commented Feb 17, 2022

just clarifying , I won't write a KEP until 2025 :P

@astoycos
Copy link
Member

/assign @astoycos

@astoycos
Copy link
Member

astoycos commented Mar 28, 2022

This came up in the sig-network-policy API meeting on Monday May 28th 2022 (I recommend watching the recording if you were not able to make it)

I will give a better summary of the discussion here in a bit to try and get the conversation going.

@astoycos
Copy link
Member

Originally the design for Admin Network Policy did include N/S use cases such as the one described in #9. This was removed due to some of the same issues NetworkPolicy faced with IPblock selectors (i.e The pre/post SNAT ambiguity, etc), the fact that the proposal was already extremely complicated and lastly that we believed designing for N/S use cases would be better completed in a separate/new object (something like an Egress Firewall object). Even now in the Admin Network Policy API PR we still explicitly leave the selection of N/S traffic out of the API.

However, there is some rational for re-opening this conversation while the implementation of the API is still active,

  1. Implementing rules that explicitly only reference E/W traffic may be inefficient and even impossible for some CNIs
    • Our existing rational for dealing with this issue was to push forward with v1alpha1, see if many CNIs have issues implementing a functionally correct/scalable solution, and if there are many issues revisit the N/S question in v1alpha2
  2. Based on the functionality of the existing networkPolicy resources (which include the ipBlock Selector), we believe it will be a bit "strange" for users to not be able to truly express similar desires at a cluster level with ANP. Specifically we forsee many users wanting to be able to completely block workloads from ALL traffic(N/S/E/W), while also allowing restricted egress access to specific destinations
  3. Creation of a separate object (such as EGFW) could take quite a bit of time, leaving users unable to satisfy all their use-cases in-tree... And if 1. actually occurs we would end up having to address N/S traffic selection in later versions of the ANP API anyways

The way I see it we have a few options moving forward,

  1. Do nothing, continue as planned, leaving N/S traffic selection out of the ANP API with the possibility of having to address it in later API versions
  2. Embrace a total isolation model in the ANP API, meaning if a workload is affected by a general Deny rule it applies to ALL (N/S/E/W) traffic, and add an egress traffic selector allowing rules to be applied which would allow the workload to send egress traffic to specific remote destinations. (this explicitly leaves out a method of selecting ingress traffic, since in-cluster workloads are not generally directly exposed to remote sources and other mechanisms, such as the gatewayAPI, can be used to control ingress traffic to workloads. This is also inline with many of the actual customer use-cases that exist today. )
  3. Leave the ANP API be for now and push hard to start the work on a new object to satisfy the additional N/S use cases

Please let me know what you all think!

@astoycos
Copy link
Member

/assign @danwinship @thockin @caseydavenport

@danwinship
Copy link
Contributor

Implementing rules that explicitly only reference E/W traffic may be inefficient and even impossible for some CNIs

This was brought up in the ANP KEP, but NetworkPolicy already requires plugins to be able to distinguish E/W from N/S traffic (namespaceSelector: {} selects all E/W traffic and no N/S traffic), so either (a) this is not a real issue, or (b) some plugins are already unable to implement NetworkPolicy correctly/efficiently.

I brought up in the meeting (or tried to bring up; I was having trouble describing it) that if this really is a problem, then one way to deal with it might be to say that plugins are allowed to get confused about hosts "near the cluster", but they aren't allowed to get confused about hosts "far away from the cluster". So, like, if your plugin allocates pod IPs from the same subnet as node IPs, then a rule saying "allow egress to all pods" might end up allowing egress to nodes too, but it could never allow egress to api.example.com). This is not ideal, but if it is a problem for you, then don't use a plugin that has this problem!

(Although that's also a bad example because one of the things I brought up in my old ClusterEgressFirewall proto-KEP that you linked to is that admins really want a way to reliably refer specifically to "all node IPs".)

@thockin
Copy link

thockin commented Apr 30, 2022

As discussed today, I suspected we would need to reopen this, but let's get v0 committed and then iterate on this

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 28, 2022
@astoycos
Copy link
Member

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 29, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 27, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 27, 2022
@astoycos
Copy link
Member

astoycos commented Jan 2, 2023

/remove-lifecycle stale

@astoycos
Copy link
Member

astoycos commented Jan 2, 2023

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 2, 2023
@tssurya
Copy link
Contributor

tssurya commented Mar 26, 2023

/assign @tssurya

I'd like to own and push this forward as there are a lot of users/customers wanting this support in ANP. What better time than to start doing this while the API is fairly new and in development. I also have additional downstream investment angle here as we are starting to implement this API in OVN-Kubernetes plugin.

I have opened #86 -> Let's start with the user stories and then go from there? But for starters I'd rather have this support in the same API Object (ANP/BANP) than do a new object.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 24, 2023
@tssurya
Copy link
Contributor

tssurya commented Jul 18, 2023

Closing this in favour of #126

@tssurya
Copy link
Contributor

tssurya commented Jul 18, 2023

/close

@k8s-ci-robot
Copy link
Contributor

@tssurya: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
Development

No branches or pull requests

9 participants