Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove IP Family from API types and cluster class builtin variables #7521

Open
killianmuldoon opened this issue Nov 8, 2022 · 11 comments
Open
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@killianmuldoon
Copy link
Contributor

This issue is for discussion on whether an "IPFamily" concept and restriction is useful and concretely definable in a way that works across Cluster API providers.

Since the introduction of dual stack networking in Cluster API - in particular its use in tests in CAPD - Cluster API has a definition of a valid "IPFamily" for a Cluster. This is an indication of whether the Pod and Service CIDR blocks are compatible with one another.

This concept is used in the Cluster Topology controller as "IPFamily" is available as a variable. It can be one of IPv4, IPv6 or Dualstack.

Cluster API calculates this value here:

func (c *Cluster) GetIPFamily() (ClusterIPFamily, error) {

It comes down to:

  1. If both pods and service CIDR blocks are IPv4, the Cluster IPFamily is IPv4
  2. If both pods and service CIDR blocks are IPv6, the Cluster IPFamily is IPv6
  3. If both pods CIDR entry is Dualstack, the Cluster IPFamily is Dualstack
  4. Otherwise if the pods and service CIDR blocks are not identical the IPFamily is invalid.

Case 4. causes the Cluster Topology controller to throw a reconciliation error today.

As far as I understand this check was introduced because this is the validation that the DockerLoadBalancer does today, but I'm not sure this is the correct concept for a Cluster IPFamily, or if such a concept is definable across infrastructure providers.

Kubernetes does not AFAIK have any such concept today, and you can create clusters with Kubeadm that violate the rules for Cluster API.

Some additional context and conversation around this can be found in: #7420. This change also (at time of writing) adds strict validation that the Cluster IP family must be valid in line with Cluster API's definition.

Is it necessary to define an "IPFamily" for CAPI Clusters? Is it useful to expose it e.g. as a variable for ClusterClass-based clusters?

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 8, 2022
@sbueringer
Copy link
Member

sbueringer commented Nov 9, 2022

It comes down to:

I think the summary is not correct. I think it is:

  1. If no CIDRS are set, the Cluster IPFamily is IPv4
  2. If only pod CIDR is set, the Cluster IPFamily is the one from the pod CIDR
  3. If only service CIDR is set, the Cluster IPFamily is the one from the service CIDR
  4. If pod CIDR is DualStack, the Cluster IPFamily is DualStack
  5. If both pods and service CIDR blocks are IPv4, the Cluster IPFamily is IPv4
  6. If both pods and service CIDR blocks are IPv6, the Cluster IPFamily is IPv6
  7. Otherwise if the pods and service CIDR blocks are not identical the IPFamily is invalid.

(could be that there are small mistakes there as well)

The following should be correct (but please double check):

Pod Service IPFamily
IPv4 - IPv4
IPv6 - IPv6
DualStack - DualStack
- - IPv4
- IPv4 IPv4
- IPv6 IPv6
- DualStack DualStack
IPv4 IPv4 IPv4
IPv6 IPv6 IPv6
DualStack DualStack DualStack
DualStack IPv4 DualStack
DualStack IPv6 DualStack
IPv4 DualStack Invalid
IPv6 DualStack Invalid
IPv6 IPv4 Invalid
IPv4 IPv6 Invalid

My opinion. If Kubernetes doesn't have a concept of Cluster IP family, we shouldn't have one. We should probably ask around a bit if there is such a concept.

@mcwumbly If you're around, do you know where we got this concept from or was it only for CAPD?

@neolit123
Copy link
Member

you could open a thread in #sig-network and link to this issue for additional feedback

@mcwumbly
Copy link
Contributor

My opinion. If Kubernetes doesn't have a concept of Cluster IP family, we shouldn't have one. We should probably ask around a bit if there is such a concept.

@mcwumbly If you're around, do you know where we got this concept from or was it only for CAPD?

I think most of the background for this decision is covered in this comment thread: #4558 (comment)

Toward the end of that thread, I linked to this sig-networking thread: https://groups.google.com/g/kubernetes-sig-network/c/S_64nw4BqoM/m/w64pw2ejAwAJ?pli=1

In summary, I think you're correct that this concept doesn't exist in Kubernetes itself, but that it was helpful to introduce to make a simpler user experience for creating clusters via ClusterAPI.

If there are needs to create or manage clusters via ClusterAPI that violate the assumptions made here, I could see a few ways forward (from ripping it out and doing something completely different, to introducing some "break the glass" ip family type, to just making small enhancements that keep the same constrained design).

@sbueringer
Copy link
Member

sbueringer commented Nov 21, 2022

Thx @mcwumbly. That's really helpful context, especially: #4558 (comment).

Essentially IPFamily is used in the following places:

  1. CAPD: to detect which IPFamily a cluster has to then setup listener addresses, sysctls, ... correctly
  2. Additionally we took the value of IPFamily and provided it as a builtin variable in ClusterClass

As we never had concrete demand for 2. and IPFamily at this point is only a CAPD-internal construct I would suggest to:

  • deprecate the Cluster.GetIPFamily func
  • deprecate the IPFamily builtin variable

After deprecation:

  • move the GetIPFamily func to a CAPD internal util

I think the way IPFamily is calculated definitely sounds reasonable, but in the absence of that concept in Kubernetes I would like to avoid exposing it. If we get clear requirements to expose something like that in the future we can re-introduce it based on those requirements.

The mailing list thread (https://groups.google.com/g/kubernetes-sig-network/c/S_64nw4BqoM/m/w64pw2ejAwAJ?pli=1) shows to me the complexity of the situation and thus I don't think it should be Cluster API that defines what the IPFamily of a Kubernetes cluster is. We only accidentally got into this situation because we took the IPFamily we added for CAPD and "promoted" it to a builtin variable.

@fabriziopandini WDYT?

@fabriziopandini
Copy link
Member

+1 to move this concept to something internal to CAPD.
Let's just make an inventory of what the flags influence today + advertise this change as usual

@sbueringer
Copy link
Member

sbueringer commented Nov 23, 2022

Let's just make an inventory of what the flags influence today

Outside of CAPD: only the builtin variable and we have no knowledge of who is using that

In CAPD: A lot of things and the information is propagated everywhere to implement IPv4/IPv6 correctly:

image

advertise this change as usual

I would deprecate by:

  • Bringing it up in Slack / the office hours
  • Adding Deprecated on the Cluster.GetIPFamily func as usual
  • Adding Deprecated to the IPFamily field in the builtin variable struct (which is just internal)
    • Additionally add the deprecation to our docs (book + proposal (?))

The question is then, can we drop the func & builtin variable one minor CAPI release or one apiVersion later?

@fabriziopandini
Copy link
Member

/triage accepted
/help

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/triage accepted
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 30, 2022
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 19, 2024
@fabriziopandini
Copy link
Member

/kind cleanup
/priority backlog

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Apr 11, 2024
@fabriziopandini
Copy link
Member

/assign
To re asses

@fabriziopandini fabriziopandini changed the title Does Cluster API define an IPFamily concept? Remove IP Family from API types and cluster class builtin variables May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

7 participants