Calculate IPv6 subnet CIDR based on cluster CIDR #11523

hakman · 2021-05-19T06:24:35Z

Inspired by #11163.

This PR makes it possible to set subnet CIDR based on VPC CIDR on cluster creation similarly to:

There is some feedback from @johngmyers in #11442 (comment), which suggests using a cidrsubnet(8,1) syntax instead of a cidrsubnet-8-1 syntax for configuring a subnet.
The reason I chose the cidrsubnet-8-1 is to maybe allow using this as a flag on create cluster and that it seems similar to other places, like https://kops.sigs.k8s.io/cluster_spec/#egress.

/cc @justinsb @johngmyers @rifelpet

upup/pkg/fi/cloudup/awstasks/subnet.go

upup/pkg/fi/utils/cidr.go

hakman · 2021-05-19T14:40:46Z

/hold for more feedback on approach

johngmyers · 2021-05-19T15:38:20Z

egress takes AWS resource IDs, whereas this is a function. I would like something that looks more like a function than an identifier.

hakman · 2021-05-19T15:44:12Z

egress takes AWS resource IDs, whereas this is a function. I would like something that looks more like a function than an identifier.

True. Still, exposing this as a function would be something totally new.
I am not opposed to this change, just think it would be best to agree which way to go during office hours.

upup/pkg/fi/cloudup/awstasks/subnet.go

hakman · 2021-05-20T05:04:49Z

One other idea coming from @rifelpet would be to use a struct field to describe this:

type IPv6CIDRSubnet struct {
    NewBits int
    NetNum int
}

justinsb · 2021-05-23T17:12:09Z

Could we use a syntax like ipv6CIDR: /64 on the subnet - for when the user wants a /64 but doesn't much care which one.

I figure if we have use cases for more precise allocation in future we can add those.

I did a prototype #11579 and the experience feels intuitive. (My prototype uses "::/64" so it's a valid CIDR, but that's not a great experience because kops edit keeps removing the (required) quotes around that).

We'd have to figure out how we assign to terraform consistently. We could do something like

0:0:0:1::/64 through 0:0:0:ff::/64; that indicates the bits we care about.

I think it's reasonable to require this specification for terraform. It would be nice if kops create cluster auto assigned those to subnets in ipv6 mode. We could then also require that for direct mode, though I'm not wild about the idea of making kOps less user-friendly because of terraform limitations.

justinsb · 2021-05-23T17:24:14Z

Another possible syntax for absolute positioning: /64#3 meaning "subnet of size 64, number 3". That's close to what users care about - I think - and in keeping with the easier syntax /64.

(I don't know if # is ever meaningful in IPv6 and we should choose another character. % is scope so I think we should avoid that.)

hakman · 2021-05-23T18:22:04Z

I like the /64#3 or ::/64#3 notations.
Regarding auto-assignment, I would be happy to do it in create cluster only, where this would be done based on simple subnet number. I would prefer not to add a mechanism for auto-assignment for existing clusters. On those, people can decide on these values themselves.

johngmyers · 2021-05-23T19:51:55Z

I don't see any particular reason not to support auto-assignment for existing clusters. Why would would we require admins of existing clusters to manually do this themselves?

What's the problem with Terraform? Can't we query the existing state of the subnets in order to specify that state in the configuration when it exists?

hakman · 2021-05-23T20:53:38Z

The reason I see not to support auto-assignment after cluster creation in the complications in logic for something that is not needed to make IPv6 work. It is nice to have, but would gladly live without it for now.
We already have differences between how things are done in Direct, CF and TF because of how settings are implemented of not there.

hakman · 2021-05-30T08:55:14Z

/hold cancel

hakman · 2021-06-01T04:35:15Z

Final implementations uses the /64#3 notation for IPv6 subnet, where 64 is a decimal value and 3 is a hex value (as discussed during Office Hours.

pkg/apis/kops/validation/validation.go

upup/pkg/fi/utils/cidr.go

johngmyers · 2021-06-02T04:12:38Z

upup/pkg/fi/utils/cidr.go

+
+// ParseCidrSubnet parses a string in the format "/<newSize>#<netNum>"
+// and returns the values of <newSize> and <netNum>.
+func ParseCidrSubnet(subnet string) (int, int64, error) {


While cidrsubnet() is the Terraform name of a related function, I think a name more like ParseGeneratedCIDR() would be more appropriate. The Terraform function takes as input a CIDR and subnets it, whereas this thing takes the implicit generated block and carves out a subnet of it. The term "subnet" is problematic because it could be confused with the Subnet object that this is being used to fill the IPv6CIDR field in.

Well, "generated" is not ideal either. It is just a notation.
When using networking as the context, "subnet" has a broader meaning.
I would see it more as a ParseSubnetNotation() name.

Yeah, it's not necessarily generated. It's cut out from the VPC's singular CIDR.

(except VPCs, especially shared ones, can technically have multiple CIDRs.)

Does ParseCIDRNotation() sound better?

pkg/apis/kops/validation/validation.go

johngmyers · 2021-06-02T04:41:52Z

upup/pkg/fi/cloudup/awstasks/subnet.go

@@ -98,6 +99,14 @@ func (e *Subnet) Find(c *fi.Context) (*Subnet, error) {
 		}

 		actual.IPv6CIDR = association.Ipv6CidrBlock
+		if e.IPv6CIDR != nil && strings.HasPrefix(aws.StringValue(e.IPv6CIDR), "/") {
+			subnetIPv6CIDR, err := calculateSubnetCIDR(e.VPC.IPv6CIDR, e.IPv6CIDR)


It looks like there's a corner case that isn't handled: on the first run after a kOps upgrade, when the VPC and Subnets already exist, but there is not yet an associated Amazon-provided IPv6 association.

I don't see VPCAmazonIPv6CIDRBlock.RenderAWS() writing to its e.VPC.IPv6CIDR and I don't see any dependency of the Subnet task on the VPCAmazonIPv6CIDRBlock task.

(I do find it surprising that Find() methods write to e. And if they depend on values set by the Task.Run() methods of their dependencies, how is that going to work with TargetDryRun?

VPCAmazonIPv6CIDRBlock.RenderAWS() cannot write the CIDR, because the request doesn't actually return which block it associates. Because of that, there is no dependency and Subnet.RenderAWS() waits for the association to be done, using fi.NewTryAgainLaterError().

I agree that Find() writing to e is awkward, but it was that or writing the e.IPv6CIDR notation to a.IPv6CIDR.
Also makes it easier later in FindDeletions(). I could change that to set a.IPv6CIDR = e.IPv6CIDR, if it makes things less awkward.

I think Find() writing to e is okay, I'm just having to figure out what the rules and model are. It seems Find() can refine the expected based on existing state, including state produced by tasks listed as dependencies. But it has to deal with those dependencies not running due to TargetDryRun.

There's an opportunity here to improve the developer documentation.

This could be used in the future to support a /64 notation where the particular sub-block doesn't need to be specified.

But there is a dependency on VPCAmazonIPv6CIDRBlock as the association won't happen unless that task is run. And there shouldn't be two blocks of code when one would do, especially not if the two blocks of code implement different semantics.

upup/pkg/fi/cloudup/awstasks/subnet.go

johngmyers · 2021-06-02T05:08:21Z

upup/pkg/fi/cloudup/awstasks/subnet.go

@@ -196,6 +205,22 @@ func (_ *Subnet) RenderAWS(t *awsup.AWSAPITarget, a, e, changes *Subnet) error {
 		}
 	}

+	if e.IPv6CIDR != nil && strings.HasPrefix(aws.StringValue(e.IPv6CIDR), "/") {


A comment would be helpful to indicate this handles the case where the VPC's IPv6CIDR was not known at the time e.Find() is run.

And actually, e.Find() should NewTryAgainLaterError until it can get the association. Otherwise the code path is timing-dependent.

This correct here, as it handles the case of TargetDryRun. We only care about the exact CIDR at render time, when we know that the VPC will eventually get and IPv6 CIDR.

TargetDryRun will never call RenderAWS(), so that case will never happen.

Find() has nothing to do with VPC.IPv6CIDR. It just needs to find the current state of the subnet.
If by the time it gets to RenderAWS() is still missing the VPC.IPv6CIDR, NewTryAgainLaterError waits for it.

Find() is using VPC.IPv6CIDR to get the block to make allocations from.

Find() and RenderAWS() are using different rules for making the allocation and which one gets run depends on timing.

johngmyers · 2021-06-02T05:09:22Z

upup/pkg/fi/cloudup/awstasks/subnet.go

@@ -196,6 +205,22 @@ func (_ *Subnet) RenderAWS(t *awsup.AWSAPITarget, a, e, changes *Subnet) error {
 		}
 	}

+	if e.IPv6CIDR != nil && strings.HasPrefix(aws.StringValue(e.IPv6CIDR), "/") {
+		vpcIPv6CIDR, err := findVPCIPv6CIDR(t.Cloud, e.VPC.ID)


Why limit this to Amazon-provided IPv6 associations? Why not just use e.VPC.IPv6CIDR?

I cannot test with customer owned CIDRs. The AWS docs regarding IPv6 are not exactly clear either.
So, for now, this is meant to only support AWS owned CIDRs, where I know that can be exactly 1 per subnet with a /64 size and from exactly 1 bloc of /56 size.

If there's a customer-specified VPC.IPv6CIDR, the intended behavior is obvious: take the slice off of the specified VPC IPv6CIDR.

Also, it's obvious that it means extra work, that is unrelated. For now, this is experimental, with limitations.

You could test using a shared VPC with a site-local IPv6 CIDR association.

It make sense to support only AWS-provided CIDRs in managed VPCs, especially for egress. But we shouldn't design out the shared VPC case. There's already code that pulls from VPC.IPv6CIDR; we should be consistent about using that when it's available.

johngmyers · 2021-06-02T05:10:37Z

upup/pkg/fi/cloudup/awstasks/subnet.go

@@ -196,6 +205,22 @@ func (_ *Subnet) RenderAWS(t *awsup.AWSAPITarget, a, e, changes *Subnet) error {
 		}
 	}

+	if e.IPv6CIDR != nil && strings.HasPrefix(aws.StringValue(e.IPv6CIDR), "/") {
+		vpcIPv6CIDR, err := findVPCIPv6CIDR(t.Cloud, e.VPC.ID)


There's no dependency on the VPCAmazonIPv6CIDRBlock task, so RunTasks might schedule the VPCAmazonIPv6CIDRBlock task in a later slot.

There is no value in the dependency, as VPCAmazonIPv6CIDRBlock will not return the associated block.
This is the reason for using NewTryAgainLaterError in Subnet.RenderAWS.

But unless VPCAmazonIPv6CIDRBlock.Run() is called, the IPv6 association won't ever arrive. If there's no dependency, RunTasks might schedule VPCAmazonIPv6CIDRBlock to a later tranche than Subnet.

The IPv6 association won't ever arriveThere is a dependency in VPC, so it will arrive.

There is a dependency in VPC, so it will arrive, sooner or later.

RunTasks might schedule VPCAmazonIPv6CIDRBlock to a later tranche than Subnet.

This is why NewTryAgainLaterError waits for it to appear.

It won't appear due to how RunTasks schedules tasks. If tasks b and c depend on a, d depends on b, and e depends on c, the scheduler effectively puts false dependencies of d depending on c and e depending on b.

johngmyers · 2021-06-02T05:32:16Z

We should support this syntax on the Subnet.CIDR field, but that can be a separate PR.

johngmyers · 2021-06-02T06:35:22Z

upup/pkg/fi/cloudup/awstasks/subnet.go

+			return err
+		}
+		if vpcIPv6CIDR == nil {
+			return fi.NewTryAgainLaterError("waiting for the VPC IPv6 CIDR to be assigned")


It's also possible it's a shared VPC with no IPv6 allocation forthcoming.

In this case, there will be an error from VPCAmazonIPv6CIDRBlock that says: "IPv6 CIDR block provided by Amazon not found".

johngmyers · 2021-06-02T16:00:25Z

I think we might need to make significant changes to subnet_test.go to exercise all of this.

johngmyers · 2021-06-06T00:16:36Z

I believe there are two outstanding issues:

Missing dependency from Subnet to VPCAmazonIPv6CIDRBlock.
Looking up the VPC's IPv6 association in one place (Find()) or two (Find() and RenderAWS()).

For issue (1):

RunTasks works by scheduling Tasks into tranches. The first tranche is tasks that have no dependencies. The second tranche is tasks that depend only on tasks in the first tranche. The third tranche is tasks that depend only on tasks in either the first or second tranche. And so on.

It then runs all the tasks in a given tranche in parallel. It waits for all tasks in a tranche to complete (without retries) before running any tasks in the next tranche.

Subnet depends only on VPC. So it will be in the tranche immediately following the one that VPC is in. VPCAmazonIPv6CIDRBlock currently depends only on VPC, so currently is scheduled in the same tranche as Subnet. So the two tasks race.

If there were a subsequent change that required VPCAmazonIPv6CIDRBlock to depend on an intermediate task that in turn depended on VPC, then it would then be scheduled in a tranche after Subnet. That means it would never run until all Subnet tasks completed. So if Subnet were to wait for an association that it is expecting VPCAmazonIPv6CIDRBlock to initiate, it would wait forever.

There also might be a race. If Subnet.RenderAWS() calls findVPCIPv6CIDR() before VPCAmazonIPv6CIDRBlock.RenderAWS() runs, it won't find an IPv6 association in progress, so might return a permanent error instead of a TryAgainLaterError.

I'll write up issue (2) later.

hakman · 2021-06-09T06:09:43Z

@johngmyers I addressed (1).

hakman · 2021-06-10T04:08:55Z

At the moment, task order due to dependencies is the following:

VPC (no dependency)
VPCAmazonIPv6CIDR (depends on VPC)
Subnet (depends on VPC and VPCAmazonIPv6CIDR)

In the subnet Find() function, we have the following case with regards to IPv6:

VPC doesn't exists -> No subnet will exist so no CIDR calculation is needed
VPC exists but doesn't have IPv6 CIDR -> No subnet will have a IPv6 CIDR so no calculation is needed
VPC exists and has IPv6 CIDR -> VPC.IPv6CIDR will be available for calculation.

In the subnet RenderAWS() function we have the following case with regards to IPv6:

VPC doesn't exists -> It is a dependency, VPC will always exist when it runs
VPC exists but doesn't have IPv6 CIDR -> it will retry until the CIDR allocated by VPCAmazonIPv6CIDR is available.
VPC exists and has IPv6 CIDR -> VPC.IPv6CIDR will be available for calculation and `findVPCIPv6CIDR() will not be needed.

@johngmyers Do you see any possible race condition anymore?

johngmyers · 2021-06-10T04:33:05Z

The initial look seems quite promising. It will likely be 24 hours before I can give it a proper review.

hakman · 2021-06-10T04:41:36Z

The initial look seems quite promising. It will likely be 24 hours before I can give it a proper review.

Thanks, no rush.

johngmyers · 2021-06-11T03:14:35Z

upup/pkg/fi/cloudup/awstasks/subnet_test.go

+		checkNoChanges(t, cloud, allTasks)
+	}
+}
+


Missing a test for the case where we have to wait for VPCAmazonIPv6CIDRBlock

k8s-ci-robot · 2021-06-11T03:16:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johngmyers

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [johngmyers]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

johngmyers · 2021-06-11T04:00:29Z

/retest

k8s-ci-robot requested review from johngmyers, justinsb and rifelpet May 19, 2021 06:24

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/api area/provider/aws Issues or PRs related to aws provider labels May 19, 2021

hakman requested a review from olemarkus May 19, 2021 06:27

hakman mentioned this pull request May 19, 2021

IPv6 support #8432

Closed

johngmyers requested changes May 19, 2021

View reviewed changes

upup/pkg/fi/cloudup/awstasks/subnet.go Outdated Show resolved Hide resolved

upup/pkg/fi/cloudup/awstasks/subnet.go Outdated Show resolved Hide resolved

upup/pkg/fi/utils/cidr.go Outdated Show resolved Hide resolved

k8s-ci-robot assigned johngmyers May 19, 2021

hakman force-pushed the ipv6_cidr_subnet branch from 2e4acc3 to 9bcf828 Compare May 19, 2021 12:58

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 19, 2021

johngmyers added the kind/office-hours label May 19, 2021

rifelpet reviewed May 19, 2021

View reviewed changes

upup/pkg/fi/cloudup/awstasks/subnet.go Show resolved Hide resolved

hakman force-pushed the ipv6_cidr_subnet branch from 9bcf828 to 0d5faf5 Compare May 30, 2021 08:48

hakman mentioned this pull request May 30, 2021

Add --ipv6 experimental cli flag #11629

Merged

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 30, 2021

johngmyers requested changes Jun 2, 2021

View reviewed changes

johngmyers reviewed Jun 2, 2021

View reviewed changes

hakman force-pushed the ipv6_cidr_subnet branch from 2b40628 to c27c449 Compare June 9, 2021 06:34

hakman added 2 commits June 9, 2021 09:57

Calculate IPv6 subnet CIDR based on cluster CIDR

13104ec

Add Subnet dependency on VPCAmazonIPv6CIDRBlock

9926869

hakman force-pushed the ipv6_cidr_subnet branch from c27c449 to 9926869 Compare June 9, 2021 06:58

Check that VPC IPv6 CIDR exist before calculating the subnet CIDR

4fc11db

hakman force-pushed the ipv6_cidr_subnet branch from e08e681 to 4fc11db Compare June 10, 2021 04:07

Rename ParseCidrSubnet() to ParseCIDRNotation()

9cf7039

johngmyers self-requested a review June 10, 2021 04:29

Calculate expected IPv6 CIDR if possible and in the "/64#N" format

eca0491

johngmyers approved these changes Jun 11, 2021

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2021

k8s-ci-robot merged commit 92af7b8 into kubernetes:master Jun 11, 2021

k8s-ci-robot added this to the v1.22 milestone Jun 11, 2021

johngmyers mentioned this pull request Jun 11, 2021

WIP: auto assign cidrs #11579

Closed

hakman deleted the ipv6_cidr_subnet branch October 22, 2021 07:13

Calculate IPv6 subnet CIDR based on cluster CIDR #11523

Calculate IPv6 subnet CIDR based on cluster CIDR #11523

Conversation

hakman commented May 19, 2021

hakman commented May 19, 2021

johngmyers commented May 19, 2021

hakman commented May 19, 2021

hakman commented May 20, 2021

justinsb commented May 23, 2021

justinsb commented May 23, 2021

hakman commented May 23, 2021

johngmyers commented May 23, 2021

hakman commented May 23, 2021

hakman commented May 30, 2021

hakman commented Jun 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johngmyers commented Jun 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johngmyers commented Jun 2, 2021

johngmyers commented Jun 6, 2021

hakman commented Jun 9, 2021

hakman commented Jun 10, 2021

johngmyers commented Jun 10, 2021

hakman commented Jun 10, 2021

Choose a reason for hiding this comment

k8s-ci-robot commented Jun 11, 2021

johngmyers commented Jun 11, 2021