Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate IPv6 subnet CIDR based on cluster CIDR #11523

Merged
merged 5 commits into from
Jun 11, 2021

Conversation

hakman
Copy link
Member

@hakman hakman commented May 19, 2021

Ref: #8432 (comment)

Inspired by #11163.

This PR makes it possible to set subnet CIDR based on VPC CIDR on cluster creation similarly to:

There is some feedback from @johngmyers in #11442 (comment), which suggests using a cidrsubnet(8,1) syntax instead of a cidrsubnet-8-1 syntax for configuring a subnet.
The reason I chose the cidrsubnet-8-1 is to maybe allow using this as a flag on create cluster and that it seems similar to other places, like https://kops.sigs.k8s.io/cluster_spec/#egress.

/cc @justinsb @johngmyers @rifelpet

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/api area/provider/aws Issues or PRs related to aws provider labels May 19, 2021
@hakman hakman requested a review from olemarkus May 19, 2021 06:27
@hakman hakman mentioned this pull request May 19, 2021
upup/pkg/fi/cloudup/awstasks/subnet.go Outdated Show resolved Hide resolved
upup/pkg/fi/cloudup/awstasks/subnet.go Outdated Show resolved Hide resolved
upup/pkg/fi/utils/cidr.go Outdated Show resolved Hide resolved
@hakman
Copy link
Member Author

hakman commented May 19, 2021

/hold for more feedback on approach

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 19, 2021
@johngmyers
Copy link
Member

egress takes AWS resource IDs, whereas this is a function. I would like something that looks more like a function than an identifier.

@hakman
Copy link
Member Author

hakman commented May 19, 2021

egress takes AWS resource IDs, whereas this is a function. I would like something that looks more like a function than an identifier.

True. Still, exposing this as a function would be something totally new.
I am not opposed to this change, just think it would be best to agree which way to go during office hours.

@hakman
Copy link
Member Author

hakman commented May 20, 2021

One other idea coming from @rifelpet would be to use a struct field to describe this:

type IPv6CIDRSubnet struct {
    NewBits int
    NetNum int
}

@justinsb
Copy link
Member

Could we use a syntax like ipv6CIDR: /64 on the subnet - for when the user wants a /64 but doesn't much care which one.

I figure if we have use cases for more precise allocation in future we can add those.

I did a prototype #11579 and the experience feels intuitive. (My prototype uses "::/64" so it's a valid CIDR, but that's not a great experience because kops edit keeps removing the (required) quotes around that).

We'd have to figure out how we assign to terraform consistently. We could do something like

0:0:0:1::/64 through 0:0:0:ff::/64; that indicates the bits we care about.

I think it's reasonable to require this specification for terraform. It would be nice if kops create cluster auto assigned those to subnets in ipv6 mode. We could then also require that for direct mode, though I'm not wild about the idea of making kOps less user-friendly because of terraform limitations.

@justinsb
Copy link
Member

Another possible syntax for absolute positioning: /64#3 meaning "subnet of size 64, number 3". That's close to what users care about - I think - and in keeping with the easier syntax /64.

(I don't know if # is ever meaningful in IPv6 and we should choose another character. % is scope so I think we should avoid that.)

@hakman
Copy link
Member Author

hakman commented May 23, 2021

I like the /64#3 or ::/64#3 notations.
Regarding auto-assignment, I would be happy to do it in create cluster only, where this would be done based on simple subnet number. I would prefer not to add a mechanism for auto-assignment for existing clusters. On those, people can decide on these values themselves.

@johngmyers
Copy link
Member

I don't see any particular reason not to support auto-assignment for existing clusters. Why would would we require admins of existing clusters to manually do this themselves?

What's the problem with Terraform? Can't we query the existing state of the subnets in order to specify that state in the configuration when it exists?

@hakman
Copy link
Member Author

hakman commented May 23, 2021

The reason I see not to support auto-assignment after cluster creation in the complications in logic for something that is not needed to make IPv6 work. It is nice to have, but would gladly live without it for now.
We already have differences between how things are done in Direct, CF and TF because of how settings are implemented of not there.

@hakman
Copy link
Member Author

hakman commented May 30, 2021

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 30, 2021
@hakman
Copy link
Member Author

hakman commented Jun 1, 2021

Final implementations uses the /64#3 notation for IPv6 subnet, where 64 is a decimal value and 3 is a hex value (as discussed during Office Hours.

pkg/apis/kops/validation/validation.go Show resolved Hide resolved
upup/pkg/fi/utils/cidr.go Outdated Show resolved Hide resolved

// ParseCidrSubnet parses a string in the format "/<newSize>#<netNum>"
// and returns the values of <newSize> and <netNum>.
func ParseCidrSubnet(subnet string) (int, int64, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While cidrsubnet() is the Terraform name of a related function, I think a name more like ParseGeneratedCIDR() would be more appropriate. The Terraform function takes as input a CIDR and subnets it, whereas this thing takes the implicit generated block and carves out a subnet of it. The term "subnet" is problematic because it could be confused with the Subnet object that this is being used to fill the IPv6CIDR field in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, "generated" is not ideal either. It is just a notation.
When using networking as the context, "subnet" has a broader meaning.
I would see it more as a ParseSubnetNotation() name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not necessarily generated. It's cut out from the VPC's singular CIDR.

(except VPCs, especially shared ones, can technically have multiple CIDRs.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ParseCIDRNotation() sound better?

pkg/apis/kops/validation/validation.go Show resolved Hide resolved
@@ -98,6 +99,14 @@ func (e *Subnet) Find(c *fi.Context) (*Subnet, error) {
}

actual.IPv6CIDR = association.Ipv6CidrBlock
if e.IPv6CIDR != nil && strings.HasPrefix(aws.StringValue(e.IPv6CIDR), "/") {
subnetIPv6CIDR, err := calculateSubnetCIDR(e.VPC.IPv6CIDR, e.IPv6CIDR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's a corner case that isn't handled: on the first run after a kOps upgrade, when the VPC and Subnets already exist, but there is not yet an associated Amazon-provided IPv6 association.

I don't see VPCAmazonIPv6CIDRBlock.RenderAWS() writing to its e.VPC.IPv6CIDR and I don't see any dependency of the Subnet task on the VPCAmazonIPv6CIDRBlock task.

(I do find it surprising that Find() methods write to e. And if they depend on values set by the Task.Run() methods of their dependencies, how is that going to work with TargetDryRun?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPCAmazonIPv6CIDRBlock.RenderAWS() cannot write the CIDR, because the request doesn't actually return which block it associates. Because of that, there is no dependency and Subnet.RenderAWS() waits for the association to be done, using fi.NewTryAgainLaterError().

I agree that Find() writing to e is awkward, but it was that or writing the e.IPv6CIDR notation to a.IPv6CIDR.
Also makes it easier later in FindDeletions(). I could change that to set a.IPv6CIDR = e.IPv6CIDR, if it makes things less awkward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Find() writing to e is okay, I'm just having to figure out what the rules and model are. It seems Find() can refine the expected based on existing state, including state produced by tasks listed as dependencies. But it has to deal with those dependencies not running due to TargetDryRun.

There's an opportunity here to improve the developer documentation.

This could be used in the future to support a /64 notation where the particular sub-block doesn't need to be specified.

But there is a dependency on VPCAmazonIPv6CIDRBlock as the association won't happen unless that task is run. And there shouldn't be two blocks of code when one would do, especially not if the two blocks of code implement different semantics.

upup/pkg/fi/cloudup/awstasks/subnet.go Outdated Show resolved Hide resolved
@@ -196,6 +205,22 @@ func (_ *Subnet) RenderAWS(t *awsup.AWSAPITarget, a, e, changes *Subnet) error {
}
}

if e.IPv6CIDR != nil && strings.HasPrefix(aws.StringValue(e.IPv6CIDR), "/") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment would be helpful to indicate this handles the case where the VPC's IPv6CIDR was not known at the time e.Find() is run.

And actually, e.Find() should NewTryAgainLaterError until it can get the association. Otherwise the code path is timing-dependent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This correct here, as it handles the case of TargetDryRun. We only care about the exact CIDR at render time, when we know that the VPC will eventually get and IPv6 CIDR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TargetDryRun will never call RenderAWS(), so that case will never happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find() has nothing to do with VPC.IPv6CIDR. It just needs to find the current state of the subnet.
If by the time it gets to RenderAWS() is still missing the VPC.IPv6CIDR, NewTryAgainLaterError waits for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find() is using VPC.IPv6CIDR to get the block to make allocations from.

Find() and RenderAWS() are using different rules for making the allocation and which one gets run depends on timing.

@@ -196,6 +205,22 @@ func (_ *Subnet) RenderAWS(t *awsup.AWSAPITarget, a, e, changes *Subnet) error {
}
}

if e.IPv6CIDR != nil && strings.HasPrefix(aws.StringValue(e.IPv6CIDR), "/") {
vpcIPv6CIDR, err := findVPCIPv6CIDR(t.Cloud, e.VPC.ID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why limit this to Amazon-provided IPv6 associations? Why not just use e.VPC.IPv6CIDR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot test with customer owned CIDRs. The AWS docs regarding IPv6 are not exactly clear either.
So, for now, this is meant to only support AWS owned CIDRs, where I know that can be exactly 1 per subnet with a /64 size and from exactly 1 bloc of /56 size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a customer-specified VPC.IPv6CIDR, the intended behavior is obvious: take the slice off of the specified VPC IPv6CIDR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's obvious that it means extra work, that is unrelated. For now, this is experimental, with limitations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could test using a shared VPC with a site-local IPv6 CIDR association.

It make sense to support only AWS-provided CIDRs in managed VPCs, especially for egress. But we shouldn't design out the shared VPC case. There's already code that pulls from VPC.IPv6CIDR; we should be consistent about using that when it's available.

@@ -196,6 +205,22 @@ func (_ *Subnet) RenderAWS(t *awsup.AWSAPITarget, a, e, changes *Subnet) error {
}
}

if e.IPv6CIDR != nil && strings.HasPrefix(aws.StringValue(e.IPv6CIDR), "/") {
vpcIPv6CIDR, err := findVPCIPv6CIDR(t.Cloud, e.VPC.ID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no dependency on the VPCAmazonIPv6CIDRBlock task, so RunTasks might schedule the VPCAmazonIPv6CIDRBlock task in a later slot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no value in the dependency, as VPCAmazonIPv6CIDRBlock will not return the associated block.
This is the reason for using NewTryAgainLaterError in Subnet.RenderAWS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But unless VPCAmazonIPv6CIDRBlock.Run() is called, the IPv6 association won't ever arrive. If there's no dependency, RunTasks might schedule VPCAmazonIPv6CIDRBlock to a later tranche than Subnet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IPv6 association won't ever arriveThere is a dependency in VPC, so it will arrive.

There is a dependency in VPC, so it will arrive, sooner or later.

RunTasks might schedule VPCAmazonIPv6CIDRBlock to a later tranche than Subnet.

This is why NewTryAgainLaterError waits for it to appear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't appear due to how RunTasks schedules tasks. If tasks b and c depend on a, d depends on b, and e depends on c, the scheduler effectively puts false dependencies of d depending on c and e depending on b.

@johngmyers
Copy link
Member

We should support this syntax on the Subnet.CIDR field, but that can be a separate PR.

return err
}
if vpcIPv6CIDR == nil {
return fi.NewTryAgainLaterError("waiting for the VPC IPv6 CIDR to be assigned")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible it's a shared VPC with no IPv6 allocation forthcoming.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, there will be an error from VPCAmazonIPv6CIDRBlock that says: "IPv6 CIDR block provided by Amazon not found".

@johngmyers
Copy link
Member

I think we might need to make significant changes to subnet_test.go to exercise all of this.

@johngmyers
Copy link
Member

I believe there are two outstanding issues:

  1. Missing dependency from Subnet to VPCAmazonIPv6CIDRBlock.
  2. Looking up the VPC's IPv6 association in one place (Find()) or two (Find() and RenderAWS()).

For issue (1):

RunTasks works by scheduling Tasks into tranches. The first tranche is tasks that have no dependencies. The second tranche is tasks that depend only on tasks in the first tranche. The third tranche is tasks that depend only on tasks in either the first or second tranche. And so on.

It then runs all the tasks in a given tranche in parallel. It waits for all tasks in a tranche to complete (without retries) before running any tasks in the next tranche.

Subnet depends only on VPC. So it will be in the tranche immediately following the one that VPC is in. VPCAmazonIPv6CIDRBlock currently depends only on VPC, so currently is scheduled in the same tranche as Subnet. So the two tasks race.

If there were a subsequent change that required VPCAmazonIPv6CIDRBlock to depend on an intermediate task that in turn depended on VPC, then it would then be scheduled in a tranche after Subnet. That means it would never run until all Subnet tasks completed. So if Subnet were to wait for an association that it is expecting VPCAmazonIPv6CIDRBlock to initiate, it would wait forever.

There also might be a race. If Subnet.RenderAWS() calls findVPCIPv6CIDR() before VPCAmazonIPv6CIDRBlock.RenderAWS() runs, it won't find an IPv6 association in progress, so might return a permanent error instead of a TryAgainLaterError.

I'll write up issue (2) later.

@hakman
Copy link
Member Author

hakman commented Jun 9, 2021

@johngmyers I addressed (1).

@hakman
Copy link
Member Author

hakman commented Jun 10, 2021

At the moment, task order due to dependencies is the following:

  • VPC (no dependency)
  • VPCAmazonIPv6CIDR (depends on VPC)
  • Subnet (depends on VPC and VPCAmazonIPv6CIDR)

In the subnet Find() function, we have the following case with regards to IPv6:

  • VPC doesn't exists -> No subnet will exist so no CIDR calculation is needed
  • VPC exists but doesn't have IPv6 CIDR -> No subnet will have a IPv6 CIDR so no calculation is needed
  • VPC exists and has IPv6 CIDR -> VPC.IPv6CIDR will be available for calculation.

In the subnet RenderAWS() function we have the following case with regards to IPv6:

  • VPC doesn't exists -> It is a dependency, VPC will always exist when it runs
  • VPC exists but doesn't have IPv6 CIDR -> it will retry until the CIDR allocated by VPCAmazonIPv6CIDR is available.
  • VPC exists and has IPv6 CIDR -> VPC.IPv6CIDR will be available for calculation and `findVPCIPv6CIDR() will not be needed.

@johngmyers Do you see any possible race condition anymore?

@johngmyers johngmyers self-requested a review June 10, 2021 04:29
@johngmyers
Copy link
Member

The initial look seems quite promising. It will likely be 24 hours before I can give it a proper review.

@hakman
Copy link
Member Author

hakman commented Jun 10, 2021

The initial look seems quite promising. It will likely be 24 hours before I can give it a proper review.

Thanks, no rush.

checkNoChanges(t, cloud, allTasks)
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a test for the case where we have to wait for VPCAmazonIPv6CIDRBlock

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johngmyers

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2021
@johngmyers
Copy link
Member

/retest

@k8s-ci-robot k8s-ci-robot merged commit 92af7b8 into kubernetes:master Jun 11, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jun 11, 2021
@hakman hakman deleted the ipv6_cidr_subnet branch October 22, 2021 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/office-hours lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants