Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃尡 optimize ip allocation #696

Merged
merged 1 commit into from Oct 18, 2022

Conversation

schrej
Copy link
Member

@schrej schrej commented Aug 25, 2022

What this PR does / why we need it:
It splits the allocation of IP addresses from IP pools into two steps: creating the claim, and attempting to fetch the address.
This way we can give the IPAM providers a little bit of time to fulfil the claim.

@metal3-io-bot metal3-io-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 25, 2022
@metal3-io-bot
Copy link
Contributor

Hi @schrej. Thanks for your PR.

I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot metal3-io-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 25, 2022
@lentzi90
Copy link
Member

/ok-to-test

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 25, 2022
@kashifest kashifest requested a review from maelk August 26, 2022 07:16
@kashifest
Copy link
Member

What this PR does / why we need it: It splits the allocation of IP addresses from IP pools into two steps: creating the claim, and attempting to fetch the address. This way we can give the IPAM providers a little bit of time to fulfil the claim. To avoid unnecessary re-queues we wait for 200ms if a claim should be fetched again before attempting to fetch the address.

In an ideal case, one or more claims would be created because they do not exist. We then wait 200ms, fetch the claims again, hoping that they were updated, and then fetch the IP address the claim was fulfilled with.

The delay of 200ms was chosen arbitrarily. Not sure what a good value would be here.

A bit unsure about the motivation here. Whats the harm if the reconciliation loop is requeing ? If you are concerned about too many reques, perhaps introducing something like this would make it much cleaner ?

@schrej
Copy link
Member Author

schrej commented Aug 26, 2022

A bit unsure about the motivation here. Whats the harm if the reconciliation loop is requeing ? If you are concerned about too many reques, perhaps introducing something like this would make it much cleaner ?

That's how it's working right now. If any of the IPAddresses is missing, it will requeue after 30s. Therefore a M3M takes at least two reconciliation cycles with 30s delay after initial creation. The goal is to handle an entire machine with a single reconciliation after it was first created, at least if the IP gets allocated quickly enough (e.g. within 200ms). I've initially suggested it in Slack, and just went for an implementation as it's not that complicated.

The ideal solution would be registering watches on IPClaims so we can react as soon as something changes, but with the CAPI contract that would be quite difficult to do.

@metal3-io-bot metal3-io-bot added the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Sep 13, 2022
All claims are created first. If a new claim was added, wait briefly and
fetch the claim again before fetching the address.
@metal3-io-bot metal3-io-bot removed the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Oct 4, 2022
Copy link
Member

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good cleanup to me. But I think the 200ms wait that is mentioned in the description is now gone?

@lentzi90
Copy link
Member

lentzi90 commented Oct 5, 2022

/test-centos-e2e-main

@schrej
Copy link
Member Author

schrej commented Oct 5, 2022

Yeah. We discussed this during the community call and it was decided that we don't wait after creation. Iirc the main reason was that we don't want to block the reconciliation queue unnecessarily. I'll update the description.

This is also missing a test for claim creation. I had one before, but the complexity increased a lot since preallocations were introduced in the meantime, which requires to mock m3dt, m3m, ma and bmh in addition to m3d.

@furkatgofurov7
Copy link
Member

/test-ubuntu-integration-main

@lentzi90
Copy link
Member

/test-centos-e2e-parallel-main

Copy link
Member

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2022
Copy link
Member

@furkatgofurov7 furkatgofurov7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @schrej

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: furkatgofurov7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 18, 2022
@furkatgofurov7
Copy link
Member

/test golangci-lint

@metal3-io-bot metal3-io-bot merged commit 360dba0 into metal3-io:main Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants