Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag to disable the checksum workaround for Flannel VXLAN #9614

Merged

Conversation

hakman
Copy link
Member

@hakman hakman commented Jul 23, 2020

Last attempt of making newer k8s versions 1.18 play nice with Flannel VXLAN.
Ref: #9543 (comment)

/cc @johngmyers @rifelpet

@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Jul 23, 2020
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. area/nodeup labels Jul 23, 2020
@hakman
Copy link
Member Author

hakman commented Jul 23, 2020

/approve
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 23, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2020
@hakman
Copy link
Member Author

hakman commented Jul 23, 2020

@jhohertz We discussed internally if to disable or not the workaround so late in the release cycle for Kops 1.18. It would be acceptable as long a you could test the build from this PR. If that's acceptable for you, I can provide the steps needed to use this build.

@jhohertz
Copy link
Contributor

Would be happy to! Not run a pre-merged build before, will scan the docs and let you know if I need some pointers.

@hakman
Copy link
Member Author

hakman commented Jul 23, 2020

@jhohertz It is very easy:

$ wget https://storage.googleapis.com/kops-ci/pulls/pull-kops-e2e-kubernetes-aws-1-18/pull-42cf9dc24b/linux/amd64/kops
$ export KOPS_BASE_URL=https://storage.googleapis.com/kops-ci/pulls/pull-kops-e2e-kubernetes-aws-1-18/pull-42cf9dc24b
$ ./kops create ...

@jhohertz
Copy link
Contributor

Thanks, working on it now, just need to tweak my tooling a bit, should have something later this evening.

@jhohertz
Copy link
Contributor

So, launching a 1.18.6 cluster with that build, the issues what would indicate a checksum issue do NOT present. 🎉

Are there any other scenarios you'd like me to try? Was wondering about a .5 -> .6 transition, but I'd need to find a long running connectivity test to run... not 100% sure why it would be an issue as in either case (w/ workaround or w/ patch), the checksum should be good on the vxlan packets sent around.

I'd like to help clear this up as it might be useful to 1.17 (1.16?) if we can deem it to be safe.

@johngmyers
Copy link
Member

johngmyers commented Jul 24, 2020

As I recall, one of the concerns @justinsb raised at Office Hours was that kops checking against a patch level of Kubernetes would be unprecedented. I cannot represent that position, though. I am neutral on this point as I don't think this particular case requires adding to our test matrix.

My concerns are around the proximity to the 1.18.0 release and the risk/benefit tradeoff. The risk seems to have been managed as well as it can be, but, as was mentioned during Office Hours, the benefit seems negligible.

@hakman
Copy link
Member Author

hakman commented Jul 24, 2020

As I recall, one of the concerns @justinsb raised at Office Hours was that kops checking against a patch level of Kubernetes would be unprecedented.

Hmm, I didn't recall this part when I proposed the change. Thanks for the reminder.

Considering that it is confirmed that Kubernetes 1.18.6 doesn't have the issue and was further tested by @jhohertz, seems pretty low risk to skip the workaround for 1.18.6+, but leave the final decision to @justinsb .
I am mostly worried that the workaround may cause some unexpected issues.

/assign @justinsb

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/api and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 1, 2020
@hakman hakman changed the title Skip the checksum workaround for Flannel VXLAN for Kubernetes 1.18.6+ Add flag to disable the checksum workaround for Flannel VXLAN Aug 1, 2020
@hakman
Copy link
Member Author

hakman commented Aug 1, 2020

Reworked as flag per discussion during office hours.

@hakman
Copy link
Member Author

hakman commented Aug 14, 2020

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 14, 2020
@johngmyers
Copy link
Member

I think we might need to add this flag (as a no-op) to master branch to prevent upgrade problems.

@hakman
Copy link
Member Author

hakman commented Aug 14, 2020

@johngmyers Aren't unknown flags just ignored by Kops (which is actually a problem that pops up from time to time) ?

@johngmyers
Copy link
Member

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
# Found fields that are not recognized
# ...
#     networkID: vpc-0a35c793b8cb0fbe7
#     networking:
# +     canal:
# +       disableTxChecksumOffloading: false
# -     canal: {}
#     nonMasqueradeCIDR: 100.64.0.0/10
#     sshAccess:
# ...
#
#

@hakman
Copy link
Member Author

hakman commented Aug 20, 2020

@johngmyers Done: #9782.

@rifelpet
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2020
@k8s-ci-robot k8s-ci-robot merged commit 57695d8 into kubernetes:release-1.18 Aug 21, 2020
@hakman hakman deleted the skip-flannel-workaround branch August 21, 2020 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api area/nodeup cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants