Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for calico kubernetes backend #3127

Closed

Conversation

blakebarnett
Copy link

@blakebarnett blakebarnett commented Aug 2, 2017

This will enable locking down access to etcd from the nodes as well as support for the new NetworkPolicy API in k8s 1.7

  • Update to Calico 2.4
  • Disable opening the etcd port

ref: #2639

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 2, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @blakebarnett. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: blakebarnett
We suggest the following additional approver: justinsb

Assign the PR to them by writing /assign @justinsb in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@blakebarnett
Copy link
Author

/assign @justinsb

@@ -1577,6 +1569,27 @@ func GetOpenAPIDefinitions(ref openapi.ReferenceCallback) map[string]openapi.Ope
},
Dependencies: []string{},
},
"k8s.io/kops/pkg/apis/kops/v1alpha1.Assets": {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the apimachinery generated this bit...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure, but my guess is that you're just picking up that it wasn't run before... Don't worry about it anyway, although it highlights that we need to get this under CI testing lest people forget.

@blakebarnett
Copy link
Author

blakebarnett commented Aug 2, 2017

There were a few suprises when I ran make apimachinery with the new changes in master. Did I do something wrong there?

@blakebarnett
Copy link
Author

@justinsb looks like switching the backend during a rolling-update isn't going to work automatically. What's the best way to determine if this is a first time cluster deploy? I'll make it refuse to change backends if it's not a new cluster for now.

@justinsb
Copy link
Member

justinsb commented Aug 6, 2017

What's the best way to determine if this is a first time cluster deploy?

This came up in office hours with respect to etcd also (enabling TLS & maybe thinking about allowing etcd3). So I did #3147 / #3148 - that adds a ValidateClusterUpdate validation, where you could prevent this change. It infers the status from the cloudprovider, which is nice for initial population, but we'll probably eventually put the Status into the Cluster object (like other k8s objects).

Because we know we can do this, the fact that switching won't work needn't be a blocker to getting this PR merged. I'll take a pass at the code now.

Copy link
Member

@justinsb justinsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generallly looks great. We don't want a separate 1.7 manifest unless it is actually needed though...

channels/alpha Outdated
@@ -17,6 +17,9 @@ spec:
networking:
kubenet: {}
kubernetesVersions:
- range: ">=1.7.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this relate to this PR (and if so, why :-) )? I made similar changes to the alpha channel but I don't think we've promoted it to stable yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to leave it out, but there's the issue of having addons not change unless the channel version changes. It's probably not a problem for this PR though, for now we can't make an upgrade work anyway. I can remove.

@@ -196,7 +196,7 @@ func (b *FirewallModelBuilder) applyNodeToMasterBlockSpecificPorts(c *fi.ModelBu
udpRanges := []portRange{{From: 1, To: 65535}}
protocols := []Protocol{}

if b.Cluster.Spec.Networking.Calico != nil {
if b.Cluster.Spec.Networking.Calico != nil && b.Cluster.Spec.Networking.Calico.KubernetesDataStore != true {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: !x is probably more idiomatic than x != true, but I actually find your version easier to read. So either works :-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've always found that idiom in go to be super easy to miss

@@ -1577,6 +1569,27 @@ func GetOpenAPIDefinitions(ref openapi.ReferenceCallback) map[string]openapi.Ope
},
Dependencies: []string{},
},
"k8s.io/kops/pkg/apis/kops/v1alpha1.Assets": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure, but my guess is that you're just picking up that it wasn't run before... Don't worry about it anyway, although it highlights that we need to get this under CI testing lest people forget.

namespace: kube-system
data:
{{- if not (.Networking.Calico.KubernetesDataStore)}}
# The calico-etcd PetSet service IP:port
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're supposed to call them StatefulSets now :-) But NBD.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's remnants, not something I added :) I'll change it

}

{
location := key + "/k8s-1.7.yaml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we only need a new manifest if the manifests are actually different. 1.6 is really >= 1.6, and pre-1.6 is really < 1.6. The change being that 1.6 changed the node taints/tolerations (both the values and it moved from an annotation to a field), and that 1.6 introduced RBAC.

I'd prefer not to have a duplicate version if 1.7 & 1.6 are actually the same. But of course, if they are different, then they have to be.

Are they different (and if so why :-) ) ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference for calico is that there are some changes that only work with 1.7 (the NetworkPolicy changes) AFAIK. I don't think it's incompatible with 1.6 at all though, so it's probably safe to leave out the 1.7 manifest.

At some point I would really like to extract the version requirements for bundled addons like this into the addon manifests, so changes to the channel like this would never be necessary.

@justinsb
Copy link
Member

justinsb commented Aug 6, 2017

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 6, 2017
```
networking:
calico:
kubernetesDataStore: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest switching this to something like:

datastore: kubernetes | etcd

Just in case someone wants a third option there some day.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

Blake added 6 commits August 7, 2017 11:03
- Update to Calico 2.4
- Disable opening the etcd port

This will enable locking down access to etcd from the nodes as well as support for the new NetworkPolicy API in k8s 1.7
 - Also use RollingUpdate for updateStrategy for the calico-node DaemonSet, which seems safer for updates.
 - Adds a separate serviceAccount for calico-node and calico-policy-controller, calico-node has many more required privileges.
 - Add the path to the manifest visitor error output to make it easier to track down weird manifest errors
@blakebarnett
Copy link
Author

After speaking to @caseydavenport it's probably best to hold off on merging this until Calico 2.5 comes out and the move from TPR -> CRD is done. In the meantime I'll work on a migration script to try switching this live on a cluster.

- Switch from a bool kubernetesDataStore option to `datastore` and set the default to `etcd`
- Remove the 1.7 version manifest
@justinsb
Copy link
Member

Do let me know if / when this is ready for review @blakebarnett

@k8s-github-robot
Copy link

@blakebarnett PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 18, 2017
@blakebarnett
Copy link
Author

Looks like 2.5.0 is out and they have a migration process documented. I'll see if it's feasible to automate and try updating this next week: https://github.com/projectcalico/calico/blob/master/upgrade/v2.5/README.md#1-before-you-begin

@justinsb justinsb added this to the 1.8.0 milestone Sep 25, 2017
@justinsb
Copy link
Member

Hi @blakebarnett sorry this one fell off my radar a bit - where are we on calico versions now?

@blakebarnett
Copy link
Author

No worries, it fell off of mine also. I probably need to start fresh on this one, probably won't get it done for 1.8

@chrislovecnm
Copy link
Contributor

2.5 just came out a couple of weeks ago. We need it for 1.8 network policies ;)

@blakebarnett
Copy link
Author

You mean 2.6.x :)

Yeah, we should update the version separate from this also.

@justinsb justinsb added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. and removed WIP: Work In Progress labels Nov 4, 2017
@justinsb justinsb modified the milestones: 1.8.0, 1.8.1 Nov 22, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 20, 2018
@justinsb justinsb modified the milestones: 1.8.1, 1.9 Feb 21, 2018
@justinsb
Copy link
Member

What's the state of this one @blakebarnett ?

@blakebarnett
Copy link
Author

Will probably be a month or so before I can come back do it, we can close if someone else wants to take a stab.

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 28, 2018
@blakebarnett
Copy link
Author

I don't have time to work on this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants