New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Container Runtime Interface to use enumerated namespace modes #58973

Merged
merged 3 commits into from Feb 7, 2018

Conversation

@verb
Contributor

verb commented Jan 29, 2018

What this PR does / why we need it: This updates the CRI as described in the Shared PID Namespace proposal. This change to the alpha API is not backwards compatible: implementations of the CRI will need to update to the new API version.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
WIP #1615

Special notes for your reviewer:
/assign @yujuhong

Release note:

[action-required] The Container Runtime Interface (CRI) version has increased from v1alpha1 to v1alpha2. Runtimes implementing the CRI will need to update to the new version, which configures container namespaces using an enumeration rather than booleans.
@yujuhong

This comment has been minimized.

Contributor

yujuhong commented Jan 29, 2018

@resouer

This comment has been minimized.

Member

resouer commented Jan 29, 2018

Since this is a break change (most CRI shims should be impacted), looks like we are missing rollout plan and target releases? (Or has been mentioned somewhere else?)

POD = 0;
CONTAINER = 1;
NODE = 2;
}

This comment has been minimized.

@resouer

resouer Jan 29, 2018

Member

These terms are a little bit different from host_xxx pattern we used before. So mind to add some comments for them? Like: what is NamespaceMode=NODE means? (Seems to be host_network=true)

This comment has been minimized.

@yujuhong

yujuhong Jan 29, 2018

Contributor

+1 for adding more comments.

This comment has been minimized.

@feiskyer

feiskyer Jan 30, 2018

Member

+1 of adding comments

maybe rename NODE to HOST? which I think is consistent with hostNetwork mode

This comment has been minimized.

@verb

verb Jan 30, 2018

Contributor

Yup, you're all absolutely right, the comments were insufficient. I've updated them now, please take another look.

@feiskyer I chose NODE because it's well defined by Kubernetes and I thought we should use the Kubernetes terminology where possible. It does create an inconsistency with v1.Pod (which itself uses host & node inconsistently) but I expect that a v2.Pod would replace the namespace booleans with an enum matching this one.

@yujuhong

This comment has been minimized.

Contributor

yujuhong commented Jan 29, 2018

Since this is a break change (most CRI shims should be impacted), looks like we are missing rollout plan and target releases? (Or has been mentioned somewhere else?)

@resouer the plan is to target this for v1.10, and the proposal has been reviewed and approved by sig-node.
This is indeed a breaking change, but I think it'd be better to make this change (to make the API cleaner) since CRI is not completely stable yet. We also have other API breaking changes going out this release (e.g., reopen the log file), so bundling them together is also good.

I understand that this adds more burden to the CRI shim maintainers. As a community we will be working on stabilizing and versioning the API soon. Before that, I think we can be more tolerant to breaking changes and impose that the CRI versions used by kubelet and the shim have to match exactly :-)

Thanks for reviewing!

// Network namespace for this container/sandbox.
// Note: There is currently no way to set CONTAINER scoped network in the Kubernetes API.
// Namespaces currently set by the kubelet: POD, NODE
NamespaceMode network = 1;

This comment has been minimized.

@verb

verb Jan 30, 2018

Contributor

@yujuhong I don't know enough about proto wire format to know if changing these types will cause an error on deserialize (which is what I want) or garbage bools. Any ideas?

This comment has been minimized.

@yujuhong

yujuhong Jan 31, 2018

Contributor

I think it might not break, but it'd be good to verify the behavior either way.

This comment has been minimized.

@feiskyer

feiskyer Jan 31, 2018

Member

It breaks since the filed name/type are changed with the same id. If you add those with new ids (e.g. 4,5,6), then it won't break.

This comment has been minimized.

@verb

verb Jan 31, 2018

Contributor

Researched a bit here and tested using the remote dockershim. It turns out that proto3 fields are compatible across types, including bools & enums, so we can't just change the type. The behavior appears to be that the enum will be cast to a bool resulting in lots of containers getting host namespaces.

Instead we can renumber the fields. The old bools will get default false values. The effect is that host network/pid/IPC will stop working until the runtimes update their proto.

This comment has been minimized.

@yujuhong

yujuhong Feb 1, 2018

Contributor

I don't now if there is a cleaner way to resolve this short of kubelet performing a version check (which will come later).
Defaulting to POD namespace mode does sound safer.

/cc @Random-Liu @mrunalp

This comment has been minimized.

@verb

verb Feb 4, 2018

Contributor

@yujuhong, we could also do something like this:

message NamespaceOption {                                                                                                                                                        
    // Prior to using NamespaceMode, NamespaceOption used bools to specify namespace.                                                                                            
    // This unused field is type-incomptabile with the previous bool in order to force                                                                                           
    // an error when a runtime implements the outdated interface.                                                                                                                
    float sentinel = 1;                                                                                                                                                          
    reserved 2 to 3;   
...

Then the kubelet would always set sentinel to a non-zero which will cause a grpc error and no containers will start:

I0204 22:14:02.194405   11572 kuberuntime_manager.go:579] SyncPod received new pod "debugtest_default(49c4f606-09f0-11e8-b5dc-42010af00002)", will create a sandbox for it       
I0204 22:14:02.194427   11572 kuberuntime_manager.go:588] Stopping PodSandbox for "debugtest_default(49c4f606-09f0-11e8-b5dc-42010af00002)", will start new one                  
I0204 22:14:02.194449   11572 kuberuntime_manager.go:640] Creating sandbox for pod "debugtest_default(49c4f606-09f0-11e8-b5dc-42010af00002)"                                     
E0204 22:14:02.195188   11572 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Internal desc = grpc: error unmarshalling request: proto: wrong 
wireType = 5 for field HostNetwork                                                                                                                                               
E0204 22:14:02.195262   11572 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "debugtest_default(49c4f606-09f0-11e8-b5dc-42010af00002)" failed: rpc error: code = Internal de
sc = grpc: error unmarshalling request: proto: wrong wireType = 5 for field HostNetwork                                                                                          
E0204 22:14:02.195281   11572 kuberuntime_manager.go:646] createPodSandbox for pod "debugtest_default(49c4f606-09f0-11e8-b5dc-42010af00002)" failed: rpc error: code = Internal d
esc = grpc: error unmarshalling request: proto: wrong wireType = 5 for field HostNetwork                                                                                         
E0204 22:14:02.195340   11572 pod_workers.go:186] Error syncing pod 49c4f606-09f0-11e8-b5dc-42010af00002 ("debugtest_default(49c4f606-09f0-11e8-b5dc-42010af00002)"), skipping: f
ailed to "CreatePodSandbox" for "debugtest_default(49c4f606-09f0-11e8-b5dc-42010af00002)" with CreatePodSandboxError: "CreatePodSandbox for pod \"debugtest_default(49c4f606-09f0
-11e8-b5dc-42010af00002)\" failed: rpc error: code = Internal desc = grpc: error unmarshalling request: proto: wrong wireType = 5 for field HostNetwork"                         
I0204 22:14:02.195449   11572 server.go:423] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"debugtest", UID:"49c4f606-09f0-11e8-b5dc-42010af00002", APIVersion:"
v1", ResourceVersion:"271", FieldPath:""}): type: 'Warning' reason: 'FailedCreatePodSandBox' Failed create pod sandbox: rpc error: code = Internal desc = grpc: error unmarshalli
ng request: proto: wrong wireType = 5 for field HostNetwork 
@tallclair

This comment has been minimized.

Member

tallclair commented Feb 5, 2018

Shouldn't a breaking change increment the API version? I.e. v1alpha2? Or is that not needed since the wire format is backwards compatible?

@verb

This comment has been minimized.

Contributor

verb commented Feb 5, 2018

@tallclair That's a good point, but I wasn't sure since CRI doesn't use any of the api-machinery and I don't think the version numbers are referenced in the protocol, so I'm not sure we'll gain anything. There's no protocol negotiation, the kubelet will just use the one proto it knows, and we don't want to support the old format going forward.

@yujuhong has plans for versioning and compatibility requirements once the the API stabilizes.

@verb

This comment has been minimized.

Contributor

verb commented Feb 6, 2018

/retest

@verb verb referenced this pull request Feb 6, 2018

Closed

Support shared pid namespace #597

@yujuhong

This comment has been minimized.

Contributor

yujuhong commented Feb 6, 2018

@yujuhong has plans for versioning and compatibility requirements once the the API stabilizes.

This is mentioned in the sig-node meeting today. I think it's okay to bump the CRI version to v1alpha2 in a separate PR first.

@verb

This comment has been minimized.

Contributor

verb commented Feb 6, 2018

@tallclair you were totally right and bumping the version worked much better. Thanks for chiming in!

@yujuhong

This comment has been minimized.

Contributor

yujuhong commented Feb 6, 2018

@verb you'd need to change pkg/kubelet/apis/cri/services.go as well.

And also change the release note.

@yujuhong

This comment has been minimized.

Contributor

yujuhong commented Feb 7, 2018

/cc @Random-Liu @mrunalp @resouer

This PR bumps the CRI version from v1alpha1 to v1alpha2 since there are breaking changes. Please comment if you have concerns.

@k8s-ci-robot k8s-ci-robot requested a review from resouer Feb 7, 2018

@feiskyer

This comment has been minimized.

Member

feiskyer commented Feb 7, 2018

This PR bumps the CRI version from v1alpha1 to v1alpha2 since there are breaking changes. Please comment if you have concerns.

LGTM

@verb

This comment has been minimized.

Contributor

verb commented Feb 7, 2018

@verb you'd need to change pkg/kubelet/apis/cri/services.go as well.

@yujuhong currently apis/cri/services.go isn't versioned with the runtime and is instead referred to in code as "internalapi". Would you like it to be? It would simplify things.

verb added some commits Jan 25, 2018

Update kubelet for enumerated CRI namespaces
This adds support to both the Generic Runtime Manager and the
dockershim for the CRI's enumerated namespaces.
Increment CRI version from v1alpha1 to v1alpha2
This also incorporates the version string into the package name so
that incompatibile versions will fail to connect.

Arbitrary choices:
- The proto3 package name is runtime.v1alpha2. The proto compiler
  normally translates this to a go package of "runtime_v1alpha2", but
  I renamed it to "v1alpha2" for consistency with existing packages.
- kubelet/apis/cri is used as "internalapi". I left it alone and put the
  public "runtimeapi" in kubelet/apis/cri/runtime.
@yujuhong

This comment has been minimized.

Contributor

yujuhong commented Feb 7, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 7, 2018

@yujuhong

This comment has been minimized.

Contributor

yujuhong commented Feb 7, 2018

/assign @dchen1107
to approve /hack changes

@dchen1107

This comment has been minimized.

Member

dchen1107 commented Feb 7, 2018

/lgtm

/approve

Only reviewed the commit: Update kubelet for enumerated CRI namespaces ...

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Feb 7, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, verb, yujuhong

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Feb 7, 2018

Automatic merge from submit-queue (batch tested with PRs 59276, 51042, 58973, 59377, 59472). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit cf7073a into kubernetes:master Feb 7, 2018

13 checks passed

Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation verb authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-unit Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

@Random-Liu Random-Liu referenced this pull request Feb 8, 2018

Closed

Upgrade to v1alpha2 #236

k8s-merge-robot added a commit that referenced this pull request Feb 9, 2018

Merge pull request #59475 from Random-Liu/use-mountpoint
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add mountpoint as CRI image filesystem storage identifier.

Fixes #57356.

This PR changes CRI to use mountpoint as storage identifier. See #57356 (comment).

Note that:
1) This doesn't work with devicemapper for now. Please feel free to propose change for device mapper, we can discuss more about this after this first version is merged. @mrunalp @runcom
2) `mountpoint` is added as new field in `StorageIdentifier` now. After #58973 is merged, we can remove the UUID field in `v1alpha2`. 

/cc @yujuhong @feiskyer @yguo0905 @dashpole @mikebrow @abhi @kubernetes/sig-node-api-reviews 

**Release note**:

```release-note
CRI starts using moutpoint as image filesystem identifier instead of UUID.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment