Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/cri: switch internals to CRI v1, add ingress/egress v1/v1alpha2 CRI conversion. #781

Merged
merged 9 commits into from Nov 23, 2022

Conversation

klihub
Copy link
Contributor

@klihub klihub commented Mar 1, 2022

Switch the native/internal representation of pods and containers everywhere to CRI v1.
Implement ingress and egress protocol conversion between v1alpha2 and v1 to support
both pre-v1 clients and runtimes.

@codecov-commenter
Copy link

codecov-commenter commented Mar 1, 2022

Codecov Report

Merging #781 (a3c57f5) into master (6b458ca) will not change coverage.
The diff coverage is 22.22%.

@@           Coverage Diff           @@
##           master     #781   +/-   ##
=======================================
  Coverage   34.87%   34.87%           
=======================================
  Files          60       60           
  Lines        8874     8874           
=======================================
  Hits         3095     3095           
  Misses       5477     5477           
  Partials      302      302           
Impacted Files Coverage Δ
pkg/cri/resource-manager/cache/container.go 17.33% <4.76%> (ø)
pkg/cri/resource-manager/cache/pod.go 18.15% <33.33%> (ø)
pkg/cri/resource-manager/cache/cache.go 22.01% <45.45%> (ø)
pkg/cri/resource-manager/cache/utils.go 34.69% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@klihub klihub requested a review from kad March 2, 2022 11:50
kad
kad previously approved these changes Mar 2, 2022
Copy link
Member

@kad kad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'm ok with all of this. One separate worry that I have is about "unsafe.Pointer()" cast for practically 100% of data structures: does match in all cases of we have something that might be problematic?

jukkar
jukkar previously approved these changes Mar 2, 2022
Copy link
Contributor

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Right now the option is enabled by default, is the plan to disable it by default or remove it, in the future?

@klihub
Copy link
Contributor Author

klihub commented Mar 2, 2022

In general, I'm ok with all of this. One separate worry that I have is about "unsafe.Pointer()" cast for practically 100% of data structures: does match in all cases of we have something that might be problematic?

The CRI-O guys landed a few commits in the v1alpha2 CRI implementation to bring it struct-wise identical to v1 with the explicit goal of being able to do this type of near-zero-cost protocol bridging. The CRI-O implementation is essentially doing identical thing to what we do here, and that is the only reason why I dared to go with that instead of a convert-by-copying approach.

@klihub
Copy link
Contributor Author

klihub commented Mar 2, 2022

LGTM. Right now the option is enabled by default, is the plan to disable it by default or remove it, in the future?

Yes, that's exactly what I had in mind: first we have it as opt-out, once we're sure we can expect mostly everybody to talk v1 we make it opt-in, and eventually we remove it altogether.

Copy link
Contributor

@askervin askervin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on debian-sid with

  • kubelet v1.19.0 (CRI v1alpha2) + containerd v1.6.0-41-gc523102b5
  • kubelet v1.23.4 (CRI v1) + containerd v1.6.0-41-gc523102b5
    ...and it works fine.

Also tried out with containerd v1.4.0, that talks CRI v1alpha2.
As promised in the commit message, this is not supposed to work. cri-resmgr output:

F0303 07:32:12.850610    4447 main.go:89] failed to start resource manager: resource-manager: cache synchronization pod query failed: rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService

My only question would be related to this. Merging the PR will prevent using CRI-RM master branch with older container runtime versions, which will make it harder to integrate CRI-RM to certain environments. On the other hand, without this PR, CRI-RM works with old and new kubelet and runtimes as our primary backend runtimes implement compatibility layers.

What do you think, should we add support for v1alpha2 towards CRI runtimes, too? (In this PR or in a separate PR?)

@klihub
Copy link
Contributor Author

klihub commented Mar 3, 2022

Tested on debian-sid with

  • kubelet v1.19.0 (CRI v1alpha2) + containerd v1.6.0-41-gc523102b5
  • kubelet v1.23.4 (CRI v1) + containerd v1.6.0-41-gc523102b5
    ...and it works fine.

Also tried out with containerd v1.4.0, that talks CRI v1alpha2. As promised in the commit message, this is not supposed to work. cri-resmgr output:

F0303 07:32:12.850610    4447 main.go:89] failed to start resource manager: resource-manager: cache synchronization pod query failed: rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService

My only question would be related to this. Merging the PR will prevent using CRI-RM master branch with older container runtime versions, which will make it harder to integrate CRI-RM to certain environments. On the other hand, without this PR, CRI-RM works with old and new kubelet and runtimes as our primary backend runtimes implement compatibility layers.

What do you think, should we add support for v1alpha2 towards CRI runtimes, too? (In this PR or in a separate PR?)

The initial idea was to support bridging in both directions. Then I quite quickly changed my mind regarding bridging from new clients to old runtimes. I think that is neither worth the effort nor necessarily very safe: I'm not sure if clients talking the new protocol version would expect talking to that old a runtime. Also looking at the official EOL chart, I think dropping support for 1.19/older should not be a problem...

If we consider keeping v1alpha2 support around for a longer time, then I think using a dedicated branch/per-release branches/binary/binaries for it could/would probably be the way to go. That branch/those branches would simply have one additional commit swapping v1 for v1alpha2.

WDYT?

askervin
askervin previously approved these changes Mar 3, 2022
Copy link
Contributor

@askervin askervin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @klihub! This sounds reasonable.

LGTM.

@klihub klihub requested a review from kad March 7, 2022 09:16
@klihub klihub dismissed stale reviews from askervin, jukkar, and kad via 99e0237 June 14, 2022 10:07
@klihub
Copy link
Contributor Author

klihub commented Jun 14, 2022

@askervin @jukkar @kad @marquiz PTAL, I had to rebase this already approved one to resolve a conflict.

jukkar
jukkar previously approved these changes Jun 14, 2022
@klihub
Copy link
Contributor Author

klihub commented Jun 15, 2022

@kad, @marquiz, @askervin, @jukkar I consider merging this for the 0.7.0 release... WDYT ?

@klihub
Copy link
Contributor Author

klihub commented Jun 16, 2022

@kad, @marquiz, @askervin, @jukkar I consider merging this for the 0.7.0 release... WDYT ?

Okay, after testing that idea against our current e2e test set, it's better to keep this one out for now. Many of our tests pull in old runtimes without support for the v1 protocol.

@klihub
Copy link
Contributor Author

klihub commented Sep 8, 2022

@kad @marquiz @askervin @jukkar @fmuyassarov Once we are 100 % sure that we don't want/need to support any containerd or cri-o version which does not speak CRI v1, this should be good to go in.

Copy link
Member

@fmuyassarov fmuyassarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not well aware of the internals of CRI-resource-manager yet, but these changes look good to me from the code perspective.

@askervin
Copy link
Contributor

askervin commented Sep 8, 2022

In our e2e test point of view... I didn't run full tests here, but checked that cri-resmgr can talk to distro's on-the-stock containerd:

distro=debian-sid: ok

distro=ubuntu-22.04: ok

distro=ubuntu-20.04: fail (containerd-1.5.9)

root@vm> cri-resmgr -relay-socket /var/run/cri-resmgr/cri-resmgr.sock -runtime-socket /var/run/containerd/containerd.sock -image-socket /var/run/containerd/containerd.sock -force-config cri-resmgr.cfg   >cri-resmgr.output.txt 2>&1 &
cri-resmgr last output line:
F: [   cri-resmgr   ] failed to start resource manager: resource-manager: failed to query runtime version: rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService

distro=centos-8: ok

distro=fedora: ok

ubuntu-20.04 end-of-life is Aug/2025. What do you think?

@klihub klihub changed the title pkg/cri: switch to CRI v1 API, add bridging for pre-v1 clients. pkg/cri: switch internals to CRI v1, add ingress/egress v1/v1alpha2 CRI conversion. Nov 22, 2022
@klihub klihub marked this pull request as ready for review November 22, 2022 15:08
@klihub klihub requested review from fmuyassarov, jukkar and askervin and removed request for fmuyassarov November 22, 2022 15:08
Copy link
Contributor

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps some explanation could be added to the commented out code, now it is unclear why it is done like that.

pkg/cri/client/v1alpha2/client.go Outdated Show resolved Hide resolved
pkg/cri/client/client.go Outdated Show resolved Hide resolved
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @klihub. Looks basically really good to me. Just a few notes that I leave to your consideration

pkg/cri/client/client.go Outdated Show resolved Hide resolved
pkg/cri/client/v1/client.go Outdated Show resolved Hide resolved
pkg/cri/client/v1/client.go Outdated Show resolved Hide resolved
pkg/cri/client/v1alpha2/client.go Outdated Show resolved Hide resolved
pkg/cri/server/v1alpha2/server.go Outdated Show resolved Hide resolved
pkg/cri/client/v1alpha2/client.go Outdated Show resolved Hide resolved
@klihub klihub force-pushed the devel/cri-api/v1 branch 5 times, most recently from 360fd60 to 1ef334b Compare November 23, 2022 12:33
Implement v1/v1alpha2 CRI protocol conversion on the
ingress where we act as a CRI server for a client.
Do the protocol conversion by marshalling v1alpha2
then unmarshalling it into v1 for requests and the
other way around for responses.

Although this is slower than relying on the versions
to be struct-compatible and doing an unsafe cast, it
is much safer should the versions divert.
Add CRI v1 and v2 client backends. v1 simply forwards all
request and responses to and from the runtime unmodified.
v1alpha2 uses the v1alpha1 protocol towards the runtime.
It receives all requests and returns all responses as v1,
the native format used within cri-resmgr, converting all
requests and responses as necessary.

Conversion is done by marshalling v1 requests then
unmarshalling them into v1alpha2, and the other way
around for responses. While this is slower than relying
on the versions to be kept struct-compatible and doing
an unsafe cast, it is much safer should the versions
divert.

Implement v1/v1alpha2 CRI protocol conversion on the
egress. During connection establishment probe for the
v1 protocol then for v1alpha2 and use the first one
available.
@klihub klihub requested a review from marquiz November 23, 2022 14:30
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @klihub for this enhancement! Provides nice support accross different k8s and runtime versions.

LGTM

@marquiz marquiz merged commit b3ea4b7 into intel:master Nov 23, 2022
@klihub klihub deleted the devel/cri-api/v1 branch November 23, 2022 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants