Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TLS option for grpc probes #119093

Open
TheImpressiveDonut opened this issue Jul 5, 2023 · 14 comments · May be fixed by #124522
Open

Add TLS option for grpc probes #119093

TheImpressiveDonut opened this issue Jul 5, 2023 · 14 comments · May be fixed by #124522
Assignees
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@TheImpressiveDonut
Copy link

TheImpressiveDonut commented Jul 5, 2023

What would you like to be added?

gRPC built-in probes should be able to communicate with a server that is setup to use a secured connection.

Why is this needed?

Setting up a gRPC server to use plaintext is not really acceptable in production, so built-in probes are kind of unusable.

How?

The same way as http probe does it, with setting the field InsecureSkipVerify to true, we could do:

opts := []grpc.DialOption{}
tlsConfig := &tls.Config{InsecureSkipVerify: true}
opts.append(opts, grpc.WithTransportCredentials(credentials.NewTLS(tlsConfig)))

Maybe there is a specific reason this has not been done already ?
I am willing to contribute if needed.

/sig node

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 5, 2023
@TheImpressiveDonut
Copy link
Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 5, 2023
@AryanSharma9917
Copy link

/assign @AryanSharma9917

@aojea
Copy link
Member

aojea commented Jul 5, 2023

This is the KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2727-grpc-probe but I can't see any reference to TLS, however, the external project that is meant to replace allows to use TLS so I think we should too

@SergeyKanzhelev
Copy link
Member

This is the KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2727-grpc-probe but I can't see any reference to TLS, however, the external project that is meant to replace allows to use TLS so I think we should too

I mention transition here: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-grpc-liveness-probe: "Built-in probes do not support any authentication parameters (like -tls).". The reality is that it is hard to pass the client certificate for kubelet to use. So we decided not to implement it and described it in KEP.

KEP also saying: "As spec'd, the kubelet probe will not allow use of client certificates nor verify the certificate on the container. ".

I asked at gRPC meetup and the feedback was that the WithInsecure (https://github.com/kubernetes/kubernetes/blame/7e3c98fd303359cb9f79dfc691da733a6ca2a6e3/pkg/probe/grpc/grpc.go#L58) is probably enough for many services. This PR changes the implementation to the new transport security: #109778

I wonder if the replacement is not adequate and your proposed SkipVerify is a proper fix. Let me look into it, not a gRPC expert. If you know for sure - please comment which option is the best

@SergeyKanzhelev
Copy link
Member

Found this: grpc/grpc-go#5609

Yes, it looks like the PR: #109778 broke the feature

@SergeyKanzhelev
Copy link
Member

@TheImpressiveDonut if you can send a PR and if you have a suggestion how to implement the test (unit or e2e) to validate this feature - it will be great

@SergeyKanzhelev
Copy link
Member

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 19, 2023
@AryanSharma9917
Copy link

AryanSharma9917 commented Jul 20, 2023

Hey @SergeyKanzhelev, I'll send the PR by today I have already made some changes already, after they are done I'll send the PR.
And ya suggestions are welcomed..!!

@TheImpressiveDonut
Copy link
Author

Hey @AryanSharma9917, I do not think that the PR #119516 you've made fix this issue. I am not really familiar with flexvolume but I believe the point I made here is not linked to those. Tell me if I am wrong.

I have a rough idea on how to fix this, but I have a two questions on how it should be implemented w.r.t. kubernetes:

  1. From the security perspective of kubernetes, is it okay to use the InsecureSkipVerify: true (the same way http probe works) ? I do not think there is an another option or at least I don't think it is necessary to validate the server certificate (not really the role of the probe).
  2. It is only possible to initiate the grpc connection in plaintext xor TLS, should we only allow grpc probes to work in TLS or should we add a new property (e.g. useTLS) to api (would require more changes) ?

@SergeyKanzhelev do you have some thoughts on those ? Thanks !

@TheImpressiveDonut
Copy link
Author

/assign @TheImpressiveDonut

@AryanSharma9917
Copy link

Hey @TheImpressiveDonut, Here's a revised version of the implementation that ensures secure communication by validating the server certificate:

func (p *flexVolumeProber) initGRPCClient() {
	tlsConfig := &tls.Config{
		InsecureSkipVerify: false, // Enable certificate verification
		// Additional TLS configuration options can be provided here
	}

	opts := []grpc.DialOption{
		grpc.WithTransportCredentials(credentials.NewTLS(tlsConfig)),
	}

	p.grpcOpts = opts
}

By providing a valid tls.Config with certificate verification, we ensure that the gRPC client communicates securely with the server, making the probe implementation more secure.

@aojea
Copy link
Member

aojea commented Aug 1, 2023

for any PR please add an e2e test demonstrating it works

@k8s-triage-robot
Copy link

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Deprioritize it with /priority important-longterm or /priority backlog
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2024
@SergeyKanzhelev SergeyKanzhelev linked a pull request Apr 25, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
6 participants