Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to use own certificate for metrics endpoint #10504

Closed
wants to merge 5 commits into from

Conversation

legionus
Copy link

As a admin I want to give the monitoring system access only to metrics without access to the rest of the API.

@codecov-io
Copy link

codecov-io commented Feb 27, 2019

Codecov Report

Merging #10504 into master will decrease coverage by 0.04%.
The diff coverage is 65.21%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10504      +/-   ##
==========================================
- Coverage   71.56%   71.52%   -0.05%     
==========================================
  Files         393      393              
  Lines       36503    36525      +22     
==========================================
  Hits        26123    26123              
- Misses       8546     8567      +21     
- Partials     1834     1835       +1
Impacted Files Coverage Δ
embed/config.go 56.37% <ø> (ø) ⬆️
etcdmain/config.go 83.03% <100%> (+0.54%) ⬆️
etcdmain/grpc_proxy.go 62.18% <44.44%> (-0.43%) ⬇️
embed/etcd.go 72.76% <57.14%> (+1.75%) ⬆️
proxy/grpcproxy/register.go 69.44% <0%> (-13.89%) ⬇️
pkg/adt/interval_tree.go 83.78% <0%> (-7.51%) ⬇️
pkg/netutil/netutil.go 63.11% <0%> (-6.56%) ⬇️
etcdserver/v2_server.go 80.76% <0%> (-3.85%) ⬇️
pkg/testutil/recorder.go 77.77% <0%> (-3.71%) ⬇️
clientv3/leasing/kv.go 89.03% <0%> (-1.33%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17de9bd...1da4d75. Read the comment docs.

@pweil-
Copy link

pweil- commented Feb 27, 2019

/cc @brancz @jim-minter

@brancz
Copy link
Contributor

brancz commented Feb 27, 2019

I lack expertise in the etcd code base to review this, but 👍 on the feature in general!

Copy link
Contributor

@jingyih jingyih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR!

@@ -57,3 +61,44 @@ func metricsTest(cx ctlCtx) {
cx.t.Fatalf("failed get with curl (%v)", err)
}
}

func metricsTestCertAuth(cx ctlCtx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to unify this function with metricsTest?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingyih Everything is possible. Now the test looks better?

@@ -763,6 +763,9 @@ func (e *Etcd) serveMetrics() (err error) {

for _, murl := range e.cfg.ListenMetricsUrls {
tlsInfo := &e.cfg.ClientTLSInfo
if !e.cfg.MetricsTLSInfo.Empty() {
tlsInfo = &e.cfg.MetricsTLSInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we override tlsInfo, should we add a warning to log?

When MetricsTLSInfo is provided, client who has access to other APIs will not be able to access '/metrics' and '/health'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When MetricsTLSInfo is provided, client who has access to other APIs will not be able to access '/metrics' and '/health'?

Yes. Theoretically, we can make both work (add them to tls.Config.RootCAs and to tls.Config.ClientCAs), but it will be much more difficult solution because it requires to extend transport.TLSInfo.

If we override tlsInfo, should we add a warning to log?

This is not a hidden behavior (the admin must pass the options explicitly). I am not sure what message do I need to write to log ?
You specified other certificates for metrics. Access by client certificates will not be possible ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a hidden behavior (the admin must pass the options explicitly). I am not sure what message do I need to write to log ?

I was thinking about the scenario where etcd clients with clientTLS (not metricsTLS) might try to access the metrics and health endpoint. Adding something like 'ignoring client certificates since metrics certificates given' would help them understand why the connection is rejected. I am not an expert in this area, please feel free to let me know if you think this is reasonable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingyih That's reasonable. I added the message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better or worse, warning messages are often noted by administrators and automated systems which analyze log output for anomalies. The proposed metrics TLS configuration when used in conjunction with client TLS configuration is a valid state. So in that regard it's not clear to me a preemptive warning makes sense. It seems possible that when using both TLS configurations (again, a valid state), this warning might be interpreted as a false negative alert of some kind.

So, I would suggest we consider changing this to INFO level or omitting it entirely.

@jingyih
Copy link
Contributor

jingyih commented Feb 28, 2019

/cc @wenjiaswe

@legionus
Copy link
Author

legionus commented Mar 4, 2019

@jingyih looks good now ?

@jingyih
Copy link
Contributor

jingyih commented Mar 6, 2019

@legionus Sorry for the delayed response. I will take a look later this week (hopefully on Wednesday).

@jingyih jingyih self-assigned this Mar 6, 2019
@@ -155,6 +156,8 @@ func cURLPrefixArgs(clus *etcdProcessCluster, method string, req cURLReq) []stri
cmdArgs = append(cmdArgs, "--cacert", caPath, "--cert", certPath3, "--key", privateKeyPath3)
}
}
} else if req.useCertAuth {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having a hard time following the logic on how the command is generated. Does the if (line 145) make sense to you? For example, I tried to test TestV3MetricsSecure, but the actual curl command in the test was curl -L http://localhost:20000/metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular test I mentioned above was not affected by the changes in your PR. But I want to make sure the tests are correct and reader-friendly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having a hard time following the logic on how the command is generated.

@jingyih Me too. I had to add one more block to be able to use another certificate. It seems to me wrong that they are hardcoded in this function. On the other hand, they are pregenerated and their purpose is predefined. So maybe this makes sense.

Does the if (line 145) make sense to you?

No, it does not make sense to me. This code seems broken. It looks like someone tried to add the logic for secure connections, but failed.

For example, I tried to test TestV3MetricsSecure, but the actual curl command in the test was curl -L http://localhost:20000/metrics.

This is because this test does not test a secure connection :) The TestV3MetricsSecure and TestV3MetricsInsecure create custom cfg, but don't use it:

func TestV3MetricsSecure(t *testing.T) {
cfg := configTLS
cfg.clusterSize = 1
cfg.metricsURLScheme = "https"
testCtl(t, metricsTest)
}
func TestV3MetricsInsecure(t *testing.T) {
cfg := configTLS
cfg.clusterSize = 1
cfg.metricsURLScheme = "http"
testCtl(t, metricsTest)
}

To use custom config they must use withCfg(). It means that these tests use the default config:

func testCtl(t *testing.T, testFunc func(ctlCtx), opts ...ctlOption) {
defer testutil.AfterTest(t)
ret := ctlCtx{
t: t,
cfg: configAutoTLS,
dialTimeout: 7 * time.Second,
}
ret.applyOpts(opts)

As you can see there is no metricsURLScheme definition here:

configAutoTLS = etcdProcessClusterConfig{
clusterSize: 3,
isPeerTLS: true,
isPeerAutoTLS: true,
initialToken: "new",
}

It means the cx.cfg.metricsURLScheme will be empty as well:

if err := cURLGet(cx.epc, cURLReq{endpoint: "/metrics", expected: `etcd_debugging_mvcc_keys_total 1`, metricsURLScheme: cx.cfg.metricsURLScheme}); err != nil {

@jingyih
Copy link
Contributor

jingyih commented Mar 15, 2019

Your PR looks good to me. One option is that we get it merged first, and fix the tests in a separate PR.

@hexfusion Any comment?

@legionus
Copy link
Author

legionus commented Apr 8, 2019

@jingyih @hexfusion ping

@hexfusion
Copy link
Contributor

@legionus thanks for your patience. I will try to dig into this soon.

@xiang90
Copy link
Contributor

xiang90 commented May 1, 2019

/cc @hexfusion can you please take a look?

@stale
Copy link

stale bot commented Apr 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 6, 2020
@brancz
Copy link
Contributor

brancz commented Apr 7, 2020

I think this is still something that’s very desirable to have. Anyone continuing the work? :)

@stale stale bot removed the stale label Apr 7, 2020
@stale
Copy link

stale bot commented Jul 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 6, 2020
@brancz
Copy link
Contributor

brancz commented Jul 8, 2020

This would continue to be a valuable addition I think.

@ironcladlou
Copy link
Contributor

@legionus do you intend to complete this?

@legionus
Copy link
Author

@legionus do you intend to complete this?

@ironcladlou I have already changed the scope of work, but I can complete this PR if someone finds this PR useful.
What do I need to do ? rebase & resolve conflicts ?

@ironcladlou
Copy link
Contributor

@legionus

@ironcladlou I have already changed the scope of work, but I can complete this PR if someone finds this PR useful.
What do I need to do ? rebase & resolve conflicts ?

Can you explain what you mean re: the scope change?

I'm trying to decide how to proceed with #11993 and want to make sure my solution integrates with this work if the intent is to proceed as-is.

I can't say whether there's demand for this one, as I haven't been involved.

@legionus
Copy link
Author

Can you explain what you mean re: the scope change?

@ironcladlou I no longer develop openshift and etcd. Now I am involved in the development of completely different projects.
But I still remember golang :)

@ironcladlou
Copy link
Contributor

@legionus

@ironcladlou I no longer develop openshift and etcd. Now I am involved in the development of completely different projects.
But I still remember golang :)

Ah ha! Thanks!

Okay, so as far as I can tell from the history of this PR all that's left is rebasing, but I can't speak to whether the demand for the change still exists. @brancz seems to say yes, but I'm not personally aware of anyone with use cases for it, so I'm neutral.

Who can make the decision about whether this should proceed or not?

@tangcong
Copy link
Contributor

i think this pr is useful, @jingyih also agrees with it. can you rebase & resolve conflicts ? thanks. @legionus

Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
…ndpoints

Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
@legionus
Copy link
Author

i think this pr is useful, @jingyih also agrees with it. can you rebase & resolve conflicts ? thanks.

@tangcong, @ironcladlou Yes, I can. Done.

@@ -211,6 +223,12 @@ func startGRPCProxy(cmd *cobra.Command, args []string) {
go func() { errc <- srvhttp.Serve(httpl) }()
go func() { errc <- m.Serve() }()
if len(grpcProxyMetricsListenAddr) > 0 {
if grpcProxyMetricsListenCert != "" && grpcProxyMetricsListenKey != "" {
tlsinfo = newTLS(grpcProxyMetricsListenCA, grpcProxyMetricsListenCert, grpcProxyMetricsListenKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr #12114 adds health handler for grpcproxy self, if grpcProxyMetricsListenAddr uses own certificate for metrics endpoint, we should also update healthcheck client certs, otherwise, it is failed to access '/proxy/health' handle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ironcladlou said For the health listener, honor global the global TLS configuration by default (for compatibility with today's behavior), but provide a new flag to allow control over health endpoint client certificate auth (e.g. --health-client-cert-auth=false). in issue #11993.
i agrees with it. can we add a flag to access /health,/proxy/health without cert auth?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr #12114 adds health handler for grpcproxy self, if grpcProxyMetricsListenAddr uses own certificate for metrics endpoint, we should also update healthcheck client certs, otherwise, it is failed to access '/proxy/health' handle.

I already mentioned this before. Therefore, when using separate certificates for metrics, a warning is displayed.

i agrees with it. can we add a flag to access /health,/proxy/health without cert auth?

Do you want the server to serve some endpoints without certificates? Theoretically, this is possible, but will require rewriting the server. To do this, you will need to make authorization optional and do it not at the TLS level. This is out of scope of this proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. thanks. I will try to add --health-client-cert-auth flag in another PR. thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tangcong Can this PR be merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. can you add changelog and doc in another pr? thanks. @hexfusion can you take a final look? thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tangcong sorry for delay yes I will take a look.

@hexfusion
Copy link
Contributor

/assign

@hexfusion hexfusion self-assigned this Aug 11, 2020
@legionus
Copy link
Author

legionus commented Sep 8, 2020

@hexfusion @tangcong, @ironcladlou Hi! Is there any news?

@stale
Copy link

stale bot commented Jan 3, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

9 participants