Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of the Prometheus endpoint #1669

Merged
merged 20 commits into from Jan 26, 2024
Merged

Implementation of the Prometheus endpoint #1669

merged 20 commits into from Jan 26, 2024

Conversation

azazeal
Copy link
Contributor

@azazeal azazeal commented Jan 10, 2024

This PR implements the http://{metricsAddress}/metrics endpoint.

@github-actions github-actions bot added the needs triage Waiting for discussion / prioritization by team label Jan 10, 2024
@hslatman hslatman removed the needs triage Waiting for discussion / prioritization by team label Jan 11, 2024
@azazeal azazeal force-pushed the prometheus branch 5 times, most recently from 264caad to ddff1c9 Compare January 22, 2024 14:27
@azazeal azazeal force-pushed the prometheus branch 3 times, most recently from a76f424 to 48d59b5 Compare January 22, 2024 14:31
@azazeal azazeal changed the title Implementation of the /-/prometheus endpoint Implementation of the Prometheus endpoint Jan 22, 2024
authority/authority.go Outdated Show resolved Hide resolved
@azazeal azazeal force-pushed the prometheus branch 2 times, most recently from f90262d to e2a7304 Compare January 22, 2024 18:51
@azazeal azazeal marked this pull request as ready for review January 22, 2024 18:54
authority/meter.go Outdated Show resolved Hide resolved
authority/config/config.go Outdated Show resolved Hide resolved
internal/metrix/meter.go Outdated Show resolved Hide resolved
@azazeal azazeal force-pushed the prometheus branch 2 times, most recently from a963833 to ba25511 Compare January 22, 2024 20:59
Copy link
Contributor

@maraino maraino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few comments.

We should replace the current signing code with internal methods and do a minimal wrapper on those with the metrics. See an example in one of my comments.

For Sign and SignSSH we do not consider errors in authority.Authorize as errors.

Comment on lines 977 to 984
func (a *Authority) incrProvisionerCounter(prov *provisioner.Interface, err *error, count func(Meter, string, bool)) {
var name string
if p := *prov; p != nil {
name = p.GetName()
}

count(a.meter, name, *err == nil)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to this method in Authority.Sign and Authority.SignSSH does not consider all the error cases. A POST to /sign will call first authority.Authorize and then, if there's not error authority.Sign. The authority.Authorize can fail for multiple reasons, for example, it will fail if the token is not valid. The same will happen for /ssh/sign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would be authorization errors and not errors that happen while signing the certificate. Happy to consider adding more instruments in the future or reviewing a PR that adds them.

authority/ssh.go Outdated
func (a *Authority) SignSSH(_ context.Context, key ssh.PublicKey, opts provisioner.SignSSHOptions, signOpts ...provisioner.SignOption) (*ssh.Certificate, error) {
func (a *Authority) SignSSH(_ context.Context, key ssh.PublicKey, opts provisioner.SignSSHOptions, signOpts ...provisioner.SignOption) (cert *ssh.Certificate, err error) {
var prov provisioner.Interface
defer a.incrProvisionerCounter(&prov, &err, Meter.SSHSigned)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change this and similar methods, it will be very easy to miss this and return a different error. A better way might be to pass a pointer to the certificate instead, the certificate is probably only set in one place, and we can report an error if cert == nil. We will need to different methods for x509 and ssh and we will have to put set it to nil if store*Certificate fails, but that's gonna be harder to miss. Most of the errors will happen way before we set the certificate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also clean this up by moving the signing code to internal methods and getting the error from the return values.

Copy link
Contributor Author

@azazeal azazeal Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already setting the error to nil when storeCertificate fails and the errors.Is(err, db.ErrNotImplemented).

If we base our decision to instrument based on whether the returned chain is nil, then we lose the ability (in other places, outside the scope of open source) to instrument in more detail (perhaps according to the underlying type of error). We can look into changing the relevant authority.Meter signatures to be (provisioner.Interface, error) instead of (string, bool) in that case. In any case, the error is more important that the value of the returned chain in this case, I think.

Copy link
Contributor Author

@azazeal azazeal Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether (*Authority).storeCertificate should return a nil error, where it now returns db.ErrNotImplemented (so that we can avoid checking for the particular exception), is something to consider, in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I've changed the signature of meter functions to directly accept the provisioner & the error.

authority/ssh.go Outdated
@@ -146,22 +146,24 @@ func (a *Authority) GetSSHBastion(ctx context.Context, user, hostname string) (*
}

// SignSSH creates a signed SSH certificate with the given public key and options.
func (a *Authority) SignSSH(_ context.Context, key ssh.PublicKey, opts provisioner.SignSSHOptions, signOpts ...provisioner.SignOption) (*ssh.Certificate, error) {
func (a *Authority) SignSSH(_ context.Context, key ssh.PublicKey, opts provisioner.SignSSHOptions, signOpts ...provisioner.SignOption) (cert *ssh.Certificate, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at this package's dependents, I am willing to accept a breaking change that accepts a provisioner as a parameter or Sign and SignSSH. We can add the provisioner as a return value in authority.Authorize, and just pass it.

I'm not worried about breaking changes, but an option that won't cause less breaking changes would be to store the provisioner in the context, although we will need to add a context to Sign or just create the SignContext method.

In any case, I'm ok to clean these, and have always a context and a provisioner parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do something like this we can kill a few issues at once:

func (a *Authority) SignSSH(_ context.Context, key ssh.PublicKey, opts provisioner.SignSSHOptions, signOpts ...provisioner.SignOption (*ssg.Certificate, err error) {
   certs, prov, err = a.signSSH(...)
   a.incrProvisionerCounter(prov, err, Meter.SSHSigned) 
   return certs, err
}

We can keep the current interface, although, as I said, I'm not opposed to changing it. But especially, it will be harder to cause bugs in the future because a PR adds something like if err := foo(); err != nil { ... }, well and we won't need the pointer to an interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaning up sounds nice and I hope that we will. There's quite a few things that could benefit from being worked on in many places.

Having said that, named returns and 'defer' statements aren't uncommon when dealing with instrumented code and I don't see how adding yet another level of indirection is helpful.

If we instrument SignSSH and not signSSH then a caller of signSSH would need to remember to instrument the call.

The pointer to a pointer (interface), could be solved by refactoring the defer statement to be:

defer func() { incrProvisionerCounter(prov, err, a.Meter.SSHSigned) }()

instead of

defer a.incrProvisionerCounter(prov, err, Meter.SSHSigned)

Other than verbosity, there's not a lot of difference to these signatures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maraino Adding a SignContext sounds good. In fact, I have that in #1542. I should have a look at that again, sometime soon 😅

Comment on lines +9 to +10
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using OpenTelemetry be more flexible and will allow us to add more things in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was discussed and decided against.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it without OT means we don't have to keep up with all the changes happening on that side. I can see us revisiting this sometime in the future when OT is more stabilized, though. It could also be a possibility to support both, if needed.

internal/metrix/meter.go Outdated Show resolved Hide resolved
@dopey
Copy link
Contributor

dopey commented Jan 25, 2024

Want to chime in and say that, while I agree with the sentiment of doing a deeper rewrite of certificates to enable easier instrumentation, I think it's outside the scope of this feature. Let's set aside some time in the future to plan and execute some deeper changes that will make the APIs easier and more intuitive.

dopey
dopey previously approved these changes Jan 25, 2024
@azazeal azazeal enabled auto-merge (squash) January 25, 2024 17:37
This PR split the main metered methods in two to make it harder to miss
metrics in the future.
@dopey dopey disabled auto-merge January 26, 2024 07:45
@dopey dopey merged commit dd1ff9c into master Jan 26, 2024
13 checks passed
@dopey dopey deleted the prometheus branch January 26, 2024 07:47
@hslatman hslatman added this to the v0.25.3 milestone Feb 20, 2024
@hslatman hslatman mentioned this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants