Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve path handling for Pushgateway #186

Closed
beorn7 opened this issue May 13, 2020 · 12 comments
Closed

Improve path handling for Pushgateway #186

beorn7 opened this issue May 13, 2020 · 12 comments
Assignees
Milestone

Comments

@beorn7
Copy link
Member

beorn7 commented May 13, 2020

Currently, the code in push.rb only handles pushes to /metrics/job/%s and '/metrics/job/%s/instance/%s, and even those have some problems:

  • If the job name or the instance name contain a /, the push will fail.
  • If the instance name is empty, the push will fail. (It will also fail with an empty job name, but that's intended. This case could be caught earlier in the code, though.)
  • No other grouping keys are possible.

Please check out https://github.com/prometheus/pushgateway/blob/master/README.md to learn about arbitrary label pairs as grouping key and how to handle base64 encoding for / and empty values (the latter to be merged as prometheus/pushgateway#346 ).

@Sinjo
Copy link
Member

Sinjo commented Mar 25, 2021

Hi @beorn7, I'm just running through our open issues and I wanted to ask your opinion on this one - particularly on handling special characters. I had a read through the README you linked and saw the option of encoding grouping key values as base64, which as far as I can tell deals with all possible encoding issues.

Would it be a reasonable approach to do that all the time rather than trying to detect if a particular value needs encoding? My rough plan, assuming that's a sensible approach, would be:

Alternatively, if that's overkill (I can see how it would make debugging more of a pain when using simple examples), I'm thinking:

  • Add a way of specifying additional grouping keys when using push.rb
  • Validate the names of those keys using the standard allowed characters for metric labels
  • If a value is empty or contains a /, suffix its key with @base64 and encode the value as RFC4648 base 64
  • Otherwise apply standard URI encoding to the value

@beorn7
Copy link
Member Author

beorn7 commented Mar 25, 2021

I guess you could just follow what I did in client_golang. Just always doing base64 is certainly a safe way to go, but I could see a loss of "quality of life" if you want to debug something, dump the URLs somehow, and then you have to decipher the base64 encoded parameters.

As far as I can see, past me made an effort to only use base64 encoding if required: https://github.com/prometheus/client_golang/blob/master/prometheus/push/push.go#L309-L319

@Sinjo Sinjo self-assigned this Mar 25, 2021
@Sinjo
Copy link
Member

Sinjo commented Mar 26, 2021

Yeah, that feels like the right way. Most of the time, the base64 encoding is just going to be a pain for debugging.

I've just noticed while reading through the client_golang push code that it returns an error if the job label is present in any metric in the registry being pushed, which isn't something we do in the client_ruby equivalent. I'm thinking we should probably also perform that check, along with adding a check for any arbitrary label pairs that collide with labels in the registry being pushed.

I'm not sure if this should also be applied for instance, which doesn't seem to get any special treatment in the client_golang code (unlike in client_ruby). Reading this part of the pushgateway docs suggests that while instance is really a well-known label with specific meaning, it can be treated the same way as any arbitrary user-provided labels as it is optional.

Unfortunately, perfoming that validation on job and instance would be a breaking change, but we have another of those in the pipeline in #209.

@dmagliola I'm thinking we should hold off on releasing 3.0 so we can make this breaking change at the same time. #209 is currently on master. I'm happy to leave it there for now, on the assumption that I can get this done in the next few days and that we don't need to cut a release before then. If we did want to cut a release in a hurry for any reason, I propose we temporarily revert that PR.

@Sinjo Sinjo modified the milestones: v0.8.0, v3.0.0 Mar 26, 2021
@Sinjo
Copy link
Member

Sinjo commented Mar 27, 2021

Just noted another thing we might need to change (pending a quick look at how the pushgateway handles it): in the Go library, they're using the QueryEscape method from net/url. We're using CGI.escape, which off the top of my head encodes spaces as + instead of %20, but is otherwise the same. Hopefully the pushgateway treats those things the same, but I want to make sure it does, and follow what client_golang does if it's treated differently.

I've created a draft PR at #220 for all this work.

@beorn7
Copy link
Member Author

beorn7 commented Mar 29, 2021

AFAIK it's not possible to encode a space as + in the path part of the URL (but it's OK in the query params after the ?, which the PGW API doesnt't use). I actually would expect CGI.escape to handle that properly. Perhaps you looked at a URL parameter and not at the path when you checked?

@beorn7
Copy link
Member Author

beorn7 commented Mar 29, 2021

About the instance label:

Yeah, there is a bit history behind the instance label. It used to be a fixed part of the URL path, like job, but then we realized that an instance label often doesn't make sense on metrics you would push to a PGW. See https://github.com/prometheus/pushgateway#about-the-job-and-instance-labels, especially the last paragraph thereof, for context.

So the instance label is still a bit special, but more on the side of exposing pushed metrics than during pushing a metric.

The special validation performed by client_golang has more to do with the grouping key in general (and the job label is always part of the grouping key). See https://pkg.go.dev/github.com/prometheus/client_golang@v1.10.0/prometheus/push#Pusher.Grouping

Any label that is part of the grouping key but also present in the metric to push will lead to an error in client_golang. That's just a safeguard against surprises. The PGW itself is fine with that case but will ruthlessly overwrite the labels present in the metric with the labels from the grouping key. So how to deal with the case in the client library is essentially a philosophical question. Personally, I'd say if someone is setting labels on the metric that are also present in the grouping key, they are doing something wrong, and probably without noticing. So I better let them know as early as possible. But if you want to put emphasis on making the code run, you could see that differently. So perhaps you would prefer to not validate but issue a warning somehow?

@Sinjo
Copy link
Member

Sinjo commented Apr 4, 2021

AFAIK it's not possible to encode a space as + in the path part of the URL (but it's OK in the query params after the ?, which the PGW API doesnt't use).

Oh, TIL! I had no idea that was only for query params.

I actually would expect CGI.escape to handle that properly. Perhaps you looked at a URL parameter and not at the path when you checked?

Ruby's URL/CGI escaping is a bit of a mess. This StackOverflow answer (which is way better than the accepted one on the question) gives a decent overview of how it ended up the way it is, but tl;dr we're gonna have to think it through and I'm gonna test it against a real pushgateway and see what happens.

So perhaps you would prefer to not validate but issue a warning somehow?

I think that's more in line with how we do things in this library. I think the approach you mentioned works really well in a language like Go, where people can use errcheck to make sure they're handling errors returned by the libraries they're using.

I'm torn for the reasoning you mention though. We do raise an exception on observations with invalid label sets, and this doesn't feel too dissimilar to that in principle. @dmagliola Thoughts?

@Sinjo
Copy link
Member

Sinjo commented Jun 9, 2021

Okay, after some local testing I can 100% confirm we're messing this up.

# HELP foo foo description
# TYPE foo gauge
foo{instance="with space",job="test-job"} 1
foo{instance="with+space",job="test-job"} 1
foo{instance="with-hyphen",job="test-job"} 1

I tried sending a metric with different instance names to a local pushgateway. When using CGI.escape to encode a label value with a space, you end up with a literal + in the value. I made a hacky change to send %20 instead, and I got the correct value in the label.

I'm going to play around and see if there's anything reasonable I can use from stdlib, particularly after the deprecation that led to #188. If not, we may have to pull addressable in as a dependency. I'd like to avoid that if possible, but correctness wins over that preference.

@Sinjo
Copy link
Member

Sinjo commented Jun 9, 2021

Okay, that bug is fixed and I've merged #220 as an intermediate step in this work. The remaining items are:

  • Stop treating instance as a special label in the grouping key (neither of the python or go client do)
  • Support passing arbitrary labels to be used in the grouping key
  • Validate labels
  • Use base64 encoding for label values containing /
  • Raise an error when grouping key labels conflict with metric labels
  • Raise errors for non-2xx responses (currently a 400 from sending a label with a / or its encoded form just fails silently)

@Sinjo
Copy link
Member

Sinjo commented Dec 24, 2021

I've just been working through this, and was looking at reusing LabelSetValidator on the grouping key. I noticed that it doesn't validate the character set used in the label name, which is specified here in the Prometheus docs.

Seeing as we're going to be shipping this as part of a major, breaking release, I'm going to add that validation to LabelSetValidator.

@Sinjo
Copy link
Member

Sinjo commented Jan 2, 2022

Guess a couple of quiet days over Christmas/New Years was just what I needed 😅

#234 is ready for review! As the changes to the push client are fairly drastic, I've done it as a series of small commits for ease of review.

@Sinjo
Copy link
Member

Sinjo commented Jan 9, 2022

With #234 merged this is done, and will be in our next release (3.0.0).

My plan is to get a PR in for #170 before we make that release, as the fix for it is also a breaking change.

@Sinjo Sinjo closed this as completed Jan 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants