Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable configuring Net::HTTP timeouts #222

Merged
merged 2 commits into from May 12, 2021
Merged

Conversation

nickvanw
Copy link
Contributor

@nickvanw nickvanw commented Apr 5, 2021

馃憢 !

We recently had an issue in our infrastructure where a Push Gateway was getting overloaded due to too much metric cardinality in the time series data that we were sending. We remedied this issue and things are well, but part of the way we noticed was that these pushes were happening in our job workers, which we have a lot of timing instrumentation around.

Digging in a bit further, we noticed that a huge number jobs were taking 60 seconds to complete, when they are normally much (sub-second) faster. We realized that this was due to our calls out to the push gateway, which would eventually time out. We wanted to be able to set the push timeout to be significantly lower (1-5 seconds) so that we can fail fast and avoid eating into our job capacity if there is ever an issue with our gateways again.

Looking at the Ruby documentation for Net::HTTP, both read_timeout and open_timeout have a default timeout of 60 seconds:

If the HTTP object cannot read data in this many seconds, it raises a Net::ReadTimeout exception. The default value is 60 seconds.
If the HTTP object cannot open a connection in this many seconds, it raises a Net::OpenTimeout exception. The default value is 60 seconds.

This PR allows a user to pass through the read_timeout and open_timeout in the initialization of a new Push class, but retains the defaults otherwise. I am not in love with using **kwargs, and am open to other methods of configuring this value (and happy to implement them myself). I do believe that it's important to give the user control of these values however.

Thanks!

@coveralls
Copy link

coveralls commented Apr 6, 2021

Coverage Status

Coverage remained the same at 100.0% when pulling 546b535 on nickvanw:master into a32180a on prometheus:master.

Signed-off-by: Nick Van Wiggeren <nick@planetscale.com>
@Sinjo
Copy link
Member

Sinjo commented May 12, 2021

Hey @nickvanw, thanks for the PR!

I agree we should have a way to pass these through.

I'm actually in the middle of a rework of the pushgateway client over in #220 that switchers the initializer over to keyword arguments entirely (we missed updating this class to them in the big rework in #95) so I'm totally cool with using them.

The only change I'd ask for before merging is to add a test for these extra arguments. Just something like this that checks we're passing the parameters along.

Signed-off-by: Nick Van Wiggeren <nick@planetscale.com>
@nickvanw
Copy link
Contributor Author

nickvanw commented May 12, 2021

Thanks @Sinjo!

I updated the fixture to take these arguments, and all of the relevant tests to check for them. Let me know if you want a separate test specifically for this (or, even, a specific instance of the gateway to test this).

Copy link
Member

@Sinjo Sinjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'm gonna refactor these tests so that the expectations aren't so duplicated (what with the use_ssl assertion already being repeated a bunch), but I'm not gonna hold up this PR with that.

Thanks for the contribution!

@Sinjo Sinjo merged commit f4ce120 into prometheus:master May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants