Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

objstore/s3: make IdleConnTimeout in http.Transport configurable #567

Merged
merged 8 commits into from
Nov 16, 2018

Conversation

GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Oct 10, 2018

Let's make the http.Transport used inside objstore/s3 configurable. It helps in some cases where a HAProxy in front of the S3 storage sometimes has smaller time-out values and then spurious messages are printed to the logs which are actually harmless. It seems that it also helps with invalid HTTP queries that Go's stdlib sometimes sends to check if a connection is still alive.

Changes

Added new fields to the Config struct and added the default values from the code before. No impact to the user at all if they have nothing specified in those config files.

The documentation should be updated as well but I am not sure what kind of format should I follow for the docs so I left it up to you :)

Verification

Added unit tests for checking the new parsing function which has default values.

minio/minio-go#1037 check this link for more information.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion -> remember you are defining here an "API" that we aim to be stable, so we need to be careful here

SignatureV2 bool `yaml:"signature-version2"`
SSEEncryption bool `yaml:"encrypt-sse"`
SecretKey string `yaml:"secret-key"`
IdleConnTimeout int `yaml:"idle-conn-timeout`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using model.Time here? It nicely marshals from 5s or 5m . I think it would better than guessing what is the unit of those (:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you mean model.Duration? It seems like model.Time can only unmarshal strings of format "a.b" and then it converts it into seconds whereas model.Duration has a whole parser for different units like you said: https://github.com/prometheus/common/blob/master/model/time.go#L182

I agree with you there, though. Some software only recognizes and accepts seconds however we could be smarter here and re-use some of the Prometheus code to implement it properly :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant Duration, sorry (:

@jturolla
Copy link

What problem are you trying to solve? Do you have issues with the defaults?

We have not encountered issues with this, maybe if you're experiencing timeouts you should move to a bucket near your location?

@GiedriusS
Copy link
Member Author

GiedriusS commented Oct 18, 2018

Yes, I guess that they probably only make sense when using Google's or Amazon's S3 storage. In our setup, we have a HAProxy in front of a Ceph-based S3 storage. This is related to keepalive connection timeouts mismatches between the HAProxy and the http.Transport in s3.go in Thanos. I outlined the issue here: minio/minio-go#1037. I could provide you the exact timeout values but you get the idea: the mismatch causes that harmless message to be printed from time to time, and the implementation of http.Transport also prints a message that a keep-alive connection had been closed unexpectedly, when that 400 error happens, in stdout.
The bucket is certainly close to the location where Thanos is running, the latencies are around <1ms. :) That's not the problem.

@GiedriusS
Copy link
Member Author

Sorry for the mess with the commits and rebasing, should be all OK now :)

@GiedriusS GiedriusS changed the title objstore/s3: make s3 client configurable objstore/s3: make http.Transport configurable Oct 18, 2018
Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK two things comes to my mind when I reviewed this PR.

A) Since config is part of our interface, adding additional (even optional) fields makes our interface wider,
which increases complexity.

Are you adding those just in case of you need to tweak particular option? If only particular one, can we just add that one?

Why we cannot have ONE defaults that works for all cases? Isn't that possible?

B) What if someone want to tweak let's say IdleConnTimeout but for GCS provider? We would end up doing same code there as well? Maybe doing this in common place makes sense?

Sorry for delay in review!

@@ -60,7 +68,16 @@ type Bucket struct {

// NewBucket returns a new Bucket using the provided s3 config values.
func NewBucket(logger log.Logger, conf []byte, reg prometheus.Registerer, component string) (*Bucket, error) {
var config Config
defIdleConnTimeout, _ := model.ParseDuration("90s")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Defaults could be done as package var and have model.MustParseDuration
    1a. Why even parsing? 90 * time.Second does not work here?
  2. This strictly relays on yaml Marshaling and how it deals if fields is not filled. We would feel more safe if we had a unit test for this?
  3. Let's maybe pack this into function? so we can nicely test it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here was to use model.ParseDuration so that we would less associate with the actual underlying type of model.Duration but I guess I will change it to just use 90 * time.Second if that is OK with you. I agree that it definitely makes things clearer. As for other points, I agree. I thought it was the default and only acceptable behaviour of unmarshal function but we can move it into a function - it would be much nicer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@GiedriusS
Copy link
Member Author

In our particular case only IdleConnTimeout is enough but what about the other options? Perhaps someone may want to change the other ones as well so I added these to this PR just in case. I don't quite agree with adding one single option for tweaking the idle connections timeout because e.g. other storage providers may not use at all the HTTP protocol underneath their calls.
I think that having one single defaults like this isn't really possible because someone might have e.g. different timeout settings in their haproxy instances and it might be impossible to change on them on the server itself if, for example, it is used intensively by other applications.
I'll make changes and update the PR. Thank you for your review.

@bwplotka
Copy link
Member

Perhaps someone may want to change the other ones as well so I added these to this PR just in case.

Yes, but it's not a problem to add this later if needed. Those options are quite low level and requires deep knowledge on HTTP client library trivia. I would vote for adding what you need and also pack it in some http-config nested structure, so it will look like:

type: S3
config:
  http_config:
    idle_conn_timeout: <val>

What do you think?

SignatureV2 bool `yaml:"signature-version2"`
SSEEncryption bool `yaml:"encrypt-sse"`
SecretKey string `yaml:"secret-key"`
IdleConnTimeout model.Duration `yaml:"idle-conn-timeout"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do yaml names with underscore? I don't know how we slipped the dash convention, but I would stay with the style Prometheus has: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E

@GiedriusS
Copy link
Member Author

@bwplotka could you PTAL again? I moved the HTTP config to a different struct and the parsing to a separate function, added a simple unit test for it. I still think there is value in being able to configure other values as well. Being able to configure idle connection timeouts works for us here but someone might run into other issues if they have other settings in their reverse proxy.

@GiedriusS
Copy link
Member Author

@bwplotka ping.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions, but overall LGTM

It's overall YAGNI rule - to not add something that no one wants right now, just because it will be maybe useful (: but not a blocker here.

HTTPConfig HTTPConfig `yaml:"http_config"`
}

// HTTPConfig stores the http.Transport configuration for the s3 minio client
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing trailing period

@@ -48,10 +61,24 @@ type Bucket struct {
sse encrypt.ServerSide
}

// ParseConfig unmarshals a buffer into a Config with default HTTPConfig values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing trailing period

@@ -48,10 +61,24 @@ type Bucket struct {
sse encrypt.ServerSide
}

// ParseConfig unmarshals a buffer into a Config with default HTTPConfig values
func ParseConfig(conf []byte) (Config, error) {
defaultHTTPConfig := HTTPConfig{IdleConnTimeout: model.Duration(90 * time.Second),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's format it nicely. We need to put newline for each field as the number of elements is large.


// HTTPConfig stores the http.Transport configuration for the s3 minio client
type HTTPConfig struct {
IdleConnTimeout model.Duration `yaml:"idle_conn_timeout"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really suggest just doing the thing you need right now which is IdleConnTimeout explicitly for HA Proxy.

I feel we should stick to YAGNI here and not introduce something that is not useful right now: https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it

@GiedriusS
Copy link
Member Author

GiedriusS commented Nov 15, 2018

Updated the PR according to your comments. I don't see a problem with giving the users complete power in terms of configuration but I understand your concerns and I only left idle_conn_timeout for now since that is what is causing these problems in the first place. Please take a look when you have time.

@GiedriusS GiedriusS changed the title objstore/s3: make http.Transport configurable objstore/s3: make IdleConnTimeout in http.Transport configurable Nov 15, 2018
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bwplotka bwplotka merged commit 10657cb into thanos-io:master Nov 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants