Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HTTP Output Compression #4807

Merged
merged 3 commits into from
Oct 5, 2018

Conversation

mihaitodor
Copy link
Contributor

@mihaitodor mihaitodor commented Oct 4, 2018

Fixes #4472.

TODO:

  • Should there be some refactoring done to avoid duplicating compressWithGzip here, since it's already present in influxdb and in influxdb_v2?
  • Would you be interested in caching gzip writers similar to here for better performance?
  • Do you really want to support "deflate" compression besides "gzip"?
  • If "br" stands for "brotli", it's not supported natively yet: golang support google/brotli#182 Do you want to bring in a 3rd party package for this if support for it is required?

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@glinton glinton added hacktoberfest feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin labels Oct 4, 2018
@glinton
Copy link
Contributor

glinton commented Oct 4, 2018

  1. I think that's a great idea, though obviously a bit more involved
  2. Also likely worth implementing some form of writer caching
  3. Personally, I'm fine with just gzip
  4. I don't think it's worth adding, but I think having some generic compressor/writer pool that can be extended to add deflate/br sometime in the future wouldn't be a bad idea.

I'm curious to know, and would wait to find out, what @danielnelson has to say on the matter though.

@danielnelson
Copy link
Contributor

We could move the shared code under internal, it sounds like it would be a good idea.

I would rather avoid a pool for gzip.Writer. If it is a performance bottleneck, it may make sense to reuse the Writer between calls to Write, within a plugin instance, but I would want to see some benchmarks showing the gzip.Writer as a significant portion of write time first.

On brotli, I don't think it has enough adoption to warrant, I'm just hearing of it for the first time now.

@mihaitodor mihaitodor force-pushed the http-output-compression branch 2 times, most recently from 8954aff to a53e591 Compare October 5, 2018 00:34
@mihaitodor
Copy link
Contributor Author

mihaitodor commented Oct 5, 2018

@glinton @danielnelson Thanks for answering my questions! I have implemented 1 in a53e591.

Studying this CompressWithGzip function in more detail, I think its design is not safe. io.Copy(gzipWriter, data) is executed in the background, which means that the internal goroutine will be still running when CompressWithGzip returns, so the caller will never know if io.Copy(gzipWriter, data) returns an error eventually. Do you have any insight into why it was designed this way? I'd probably consider returning an error channel to the caller instead and have it wait for the channel to receive the error. WDYT?

Regarding 2, I'd rather leave the pool for gzip.Writer for a future PR, if you're happy with the current performance. I'm not too keen to spend time on coming up with benchmarks for this right now.

Regarding 3 and 4, I asked because #4472 asks explicitly for "gzip, deflate, br, etc", so wasn't sure what your expectations are. Glad to hear that you're OK with having only gzip. I think it should be enough for most situations.

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setting of the error in CompressWithGzip is definitely wrong, but I don't think an error channel would be the right approach here as we really want any reader errors passed though the Pipe, this way the user of the pipe (http package in this case) gets the error. This would give consistent behavior with other errors in the Reader.

I'm not immediately sure how to implement this, but we shouldn't return the error from CompressWithGzip since we can't detect an error at this point. Also, fixing the error handling is optional for this PR, since it wouldn't be a step backwards.

var reqBodyBuffer io.Reader = bytes.NewBuffer(reqBody)
var err error

if h.ContentEncoding == "gzip" && h.Method == http.MethodPost {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only for POST method?

Copy link
Contributor Author

@mihaitodor mihaitodor Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what was requested in #4472. It will add a bit of overhead to requests which don't have a body, but I wouldn't normally consider such small optimisations. Maybe @madisonleavo can offer additional insight. If not, I can delete that condition. Just let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the requests currently have a body in this plugin, and it is hard to imagine a new data format that wouldn't produce it, perhaps a null data format? At any rate we currently do POST and PUT and I think they should both use the encoding if it is requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Done in 26ee478.


## HTTP Content-Encoding for write request body, can be set to "gzip" to
## compress body or empty string to not apply any encoding.
# content_encoding = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor but can you document like the influxdb option with the identity setting as default, no code needs to change:

  ## HTTP Content-Encoding for write request body, can be set to "gzip" to
  ## compress body or "identity" to apply no encoding.
  # content_encoding = "identity"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it out on purpose as requested in #4472. Changed it in c5502a0.

@mihaitodor
Copy link
Contributor Author

mihaitodor commented Oct 5, 2018

@danielnelson

The setting of the error in CompressWithGzip is definitely wrong, but I don't think an error channel would be the right approach here as we really want any reader errors passed though the Pipe, this way the user of the pipe (http package in this case) gets the error. This would give consistent behavior with other errors in the Reader.

Indeed, you are right! I got carried away thinking that it would be enough to return an error in the write method of the HTTP output plugin.

How about calling CloseWithError on the PipeWriter? I can simply pass into it the error returned by io.Copy, because:

subsequent reads from the read half of the pipe will return no bytes and the error err, or EOF if err is nil.

Also, fixing the error handling is optional for this PR, since it wouldn't be a step backwards.

I agree. Will open a new issue.

@danielnelson
Copy link
Contributor

That sounds perfect

@mihaitodor
Copy link
Contributor Author

@danielnelson OK, I believe I have addressed all the required changes in this PR. Please let me know if I missed anything.

@danielnelson
Copy link
Contributor

Did you do the CloseWithError idea, I'm not seeing it?

Propagate io.Copy errors through the io.Pipe
@mihaitodor
Copy link
Contributor Author

mihaitodor commented Oct 5, 2018

Oops, communication issue. I thought you preferred to have a new PR for it. Added it in 11cdf77.

@danielnelson danielnelson merged commit f3da717 into influxdata:master Oct 5, 2018
@danielnelson danielnelson added this to the 1.9.0 milestone Oct 5, 2018
@danielnelson
Copy link
Contributor

Thanks!

rgitzel pushed a commit to rgitzel/telegraf that referenced this pull request Oct 17, 2018
otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019
otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019
dupondje pushed a commit to dupondje/telegraf that referenced this pull request Apr 22, 2019
athoune pushed a commit to bearstech/telegraf that referenced this pull request Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants