Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy pre-compressed data as-is #358

Closed
aukaost opened this issue May 2, 2023 · 13 comments
Closed

Proxy pre-compressed data as-is #358

aukaost opened this issue May 2, 2023 · 13 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@aukaost
Copy link

aukaost commented May 2, 2023

Hi,
Thanks for this project!
If you take a file which is gzipped and has Content-Encoding set to gzip on the object's meta data it seems to be served uncompressed.
Is it possible to proxy those objects as-is?

Thanks!

@oxyno-zeta
Copy link
Owner

Hello @aukaost ,

Happy to see that this project is interesting for multiple people ;)

I'm afraid that this coming from your browser when you download the file. I've already seen this when I used the real S3 interface.
Did you try to manually change the content-type in S3 and test ? If this is working, it means that this is coming from your browser and you could use the header override feature to avoid this behavior. If not, we will need to check more in details ;)

Oxyno-zeta

@aukaost
Copy link
Author

aukaost commented May 2, 2023

I'll add some more detail :)
Our assets are all pre-compressed in the bucket and we do want the browser to do its thing via Accept-Encoding and transparently decompress the assets in the browser. This is all working as intended when running using curl, e.g. curl https://<bucket>.s3.us-west-1.amazonaws.com/object.txt -H "Accept-Encoding: gzip" will return a compressed object with a Content-Encoding: gzip header.
If I request the same asset through s3-proxy, the result is a decompressed object with no Content-Encoding header.
So it seems that the result from the request between s3-proxy and s3 gets decompressed and then served uncompressed.
We could enable compression on s3-proxy but that would mean decompressing and re-compressing without need.

@oxyno-zeta
Copy link
Owner

Interesting...

I will need a bit of time to debug this.
I wonder if this isn't coming from the AWS client itself.

But thanks for all those details !

@aukaost
Copy link
Author

aukaost commented May 3, 2023

I'll also have a look, just wanted to check if I was missing anything before I do!

@aukaost
Copy link
Author

aukaost commented May 3, 2023

aws/aws-sdk-go#1292 (comment)
https://github.com/oxyno-zeta/s3-proxy/blob/master/pkg/s3-proxy/s3client/s3Context.go#L226

	obj, err := s3ctx.svcClient.GetObjectWithContext(ctx, s3Input,
		func(r *request.Request) {
			// Comment out to have transport decode the object.
			r.HTTPRequest.Header.Add("Accept-Encoding", "gzip")
		})

Tested s3-proxy with this modification and that seems to do the job.
I suppose the proper way would be to forward the client request's Accept-Encoding to GetObjectWithContext.

➜  ~ curl http://127.0.0.1:8080/hello.txt -vs | gunzip
*   Trying 127.0.0.1:8080...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /hello.txt HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: no-cache, no-store, no-transform, must-revalidate, private, max-age=0
< Content-Encoding: gzip
< Content-Length: 35
< Content-Type: text/plain
< Etag: "2ab3bdddd9d55d30c05b22c08658c436"
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Last-Modified: Tue, 02 May 2023 17:22:07 GMT
< Pragma: no-cache
< X-Accel-Expires: 0
< Date: Wed, 03 May 2023 08:13:14 GMT
<
{ [35 bytes data]
* Connection #0 to host 127.0.0.1 left intact
hello%

@oxyno-zeta
Copy link
Owner

Thanks for your input and tests !

Now that we know how to patch it, I need to think about a way to add configuration to allow people sending whatever they want :) .
Thanks a lot for this new feature "spec" 😄 .

@oxyno-zeta
Copy link
Owner

oxyno-zeta commented May 3, 2023

@aukaost I spend few hours to develop the feature and I was on the test phase and can't reproduce your problem...

With this example:

echo "Fake" > file.txt
gzip file.txt
aws s3 cp file.txt.gz s3://TARGET/file.txt.gz

Making those requests:

curl http://localhost:8080/mm/file.txt.gz -vs | gunzip
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /mm/file.txt.gz HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: must-revalidate, max-age=0
< Content-Length: 34
< Content-Type: text/plain
< Etag: "6d1220d5af136cb2cc4fa8bc51dde518"
< Last-Modified: Wed, 03 May 2023 21:07:50 GMT
< Vary: Origin
< Date: Wed, 03 May 2023 21:13:47 GMT
< 
{ [34 bytes data]
* Connection #0 to host localhost left intact
Fake
 curl http://localhost:8080/mm/file.txt.gz -vs | cat 
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /mm/file-gz.txt HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: must-revalidate, max-age=0
< Content-Length: 34
< Content-Type: text/plain
< Etag: "6d1220d5af136cb2cc4fa8bc51dde518"
< Last-Modified: Wed, 03 May 2023 21:17:33 GMT
< Vary: Origin
< Date: Wed, 03 May 2023 21:17:50 GMT
< 
{ [34 bytes data]
* Connection #0 to host localhost left intact
��Rdfile.txtsK�N����.% 

So... Seems that everything is intact.

Am I missing something ?

Can you give me a procedure, a configuration and the version you are using please ?

@oxyno-zeta oxyno-zeta self-assigned this May 3, 2023
@oxyno-zeta oxyno-zeta added enhancement New feature or request good first issue Good for newcomers question Further information is requested labels May 3, 2023
@aukaost
Copy link
Author

aukaost commented May 4, 2023

image
You need to update the s3 object's metadata to include Content-Encoding: gzip.
This way s3 will return this header and will make the s3client decompress the object transparently.

I've been testing with 4.8.0, the config is very basic:

targets:
  sourcemaps:
    mount:
      path:
        - /
    bucket:
      name: <bucket>
      region: us-west-1
server:
  listenAddr: ""
  port: 80

@oxyno-zeta
Copy link
Owner

Ahhh that explains why I wasn't able to reproduce it.

Now I am ! And I'm able to make it work as desired too.
But I'm not able to reproduce it in tests due to the fact that I'm using a fake S3 in-memory... I will check how difficult will it be to migrate to minio for tests.

In all cases, I will probably release early next week.

@aukaost
Copy link
Author

aukaost commented May 4, 2023

That's really awesome, thanks for that :)

oxyno-zeta added a commit that referenced this issue May 8, 2023
@oxyno-zeta
Copy link
Owner

I've tested to switch to Minio for tests but it will required more work than expected...

I've decided to release as is the code.

4.9.0 is released with your needed patch ! 🎉

Have a good rest of your day,

Oxyno-zeta

@oxyno-zeta oxyno-zeta removed the question Further information is requested label May 8, 2023
@aukaost
Copy link
Author

aukaost commented May 9, 2023

Works like a charm, thanks a lot!

@oxyno-zeta
Copy link
Owner

@aukaost I'm very happy to see that and you are welcome!

Do not hesitate if you see a bug or a new feature idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants