Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blob/gcsblob: Allow blank Content-Type to be set #3298

Closed
stanhu opened this issue Aug 7, 2023 · 8 comments · Fixed by #3371
Closed

blob/gcsblob: Allow blank Content-Type to be set #3298

stanhu opened this issue Aug 7, 2023 · 8 comments · Fixed by #3371
Labels
needs info Further discussion or clarification is necessary

Comments

@stanhu
Copy link
Contributor

stanhu commented Aug 7, 2023

Describe the bug

Google Cloud Storage allows a client to request a blob with a different Content-Type via the response-content-type query parameter only if the blob's Content-Type metadata has NOT already been set.

This differs from Amazon S3, which allows response-content-type to be requested regardless of the blob metadata.

With Go Cloud, It's currently not possible to keep the Content-Type value blank for GCS. https://github.com/google/go-cloud/blob/v0.30.0/internal/docs/design.md#portable-types-and-drivers says:

Consider the Bucket.NewWriter method, which infers the content type of the blob based on the first bytes written to it.

That seems to be happening here:

go-cloud/blob/blob.go

Lines 1109 to 1121 in d2d5bed

if opts.ContentType != "" {
t, p, err := mime.ParseMediaType(opts.ContentType)
if err != nil {
cancel()
return nil, err
}
ct := mime.FormatMediaType(t, p)
dw, err := b.b.NewTypedWriter(ctx, key, ct, dopts)
if err != nil {
cancel()
return nil, wrapError(b.b, err, key)
}
w.w = dw

To Reproduce

  1. Create an object via Bucket.NewWriter with Content-Type blank in WriterOptions (https://pkg.go.dev/github.com/google/go-cloud/blob#WriterOptions).
  2. Observe in the bucket that the file has a Content-Type already set.

Expected behavior

The metadata should remain blank so that clients can override the type on request.

Version

v0.30.0, but this has been true for a while.

@stanhu
Copy link
Contributor Author

stanhu commented Aug 7, 2023

@vangent What do you think about introducing a WriterOptions that allows disabling of Content-Type auto-detection? I suppose that might make NewTypedWriter confusing if Content-Type remains blank.

@vangent
Copy link
Contributor

vangent commented Aug 21, 2023

Yea, the driver interface says

        // contentType sets the MIME type of the object to be written. It must not be empty.                                                                                                

so I'm a bit leery of breaking that promise.

Google Cloud Storage allows a client to request a blob with a different Content-Type via the response-content-type query
parameter only if the blob's Content-Type metadata has NOT already been set.

Sorry, I don't quite follow this, can you explain? What are you trying to do? The documentation for this header says

The response-content-type command sets the values to allow in HTTP Content-Type headers. Use this command as many times as needed to create a list of HTTP Content-Type values to allow. If you do not specify a content type, all content types are allowed.

Based on that, it sounds to me like GCS's behavior is correct, no? I think you're implying that you're passing a response-content-type header to AWS and getting back blobs that have a different Content-Type, which sounds wrong.

@vangent vangent added the needs info Further discussion or clarification is necessary label Aug 22, 2023
@vangent vangent closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2023
@stanhu
Copy link
Contributor Author

stanhu commented Dec 14, 2023

Sorry, I don't quite follow this, can you explain? What are you trying to do?

@vagent Sorry I took so long to respond. We have a system where users upload files via Git LFS (Large File System), and they can browse their project. If they click on a LFS file, we want to be careful we set both Content-Disposition and Content-Type properly so that the browser does the right thing. For example, if there's a PDF, we want to send it as a browser attachment due to security concerns. For images, we allow them to be displayed inline by the browser.

When a LFS file is uploaded, we originally stored them as Content-Type: application/octet-stream since they are generally just binary blobs to us.

However, if the file is an image such as JPEG or PNG, we want to display the file to the user with the right Content-Type so they get rendered by the browser.

With GCS, if the object has been stored with application/octet-stream already, you can't create a presigned URL that returns a different Content-Type via Response-Content-Type. You will always get application/octet-stream.

With AWS, even if the object has been stored as application/octet-stream, you can create a presigned URL with a different Content-Type with response-content-type.

It's easier to tag the Content-Type on the way out based on the file extension because generally this is only needed in the case when a user clicks to view an image.

Based on that, it sounds to me like GCS's behavior is correct, no? I think you're implying that you're passing a response-content-type header to AWS and getting back blobs that have a different Content-Type, which sounds wrong.

That's an acceptable use case in AWS, though. Most of the time we don't want to inspect the blob to determine Content-Type since we have millions of these uploads a day. It's also easy to come up with scenarios where we incorrectly tag the Content-Type and users need to override the stored values with something more specific.

@vangent
Copy link
Contributor

vangent commented Dec 15, 2023

Where does Go CDK fit into the above? You suggested updating WriterOptions, but it doesn't sound like you want to change write behavior. Maybe you're using SignedURL?

@vangent vangent reopened this Dec 15, 2023
@stanhu
Copy link
Contributor Author

stanhu commented Dec 15, 2023

Where does Go CDK fit into the above? You suggested updating WriterOptions, but it doesn't sound like you want to change write behavior. Maybe you're using SignedURL?

We use Bucket.NewWriter, and it seems to sniff a Content-Type: https://github.com/google/go-cloud/blob/d2d5bedb50683e2a6b893b75aafc193eca2715db/blob/blob.go#L455C2-L459

We want to leave this Content-Type blank on write and disable this sniffing, since it's easy to get this value wrong. For example, Microsoft Office files are notorious for having a more specific Content-Type than application/zip.

@vangent
Copy link
Contributor

vangent commented Dec 21, 2023

See #3371.

Note that I can't really test this very well, because even though we write the blobs to the backend with an empty ContentType, when we read it back it's not blank. DIfferent backends return different things (e.g., s3 returns "application/octet-stream", while GCS seems to do its own content sniffing).

@dimameshcharakou
Copy link

Many thanks for the work @vangent!

I was testing the changes and disabled the content type detection, but anyhow it was still set. The type that appeared in GCS seems to be auto detected since it's not the real one:

Screenshot_2024-01-21_at_20 45 11

@stanhu
Copy link
Contributor Author

stanhu commented Feb 15, 2024

It looks like the Google Storage SDK also does Content-Type sniffing: googleapis/google-cloud-go#9430

stanhu added a commit to stanhu/google-cloud-go that referenced this issue Feb 16, 2024
As described in
google/go-cloud#3298 (comment),
we want to disable automatic `Content-Type` detection when
inserting an object to Google Cloud Storage (GCS).

Previously it wasn't possible to disable this auto-detection, even
though `googleapi.MediaOptions` provides a `ForceEmptyContentType`
option (https://github.com/googleapis/google-api-go-client/blob/v0.165.0/googleapi/googleapi.go#L283).

We enable this by adding a `Writer` option to set this value.
stanhu added a commit to stanhu/google-cloud-go that referenced this issue Feb 16, 2024
As described in
google/go-cloud#3298 (comment),
we want to disable automatic `Content-Type` detection when
inserting an object to Google Cloud Storage (GCS).

Previously it wasn't possible to disable this auto-detection, even
though `googleapi.MediaOptions` provides a `ForceEmptyContentType`
option (https://github.com/googleapis/google-api-go-client/blob/v0.165.0/googleapi/googleapi.go#L283).

We enable this by adding a `Writer` option to set this value.

Closes googleapis#9430
stanhu added a commit to stanhu/google-cloud-go that referenced this issue Feb 16, 2024
As described in
google/go-cloud#3298 (comment),
we want to disable automatic `Content-Type` detection when
inserting an object to Google Cloud Storage (GCS).

Previously it wasn't possible to disable this auto-detection, even
though `googleapi.MediaOptions` provides a `ForceEmptyContentType`
option (https://github.com/googleapis/google-api-go-client/blob/v0.165.0/googleapi/googleapi.go#L283).

We enable this by adding a `Writer` option to set this value.

Closes googleapis#9430
stanhu added a commit to stanhu/google-cloud-go that referenced this issue Feb 16, 2024
As described in
google/go-cloud#3298 (comment),
we want to disable automatic `Content-Type` detection when
inserting an object to Google Cloud Storage (GCS).

Previously it wasn't possible to disable this auto-detection, even
though `googleapi.MediaOptions` provides a `ForceEmptyContentType`
option (https://github.com/googleapis/google-api-go-client/blob/v0.165.0/googleapi/googleapi.go#L283).

We enable this by adding a `Writer` option to set this value.

Closes googleapis#9430
stanhu added a commit to stanhu/google-cloud-go that referenced this issue Feb 16, 2024
As described in
google/go-cloud#3298 (comment),
we want to disable automatic `Content-Type` detection when
inserting an object to Google Cloud Storage (GCS).

Previously it wasn't possible to disable this auto-detection, even
though `googleapi.MediaOptions` provides a `ForceEmptyContentType`
option (https://github.com/googleapis/google-api-go-client/blob/v0.165.0/googleapi/googleapi.go#L283).

We enable this by adding a `Writer` option to set this value.

Closes googleapis#9430
stanhu added a commit to stanhu/google-cloud-go that referenced this issue Feb 20, 2024
As described in
google/go-cloud#3298 (comment),
we want to disable automatic `Content-Type` detection when
inserting an object to Google Cloud Storage (GCS).

Previously it wasn't possible to disable this auto-detection, even
though `googleapi.MediaOptions` provides a `ForceEmptyContentType`
option (https://github.com/googleapis/google-api-go-client/blob/v0.165.0/googleapi/googleapi.go#L283).

We enable this by adding a `Writer` option to set this value.

Closes googleapis#9430
stanhu added a commit to stanhu/google-cloud-go that referenced this issue Feb 20, 2024
As described in
google/go-cloud#3298 (comment),
we want to disable automatic `Content-Type` detection when
inserting an object to Google Cloud Storage (GCS).

Previously it wasn't possible to disable this auto-detection, even
though `googleapi.MediaOptions` provides a `ForceEmptyContentType`
option (https://github.com/googleapis/google-api-go-client/blob/v0.165.0/googleapi/googleapi.go#L283).

We enable this by adding a `Writer` option to set this value.

Closes googleapis#9430
stanhu added a commit to stanhu/google-cloud-go that referenced this issue Feb 21, 2024
As described in
google/go-cloud#3298 (comment),
we want to disable automatic `Content-Type` detection when
inserting an object to Google Cloud Storage (GCS).

Previously it wasn't possible to disable this auto-detection, even
though `googleapi.MediaOptions` provides a `ForceEmptyContentType`
option (https://github.com/googleapis/google-api-go-client/blob/v0.165.0/googleapi/googleapi.go#L283).

We enable this by adding a `Writer` option to set this value.

Closes googleapis#9430
stanhu added a commit to stanhu/google-cloud-go that referenced this issue Feb 21, 2024
As described in
google/go-cloud#3298 (comment),
we want to disable automatic `Content-Type` detection when
inserting an object to Google Cloud Storage (GCS).

Previously it wasn't possible to disable this auto-detection, even
though `googleapi.MediaOptions` provides a `ForceEmptyContentType`
option (https://github.com/googleapis/google-api-go-client/blob/v0.165.0/googleapi/googleapi.go#L283).

We enable this by adding a `Writer` option to set this value.

Closes googleapis#9430
stanhu added a commit to stanhu/google-cloud-go that referenced this issue Feb 22, 2024
As described in
google/go-cloud#3298 (comment),
we want to disable automatic `Content-Type` detection when
inserting an object to Google Cloud Storage (GCS).

Previously it wasn't possible to disable this auto-detection, even
though `googleapi.MediaOptions` provides a `ForceEmptyContentType`
option (https://github.com/googleapis/google-api-go-client/blob/v0.165.0/googleapi/googleapi.go#L283).

We enable this by adding a `Writer` option to set this value.

Closes googleapis#9430
stanhu added a commit to stanhu/google-cloud-go that referenced this issue Feb 22, 2024
As described in
google/go-cloud#3298 (comment),
we want to disable automatic `Content-Type` detection when
inserting an object to Google Cloud Storage (GCS).

Previously it wasn't possible to disable this auto-detection, even
though `googleapi.MediaOptions` provides a `ForceEmptyContentType`
option (https://github.com/googleapis/google-api-go-client/blob/v0.165.0/googleapi/googleapi.go#L283).

We enable this by adding a `Writer` option to set this value.

Closes googleapis#9430
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info Further discussion or clarification is necessary
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants