Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: auto content-type on blob creation #338

Merged
merged 8 commits into from
Jul 15, 2020
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1809,7 +1809,8 @@ public static Builder newBuilder() {
Bucket create(BucketInfo bucketInfo, BucketTargetOption... options);

/**
* Creates a new blob with no content.
* Creates a new blob with no content. The content type is detected from the blob name if not
* explicitly set.
*
* <p>Example of creating a blob with no content.
*
Expand All @@ -1830,9 +1831,10 @@ public static Builder newBuilder() {
* Creates a new blob. Direct upload is used to upload {@code content}. For large content, {@link
* #writer} is recommended as it uses resumable upload. MD5 and CRC32C hashes of {@code content}
* are computed and used for validating transferred data. Accepts an optional userProject {@link
* BlobGetOption} option which defines the project id to assign operational costs.
* BlobGetOption} option which defines the project id to assign operational costs. The content
* type is detected from the blob name if not explicitly set.
*
* <p>Example of creating a blob from a byte array.
* <p>Example of creating a blob from a byte array:
*
* <pre>{@code
* String bucketName = "my-unique-bucket";
Expand All @@ -1853,9 +1855,9 @@ public static Builder newBuilder() {
* {@code content}. For large content, {@link #writer} is recommended as it uses resumable upload.
* MD5 and CRC32C hashes of {@code content} are computed and used for validating transferred data.
* Accepts a userProject {@link BlobGetOption} option, which defines the project id to assign
* operational costs.
* operational costs. The content type is detected from the blob name if not explicitly set.
*
* <p>Example of creating a blob from a byte array.
* <p>Example of creating a blob from a byte array:
*
* <pre>{@code
* String bucketName = "my-unique-bucket";
Expand All @@ -1874,10 +1876,11 @@ Blob create(

/**
* Creates a new blob. Direct upload is used to upload {@code content}. For large content, {@link
* #writer} is recommended as it uses resumable upload. By default any md5 and crc32c values in
* #writer} is recommended as it uses resumable upload. By default any MD5 and CRC32C values in
* the given {@code blobInfo} are ignored unless requested via the {@code
* BlobWriteOption.md5Match} and {@code BlobWriteOption.crc32cMatch} options. The given input
* stream is closed upon success.
* stream is closed upon success. The content type is detected from the blob name if not
* explicitly set.
*
* <p>This method is marked as {@link Deprecated} because it cannot safely retry, given that it
* accepts an {@link InputStream} which can only be consumed once.
Expand Down Expand Up @@ -2490,11 +2493,12 @@ Blob create(
ReadChannel reader(BlobId blob, BlobSourceOption... options);

/**
* Creates a blob and return a channel for writing its content. By default any md5 and crc32c
* values in the given {@code blobInfo} are ignored unless requested via the {@code
* BlobWriteOption.md5Match} and {@code BlobWriteOption.crc32cMatch} options.
* Creates a blob and returns a channel for writing its content. The content type is detected from
* the blob name if not explicitly set. By default any MD5 and CRC32C values in the given {@code
* blobInfo} are ignored unless requested via the {@code BlobWriteOption.md5Match} and {@code
* BlobWriteOption.crc32cMatch} options.
*
* <p>Example of writing a blob's content through a writer.
* <p>Example of writing a blob's content through a writer:
*
* <pre>{@code
* String bucketName = "my-unique-bucket";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,12 @@
import java.io.InputStream;
import java.io.OutputStream;
import java.math.BigInteger;
import java.net.FileNameMap;
import java.net.URLConnection;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import java.util.Locale;
import java.util.Map;

public class HttpStorageRpc implements StorageRpc {
Expand All @@ -98,6 +101,7 @@ public class HttpStorageRpc implements StorageRpc {
private final HttpRequestInitializer batchRequestInitializer;

private static final long MEGABYTE = 1024L * 1024L;
private static final FileNameMap FILE_NAME_MAP = URLConnection.getFileNameMap();

public HttpStorageRpc(StorageOptions options) {
HttpTransportOptions transportOptions = (HttpTransportOptions) options.getTransportOptions();
Expand Down Expand Up @@ -286,7 +290,7 @@ public StorageObject create(
.insert(
storageObject.getBucket(),
storageObject,
new InputStreamContent(storageObject.getContentType(), content));
new InputStreamContent(detectContentType(storageObject), content));
insert.getMediaHttpUploader().setDirectUploadEnabled(true);
Boolean disableGzipContent = Option.IF_DISABLE_GZIP_CONTENT.getBoolean(options);
if (disableGzipContent != null) {
Expand Down Expand Up @@ -372,6 +376,16 @@ public Tuple<String, Iterable<StorageObject>> list(final String bucket, Map<Opti
}
}

private static String detectContentType(StorageObject object) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an optional feature that's disabled by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about making this feature optional, but I didn't find an appropriate place to store this setting. It could be a property of a project/client/bucket/object.

I only found: https://cloud.google.com/storage/docs/metadata#content-type
which states: If the Content-Type is not specified by the uploader and cannot be determined, it is set to application/octet-stream or application/x-www-form-urlencoded, depending on how you uploaded the object.

If I read this correctly, the Storage should attempt to determine the content type if not explicitly set, so this feature should be on by default.

BTW, right now the content type of blobs created with Storage.create() methods is null, but should be application/octet-stream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage doesn't provide contentType detection in the API.

If contentType is set at creation then application/octet-stream is used.

Please make this optional otherwise it's a behavior change not originally intended on uploads with the Java client.

String contentType = object.getContentType();
if (contentType != null) {
return contentType;
}
return firstNonNull(
FILE_NAME_MAP.getContentTypeFor(object.getName().toLowerCase(Locale.ENGLISH)),
"application/octet-stream");
}

private static Function<String, StorageObject> objectFromPrefix(final String bucket) {
return new Function<String, StorageObject>() {
@Override
Expand Down Expand Up @@ -811,9 +825,7 @@ public String open(StorageObject object, Map<Option, ?> options) {
HttpRequest httpRequest =
requestFactory.buildPostRequest(url, new JsonHttpContent(jsonFactory, object));
HttpHeaders requestHeaders = httpRequest.getHeaders();
requestHeaders.set(
"X-Upload-Content-Type",
firstNonNull(object.getContentType(), "application/octet-stream"));
requestHeaders.set("X-Upload-Content-Type", detectContentType(object));
String key = Option.CUSTOMER_SUPPLIED_KEY.getString(options);
if (key != null) {
BaseEncoding base64 = BaseEncoding.base64();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3273,4 +3273,45 @@ public void testBlobReload() throws Exception {
updated.delete();
assertNull(updated.reload());
}

private Blob createBlob(String method, BlobInfo blobInfo) throws IOException {
switch (method) {
case "create":
return storage.create(blobInfo);
case "writer":
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the braces here?

storage.writer(blobInfo).close();
return storage.get(BlobId.of(blobInfo.getBucket(), blobInfo.getName()));
}
default:
throw new IllegalArgumentException("Unknown method " + method);
}
}

private void testAutoContentType(String method) throws IOException {
String[] names = {"file1.txt", "dir with spaces/Pic.Jpg", "no_extension"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to infer the type based on bytes rather than file extension? That's how Golang does it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not always possible. In case of resumeable upload a blob is created first and the content is available later:

writer = storage.writer(blobInfo); // creating a blob, detecting content-type, no contect
writer.write(content);
writer.close();              

I did an experiment:
I created a file PDF.txt with a pdf content. Then I uploaded it as PDF.jpg with various clients and checked the resulting content type. I got the following:

gsutil: text/plain
NodeJS: image/jpeg
Go: application/pdf

To achieve consistent behavior across all the clients it should be implemented on the server-side at the moment when the upload is finished.

Until this feature is implemented I suggest type detecting based on file extension, as it has been done recently in nodejs (googleapis/nodejs-storage#1190)

String[] types = {"text/plain", "image/jpeg", "application/octet-stream"};
Blob blob = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can push this into the for loop

for (int i = 0; i < names.length; i++) {
BlobId blobId = BlobId.of(BUCKET, names[i]);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).build();
blob = createBlob(method, blobInfo);
assertEquals(types[i], blob.getContentType());
}
String customType = "custom/type";
BlobId blobId = BlobId.of(BUCKET, names[0]);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType(customType).build();
blob = createBlob(method, blobInfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't reuse local variable. There are two separate blob objects here that coincidentally share a variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a reason to reuse the variable but after rewriting the test that reason is not actual anymore. Thanks for catching that.

assertEquals(customType, blob.getContentType());
}

@Test
public void testAutoContentTypeCreate() throws IOException {
testAutoContentType("create");
}

@Test
public void testAutoContentTypeWriter() throws IOException {
testAutoContentType("writer");
}
}