-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: auto content-type on blob creation #338
Conversation
Codecov Report
@@ Coverage Diff @@
## master #338 +/- ##
============================================
- Coverage 63.27% 63.20% -0.07%
- Complexity 609 619 +10
============================================
Files 32 32
Lines 5113 5123 +10
Branches 487 488 +1
============================================
+ Hits 3235 3238 +3
- Misses 1713 1720 +7
Partials 165 165
Continue to review full report at Codecov.
|
@@ -81,6 +81,7 @@ | |||
import java.util.LinkedList; | |||
import java.util.List; | |||
import java.util.Map; | |||
import javax.activation.MimetypesFileTypeMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use com.google.common.net.MediaType so you don't introduce another dependency, and ping the guava team to get this out of beta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
com.google.common.net.MediaType doesn't provide mapping extension to the meida type. Its parse method recognizes strings like "text/plain".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Files.probeContentType(Path)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or URLConnection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both Files.probeContentType(Path) and URLConnection work! I will use it instead. Thanks for the hint.
google-cloud-storage/pom.xml
Outdated
@@ -88,6 +88,11 @@ | |||
<groupId>org.threeten</groupId> | |||
<artifactId>threetenbp</artifactId> | |||
</dependency> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this included in java 6 and later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it works with jdk7,8. This dependency is required for java9+
return contentType; | ||
} | ||
return firstNonNull( | ||
FILE_NAME_MAP.getContentTypeFor(object.getName().toLowerCase()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toLowerCase always should have a locale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
case "create": | ||
return storage.create(blobInfo); | ||
case "writer": | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the braces here?
String customType = "custom/type"; | ||
BlobId blobId = BlobId.of(BUCKET, names[0]); | ||
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType(customType).build(); | ||
blob = createBlob(method, blobInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't reuse local variable. There are two separate blob objects here that coincidentally share a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a reason to reuse the variable but after rewriting the test that reason is not actual anymore. Thanks for catching that.
private void testAutoContentType(String method) throws IOException { | ||
String[] names = {"file1.txt", "dir with spaces/Pic.Jpg", "no_extension"}; | ||
String[] types = {"text/plain", "image/jpeg", "application/octet-stream"}; | ||
Blob blob = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can push this into the for loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dmitry-fa, thanks for addressing the request. Have two requests.
@@ -372,6 +375,16 @@ public StorageObject create( | |||
} | |||
} | |||
|
|||
private static String detectContentType(StorageObject object) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an optional feature that's disabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about making this feature optional, but I didn't find an appropriate place to store this setting. It could be a property of a project/client/bucket/object.
I only found: https://cloud.google.com/storage/docs/metadata#content-type
which states: If the Content-Type is not specified by the uploader and cannot be determined, it is set to application/octet-stream or application/x-www-form-urlencoded, depending on how you uploaded the object.
If I read this correctly, the Storage should attempt to determine the content type if not explicitly set, so this feature should be on by default.
BTW, right now the content type of blobs created with Storage.create() methods is null
, but should be application/octet-stream
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storage doesn't provide contentType detection in the API.
If contentType is set at creation then application/octet-stream
is used.
Please make this optional otherwise it's a behavior change not originally intended on uploads with the Java client.
} | ||
|
||
private void testAutoContentType(String method) throws IOException { | ||
String[] names = {"file1.txt", "dir with spaces/Pic.Jpg", "no_extension"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to infer the type based on bytes rather than file extension? That's how Golang does it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not always possible. In case of resumeable upload a blob is created first and the content is available later:
writer = storage.writer(blobInfo); // creating a blob, detecting content-type, no contect
writer.write(content);
writer.close();
I did an experiment:
I created a file PDF.txt
with a pdf content. Then I uploaded it as PDF.jpg
with various clients and checked the resulting content type. I got the following:
gsutil: text/plain
NodeJS: image/jpeg
Go: application/pdf
To achieve consistent behavior across all the clients it should be implemented on the server-side at the moment when the upload is finished.
Until this feature is implemented I suggest type detecting based on file extension, as it has been done recently in nodejs (googleapis/nodejs-storage#1190)
@@ -372,6 +375,16 @@ public StorageObject create( | |||
} | |||
} | |||
|
|||
private static String detectContentType(StorageObject object) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storage doesn't provide contentType detection in the API.
If contentType is set at creation then application/octet-stream
is used.
Please make this optional otherwise it's a behavior change not originally intended on uploads with the Java client.
For the time being, objects created with Storage.create() method have content-type set to null if not explicitly set, not Is it possible to store custom properties on the project level? |
Whoops I meant, If contentType is not set at creation then application/octet-stream is used by the service. Please introduce this as a new option in the request instead of a client level option for now. |
Not sure if it will be convenient to specify 'I want to detect the content type' each time when a new object is created. The content type could be provided explicitly instead. Okay, I'll think about that. If the user specifies explicitly that he wants to detect the content type, he can also select a strategy for that: 'by extension', 'by content' or 'by both'. |
Fixes #47
Proposed fix of the auto detection of the content type based on the blob name, like other clients do (php, nodejs, go).
The class javax.activation.MimetypesFileTypeMap from JDK is used to detect the content type. Using this class with Java 11+ requires the dependency on Java Activation Framework.