New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to ActiveStorage::Blob to set extract_content_type_from_io #32833

Merged
merged 1 commit into from May 8, 2018

Conversation

Projects
None yet
4 participants
@ryanwhocodes
Contributor

ryanwhocodes commented May 6, 2018

activestorage CHANGELOG.md addition

  • Use extract_content_type_from_io: false as an argument in
    ActiveStorage::Blob.create_after_upload to manually override the
    automatic extraction of content type from the io.

    This address the issue where the user defines a content type as an argument
    but the automatic MimeType inference from the io is not what is expected.
    It means a user avoids needing force the correct content type by
    patching ActiveStorage::Blob.extract_content_type.

    Ryan Davidson

Summary

This adds a boolean extract_content_type_from_io to ActiveStorage::Blob
methods #create_after_upload, #build_after_upload and #upload. It
allows a user to manually override the automatic MimeType extraction from
the io. This provides a patch for when Marcel::MimeType.for returns an
incorrect content type when #extract_content_type is called from the #upload method.

This patch aims to address:

  • #32632 Provide a way to deactivate ActiveStorage MIME type inference

It fixes bugs where a content_type is automatically extracted from the input and gives an inaccurate result. This is due to a bug in the Marcel::MimeType gem that is called in the extract_content_type method (which was called by default in the #upload method, which is in turn called by #build_after_upload and #create_after_upload)

2.5.0 Marcel::MimeType.for(StringIO.new("1895 1"), name: 'text.csv', declared_type: 'text/csv')
=> "text/csv"
2.5.0 Marcel::MimeType.for(StringIO.new("Article dates analysis"), name: 'text.csv', declared_type: 'text/csv')
=> "message/news"
2.5.0 Marcel::MimeType.for(StringIO.new("dates analysis"), name: 'text.csv', declared_type: 'text/csv')
=> "text/csv"
2.5.0 Marcel::MimeType.for(StringIO.new("Dates analysis"), name: 'text.csv', declared_type: 'text/csv')
=> "text/csv"
2.5.0 Marcel::MimeType.for(StringIO.new("article"), name: 'text.csv', declared_type: 'text/csv')
=> "text/csv"
2.5.0 Marcel::MimeType.for(StringIO.new("Article"), name: 'text.csv', declared_type: 'text/csv')
=> "message/news"

Other Information

Added document comments to activestorage/app/models/active_storage/blob.rb

Finally, if your pull request affects documentation or any non-code
changes, guidelines for those changes are available
here

Thanks for contributing to Rails!

@rails-bot

This comment has been minimized.

rails-bot commented May 6, 2018

Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @schneems (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review.

Please see the contribution instructions for more information.

activestorage/app/models/active_storage/blob.rb Outdated
def build_after_upload(io:, filename:, content_type: nil, metadata: nil)
# Set the +content_type+ and +extract_content_type_from_io+ to +false+ to manually
# specify the content type rather than using automatic content type inference.
def build_after_upload(io:, filename:, content_type: nil, metadata: nil, extract_content_type_from_io: true)

This comment has been minimized.

@georgeclaghorn

georgeclaghorn May 8, 2018

Member

I’d like to name this argument identify here and in create_after_upload for consistency with the identify method.

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

Renamed to identify in this commit 4f9defb42cf7c543906a7de72c7b6f91fa5058d4

activestorage/app/models/active_storage/blob.rb Outdated
#
# Normally, you do not have to call this method directly at all. Use the factory class methods of +build_after_upload+
# and +create_after_upload!+.
def upload(io)
def upload(io, extract_content_type_from_io = true)

This comment has been minimized.

@georgeclaghorn

georgeclaghorn May 8, 2018

Member

Let’s use a keyword argument here. Name it identify to match build_after_upload and create_after_upload.

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

Making it a keyword argument makes it much clearer. Amended here e77f3db21842d20a489853b41eed4233c5394666

activestorage/test/models/blob_test.rb Outdated
test "create after upload extracts content_type when extract_content_type_from_io: true" do
blob = create_csv_blob data: "Article,dates,analysis\n1, 2, 3", content_type: "text/csv", extract_content_type_from_io: true
assert_equal "message/news", blob.content_type
end

This comment has been minimized.

@georgeclaghorn

georgeclaghorn May 8, 2018

Member

The new option defaults to true, so this test is superfluous. The preexisting tests above cover this behavior.

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

Removed this test in this commit 539e880fe4731ef430a857a35ec9efc167edf263

activestorage/test/test_helper.rb Outdated
@@ -54,6 +54,10 @@ def create_blob(data: "Hello world!", filename: "hello.txt", content_type: "text
ActiveStorage::Blob.create_after_upload! io: StringIO.new(data), filename: filename, content_type: content_type
end
def create_csv_blob(data: "article,dates,analysis\n1, 2, 3", filename: "my_csv.csv", content_type: "text/csv", extract_content_type_from_io: true)
ActiveStorage::Blob.create_after_upload! io: StringIO.new(data), filename: filename, content_type: content_type, extract_content_type_from_io: extract_content_type_from_io
end

This comment has been minimized.

@georgeclaghorn

georgeclaghorn May 8, 2018

Member

We don’t need a special helper for creating a CSV blob. Modify create_blob to take an identify argument, then call it from the new test:

test "create after upload uses content_type when extract_content_type_from_io: false" do
  blob = create_blob data: "Article,dates,analysis\n1,2,3", filename: "table.csv", content_type: "text/csv", identify: false
  assert_equal "text/csv", blob.content_type
end

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

Done in this commit 539e880fe4731ef430a857a35ec9efc167edf263

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

Just remembered to include the filename arg for this updated test d6e1197edd9e626172de6dbf4c0380088c6e28f7

activestorage/test/models/blob_test.rb Outdated
test "create after upload extracts content_type from io when no content_type given" do
blob = create_csv_blob content_type: nil
assert_equal "text/csv", blob.content_type
end

This comment has been minimized.

@georgeclaghorn

georgeclaghorn May 8, 2018

Member

This test doesn’t seem terribly valuable. I’d omit it or modify it to cover the case where content_type is nil and identify is false.

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

Modified for content_type nil and identify: false e34eec2c2c2133edf4512b1370455fe4c669c0ca

@ryanwhocodes

This comment has been minimized.

Contributor

ryanwhocodes commented May 8, 2018

@georgeclaghorn This PR is now updated with the requests in your review. I felt like I learnt a bit more about Rails architecture and how to write better Ruby from this activity!

activestorage/CHANGELOG.md Outdated
This address the issue where the user defines a content type as an argument
but the automatic MimeType inference from the io is not what is expected.
It means a user avoids needing force the correct content type by
patching `ActiveStorage::Blob.extract_content_type`.

This comment has been minimized.

@georgeclaghorn

georgeclaghorn May 8, 2018

Member

It’s uncommon to call ActiveStorage::Blob#create_after_upload or #build_after_upload directly. Can we demonstrate passing identify: false to ActiveStorage::Attached::{One,Many}#attach instead?

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

I've added the update but not sure if the content is what you had in mind. It has a similar wording to the comments in blob.rb and I've added in an example based on what is in the RailsGuides.

commit 4152b834102b8dbd9f6cae9965d4d00bc1b3f512

activestorage/app/models/active_storage/blob.rb Outdated
self.checksum = compute_checksum_in_chunks(io)
self.content_type = extract_content_type(io)
self.content_type = extract_content_type(io) if self.content_type.nil? || identify

This comment has been minimized.

@georgeclaghorn

georgeclaghorn May 8, 2018

Member

content_type doesn’t need an explicit receiver in the conditional. Replace self.content_type.nil? with content_type.nil?.

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

Removed reference to content_type explicit receiver here 606f42457796258f9f94622678f5d4aac10b6e29

activestorage/app/models/active_storage/blob.rb Outdated
@@ -44,21 +44,25 @@ def find_signed(id)
end
# Returns a new, unsaved blob instance after the +io+ has been uploaded to the service.
def build_after_upload(io:, filename:, content_type: nil, metadata: nil)
# Set the +content_type+ and +identify+ to +false+ to manually

This comment has been minimized.

@georgeclaghorn

georgeclaghorn May 8, 2018

Member

The first part of this sentence makes it sound like both content_type and identify should be nil. Here’s a clarifying edit:

When providing a content type, pass identify: false to bypass automatic content type inference.

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

Clarified this sentence with the edit bad8fa0

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

Added in the formatting of identify: false here 231ad55

activestorage/app/models/active_storage/blob.rb Outdated
@@ -142,13 +146,14 @@ def service_headers_for_direct_upload
#
# Prior to uploading, we compute the checksum, which is sent to the service for transit integrity validation. If the
# checksum does not match what the service receives, an exception will be raised. We also measure the size of the +io+
# and store that in +byte_size+ on the blob record.
# and store that in +byte_size+ on the blob record. The content type is automatically extracted from the +io+ unless
# the user has specified a +content_type+ and set the +identify+ as false.

This comment has been minimized.

@georgeclaghorn

georgeclaghorn May 8, 2018

Member

Remove the extra “the” between “set” and “identify.”

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

Removed extra 'the' here
c79c9e1

activestorage/app/models/active_storage/blob.rb Outdated
@@ -44,21 +44,24 @@ def find_signed(id)
end
# Returns a new, unsaved blob instance after the +io+ has been uploaded to the service.
def build_after_upload(io:, filename:, content_type: nil, metadata: nil)
# When providing a content type, pass +identify: false+ to bypass automatic content type inference.

This comment has been minimized.

@georgeclaghorn

georgeclaghorn May 8, 2018

Member

+ only works around individual words. Use <tt></tt> instead:

pass <tt>identify: false</tt>

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

Ah right, updated 209f021c15373832726496c157b86223ceaa291b

activestorage/app/models/active_storage/blob.rb Outdated
def create_after_upload!(io:, filename:, content_type: nil, metadata: nil)
build_after_upload(io: io, filename: filename, content_type: content_type, metadata: metadata).tap(&:save!)
# Set the +content_type+ and +identify+ to +false+ to manually
# specify the content type rather than using automatic content type inference.

This comment has been minimized.

@georgeclaghorn

georgeclaghorn May 8, 2018

Member

This needs the same treatment as build_after_upload.

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

create_after_upload! now has the same comment as build_after_upload
e5f7519

activestorage/app/models/active_storage/blob.rb Outdated
@@ -142,13 +145,14 @@ def service_headers_for_direct_upload
#
# Prior to uploading, we compute the checksum, which is sent to the service for transit integrity validation. If the
# checksum does not match what the service receives, an exception will be raised. We also measure the size of the +io+
# and store that in +byte_size+ on the blob record.
# and store that in +byte_size+ on the blob record. The content type is automatically extracted from the +io+ unless
# the user has specified a +content_type+ and set +identify+ as false.

This comment has been minimized.

@georgeclaghorn

georgeclaghorn May 8, 2018

Member

Use “you specify” instead of “the user has specified” to match the adjacent paragraphs, which use the second person.

“Set identify as false” doesn’t read quite right. How about “pass identify: false”?

This comment has been minimized.

@ryanwhocodes

ryanwhocodes May 8, 2018

Contributor

Yes, that read more fluently and consistently with the rest of the test
190693c

@ryanwhocodes

This comment has been minimized.

Contributor

ryanwhocodes commented May 8, 2018

@georgeclaghorn

Looks like this change would affect the Rails Guides too:

http://edgeguides.rubyonrails.org/active_storage_overview.html#attaching-file-io-objects in the section 3.3 Attaching File/IO Objects

I've added this to the commit and squashed it in.

@georgeclaghorn

This comment has been minimized.

Member

georgeclaghorn commented May 8, 2018

Can you please squash your commits into one?

@ryanwhocodes

This comment has been minimized.

Contributor

ryanwhocodes commented May 8, 2018

Commits are now squashed and pushed up

Add option to ActiveStorage::Blob to set extract_content_type_from_io
This adds a boolean argument called identify to ActiveStorage::Blob
methods #create_after_upload, #build_after_upload and #upload. It
allows a user to bypass the automatic content_type inference from
the io.

@georgeclaghorn georgeclaghorn merged commit 38ea7bb into rails:master May 8, 2018

2 checks passed

codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ryanwhocodes ryanwhocodes deleted the ryanwhocodes:activestorage_blob_set_content_type branch May 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment