Change open to use Down.open #8

renchap · 2016-08-31T09:15:12Z

From Janko :

Could #open be implemented in terms of Down.open? E.g. Down.open(url(id))? That would be useful for things like restore_cached_data plugin, which extracts metadata from cached files (which in this case might remote, directly uploaded to Google Cloud Storage), so that it doesn't need to download the whole file just to extract metadata.
For the restore_cached_plugin, when a cached file is sent in a hidden field (e.g. with direct uploads or on validation errors), an attacker could technically modify the "metadata" attributes, thus bypassing file validations. The best way that I found to solve this in a generic way was to give the ability to re-extract metadata on that assignment. This is also useful when doing direct uploads to services that don't extract metadata, to be able to extract metadata manually.
Extracting metadata uses #open, which I previously implemented in storages to download the whole file, which wasn't ideal since that part is synchronous. However, I managed to come up with Down.open which creates an IO representing the remote file, which will download only how much is read. Yeah, that would require a signed URL. That's the way I implemented it with S3, relying on the fact that signed URLs will always work. Internally Down.open uses a generic Down::ChunkedIO, which might be usable if you don't have a URL (I used it in GridFS).

This will require #7 first

The text was updated successfully, but these errors were encountered:

renchap · 2017-03-14T11:08:00Z

@janko-m I had a new look at this and signed URLs on GCS requires a service account key. You cant get one with a standard account, so it will fail in many cases.
Any other idea on how to make it work ? It is possible to retrieve the object with an HTTP GET, but it requires to create the correct HTTP auth headers and it is no longer a simple GET.

janko · 2017-03-19T07:01:13Z

@renchap I just pushed a new version of Down which allows Down.open to accept request headers, so as long as you figure out which URL is the google client internally hitting with which request headers, you should be able to make the same request with Down.open.

Down.open("http://example.com/image.jpg", {"Authorization" => "..."})

janko · 2017-05-24T08:38:04Z

I recently thought of one trick that would downloading on-demand, but still allow you to use the get_object, thus keeping google-api-client's download resumability and not having to rely on a URL. Down.open internally uses Down::ChunkedIO, which takes any chunk generator and transforms it into an IO object, so you could use Down::ChunkedIO directly with this trick:

def open(id)
  read_pipe, write_pipe = IO.pipe

  Thread.new do
    begin
      storage_api.get_object(@bucket, object_name(id), download_dest: write_pipe)
    ensure
      write_pipe.close
    end
  end

  chunks = Enumerator.new { |y| y << read_pipe.readpartial(16*1024) until read_pipe.eof? }
  object = storage_api.get_object(@bucket, object_name(id)) # hopefully not providing :download_dest makes a HEAD request?

  Down::ChunkedIO.new(
    chunks: chunks,
    size: object.size,
    on_close: -> { read_pipe.close }, # will cause download to abort with an Errno::EPIPE, and google-api-client will hopefully clean up
  )
end

I considered doing the same with in the S3 storage, but considering that aws-sdk has the ability to stream chunks as they are downloaded if you pass a block to Aws::S3::Object#get, I managed to get away without IO.pipe:

  def open(id)
    io = Down::ChunkedIO.new(chunks: (object = object(id)).enum_for(:get))
    io.size = object.content_length # Down::ChunkedIO retrieves first chunk on initialization, so Aws::S3::Object#content_length is populated now
    io
  end

rosskevin · 2017-10-12T20:07:42Z

@renchap With my #16 PR, I tried:

      def open(id)
        Down.open(url(id))
      end

The linter tests broke with:

Minitest::UnexpectedError: Down::ClientError: 403 Forbidden
    /Users/kross/.rvm/gems/ruby-2.4.1@shrine-google_cloud_storage/gems/down-4.1.0/lib/down/net_http.rb:214:in `response_error!'
    /Users/kross/.rvm/gems/ruby-2.4.1@shrine-google_cloud_storage/gems/down-4.1.0/lib/down/net_http.rb:162:in `open'
    /Users/kross/.rvm/gems/ruby-2.4.1@shrine-google_cloud_storage/gems/down-4.1.0/lib/down/backend.rb:14:in `open'
    /Users/kross/.rvm/gems/ruby-2.4.1@shrine-google_cloud_storage/gems/down-4.1.0/lib/down.rb:14:in `open'
    /Users/kross/projects/shrine-google_cloud_storage/lib/shrine/storage/google_cloud_storage.rb:76:in `open'
    /Users/kross/.rvm/gems/ruby-2.4.1@shrine-google_cloud_storage/gems/shrine-2.8.0/lib/shrine/storage/linter.rb:68:in `lint_open'
    /Users/kross/.rvm/gems/ruby-2.4.1@shrine-google_cloud_storage/gems/shrine-2.8.0/lib/shrine/storage/linter.rb:40:in `call'
    /Users/kross/projects/shrine-google_cloud_storage/test/gcs_test.rb:51:in `block (2 levels) in <top (required)>'

I'm not sure of the intent, so I'll leave #16 as-is, but seems like this one is not far off once that is merged.

janko · 2017-10-12T21:23:32Z

Down::ClientError: 403 Forbidden means that the request to the given URL responded with 403 Forbidden, so url(id) apparently didn't produce a correct URL. But it probably makes more sense to leave #16 as-is, since it preserves the current behaviour, and later we can see how to do it.

Btw, I wouldn't use Down.open, because google-api-client has the ability to automatically retry failed/interrupted downloads (though I intend to add this ability to Down at some point). It seems that File#download forward the first argument directly to google-api-client's #get_object so it seems like it doesn't need to be a path but it can be any IO object. Then we could achieve streaming like this:

class ProcIO
  def initialize(&proc)
    @proc = proc
  end

  def write(data)
    @proc.call(data)
    data.bytesize # it's ruby convention to return bytes written, it's required by methods like IO.copy_stream
  end
end

def open(id)
  file = get_file(id)

  chunks = Enumerator.new do |yielder|
    proc_io = ProcIO.new { |data| yielder << data }
    file.download(proc_io)
  end

  Down::ChunkedIO.new(
    chunks: chunks,
    size: file.size,
    data: { file: file } # so the user can access it if they want to
  )
end

janko · 2017-10-12T21:26:40Z

Note that the above should be possible because a while ago I expanded what can be the :download_dest.

Closes renchap#8

This will make the storage work well with the restore_metadata Shrine plugin, as it will allow extracting metadata by downloading only a portion of the file content. Closes renchap#8

* Implement #open using Down::ChunkedIO This will make the storage work well with the restore_metadata Shrine plugin, as it will allow extracting metadata by downloading only a portion of the file content. Closes #8 * Add ProcIO#flush required by google-api-client * Open test image file in binary mode That way retrieved content will in binary encoding, which is generally safer to use cross-platform. It also avoids comparing the same content in binary and UTF-8 encoding in tests, which will differ even if the bytes themselves are the same. * Don't verify downloaded data in #open

renchap added the enhancement label Aug 31, 2016

renchap mentioned this issue Oct 12, 2017

Switch to google-cloud-storage gem, delegate url(expires:) to #presign #16

Closed

janko added a commit to janko/shrine-google_cloud_storage that referenced this issue Nov 22, 2017

Implement #open using Down::ChunkedIO

737c8de

Closes renchap#8

janko mentioned this issue Nov 22, 2017

Implement #open using Down::ChunkedIO #20

Merged

renchap closed this as completed in #20 Dec 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change open to use Down.open #8

Change open to use Down.open #8

renchap commented Aug 31, 2016

renchap commented Mar 14, 2017

janko commented Mar 19, 2017

janko commented May 24, 2017 •

edited

Loading

rosskevin commented Oct 12, 2017

janko commented Oct 12, 2017 •

edited

Loading

janko commented Oct 12, 2017

Change open to use Down.open #8

Change open to use Down.open #8

Comments

renchap commented Aug 31, 2016

renchap commented Mar 14, 2017

janko commented Mar 19, 2017

janko commented May 24, 2017 • edited Loading

rosskevin commented Oct 12, 2017

janko commented Oct 12, 2017 • edited Loading

janko commented Oct 12, 2017

janko commented May 24, 2017 •

edited

Loading

janko commented Oct 12, 2017 •

edited

Loading