Skip to content

Storage: Downloads give incorrect not found error if object overwritten between calls #4499

@BrandonY

Description

@BrandonY

Downloads from GCS currently seem to be accomplished as a two step process:

  1. Fetch the current object's metadata, including a generation number.
  2. Using the generation number returned in step 1, attempt to download the object.

Sample download showing an example of this:

client = storage.Client()
bucket = client.get_bucket('yarbrough-test')
blob = bucket.get_blob('test.txt')
blob.download_to_filename('/tmp/test.txt')

What this does:

GET /storage/v1/b/yarbrough-test?projection=noAcl
GET /storage/v1/b/yarbrough-test/o/test.txt
GET /download/storage/v1/b/yarbrough-test/o/test.txt?generation=1511999426608623&alt=media

The problem:

In the case where the object has been overwritten by a new generation between the metadata GET and the donwload GET, a "not found" error would be reported to the user even though at no point did an object with the specified name not exist. In other words, this happens:

Time 1: GET /storage/v1/b/bucket/o/object returns generation 12345.
Time 2: User in another thread creates a new generation of text.txt, generation 23456. Generation 12345 is deleted and now bucket "bucket" object "object" implies the new generation.
Time 3: GET /download/storage/v1/b/bucket/o/object?generation=12345 fails with a 404 because there is no longer a generation 12345.

Steps to reproduce:

  1. Create a GCS bucket. Do NOT enable versioning.
  2. Start a script which rapidly overwrites an object with a new generation.
  3. Start a second script which repeatedly downloads the object.

You are likely to eventually run into a "404", although an object exists at all times.

How to fix:

I suggest this be fixed by retrying the download step if it returns with a 404 after the metadata is successfully fetched, unless the user explicitly asked for a particular generation number. Alternately, I might suggest skipping the metadata fetch step entirely, unless the user asks for some piece of metadata, which would significantly reduce the latency for small downloads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: storageIssues related to the Cloud Storage API.priority: p2Moderately-important priority. Fix may not be included in next release.type: questionRequest for information or clarification. Not an issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions