-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Storage: Downloads give incorrect not found error if object overwritten between calls #4499
Description
Downloads from GCS currently seem to be accomplished as a two step process:
- Fetch the current object's metadata, including a generation number.
- Using the generation number returned in step 1, attempt to download the object.
Sample download showing an example of this:
client = storage.Client()
bucket = client.get_bucket('yarbrough-test')
blob = bucket.get_blob('test.txt')
blob.download_to_filename('/tmp/test.txt')
What this does:
GET /storage/v1/b/yarbrough-test?projection=noAcl
GET /storage/v1/b/yarbrough-test/o/test.txt
GET /download/storage/v1/b/yarbrough-test/o/test.txt?generation=1511999426608623&alt=media
The problem:
In the case where the object has been overwritten by a new generation between the metadata GET and the donwload GET, a "not found" error would be reported to the user even though at no point did an object with the specified name not exist. In other words, this happens:
Time 1: GET /storage/v1/b/bucket/o/object returns generation 12345.
Time 2: User in another thread creates a new generation of text.txt, generation 23456. Generation 12345 is deleted and now bucket "bucket" object "object" implies the new generation.
Time 3: GET /download/storage/v1/b/bucket/o/object?generation=12345 fails with a 404 because there is no longer a generation 12345.
Steps to reproduce:
- Create a GCS bucket. Do NOT enable versioning.
- Start a script which rapidly overwrites an object with a new generation.
- Start a second script which repeatedly downloads the object.
You are likely to eventually run into a "404", although an object exists at all times.
How to fix:
I suggest this be fixed by retrying the download step if it returns with a 404 after the metadata is successfully fetched, unless the user explicitly asked for a particular generation number. Alternately, I might suggest skipping the metadata fetch step entirely, unless the user asks for some piece of metadata, which would significantly reduce the latency for small downloads.