-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: expose resumable upload upload_id=xx #1224
Comments
Hello! Thanks for filing an issue. Please note that the client library already retries resumable upload failures, such as when we get 5xx errors and so on. So, just want to clarify: is this feature request asking for an API addition that allows users to get an upload ID and then later specify the resumption of an upload with an upload ID plus some data? Or, if not, could you describe this request in more detail? |
A little bit more about the request. Our setup is something like this: Frontend -> Backend -> GCS We want to ensure that as long as data gets to the service in backend, we are able to save it in GCS. Consider the scenario that a Backend service goes down (crash, pre-emptions so on...) then in the current implementation of And the implications are bad, because we could have been streaming for 1hr and suddenly the service goes down. Another solution (although not sure how possible it is in cloud storage) is to flush the file every few seconds. My understanding is that GCS does not provide that option either. I think |
How would the backend service know which byte to resume uploading at? Do you imagine the backend service keeping a cursor locally, and this request API takes the byte location and upload_id? |
@jadekler Resumable uploads allows you to check status of the upload session documented at: Quoting the document:
Response:
|
Ah, right on, thanks Frank! |
I have another use case which relates to this. We're also streaming audio into GCS. Once all audio is done, we need to go back to the beginning of the file to write the header of the audio file (such as duration etc, which is not known until the whole file has been processed). I'm not sure if resumable uploads allows me to overwrite a portion of a file though? |
@noseglid Not at the moment. Resumable uploads are sequential writes only and can't adjust the write cursor or metadata at the end. |
@frankyn Thanks for the response! So there's no support in GCS for streaming media? This is a very common thing to do also for video. I realize this question is quite off-topic for this thread, and even this repo. |
Apologies, maybe I misunderstood. When you say go back and write the header, does that mean you want to modify the byte data or does that mean you want to modify GCS object headers? |
The byte data. So maybe I need to modify byte offset 100 - 250 of the file, once I've written it all (which normally is 100+ MB). |
Gotcha, what you could do instead is:
|
I don't really have that level of control. As per the example above, byte 0-99 is written in the first round, and all 0's is written for byte 100-250. These bytes needs to be overwritten after a seek. I'm using libav (and I know it's similar with GStreamer and probably also other media libs), and I can essentially provide it with a Write function and a Seek function. It will then call those functions when it has data to write, or want to move the cursor. Typically it will Seek to somewhere in the beginning of the file when all audio is processed, and then write a few more bytes there before it's all done. What I'd really want (I'm using the Go libs) is a WriteSeeker, as of now, I can only get a Writer. |
Thanks for the additional information @noseglid! I think this is a separate discussion that deems its own issue. Could you restate the information and feature request in a new issue? I don't have any background in this area, but would be interested to learn more.
|
Thanks! I'll create a new issue for this! |
Does anyone know what the state of this issue is? My setup: The storage microservice is using this library and abstracts away the GCS implementation, as well as providing access to only a certain part of GCS. The JS client should be able to upload large files to GCS. Besides being able to provide better progress reports, this also allows for less memory usage in the layers in between (only the chunk size memory needs to be allocated, not the full size). Does that make sense or is there something I'm missing where it would work with the existing exposed API of this library? |
We are using Google Cloud Storage for uploading audio data in a streaming fashion. Sjnce the streaming can last for quite a long time (upto an hour?) its important that we have all mechanisms to save ourselves from any crashes that may happen within the services.
Essentially we want to make sure that any data that we have uploaded until a point and not yet Close()ed the file, can be closed at some point. I see that the JSON API provides a way to obtain the upload_id=xx to resume uploads in future. Is there a way to expose this information from the golang api? If there is not one, any intermediate solution that might be useful?
Our other approach will be to roll our own simple client but then we may have to re-invent the wheel w.r.t retry logic and other niceness of an API.
Guidance appreciated
@suki-fredrik
The text was updated successfully, but these errors were encountered: