Support for chunked and resumable file uploads #5516

guerler · 2018-02-13T16:56:10Z

This augments the file uploader to allow resumable, chunked uploads without nginx. The client uses the File.slice operation to submit chunks of the target file to an api function located at api/uploads. If the server is not available the client will wait 5 seconds and then make another attempt to submit the last chunk of 100MB. Chunk size can be set in the configuration with the chunk_upload_size option. In addition to the file chunk, the client submits a pseudo-unique Session-ID to identify the target file. The Session-ID consists of User ID, timestamp and filesize. When all chunks have been uploaded and concatenated, the client triggers the Upload tool and provides the file path and file name as parameters. A version of this PR using nginx and its upload module is available too. The nginx upload module supports chunked uploads.

nsoranzo · 2018-03-09T12:51:23Z

lib/galaxy/webapps/galaxy/api/uploads.py

+            raise exceptions.MessageException("Incorrect session start.")
+        source = payload.get("session_chunk")
+        with open(target_file, "a") as f:
+            f.write(source.file.read())


Should this be read in chunks? I suppose this can fill up the memory.

This is already the file chunk. I added additional checks to verify the chunk size.

Depending on the configured chunk size, this can still be quite big. You can replace this line with something like:

read_chunk_size = 2 ** 16 while True: read_chunk = source.file.read(read_chunk_size) if not read_chunk: break f.write(read_chunk)

I see, ok cool.

nsoranzo

Thanks for the changes! Just a minor comment, sorry I missed it earlier.

nsoranzo · 2018-03-09T23:04:45Z

lib/galaxy/webapps/galaxy/api/uploads.py

+                if not read_chunk:
+                    break
+                f.write(read_chunk)
+            f.close()


You don't need to close f, it gets closed automatically as part of the with statement.

nsoranzo

The Python part LGTM, thanks @guerler!

Maybe we could add an integration test where chunk_upload_size is set to a value enough smaller than a test input to exercise the new feature. I can help with that if you want.

nsoranzo · 2018-03-10T01:34:27Z

client/galaxy/scripts/utils/uploadbox.js

+            error_login: "Uploads require you to log in.",
+            error_retry: "Waiting for server to resume...",
+        }, config);
+        console.debug(cnf);


I left it in on purpose. Afaik we'll add a feature to the client builder to disable it for production.

guerler · 2018-03-11T06:21:05Z

Sounds good, thanks for the review. How about selenium tests?

guerler · 2018-03-13T02:22:49Z

@galaxybot test this

nsoranzo · 2018-03-13T19:14:41Z

lib/galaxy/webapps/galaxy/api/uploads.py

+            target_size = os.path.getsize(target_file)
+        if session_start != target_size:
+            raise MessageException("Incorrect session start.")
+        chunk_size = os.fstat(session_chunk.file.fileno()).st_size / self.BYTES_PER_MEGABYTE


I think you need to add from __future__ import division at the top of this file, otherwise you get the floor of the division result in Python2.

guerler · 2018-03-13T19:55:30Z

@dannon we allow fractions of MB's. This is also used in the test cases, otherwise we would have to use MB sized test datasets.

dannon · 2018-03-13T20:49:43Z

@guerler Got it, the change in 22dd4a8 was the sort of logic consolidation I was thinking about there, that's perfect.

nsoranzo · 2018-03-14T09:44:17Z

client/galaxy/scripts/utils/uploadbox.js

-                error_default: "Please make sure the file is available.",
-                error_server: "Upload request failed.",
-                error_login: "Uploads require you to log in."
+                error_file: "File not provied.",


s/provied/provided/

dannon

+1, looks good!

guerler added 6 commits February 8, 2018 18:01

Add basic structure to support chunked file uploads

c83c38c

Fix debug statements

2eb4fa5

Remove debugging script, submit actual data

6de3353

Modify progress handler to accomodate chunks

96d3e80

Use let instead of var for for-loops

313f165

Use regular debug statements instead of logger

afe0b6a

guerler added status/WIP area/UI-UX area/performance labels Feb 13, 2018

guerler added this to the 18.05 milestone Feb 13, 2018

guerler added 7 commits February 13, 2018 13:18

Reuse submission helper

d9c9482

Remove redundant log

309dfbf

Add file data name parameter

8805d09

Fix for collection handler

ca23825

Reduce chunk size for testing, remove debug output

ca2c300

Fix chunk sizes

09bd908

Fix comment

5e26c03

natefoo mentioned this pull request Feb 14, 2018

Upload offloading without the nginx upload module #5536

Closed

Submit file segments in request body, use header parameters

975bd2f

nsoranzo added area/upload kind/feature labels Feb 16, 2018

guerler added 9 commits February 17, 2018 12:23

Parse file and chunk data to post helper

c5337fe

Use template literal, fix session id parsing

488ce62

Rename header parameter

0f69eaf

Fix template string

b56bec0

Parse nginx upload directory to client

2b18782

Reuse existing request helper for whole file submission

3951262

Fix success response parsing

937fe61

Fix comments

bd9ce25

Consolidate helpers

ceedcb5

nsoranzo reviewed Mar 9, 2018

View reviewed changes

guerler added 3 commits March 9, 2018 13:29

Add additional checks, use join

d6a3964

Re-use regex to validate session id

3f56aeb

Read chunks of chunks when writing target file

51a135b

nsoranzo reviewed Mar 9, 2018

View reviewed changes

Rely on automated file closing, add new line

529e98e

nsoranzo reviewed Mar 10, 2018

View reviewed changes

Test chunked uploads

a4287f1

Use ceil instead of floor

a137420

nsoranzo reviewed Mar 13, 2018

View reviewed changes

Align chunk size check

2dba72f

guerler added 2 commits March 13, 2018 16:09

Fix lint

22dd4a8

Specify chunk size in bytes instead of megabytes

7589bbb

guerler added 3 commits March 13, 2018 17:14

Remove unused import

71c8809

Add chunk size config option to schema

8d1e0fc

Moderate chunk size for upload tests

941017b

nsoranzo approved these changes Mar 14, 2018

View reviewed changes

nsoranzo reviewed Mar 14, 2018

View reviewed changes

Fix typo

7595114

dannon merged commit 7595114 into galaxyproject:dev Mar 14, 2018

dannon self-requested a review March 14, 2018 13:13

dannon reviewed Mar 14, 2018

View reviewed changes

This was referenced Mar 14, 2018

Chunked file uploads #573

Closed

Checksum for chunked uploads #5695

Open

innovate-invent mentioned this pull request Aug 9, 2019

Upload files through the rc rclone/rclone#3412

Closed

guerler deleted the chunk_uploads branch February 19, 2020 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for chunked and resumable file uploads #5516

Support for chunked and resumable file uploads #5516

guerler commented Feb 13, 2018 •

edited

nsoranzo Mar 9, 2018

guerler Mar 9, 2018 •

edited

nsoranzo Mar 9, 2018

guerler Mar 9, 2018

nsoranzo left a comment

nsoranzo Mar 9, 2018

nsoranzo left a comment

nsoranzo Mar 10, 2018

guerler Mar 10, 2018 •

edited

guerler commented Mar 11, 2018

guerler commented Mar 13, 2018

nsoranzo Mar 13, 2018

guerler commented Mar 13, 2018 •

edited

dannon commented Mar 13, 2018

nsoranzo Mar 14, 2018

dannon left a comment

Support for chunked and resumable file uploads #5516

Support for chunked and resumable file uploads #5516

Conversation

guerler commented Feb 13, 2018 • edited

nsoranzo Mar 9, 2018

Choose a reason for hiding this comment

guerler Mar 9, 2018 • edited

Choose a reason for hiding this comment

nsoranzo Mar 9, 2018

Choose a reason for hiding this comment

guerler Mar 9, 2018

Choose a reason for hiding this comment

nsoranzo left a comment

Choose a reason for hiding this comment

nsoranzo Mar 9, 2018

Choose a reason for hiding this comment

nsoranzo left a comment

Choose a reason for hiding this comment

nsoranzo Mar 10, 2018

Choose a reason for hiding this comment

guerler Mar 10, 2018 • edited

Choose a reason for hiding this comment

guerler commented Mar 11, 2018

guerler commented Mar 13, 2018

nsoranzo Mar 13, 2018

Choose a reason for hiding this comment

guerler commented Mar 13, 2018 • edited

dannon commented Mar 13, 2018

nsoranzo Mar 14, 2018

Choose a reason for hiding this comment

dannon left a comment

Choose a reason for hiding this comment

guerler commented Feb 13, 2018 •

edited

guerler Mar 9, 2018 •

edited

guerler Mar 10, 2018 •

edited

guerler commented Mar 13, 2018 •

edited