New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resume: Allow large uploads to resume the upload instead of starting from scratch #141

Closed
binwiederhier opened this Issue Jun 15, 2014 · 14 comments

Comments

Projects
None yet
4 participants
@binwiederhier
Member

binwiederhier commented Jun 15, 2014

In the current code base, if an upload (UpOperation) is interrupted, e.g. by Ctrl+C or a not recoverable connection loss, the already uploaded multichunks are not used when the next UpOperation is triggered.

Instead, the UpOperation basically starts from scratch, re-indexing everything and packaging new multichunks. The old multichunks are eventually removed by the CleanupOperation (removeLostMultiChunks).

For large uploads, it'd be great to remember the already uploaded multichunks (and the currently processing database version) locally and resume when the connection breaks or the operation is interrupted.

This is just a feature request though. Nothing major.

@Gerry33

This comment has been minimized.

Show comment
Hide comment
@Gerry33

Gerry33 Jul 9, 2014

I would not say it is minor.
If you have large files and the connection is unreliable, you run into trouble. A multichunk upload using HTTP headers HttpHeaders.CONTENT_RANGE might be a possible solution also for resuming broken uploads.

I tried to get this running with WEBDAV SARDINE lib for a long time, but only partly succeed. TOMCAT works ok, APACHE mod_dav does not like it.

If you like, I can share some simple demo code as an inspiration.

Best regards and a big thank for this nice project.
Gerd

Gerry33 commented Jul 9, 2014

I would not say it is minor.
If you have large files and the connection is unreliable, you run into trouble. A multichunk upload using HTTP headers HttpHeaders.CONTENT_RANGE might be a possible solution also for resuming broken uploads.

I tried to get this running with WEBDAV SARDINE lib for a long time, but only partly succeed. TOMCAT works ok, APACHE mod_dav does not like it.

If you like, I can share some simple demo code as an inspiration.

Best regards and a big thank for this nice project.
Gerd

@binwiederhier

This comment has been minimized.

Show comment
Hide comment
@binwiederhier

binwiederhier Jul 9, 2014

Member

Hi there and thanks for the comment,

The "unreliable connection" issue has already been addressed by retrying failed uploads a couple of times. As far as I can tell that works quite alright. This issue is targeted at a permanent connection loss (say, > 20 seconds) or at manual actions that stop an UpOperation (Ctrl+C).

Regarding the HTTP Range header: While this is the obvious solution if you upload large files via HTTP, it doesn't quite fit Syncany's use case for two reasons:

  1. It's a plugin specific solution. It'd only work for webdav, unfortunately.
  2. Syncany doesn't upload large files. In the current configuration, it only uploads max. 4MB files (multichunks).

So a more generic solution would be for Syncany to remember/persist that an UpOperation was started and whenever a multichunk/file is uploaded, write it in an upload-log. If the operation is interrupted and restarted, the log is consulted to see which multichunks still need to be uploaded.

What do you think about that? Interested in implementing that?

Beat Philipp

PS: I'm still on vacation so I don't respond as quickly as I would normally. Also: I wrote this on my phone and my fingers begin to hurt :D

Member

binwiederhier commented Jul 9, 2014

Hi there and thanks for the comment,

The "unreliable connection" issue has already been addressed by retrying failed uploads a couple of times. As far as I can tell that works quite alright. This issue is targeted at a permanent connection loss (say, > 20 seconds) or at manual actions that stop an UpOperation (Ctrl+C).

Regarding the HTTP Range header: While this is the obvious solution if you upload large files via HTTP, it doesn't quite fit Syncany's use case for two reasons:

  1. It's a plugin specific solution. It'd only work for webdav, unfortunately.
  2. Syncany doesn't upload large files. In the current configuration, it only uploads max. 4MB files (multichunks).

So a more generic solution would be for Syncany to remember/persist that an UpOperation was started and whenever a multichunk/file is uploaded, write it in an upload-log. If the operation is interrupted and restarted, the log is consulted to see which multichunks still need to be uploaded.

What do you think about that? Interested in implementing that?

Beat Philipp

PS: I'm still on vacation so I don't respond as quickly as I would normally. Also: I wrote this on my phone and my fingers begin to hurt :D

@binwiederhier

This comment has been minimized.

Show comment
Hide comment
@binwiederhier

binwiederhier Jul 9, 2014

Member

*Best

Member

binwiederhier commented Jul 9, 2014

*Best

@binwiederhier

This comment has been minimized.

Show comment
Hide comment
@binwiederhier

binwiederhier Jul 16, 2014

Member

Ping. Back from my vacation.

Member

binwiederhier commented Jul 16, 2014

Ping. Back from my vacation.

@binwiederhier

This comment has been minimized.

Show comment
Hide comment
@binwiederhier

binwiederhier Sep 11, 2014

Member

After we've merged #64 in #224, it's time to do this. Assigning myself.

Member

binwiederhier commented Sep 11, 2014

After we've merged #64 in #224, it's time to do this. Assigning myself.

@cr0

This comment has been minimized.

Show comment
Hide comment
@cr0

cr0 Sep 18, 2014

Member

Something I experienced: When syncany starts to index a file it's very ignorant: it even keeps indexing although the file is already deleted.

Member

cr0 commented Sep 18, 2014

Something I experienced: When syncany starts to index a file it's very ignorant: it even keeps indexing although the file is already deleted.

@binwiederhier

This comment has been minimized.

Show comment
Hide comment
@binwiederhier

binwiederhier Sep 20, 2014

Member

@cr0 Can you report another issue for that. This is only marginally related to this topic. Also, please provide a more detailed example for that. Thanks!

Member

binwiederhier commented Sep 20, 2014

@cr0 Can you report another issue for that. This is only marginally related to this topic. Also, please provide a more detailed example for that. Thanks!

@pimotte

This comment has been minimized.

Show comment
Hide comment
@pimotte

pimotte Dec 12, 2014

Member

Started on this at https://github.com/syncany/syncany/tree/feature/resume-transaction

TODO's for self:

  • Invert logic in serialization trap
  • Delete file if resume is successful
  • Write tests
  • Do user test
Member

pimotte commented Dec 12, 2014

Started on this at https://github.com/syncany/syncany/tree/feature/resume-transaction

TODO's for self:

  • Invert logic in serialization trap
  • Delete file if resume is successful
  • Write tests
  • Do user test
@binwiederhier

This comment has been minimized.

Show comment
Hide comment
@binwiederhier

binwiederhier Dec 14, 2014

Member

As requested, here's my feedback; based on a dry review of https://github.com/syncany/syncany/compare/feature/resume-transaction?diff=split&name=feature%2Fresume-transaction&w=1

I have very few comments, all of them are minor. You pretty much implemented what we discussed in a very nice and clean manner. I'll try the code tonight. Dry review:

Member

binwiederhier commented Dec 14, 2014

As requested, here's my feedback; based on a dry review of https://github.com/syncany/syncany/compare/feature/resume-transaction?diff=split&name=feature%2Fresume-transaction&w=1

I have very few comments, all of them are minor. You pretty much implemented what we discussed in a very nice and clean manner. I'll try the code tonight. Dry review:

@binwiederhier

This comment has been minimized.

Show comment
Hide comment
@binwiederhier

binwiederhier Dec 14, 2014

Member

I just did a little testing and I must say I am very very happy with how good this works. I took away the Wifi a couple of times and Ctrl+C'd >10 times. And it worked perfectly. I must say I am impressed.

  • g. I was a bit confused as to why it wouldn't automatically resume when triggering sy up. Do you think we should enable resume by default and implement sy up --no-resume (or sy up --restart) to prevent it? After talking to Lacriatch on IRC, I am convinced that resuming uploads should be the default option, so I went ahead and implemented it. Let me know if you don't think that's a good idea.
  • h. Any particular reason why the state/transaction* files are encrypted? Wouldn't it make more sense to leave them plaintext?
Member

binwiederhier commented Dec 14, 2014

I just did a little testing and I must say I am very very happy with how good this works. I took away the Wifi a couple of times and Ctrl+C'd >10 times. And it worked perfectly. I must say I am impressed.

  • g. I was a bit confused as to why it wouldn't automatically resume when triggering sy up. Do you think we should enable resume by default and implement sy up --no-resume (or sy up --restart) to prevent it? After talking to Lacriatch on IRC, I am convinced that resuming uploads should be the default option, so I went ahead and implemented it. Let me know if you don't think that's a good idea.
  • h. Any particular reason why the state/transaction* files are encrypted? Wouldn't it make more sense to leave them plaintext?
@binwiederhier

This comment has been minimized.

Show comment
Hide comment
@binwiederhier

binwiederhier Dec 14, 2014

Member
  • i. Resume assumes that the multichunk/database/whatever files are still in the cache/ folder. The cache is currently cleaned as part of finishOperation() in 'cleanup', 'up' and 'down', so missing files might not be an issue right now. However, since it is likely that either the user (or a more intelligent future Cache class) might delete stuff from the cache, we might want to react to that situation -- or do you think that's over the top? Right now, if I do sy up, Ctrl+C, rm -rf .syncany/cache/*, sy up, Syncany throws a nice java.io.FileNotFoundException.
Member

binwiederhier commented Dec 14, 2014

  • i. Resume assumes that the multichunk/database/whatever files are still in the cache/ folder. The cache is currently cleaned as part of finishOperation() in 'cleanup', 'up' and 'down', so missing files might not be an issue right now. However, since it is likely that either the user (or a more intelligent future Cache class) might delete stuff from the cache, we might want to react to that situation -- or do you think that's over the top? Right now, if I do sy up, Ctrl+C, rm -rf .syncany/cache/*, sy up, Syncany throws a nice java.io.FileNotFoundException.
@cr0

This comment has been minimized.

Show comment
Hide comment
@cr0

cr0 Dec 15, 2014

Member

I support your decision concerning (g). Why should I need to enable a feature which has only advantages and no disadvantages? I too think that --no-resume is the better term because future uploads cannot be resumed since they haven't been started yet.

Anyways: Great work, I'm strongly looking forward to upload some movies to flickr. (just kidding..)

Member

cr0 commented Dec 15, 2014

I support your decision concerning (g). Why should I need to enable a feature which has only advantages and no disadvantages? I too think that --no-resume is the better term because future uploads cannot be resumed since they haven't been started yet.

Anyways: Great work, I'm strongly looking forward to upload some movies to flickr. (just kidding..)

@pimotte

This comment has been minimized.

Show comment
Hide comment
@pimotte

pimotte Dec 15, 2014

Member

At a. This should be fixed, but I forgot to write something about it here. In the current implementation, if the database or transaction file is missing, we delete the other, if it exists.

Edit: Merge ready, IMHO.

Member

pimotte commented Dec 15, 2014

At a. This should be fixed, but I forgot to write something about it here. In the current implementation, if the database or transaction file is missing, we delete the other, if it exists.

Edit: Merge ready, IMHO.

@pimotte

This comment has been minimized.

Show comment
Hide comment
@pimotte

pimotte Dec 16, 2014

Member

Closed by #308.

Member

pimotte commented Dec 16, 2014

Closed by #308.

@pimotte pimotte closed this Dec 16, 2014

@cr0 cr0 removed the status:in-progress label Dec 16, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment