Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete Uploads (PseudoKV) #69

Closed
MeijeSibbel opened this issue Jun 16, 2020 · 2 comments
Closed

Incomplete Uploads (PseudoKV) #69

MeijeSibbel opened this issue Jun 16, 2020 · 2 comments

Comments

@MeijeSibbel
Copy link

MeijeSibbel commented Jun 16, 2020

Currently when a host fails while uploading to a contract set, the entire upload has to be restarted. Especially for files with bigger chunk sizes (e.g. 5 GB), re-uploading 145 GB for x30 hosts if a single host fails is problematic. Not only because all this data also has to be re-uploaded, but also because the existing sectors of the failed upload are garbage and use up valuable space on the contracts.

The suggested solution by the author of this repo;

Basically if an upload returns an error, it should save the incomplete metafile. Then you can use the existing migration functionality to copy the metafile into the new PseudoFS, and then resume uploading.

@lukechampine
Copy link
Owner

To be clear, currently a failed upload should result in a maximum of 1 sector of "garbage" per host.

Say you have a file that's stored across 30 hosts, and requires 10 sectors per host. You start uploading it, and manage to upload 5 sectors to each host. While uploading the 6th sector, one of the hosts returns an error, so you cannot continue uploading.

You now have a partially-complete metafile that lists 30 hosts, and 5 sector roots for each of those hosts. In order to continue uploading, you need to migrate this metafile to a new set of hosts; specifically, you need to replace the bad host with a good host. When you perform the migrate, file shard will be downloaded from the old (good) hosts and used to compute the shard that goes on the new (good) host. None of the sectors on the old hosts will be re-uploaded. For example, if the redundancy was 10-of-30, you'll end up downloading 10 x 4MiB x 5 = 200MiB from old hosts and uploading 4MiB x 5 = 20MiB to the new host. More commonly, the original file will still be on the local disk, so you won't need to download anything from old hosts at all.

Once the migration is complete, you have a partially-complete metafile lists the new set of 30 hosts, with 5 sector roots per host. The only difference is that the bad host (and its roots) have been replaced. You can now resume uploading the remaining 5 sectors (per host) of the original file.

In this example, we ended up generating 29 sectors' worth of garbage: we uploaded the 6th sector to 29 of the 30 hosts, but because the last host failed, we discarded those sectors and did not record them in the metafile. Later, when we resumed uploading, we had to reupload those sectors again, which is wasteful. It was not necessary, however, to reupload the entire file -- only the most recent sector.

So the desired functionality is to save the roots of those successful sectors somewhere (most likely in a separate section of the metafile), and then when resuming the upload, reuse those roots instead of reuploading the sectors. This is trickier than it sounds! If your main concern is avoiding reuploading 145 GB, well, that should already be avoidable today: your worst-case reupload with 30 hosts should be 240 MiB.

This was referenced Jun 19, 2020
@MeijeSibbel MeijeSibbel changed the title Incomplete Uploads Incomplete Uploads (PseudoKV) Jul 10, 2020
@MeijeSibbel
Copy link
Author

If I'm not mistaken PseudoKV does this now, correct? If that's correct we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants