Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Resumable uploads to the registry #4872

Open
SamSaffron opened this Issue Mar 26, 2014 · 16 comments

Comments

Projects
None yet

Ever since we moved to Docker as the official engine for deploying Discourse I have noticed a few support issues where the registry hangs us large transfers.

The gist of the issue is that some crazy VPNs and weird connections will terminate large image downloads mid way. This leaves them needing to transfer the entire payload a second time.

Proposed solution

Chunked upload and download.

  • Every image on the registry will contain a manifest, the manifest file contains a list of SHA1 hashes of image chunks, one per 512KB, the manifest itself will also be hashed so to ensure you get a kosher manifest.
  • When client downloads an images they will pull chunks, and validate chunks against the hashes in the manifest. Each correctly downloaded chunk should be stored in a tmp folder that is restricted in size. (say 1GB). If the tmp folder grows beyond that size it should remove oldests chunks.
  • When downloading chunks it should always check local tmp folder first. That way if you terminate a download halfway you will be able to pick up mid way.
  • Docker, out of the box will download 3 chunks concurrently (configurable either globally or locally to session)
  • A similar algorithm to be implemented on registry side so uploads can be chunked and resumed.

I feel such a solution will ease adoption of Docker in areas where internet connectivity is not spectacular, and heavily reduce load on registries.

The chunked solution also allows you to easily round robin on the registry side to increase reliability. Also makes it easier to handle mirroring using CDNs using origin pull.

Thoughts?

damm commented Mar 26, 2014

I would propose we raise the image chunk up around say 1-3 Megabytes? Let's give the Kernel a chance to slide the window far enough that it can actually push data as fast as possible.

Also if we're going this way we should likely thread the chunk pushing (3 seems small; Some browsers do 4 up to 8, we don't want to ddos ourselves or slow our selves down either)

1-3MB would be fine, but keep in mind that with tcp/http keepalive you will keep growing tcp window size across multiple chunks.

Concurrency should determined solely based on capacity, I guess server should be allowed to override it and tell clients to back down.

damm commented Mar 26, 2014

Right but capacity is hard; that's where flow control gets in.

perpi commented Mar 28, 2014

Which app do you use for fetching images in docker? Here, we don't have problem on downloading large files using known download manager such as wget, aria2c, xdman, etc. +

Contributor

unclejack commented Mar 28, 2014

Image pulling is being fixed as part of #2461.

perpi commented Mar 30, 2014

@unclejack
So, why I get this error:

WARNING: No swap limit support
Unable to find image 'samsaffron/discourse:0.1.2' locally
Pulling repository samsaffron/discourse
9dfbb44c55ff: Error pulling image (0.1.2) from samsaffron/discourse, read tcp 198.41.189.230:443: connection reset by peer 
8dbd9e392a96: Download complete 
21a54dd8e905: Download complete 
535e9f84ec37: Error downloading dependent layers 
2014/03/30 23:14:13 Could not find repository on any of the indexed registries.
Your Docker installation is not working correctly
See: https://meta.discourse.org/t/docker-error-on-bootstrap/13657/18?u=sam

?

d11wtq commented Mar 31, 2014

Yep, I'm still seeing this in 0.9.1. If you keep trying, it will download fine, but you chew through a bit of bandwidth trying repeatedly until it works.

vagrant@gentoo ~ $ docker pull d11wtq/redis
Pulling repository d11wtq/redis
ea76bcf23770: Error pulling image (latest) from d11wtq/redis, unexpected EOF
pected EOF 6: Download complete
fb65bcbb3dfd: Download complete
7181e4a9197f: Download complete
63c411d0656d: Download complete
c270a1a4f4db: Download complete
f5730325c9da: Download complete
ffd8bd48f3cf: Download complete
65277b5346cc: Error downloading dependent layers
2014/03/31 23:45:48 Could not find repository on any of the indexed registries.

d11wtq commented Mar 31, 2014

I like the proposed solution to this problem. Seems a lot like how Bit Torrent downloads files in chunks.

Contributor

unclejack commented Apr 1, 2014

@diff- That problem is being worked on for issue #2461.

All pull related issues with errors which contain "EOF" in them should be discussed in #2461.
I'll change the title of this topic to make it clear this issue is going to be just for push.

@unclejack unclejack changed the title from Resumable downloads and uploads from registry to Resumable uploads to the registry Apr 1, 2014

scarolan commented Apr 4, 2014

Please implement this if possible. I've got a ~660MB image that is failing around 50-100MB through the upload every time. Resumable uploads would be a huge help.

Proposed solution seems good to me. I'd love to see this make it into Docker soon.

@shin- shin- added the Distribution label Jul 1, 2014

+1

@icecrime icecrime removed the dist/registry label Jul 17, 2015

Member

runcom commented Apr 1, 2016

@aaronlehmann can we close this now that we have resumable upload/download?

Contributor

aaronlehmann commented Apr 1, 2016

@runcom: We only have resumable download at present, not resumable upload.

Member

runcom commented Apr 1, 2016

Right :)

I'm having lots of problems with this - it takes me several days to complete a docker push.

It would be a great fix if it was possible to just split the upload files.

I have a timeout every 19 minutes. If I could split the bigger files, they could complete before the timeout.

@vikstrous vikstrous referenced this issue Nov 6, 2016

Closed

WIP: Cache partial downloads from docker pull #28106

7 of 10 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment