Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn on runtime sharding by default #75

Closed
ericmjonas opened this issue Feb 28, 2017 · 4 comments
Closed

Turn on runtime sharding by default #75

ericmjonas opened this issue Feb 28, 2017 · 4 comments
Assignees
Milestone

Comments

@ericmjonas
Copy link
Collaborator

@ericmjonas ericmjonas commented Feb 28, 2017

  1. This should probably be on by default

  2. It's not clear if it should really be an argument to the executor or in the config or what.

@shivaram
Copy link
Collaborator

@shivaram shivaram commented Mar 9, 2017

As a part of this, investigate if we can move the runtime tar.gz to cloudfront and also check if this affects pricing

@shivaram shivaram self-assigned this Mar 9, 2017
@shivaram
Copy link
Collaborator

@shivaram shivaram commented Mar 9, 2017

As @ooq pointed out -- we should move this flag to the config file so that new runtimes / the python code doesn't worry about this decision

@ericmjonas
Copy link
Collaborator Author

@ericmjonas ericmjonas commented Mar 10, 2017

@shivaram I was thinking that updating the runtime meta information that we put out there, we could just also include a list of URLs for the actual runtime -- basically a mirrors key that contains the path to the ~50 (or however many) actual s3 objects, and just have the client randomly (or round-robin) select each one. That way, the default behavior would be to download the meta and then pick from within that list.

@ooq ooq closed this in #88 Mar 16, 2017
@ooq
Copy link
Collaborator

@ooq ooq commented Apr 1, 2017

workers no-sharding sharding cloudfront via s3.amazonwas.com
100 12s 12.9s 18s 7s
1000 44s 18.6s 101s 31s

Some totally non-scientific numbers for runtime download time you might be interested. @ericmjonas @shivaram
Done in 1 run. No error bar provided. Totally unfair comparison, given cloudfront is designed to scale with respect to "edges".
Right now S3 sharding gives the best performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants
You can’t perform that action at this time.