New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn on runtime sharding by default #75

Closed
ericmjonas opened this Issue Feb 28, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@ericmjonas
Collaborator

ericmjonas commented Feb 28, 2017

  1. This should probably be on by default

  2. It's not clear if it should really be an argument to the executor or in the config or what.

@shivaram

This comment has been minimized.

Show comment
Hide comment
@shivaram

shivaram Mar 9, 2017

Collaborator

As a part of this, investigate if we can move the runtime tar.gz to cloudfront and also check if this affects pricing

Collaborator

shivaram commented Mar 9, 2017

As a part of this, investigate if we can move the runtime tar.gz to cloudfront and also check if this affects pricing

@shivaram shivaram self-assigned this Mar 9, 2017

@shivaram

This comment has been minimized.

Show comment
Hide comment
@shivaram

shivaram Mar 9, 2017

Collaborator

As @ooq pointed out -- we should move this flag to the config file so that new runtimes / the python code doesn't worry about this decision

Collaborator

shivaram commented Mar 9, 2017

As @ooq pointed out -- we should move this flag to the config file so that new runtimes / the python code doesn't worry about this decision

@ericmjonas

This comment has been minimized.

Show comment
Hide comment
@ericmjonas

ericmjonas Mar 10, 2017

Collaborator

@shivaram I was thinking that updating the runtime meta information that we put out there, we could just also include a list of URLs for the actual runtime -- basically a mirrors key that contains the path to the ~50 (or however many) actual s3 objects, and just have the client randomly (or round-robin) select each one. That way, the default behavior would be to download the meta and then pick from within that list.

Collaborator

ericmjonas commented Mar 10, 2017

@shivaram I was thinking that updating the runtime meta information that we put out there, we could just also include a list of URLs for the actual runtime -- basically a mirrors key that contains the path to the ~50 (or however many) actual s3 objects, and just have the client randomly (or round-robin) select each one. That way, the default behavior would be to download the meta and then pick from within that list.

@ooq ooq closed this in #88 Mar 16, 2017

@ooq

This comment has been minimized.

Show comment
Hide comment
@ooq

ooq Apr 1, 2017

Collaborator
workers no-sharding sharding cloudfront via s3.amazonwas.com
100 12s 12.9s 18s 7s
1000 44s 18.6s 101s 31s

Some totally non-scientific numbers for runtime download time you might be interested. @ericmjonas @shivaram
Done in 1 run. No error bar provided. Totally unfair comparison, given cloudfront is designed to scale with respect to "edges".
Right now S3 sharding gives the best performance.

Collaborator

ooq commented Apr 1, 2017

workers no-sharding sharding cloudfront via s3.amazonwas.com
100 12s 12.9s 18s 7s
1000 44s 18.6s 101s 31s

Some totally non-scientific numbers for runtime download time you might be interested. @ericmjonas @shivaram
Done in 1 run. No error bar provided. Totally unfair comparison, given cloudfront is designed to scale with respect to "edges".
Right now S3 sharding gives the best performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment