New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move runtime sharding into config variable #88

Merged
merged 9 commits into from Mar 16, 2017

Conversation

Projects
None yet
3 participants
@shivaram
Collaborator

shivaram commented Mar 10, 2017

This closes #75

@shivaram

This comment has been minimized.

Collaborator

shivaram commented Mar 10, 2017

Couple of things that came to my mind while making this change

  • While we generate the default config, should we set num_shards to 50 ?
  • We need to document what all the valid config variables are and where they are used. I will do this as a part of #86
@ooq

This comment has been minimized.

Collaborator

ooq commented Mar 10, 2017

I actually meant that num_shards should completely be in [runtime_name].meta.json that is stored in S3 and the client code should be completely hidden from sharding.
This would also make switching between runtimes easier.

@shivaram

This comment has been minimized.

Collaborator

shivaram commented Mar 10, 2017

Ah I see. Ok - @ericmjonas Is that what you had in mind as well ? I can make that change but I think we need to be careful then to make sure runtimes generated before pywren 0.2 are still compatible etc.

@ericmjonas

This comment has been minimized.

Collaborator

ericmjonas commented Mar 10, 2017

@shivaram

This comment has been minimized.

Collaborator

shivaram commented Mar 10, 2017

Yep - I will try to do that. It'll require a change to the runtimes repo, but it should be doable.

@shivaram

This comment has been minimized.

Collaborator

shivaram commented Mar 12, 2017

@ericmjonas @ooq I updated this, but this depends on pywren/runtimes#2 -- So I'd suggest reviewing that first.

This is backwards compatible in that if the metadata doesn't have a urls field we just fall back to the base runtime url. The one catch though is that all the lambdas will need to access the metadata. I considered shipping that as a part of the job description, but I was worried if it would be too big for the 128KB limit ?

@ericmjonas

This comment has been minimized.

Collaborator

ericmjonas commented Mar 12, 2017

@shivaram I'm a little confused here why the lambdas need access to the metadata, can't we just pass them a runtime key as part of the invoker data?

@ericmjonas

This comment has been minimized.

Collaborator

ericmjonas commented Mar 12, 2017

If we have the executor, upon creation, read the meta.json from s3, then we will:

  1. handle serialization better in the future (As we will know at the executor level which packages are remote installed)
  2. be able to just pass the lambdas a straight-up s3 URL and not have them worry about randomly selecting which one to use, and thus they will never have to hit meta.json
@shivaram

This comment has been minimized.

Collaborator

shivaram commented Mar 12, 2017

@ericmjonas Your comment totally makes sense. Updated now

@shivaram

This comment has been minimized.

Collaborator

shivaram commented Mar 12, 2017

Also I'll add the module check in a separate PR. I still need to understand the cloudpickle code a bit more

@shivaram

Couple of minor comments. However we still need to wait to build a runtime with the new urls field.

@ooq Is it possible for you to build a runtime that has pywren/runtimes#2 ?

job_max_runtime, shard_runtime=shard_runtime)
job_max_runtime)
standalone_executor = remote_executor

This comment has been minimized.

@shivaram

shivaram Mar 15, 2017

Collaborator

is this line necessary ? just checking as it wasn't in the original PR if i'm right

This comment has been minimized.

@ooq

ooq Mar 15, 2017

Collaborator

I think I removed it during refactoring.
Piggybacking on your PR to add it back...(bad practice)

@@ -6,6 +6,9 @@
import pickle
from tblib import pickling_support
import logging
import botocore

This comment has been minimized.

@shivaram

shivaram Mar 15, 2017

Collaborator

are these imports related to this PR ?

This comment has been minimized.

@ooq

ooq Mar 15, 2017

Collaborator

No. Don't know how it slips in. Now almost every file in the project has unused imports. I'm planning to get rid off them in the PyLint PR, which is coming very soon.

@ooq ooq merged commit 2eeff66 into master Mar 16, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@Vaishaal Vaishaal deleted the sharding-config branch Nov 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment