New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue#108] Refactor Storage API #119

Merged
merged 22 commits into from Jul 2, 2017

Conversation

Projects
None yet
3 participants
@ooq
Collaborator

ooq commented Apr 9, 2017

This PR:
1, creates a Storage class as a middle layer to separate PyWren core and S3.
2, creates S3Service that wraps S3 boto3 APIs. In S3Service, reuses the same S3Client per #109 .
3, adds pywren. save_futures_to_string, pywren.load_futures_from_string, pywren.save_futures_to_file, pywren.load_futures_from_file.
4, in wrenhandler.py, use #23 .
5, removes s3util.py.
6, configuration: each object/method should only be exposed to configuration that is needed for that object/method. For example, the config file used to create Storage should be generated by wrenconfig.extract_storage_config(config); config['runtime'] should be passed to runtime.get_runtime_info(config['runtime']) instead of config.
7, documentation is added for new classes and methods.
8, no new test is added.

@shivaram @ericmjonas

@ooq

This comment has been minimized.

Show comment
Hide comment
@ooq

ooq Apr 12, 2017

Collaborator

Okay, I'll make changes per discussion with @ericmjonas yesterday.

Just for record, below is a performance graph for #109
image

Collaborator

ooq commented Apr 12, 2017

Okay, I'll make changes per discussion with @ericmjonas yesterday.

Just for record, below is a performance graph for #109
image

@ooq

This comment has been minimized.

Show comment
Hide comment
@ooq

ooq Apr 12, 2017

Collaborator

Done.
image
#119 should be resolved.

Collaborator

ooq commented Apr 12, 2017

Done.
image
#119 should be resolved.

@shivaram

Thanks @ooq - I did one pass and had some inline comments.

@@ -103,10 +100,10 @@ def result(self, timeout=None, check_only=False, throw_except=True):
else:
return None
if storage_handler is None:

This comment has been minimized.

@shivaram

shivaram Apr 17, 2017

Collaborator

Did we reach consensus on what the model we want to support is ? There was the other option of passing in the storage_handler while constructing the future that I think avoids some of the failure scenarios ?

@shivaram

shivaram Apr 17, 2017

Collaborator

Did we reach consensus on what the model we want to support is ? There was the other option of passing in the storage_handler while constructing the future that I think avoids some of the failure scenarios ?

This comment has been minimized.

@ooq

ooq Apr 17, 2017

Collaborator

Yes, we (Eric and I) agreed that we want futures to be serializable. Storing storage handler as a field of future will make it un-serializable, my previous solution was to have a save() method in which we first set future.storage_handler=None to make future serializable. We then agree that having pickle.dump(futures) to work is also important.

@ooq

ooq Apr 17, 2017

Collaborator

Yes, we (Eric and I) agreed that we want futures to be serializable. Storing storage handler as a field of future will make it un-serializable, my previous solution was to have a save() method in which we first set future.storage_handler=None to make future serializable. We then agree that having pickle.dump(futures) to work is also important.

Show outdated Hide outdated pywren/runtime.py Outdated
Show outdated Hide outdated pywren/runtime.py Outdated
Show outdated Hide outdated pywren/storage/s3_service.py Outdated
Show outdated Hide outdated pywren/storage/storage.py Outdated
Show outdated Hide outdated pywren/wren.py Outdated
@@ -37,7 +36,17 @@
PROCESS_STDOUT_SLEEP_SECS = 2
def download_runtime_if_necessary(s3conn, runtime_s3_bucket, runtime_s3_key):
def get_key_size(s3client, bucket, key):

This comment has been minimized.

@shivaram

shivaram Apr 17, 2017

Collaborator

just to clarify this is s3 specific right now ?

@shivaram

shivaram Apr 17, 2017

Collaborator

just to clarify this is s3 specific right now ?

This comment has been minimized.

@ooq

ooq Apr 17, 2017

Collaborator

yes.

@ooq

ooq Apr 17, 2017

Collaborator

yes.

@ooq

This comment has been minimized.

Show comment
Hide comment
@ooq

ooq Apr 17, 2017

Collaborator

@shivaram @ericmjonas Comments addressed.

Collaborator

ooq commented Apr 17, 2017

@shivaram @ericmjonas Comments addressed.

@shivaram

@ooq Apologies for the delay. I just had a couple of high level comments that I think would be good to think about.

Show outdated Hide outdated pywren/executor.py Outdated
Show outdated Hide outdated pywren/executor.py Outdated
Show outdated Hide outdated pywren/storage/storage.py Outdated
@@ -46,7 +55,7 @@ def download_runtime_if_necessary(s3conn, runtime_s3_bucket, runtime_s3_key):
"""
# get runtime etag
runtime_meta = s3conn.meta.client.head_object(Bucket=runtime_s3_bucket,
runtime_meta = s3_client.head_object(Bucket=runtime_s3_bucket,

This comment has been minimized.

@shivaram

shivaram Apr 28, 2017

Collaborator

It will be a good thing to think about what parts of wrenhandler cannot be handled by our existing storage API. I guess right now this includes head_object, download_file and upload_file ?

@shivaram

shivaram Apr 28, 2017

Collaborator

It will be a good thing to think about what parts of wrenhandler cannot be handled by our existing storage API. I guess right now this includes head_object, download_file and upload_file ?

This comment has been minimized.

@shivaram

shivaram Jun 19, 2017

Collaborator

Did we reach any longer term consensus on this ? If not we can talk about it this week

@shivaram

shivaram Jun 19, 2017

Collaborator

Did we reach any longer term consensus on this ? If not we can talk about it this week

Show outdated Hide outdated pywren/storage/storage.py Outdated
@shivaram

This comment has been minimized.

Show comment
Hide comment
@shivaram

shivaram Jun 19, 2017

Collaborator

@ooq Can you rebase with the latest fix ?

Collaborator

shivaram commented Jun 19, 2017

@ooq Can you rebase with the latest fix ?

@shivaram

Most of the changes look good. I think putting the storage path check in there and also the storage backend API separation have resolved most of the big questions i had. I had some small comments and we can merge this once tests pass ?

@@ -2,6 +2,9 @@
import boto3
import json
from pywren.storage.storage_utils import create_func_key

This comment has been minimized.

@shivaram

shivaram Jun 19, 2017

Collaborator

sort this and put it at the bottom ? Or we can do this with the style check PR

@shivaram

shivaram Jun 19, 2017

Collaborator

sort this and put it at the bottom ? Or we can do this with the style check PR

This comment has been minimized.

@ooq

ooq Jun 20, 2017

Collaborator

Let's do this in style check.

@ooq

ooq Jun 20, 2017

Collaborator

Let's do this in style check.

Show outdated Hide outdated pywren/executor.py Outdated
storage_config = wrenconfig.extract_storage_config(wrenconfig.default())
storage_handler = storage.Storage(storage_config)
storage_utils.check_storage_path(storage_handler.get_storage_config(), self.storage_path)

This comment has been minimized.

@shivaram

shivaram Jun 19, 2017

Collaborator

Nice !

@shivaram

shivaram Jun 19, 2017

Collaborator

Nice !

@@ -46,7 +55,7 @@ def download_runtime_if_necessary(s3conn, runtime_s3_bucket, runtime_s3_key):
"""
# get runtime etag
runtime_meta = s3conn.meta.client.head_object(Bucket=runtime_s3_bucket,
runtime_meta = s3_client.head_object(Bucket=runtime_s3_bucket,

This comment has been minimized.

@shivaram

shivaram Jun 19, 2017

Collaborator

Did we reach any longer term consensus on this ? If not we can talk about it this week

@shivaram

shivaram Jun 19, 2017

Collaborator

Did we reach any longer term consensus on this ? If not we can talk about it this week

Show outdated Hide outdated pywren/storage/storage.py Outdated
@ooq

This comment has been minimized.

Show comment
Hide comment
@ooq

ooq Jun 21, 2017

Collaborator

Thanks for the comments @shivaram . The PR is updated.

Collaborator

ooq commented Jun 21, 2017

Thanks for the comments @shivaram . The PR is updated.

@ooq

This comment has been minimized.

Show comment
Hide comment
@ooq

ooq Jun 21, 2017

Collaborator

@apengwin Can you also take a look at this?

Collaborator

ooq commented Jun 21, 2017

@apengwin Can you also take a look at this?

if not runtime.runtime_key_valid(self.runtime_meta_info):
raise Exception("The indicated runtime: s3://{}/{} is not approprite for this python version".format(runtime_s3_bucket, runtime_s3_key))
self.config = config
self.storage_config = wrenconfig.extract_storage_config(self.config)

This comment has been minimized.

@apengwin

apengwin Jun 21, 2017

Contributor

Can we formalize what config looks like? Also, it might be easier to have just one self.config instead of both self.storage_config and self.config

@apengwin

apengwin Jun 21, 2017

Contributor

Can we formalize what config looks like? Also, it might be easier to have just one self.config instead of both self.storage_config and self.config

This comment has been minimized.

@ooq

ooq Jun 21, 2017

Collaborator

I have wanted to formalize/document config for quite some time. For now, this is structured/parsed in a way to be backward-compatible with .pywren_config. Let's formalize config in a separate PR.
For the self.storage_config, so the idea is to make the config more "hierarchical" and only expose parameters that are related to class.

@ooq

ooq Jun 21, 2017

Collaborator

I have wanted to formalize/document config for quite some time. For now, this is structured/parsed in a way to be backward-compatible with .pywren_config. Let's formalize config in a separate PR.
For the self.storage_config, so the idea is to make the config more "hierarchical" and only expose parameters that are related to class.

This comment has been minimized.

@apengwin

apengwin Jun 21, 2017

Contributor

This is kind of petty, but it's confusing to have multiple config variables that are all different, so should we change to something like storage_info

@apengwin

apengwin Jun 21, 2017

Contributor

This is kind of petty, but it's confusing to have multiple config variables that are all different, so should we change to something like storage_info

@@ -17,6 +17,8 @@ class SimpleAsync(unittest.TestCase):
Test sqs dispatch but with local runner
"""
def setUp(self):

This comment has been minimized.

@apengwin

apengwin Jun 21, 2017

Contributor

Should this be set_up?

@apengwin

apengwin Jun 21, 2017

Contributor

Should this be set_up?

This comment has been minimized.

@ooq

ooq Jun 21, 2017

Collaborator

We do have some inconsistent formatting now and needs to be fixed.
Can the purpose of this PR, can you just review the changes?

@ooq

ooq Jun 21, 2017

Collaborator

We do have some inconsistent formatting now and needs to be fixed.
Can the purpose of this PR, can you just review the changes?

@shivaram

Thanks @ooq -- Sorry for the delay in reviewing. I had a couple of small nits, but otherwise this looks good. Lets merge this and we can fix things like style as follow ups.

fut = ResponseFuture(call_id, callset_id, host_job_meta,
self.s3_bucket, self.s3_prefix,
self.aws_region)
storage_path = storage_utils.get_storage_path(self.storage_config)

This comment has been minimized.

@shivaram

shivaram Jun 30, 2017

Collaborator

nit: This can probably be done at the beginning where we construct storage_config ?

@shivaram

shivaram Jun 30, 2017

Collaborator

nit: This can probably be done at the beginning where we construct storage_config ?

Show outdated Hide outdated pywren/executor.py Outdated

@ooq ooq merged commit 1fe6c9b into master Jul 2, 2017

1 check was pending

continuous-integration/travis-ci/push The Travis CI build is in progress
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment