Google Storage file access add-on for cohorts (a.k.a. minibucket) #219

armish · 2017-06-26T13:44:41Z

@jburos: This adds a really light-weight but functional module to the cohorts so that instead of dealing with all the NFS setups and remote machine admin work, we can directly make use of files on the bucket for relatively small tasks. I don't think this solution will scale up to BAMs, but VCFs should be fine to handle this way.

Here is a notebook that goes over the basic functionality really quick:
https://gist.github.com/armish/b4994d51775a390fb1057d80501683d3

@timodonnell @ryan-williams @iskandr: I know TensorFlow has its own shady integration that is pretty smooth on the App/GPU Engines but doesn't work on our locals and I think Scala/Spark world is enjoying the DataFlow (Apache Beam) utilities for such integration, but let me know if there is a better way of doing this.

…pport

coveralls · 2017-06-26T14:03:15Z

Coverage decreased (-2.5%) to 54.427% when pulling 64a26a2 on gcloud into 31545c4 on master.

Apparently the Abstract interface has just been introduced so it breaks earlier 3.x Python releases. Falling back to good old object for now.

jburos · 2017-06-26T14:21:40Z

Looks great @armish, thanks so much. This will be super helpful.

Only thing I noticed is that this will not work for python <= 3.6 -- the AbstractContextManager class isn't available for earlier versions. It's a bit of a shame since we're using 3.5 for reticulate compatibility.

At any rate, should we make this requirement explicit for Cohorts to avoid errors with earlier versions of python? Or, is there a way to introduce this functionality optionally? Open to suggestions.

armish · 2017-06-26T14:21:49Z

@jburos: oh, and I forgot to mention: cohorts still doesn't know about this new addition. This is on purpose since we should better do the integration slowly and carefully and make sure the additional support doesn't bite us for the upcoming wrap-ups, especially during these days/weeks.

So, consider this as being 1/N of minor refactorings coming in and definitely a non-urgent PR to CR. Completely fine with me to keep stacking them up until Tavi comes back and the projects start running a little bit slower.

jburos · 2017-06-26T14:23:09Z

@armish nvm see that you fixed the abstractManagerClass issue. What do you think - should we start working in a develop branch for the accumulation of these minor PRs?

coveralls · 2017-06-26T14:33:26Z

Coverage decreased (-2.5%) to 54.427% when pulling bf53812 on gcloud into 31545c4 on master.

jburos

LGTM, once you resolve the travis error on python 3.4 it's OK to merge.

coveralls · 2017-06-26T20:10:01Z

Coverage decreased (-2.5%) to 54.427% when pulling aa0fd4e on gcloud into 31545c4 on master.

tavinathanson · 2017-07-17T22:48:22Z

Great stuff, @armish!

add support for fake-but-efficient Google Bucket file up/down/edit su…

64a26a2

…pport

armish requested a review from jburos June 26, 2017 13:44

inherit from object instead of AbstractContextManager

bf53812

Apparently the Abstract interface has just been introduced so it breaks earlier 3.x Python releases. Falling back to good old object for now.

jburos approved these changes Jun 26, 2017

View reviewed changes

remove import of AbstractContextClass

aa0fd4e

armish mentioned this pull request Jul 9, 2017

Make cohorts more workflow-engine and cloud storage friendly #226

Open

3 tasks

armish merged commit f9d5fc7 into master Jul 9, 2017

armish deleted the gcloud branch July 9, 2017 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Storage file access add-on for cohorts (a.k.a. minibucket) #219

Google Storage file access add-on for cohorts (a.k.a. minibucket) #219

armish commented Jun 26, 2017 •

edited

Loading

coveralls commented Jun 26, 2017

jburos commented Jun 26, 2017

armish commented Jun 26, 2017

jburos commented Jun 26, 2017

coveralls commented Jun 26, 2017

jburos left a comment

coveralls commented Jun 26, 2017 •

edited

Loading

tavinathanson commented Jul 17, 2017

Google Storage file access add-on for cohorts (a.k.a. minibucket) #219

Google Storage file access add-on for cohorts (a.k.a. minibucket) #219

Conversation

armish commented Jun 26, 2017 • edited Loading

coveralls commented Jun 26, 2017

jburos commented Jun 26, 2017

armish commented Jun 26, 2017

jburos commented Jun 26, 2017

coveralls commented Jun 26, 2017

jburos left a comment

Choose a reason for hiding this comment

coveralls commented Jun 26, 2017 • edited Loading

tavinathanson commented Jul 17, 2017

armish commented Jun 26, 2017 •

edited

Loading

coveralls commented Jun 26, 2017 •

edited

Loading