-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google Storage file access add-on for cohorts (a.k.a. minibucket) #219
Conversation
Apparently the Abstract interface has just been introduced so it breaks earlier 3.x Python releases. Falling back to good old object for now.
Looks great @armish, thanks so much. This will be super helpful. Only thing I noticed is that this will not work for python <= 3.6 -- the AbstractContextManager class isn't available for earlier versions. It's a bit of a shame since we're using 3.5 for reticulate compatibility. At any rate, should we make this requirement explicit for Cohorts to avoid errors with earlier versions of python? Or, is there a way to introduce this functionality optionally? Open to suggestions. |
@jburos: oh, and I forgot to mention: So, consider this as being 1/N of minor refactorings coming in and definitely a non-urgent PR to CR. Completely fine with me to keep stacking them up until Tavi comes back and the projects start running a little bit slower. |
@armish nvm see that you fixed the abstractManagerClass issue. What do you think - should we start working in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, once you resolve the travis error on python 3.4 it's OK to merge.
Great stuff, @armish! |
@jburos: This adds a really light-weight but functional module to the cohorts so that instead of dealing with all the NFS setups and remote machine admin work, we can directly make use of files on the bucket for relatively small tasks. I don't think this solution will scale up to BAMs, but VCFs should be fine to handle this way.
Here is a notebook that goes over the basic functionality really quick:
https://gist.github.com/armish/b4994d51775a390fb1057d80501683d3
@timodonnell @ryan-williams @iskandr: I know TensorFlow has its own shady integration that is pretty smooth on the App/GPU Engines but doesn't work on our locals and I think Scala/Spark world is enjoying the DataFlow (Apache Beam) utilities for such integration, but let me know if there is a better way of doing this.