-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving pre-built DeepVariant binaries for conda packages #29
Comments
@chapmanb thanks for writing this up. I would prefer to get Thanks! |
Björn -- I'm agreed. I tried to look into building clif but it was too intense (https://github.com/google/clif#building) and had to give up. Right now the pre-built version assumes unpacking into |
We understand that building CLIF is a high tall for our users and putting efforts to make it easier. |
-- Bazel goes from 0.8.1 to 0.9.0. -- Both TensorFlows (custom whl and C++ library) goes from 1.4.x to 1.5.0 -- -- As a result some tests update to absltest.main() -- Numpy goes from 1.13 to 1.12 at the request of our users to be more compatible with bioconda. (See discussion in #29) PiperOrigin-RevId: 182827446
Update (1) on numpy version: Turns out getting back to numpy 1.12 is harder than I thought, because TensorFlow 1.4 requires numpy 1.14. When I try to revert the prereq script to numpy 1.12, I keep getting this message: which makes me uncomfortable to pin numpy back at 1.12. I did try changing the numpy version to 1.12, and build with build_release_binaries.sh. It seems to build, and call_variants step (which is the main step that uses Tensorflow) seems to run with similar speed. The hap.py numbers are exactly the same. |
Hi Pichuan and Brad, So it looks like Bioconda's requirement for numpy 1.12 originated from this issue: bioconda/bioconda-recipes#3961 which got merged into this PR: bioconda/bioconda-recipes#4888 But it seems the driver for the 1.12 version was CNVkit, which just requires >= 1.9 based on the setup here: https://github.com/etal/cnvkit/blob/master/setup.py#L19 Maybe updating Bioconda first with 1.14 might be a good start, and seeing if that PR passes Travis, otherwise just updating the DV scripts with a virtualenv instance, so that their local environment remains pristine. The alternative is for folks to use it via the Docker image. What do you think? Hope it helps, |
Pi-Chuan -- thanks so much for looking at this. Given that and the issues with tensorflow pinning, I'd advise just sticking with 1.14 and we can work on the dependency issues in bioconda. Apologies, I hadn't meant to cause a lot of work and didn't realize about the pinning preventing this. If we can get older glibc compatible binaries that'll cover most of the issues and we can work around the numpy problems by installing in an isolated conda environment for now. Thanks again for looking into the numpy and glibc work. |
Another update on CLIF dependency: Next week I'll push a 0.6.1 that has this under the tools/ directory. And I'll also see if I can figure out how to build it for CentOS6.
|
@pichuan numpy should not be a problem. We can pin the recipe the 1.14 for this package if its needed. |
Pi-Chuan -- thanks for this. We'd ideally build with CLIF directly in bioconda to avoid you needing to have these custom builds, but will hold off on that until there is an easier to build/install CLIF dependency. Happy to test the new version with reduced glibc requirements when it's ready. Björn -- We do pin to 1.14 now in DeepVariant, with the downside that it's not compatible in a shared environment with other looks that pin to the bioconda 1.12 default. I can work around this for now by having DeepVariant in a separate environment, but would love to synchronize bioconda to 1.14 at some point. Thanks again for all this work and help. |
Hi Brad, |
Pi-Chuan -- sorry, that's right, we would want to build on CentOS6 to be compatible with bioconda. They have a restricted build environment for portability so we'd need to have all the dependencies installable by bioconda (rather than system packages). I had looked at this earlier and realized all the pre-requisites so got afraid of tackling it. It's definitely a help to have that information but I think would still take a bit of work to port over. |
Update: I haven't looked more into the CentOS6 build. I'll send another update when I make progress on that. |
Hi @chapmanb , another update: I went through a lot of hacky steps and built CLIF. I'm actually not sure whether it's actually usable or not, so if you have a setup that quickly give it a try, that will be great. Here's the instruction on how to get
(I had to build with Python 2.7. Didn't figure out how to build with 2.6. Let me know if you actually need Python 2.6?) Once you do this, you can run
Please let me know once you have a chance to try it. |
And, another thing I did is build bazel 0.11.0 with the older GLIBC. On my CentOS 6 GCE instance:
I basically followed https://gist.github.com/truatpasteurdotfr/d541cd279b9f7bf38ce967aa3743dfcb , but use bazel version 0.11.0 instead. After this, I have a bazel 0.11.0:
I haven't tried building with it, though. |
Pichuan, when in doubt try building it statically - libc is not really that large a library. |
Pichuan, to increase the ease use and expand adoption within the Bioinformatics community it might not hurt to have a collection of customized build-and-test environments at Google that match a variety of environment configurations that users have in place, or that common packages recommend out here. Sometimes folks will be curious to try out some new Bioinformatics software package, and the faster they get it to a running state on their own machines, the happier the experience enabling the community for that package to grow faster. Basically most people just want to use stuff - and want a turn-key solution - though some of us like tinkering with puzzles :) If their experience is good on something local - or even a cluster - then they'll see the obvious need to try it out on a Cloud environment. I sort of did it from the other side. Many times when I tested most of the GoogleGenomics tools, I would try them out in some real-world scenarios, I usually ran them against a variety of configurations. That helped with having better error messages, control flow decisions, documentation or additional features. Basically you have developed a great software - which is evolving - and now comes the service component of supporting it, which is just as important. Just a friendly recommendation, |
Pi-Chuan; For clif, this is great progress, thank you. I resuscitated my bioconda build script and gave it a try with this. It's making better progress but unfortunately needs to reconstitute the system wide python install within the build environment which we can't do in conda. Everything there is in an isolated work directory so won't have the system shared libraries it wants:
and the python libraries included symlink to the system wide ones you built against:
I'm not sure if it's possible to make this more relocatable with any python as part of the build process. Sorry, I know it's a lot more work to make it relocatable like this but will allow install on all the systems we support where users don't have root privileges to rely on system libraries. Thanks again for helping with this. |
@chapmanb Thanks for giving it a try.
I think starting from there it just assumes python is in /opt/rh/python27/root/usr/bin/python. I'll take a look and see if I can make it recognize python at any path. |
I might be completely irrelevant but CLIF's INSTALL.sh usage is "Usage: $0
[python interpreter]" ie. it might take a Python of user choice.
…On Wed, May 2, 2018 at 11:28 AM Pi-Chuan Chang ***@***.***> wrote:
@chapmanb <https://github.com/chapmanb> Thanks for giving it a try.
Before I built I did something like:
# Install Python 2.7
sudo yum install -y centos-release-SCL
sudo yum install -y python27
source /opt/rh/python27/enable
I think starting from there it just assumes python is in
/opt/rh/python27/root/usr/bin/python. I'll take a look and see if I can
make it recognize python at any path.
Is there a convention that people use to build something so that they can
point to other Python locations?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQZ2jCil_vF0-yLBB_EZkN8Q9RZQEByks5tufrOgaJpZM4RQhCy>
.
--
Thanks,
--Mike
|
@mrovner Thanks Mike. That is at the time of building, correct? If I choose with a python interpreter, will the user (Brad) need to also have python at the same location? |
Correct - specifying the Python for INSTALL is the Python for building CLIF.
Is has _no_ connection to the user Python (they even can be Py2 and Py3 in
any combination).
When using CLIF the default will be the same _version_ (2 or 3) for
generating Python extension modules source code as the build Python was but
even that is controlled with (presence or absence of) --py3 flag for CLIF
tool.
…On Wed, May 2, 2018 at 1:17 PM Pi-Chuan Chang ***@***.***> wrote:
@mrovner <https://github.com/mrovner> Thanks Mike. That is at the time of
building, correct? If I choose with a python interpreter, will the user
(Brad) need to also have python at the same location?
I already built one here for CentOS6:
gs://deepvariant/packages/oss_clif/oss_clif.centos-6.9.latest.tgz
But it seems like @chapmanb <https://github.com/chapmanb> is having
trouble using it.
Ideally we'll be able to specify the location differently at run time than
the one at build time. Do you think that's possible?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQZ2gYshEz7tDrorULEWcrruKM5LrLYks5tuhQ8gaJpZM4RQhCy>
.
--
Thanks,
--Mike
|
Pi-Chuan and Mike;
which I thought was triggered by the difficulty running pyclif without having the local python installed. It could also be due to not installing is in Sorry I'm stuck here due to me limited knowledge of bazel tweaking. Either understanding how to handle a root install of the pre-build pyclif or tweaking to use the local python would be helpful. Alternatively, if you can already build DeepVariant on a CentOS6 system yourself I could use the pre-build binaries the way we're doing now, just with the build against an older glibc. Thanks again for the help with this. |
I'm guessing that that is an effect of Python PIP trying to be helpful.
CLIF provides two Python programs/tools (pyclif and pyclif_proto) which are
Python. PIP (with setup.py) creates tiny launchers for them for user
convenience, but encode build Python path and eg. --py3 option into those
launchers.
When user environment for CLIF use is different from the build environment
those launchers are not correct anymore and needs to be removed/regenerated
or otherwise "fixed" to reflect different conditions.
…On Thu, May 3, 2018 at 3:17 AM Brad Chapman ***@***.***> wrote:
Pi-Chuan and Mike;
Thanks for all this background and help. I'm trying to fit this into the
conda recipe bazel build for DeepVariant but am not sure how to take
advantage of using the local anaconda python in that context. The error I'm
seeing is that bazel can't find pyclif_proto:
(17:56:01) INFO: Found 1 target...
(17:56:01) [0 / 7] [-----] BazelWorkspaceStatusAction stable-status.txt
(17:56:01) ERROR: missing input file ***@***.***//:clif/bin/pyclif_proto'
(17:56:01) ERROR: /opt/conda/conda-bld/deepvariant_1525283132666/work/deepvariant-0.6.1/third_party/nucleus/protos/BUILD:165:1: //third_party/nucleus/protos:variants_pyclif_clif_rule: missing input file ***@***.***//:clif/bin/pyclif_proto'
Target //deepvariant:binaries failed to build
(17:56:01) ERROR: /opt/conda/conda-bld/deepvariant_1525283132666/work/deepvariant-0.6.1/third_party/nucleus/protos/BUILD:165:1 1 input file(s) do not exist
which I thought was triggered by the difficulty running pyclif without
having the local python installed. It could also be due to not installing
is in /usr/local/bin since I have to remain sandboxed in the work
directory, but I did adjust the PATH to include the download location.
Sorry I'm stuck here due to me limited knowledge of bazel tweaking. Either
understanding how to handle a root install of the pre-build pyclif or
tweaking to use the local python would be helpful. Alternatively, if you
can already build DeepVariant on a CentOS6 system yourself I could use the
pre-build binaries the way we're doing now, just with the build against an
older glibc. Thanks again for the help with this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQZ2kaD2Oo0vlzfw55tL9A65ZhknIu-ks5tutlEgaJpZM4RQhCy>
.
--
Thanks,
--Mike
|
@chapmanb I've been spending most of the last 2 days on this, and unfortunately currently stuck at where you are as well: I might have to call it a day today and look at it again tomorrow... |
OK. I noticed that my
And added that to my experimental build-prereq.sh Now I'm seeing a different error:
|
After trying to install a few things that I failed to install before, linking a few paths, I got to this error that concerns me:
It's possible that TensorFlow itself requires a newer version of GLIBC than what's on CentOS 6. @chapmanb Is it possible at all to install this on a different OS? This is getting to a point that I'm worried I'm going down a path with no good ending in sight.. |
Pi-Chuan; For tensorflow, it seems like that was built on system with a more recent glibc than on CentOS6. It is a pain to have the full dependency try be compatible with older glibc. This is part of what makes conda nice, is that you're guaranteed to have this (well, as long as the dependency exists). It looks from that thread you linked that the conda package for tensorflow is all good on CentOS6 if installing from there for your build is doable. Thanks again for helping tackle this; I look forward to working on actual fun things instead of compiling and porting. |
And I make sure the two files are there:
After this change, it seems to run past the part where it can't find clif! Basically the
So I'm currently block on that. Maybe you'll have better luck once you get past 1). Please let me know. |
Pi-Chuan; The issue I'm running into now is that the build setup assumes that the libraries and include files are available in standard locations ( I've tried hacking this include directory into the htslib copts:
but bazel is too smart and won't let us continue with non-bazel defined references:
So at this point I'm stuck by my lack of knowledge of how to incorporate this into the bazel build instructions. I couldn't find any conda bazel builds that already do this as a template and am not familiar enough with it to build up on my own. Would it be possible to make the dependencies you're installing with apt as explicit bazel targets like clif? If so, then I could adjust paths to the conda |
@chapmanb And, I'll try to see if I find some bazel experts internally to look at your questions as well. Maybe this is a very trivial question for people who have seen it before... |
Pi-Chuan; I'm trying to build this inside of bioconda. If you want to get it setup there are instructions here: https://bioconda.github.io/contributing.html and I could share the current recipe I'm working from. Although I don't want to make you wade into a new build system and get familiar with that if we can get more high level bazel advice and sort through improving the DeepVariant build process to handle this case. |
@chapmanb In terms of finding zlib when building with bazel, I wonder if things like these are useful: I wonder if directly asking on the bazel GitHub issues is the best way: Do you mind asking on Bazel issues first? |
+1 for work being done. Thanks! I cannot use the binaries:
On CentOS Linux release 7.2.1511 (Core), HPC cluster if that makes a difference. |
FYI @bgruening - I don't see deepvariant on the galaxy repo. https://depot.galaxyproject.org/singularity/ <- WHERE HAS THIS BEEN ALL MY LIFE? |
@jerowe what do you mean with Galaxy repos? The Singularity image store? |
@bgruening , yes I mean the singularity stores. There is no deep variant in there. ;-( I almost have clif built as a conda package, but its kind of hacky. |
Jillian; Until we can get a native CentOS6 build it unfortunately will have issues on CentOS due to compiling against more recent glibc on Ubuntu in the pre-built binaries. If you have any bazel expertise or want to dig into this, that would be much appreciated. |
@chapmanb , I might be able to help with the bazel builds, and if not I have some other talented folks around who could possibly be bribed. ;-) Do you have a start on it somewhere? |
Currently blocked on not detecting conda installed libraries (zlib and friends). See discussion in google/deepvariant#29
Jillian; https://github.com/chapmanb/bioconda-recipes/tree/deepvariant-compile/recipes/deepvariant Lots of hacking in there to reference the conda python with pyclif but that works and then should get stuck on not detecting zlib during the htslib compile. Let me know if you have any questions and thanks again for helping with this. |
Hi all, |
Hi all;
Thanks for all the help getting an initial conda package in place for DeepVariant (#9) through bioconda.
I wanted to follow up with some suggestions that would help make the pre-built binaries more portable as part of this process, in order of helpfulness for portability:
/usr/bin/python
. Would it be possible to generalize this by using the python that the zip file gets called with (sys.executable
)? I currently patch this in the conda build: https://github.com/bioconda/bioconda-recipes/blob/0a2d467d63d011015efeef4b644e985297b6b271/recipes/deepvariant/build.sh#L22An alternative to points 1 and 3 is making it easier to build DeepVariant as part of the conda build process. The major blocker here is the
clif
dependency which is difficult to build and the pre-built binaries require unpacking into/usr
. If we could make this relocatable and easier to install globally we could build with portable binaries and adjustable numpy as part of the bioconda preparation process.Thanks again for all the help.
The text was updated successfully, but these errors were encountered: