Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA 9.2 has been released #2084

Closed
jchodera opened this issue May 24, 2018 · 42 comments
Closed

CUDA 9.2 has been released #2084

jchodera opened this issue May 24, 2018 · 42 comments

Comments

@jchodera
Copy link
Member

jchodera commented May 24, 2018

https://developer.nvidia.com/cuda-toolkit/whatsnew

We should update our dev builds to 9.2 and think about what might go into a new OpenMM release.

I'm pretty sure we can restructure our build framework to use something similar to the pytorch system to allow the user to select their CUDA version by installing a stub feature package:

conda install --yes -c omnia -c conda-forge cuda92 openmm

pytorch appears to use a single docker image where multiple versions of CUDA are installed in different paths, and a symlink can then be used to select among paths. We may instead be able to use a different docker container for each CUDA version if that proves problematic.

It will probably take us about a week to get this working.

@jchodera
Copy link
Member Author

I think we'll be able to test out multi-CUDA builds this week, starting with the dev label nightly builds of development branch releases (which are currently tagged 7.3.0 in the dev channel).

@Lnaden has a docker image that includes texlive-2018 with CUDA 7.5, 8.0, 9.0, 9.1, and 9.2. We may need to split this up into different images due to image size, but we can start building variants that are paired to each of these CUDA versions.

How should we test these aside from the quick test that runs in the docker image? Does AWS have an easy way to spin up instances using AMIs with different CUDA versions installed? We have CUDA 9.0 and 7.5 clusters we can test on physically.

@peastman
Copy link
Member

I haven't done anything with AWS in years. That sounds like the sort of thing that ought to be easy, but I'm not sure. You might also be able to use a single image with all the versions available. That's how a lot of clusters handle it: you put module load cuda/9.2 at the start of your script to tell it which version to use.

@jchodera
Copy link
Member Author

Ideally, we'd test with different versions of the driver that comes with the CUDA install.

@jchodera
Copy link
Member Author

@peastman : What's the earliest version of CUDA that the current OpenMM code will work with?

@peastman
Copy link
Member

I'm not sure. It's been quite a while since we intentionally dropped support for a version, but we also haven't tested with old versions in a long time. Going back to 7.5 is probably plenty early enough.

@jchodera
Copy link
Member Author

OK with me. It should be easier to add earlier versions if needed.

We've almost got all the docker images built. The next step is to automate the builds for the dev label.

@jchodera
Copy link
Member Author

@peastman: Do we want to stick with clang-3.8.1, or upgrade to a more recent release?

The releases available to us without great pain are in the conda-forge channel.

@peastman
Copy link
Member

This is probably a good time for upgrading. Let's try using the most recent release and see if any problems come up.

@jchodera
Copy link
Member Author

Thanks! Moving the discussion of the test builds to here:
omnia-md/conda-dev-recipes#131

@jchodera
Copy link
Member Author

CUDA 9.2 builds are available for testing: https://anaconda.org/omnia/openmm/files

conda install -c omnia/label/dev openmm==7.3.2

I'm still awaiting access to CUDA 9.2 upgraded nodes (should have those Monday) for local testing.

If this process seems to work, I'll proceed to automate all of the other CUDA builds.

@jchodera
Copy link
Member Author

@peastman : The CUDA 9.2 build (openmm==7.3.2) checks out on our system.

Could you give it a try too? If it works OK for you, then we can automate the builds for CUDA 7.5-9.2 on conda-dev-recipes and be ready to do this for OpenMM 7.3.

@linusyukwong
Copy link

May I ask how to i compile from source for CUDA9.2?

@jchodera
Copy link
Member Author

Compilation instructions are here:

http://docs.openmm.org/latest/userguide/library.html#compiling-openmm-from-source-code

Just make sure you have the CUDA 9.2 toolkit installed!

@linusyukwong
Copy link

Thank you for the instructions! However, I have been stuck at:
[ 67%] Linking CXX shared library ../../../libOpenMMCUDA.so
[ 67%] Built target OpenMMCUDA
Makefile:138: recipe for target 'all' failed
make: *** [all] Error 2

@peastman
Copy link
Member

There should be an error message somewhere earlier in the output describing what the problem was. Can you find it and post it? Or if that isn't possible, post a link to the full output.

@jchodera
Copy link
Member Author

jchodera commented Jun 28, 2018

@peastman : Multi-CUDA dev builds are here!

Can you try out the following?

conda install --yes -c omnia -c omnia/label/dev cuda92 openmm==7.3.0
conda install --yes -c omnia -c omnia/label/dev cuda91 openmm==7.3.0
conda install --yes -c omnia -c omnia/label/dev cuda90 openmm==7.3.0
conda install --yes -c omnia -c omnia/label/dev cuda80 openmm==7.3.0
conda install --yes -c omnia -c omnia/label/dev cuda75 openmm==7.3.0

You may need to remove the packages in between with

conda remove --yes openmm cudaXY
conda clean -tipsy

where cudaXY is replaced with the cuda feature package uyou are removing (e.g. cuda75).

@linusyukwong
Copy link

@peastman I am able to compile if i only build for Cuda. If I build library with OpenCL or PME for CPU, the make will fail.

@jchodera
Copy link
Member Author

@linusyukwong: We can't help you unless you can post the error messages indicating why the build is failing. Better yet, can you ZIP up and post the whole build output, along with you or CMakeLists.txt file?

Also, any chance you can use the conda installs instead since you seem to be having trouble compiling OpenMM?

@peastman
Copy link
Member

I'll give it a try. What happens if the user has multiple cudaXX packages installed?

@jchodera
Copy link
Member Author

What happens if the user has multiple cudaXX packages installed?

Right now, you can accidentally install multiple cudaXY feature packages installed, and only one of those will be used to install OpenMM. I think that we can figure out how to ensure at most one feature package can be installed once we upgrade to conda-build 3.

For now, just making sure that the package works as expected (since we upgraded to clang 6 and are auto-building CUDA and OpenCL versions in a different way) is the focus.

@jchodera
Copy link
Member Author

It looks like anaconda has been distributing cudatoolkit versions matching CUDA releases, though they seem to be missing 9.1 and 9.2:
https://anaconda.org/anaconda/cudatoolkit/files

I wonder if we could instead add these as dependencies and dispense with the cudaXY stub features. That would allow (1) installing the latest OpenMM to automatically bring in a compatible CUDA toolkit, and (2) users with earlier drivers or who wanted to play with earlier CUDA versions to force-install earlier versions of cudatoolkit.

@peastman
Copy link
Member

Perhaps we can find out who maintains it and see if we can get them to add 9.1 and 9.2?

@jchodera
Copy link
Member Author

I've asked here: ContinuumIO/anaconda-recipes#140

@jchodera
Copy link
Member Author

jchodera commented Jul 1, 2018

If one doesn't specify a CUDA feature package (e.g. cuda92), should the one compiled for the most recent CUDA release be delivered? Or an OpenCL-only variant?

@peastman
Copy link
Member

peastman commented Jul 1, 2018

It should install the CUDA 9.2 version. If CUDA 9.3 comes out and we then add a build for that, it should continue to install the CUDA 9.2 version by default.

I see this as being a purely optional feature that should only make installation easier, never harder. Suppose we weren't adding this feature, and we were building this release the same way we've done past releases. In that case we would compile it against CUDA 9.2, and we would document that it required that specific version. Users would then install it with a simple conda install openmm or update from an earlier release with conda update openmm. That should still work, and it shouldn't require typing a single extra character.

The one case where the old method failed was if someone needed to use a different CUDA version. In that case, they had to compile from source. So the goal is to make that case easier without making the normal case harder.

Another possibility we could consider is doing this with labels. If someone wanted the CUDA 8.0 build, they could install it with conda install -c omnia/label/cuda80 openmm.

@jchodera
Copy link
Member Author

jchodera commented Jul 1, 2018

Another possibility we could consider is doing this with labels. If someone wanted the CUDA 8.0 build, they could install it with conda install -c omnia/label/cuda80 openmm.

Unfortunately, I don't think this would work. When I tested this earlier, it was clear that a specific package build could have several labels (e.g. osx-64/openmm-7.2.2-py36_1.tar.bz2 has both main and rc labels), but I don't think we can have osx-64/openmm-7.3.0-py36_cuda92_0.tar.bz2 on the cuda92 label and osx-64/openmm-7.3.0-py36_cuda91_0.tar.bz2 on the cuda91 label. I can test this again, however, since that would be simplest!

@jchodera
Copy link
Member Author

jchodera commented Jul 1, 2018

Surprisingly, this seems to be possible now! Check this out:
https://anaconda.org/jchodera/openmm/files

Let's give this a try and see if it satisfies your desiderata. For now, you'll add -c jchodera, I've only uploaded the py36 packages, and you have to request openmm==7.3.0, but this wouldn't be necessary once we do this for omnia.

Can you try this?

Install default CUDA 9.2 version from channel

conda install -c jchodera openmm==7.3.0

Install specific CUDA 9.0 version:

conda install -c jchodera/label/cuda90 openmm==7.3.0

etc.

@peastman
Copy link
Member

peastman commented Jul 2, 2018

The command

conda install -c jchodera/label/cuda90 openmm==7.3.0

gives this error:

Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - openmm==7.3.0
  - *[track_features=cuda90]

Current channels:

  - https://conda.anaconda.org/jchodera/label/cuda90/linux-64
  - https://conda.anaconda.org/jchodera/label/cuda90/noarch
  - https://repo.anaconda.com/pkgs/main/linux-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/free/linux-64
  - https://repo.anaconda.com/pkgs/free/noarch
  - https://repo.anaconda.com/pkgs/r/linux-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/pro/linux-64
  - https://repo.anaconda.com/pkgs/pro/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

@jchodera
Copy link
Member Author

jchodera commented Jul 2, 2018

Sorry, had assumed you already had omnia in your channel list.

Try

conda install -c omnia -c conda-forge -c jchodera/label/cuda90 openmm==7.3.0

@peastman
Copy link
Member

peastman commented Jul 2, 2018

I still get the same error message.

@peastman
Copy link
Member

peastman commented Jul 2, 2018

I also get the same error if I leave off the label, so that doesn't seem to be the problem. I'm not sure what's going on. Conda can see the packages on your channel:

$ conda search jchodera::*
Loading channels: done
# Name                  Version           Build  Channel             
ambermini                    13               0  jchodera            
ambermini                    14               0  jchodera            
ambermini                14.0.1               0  jchodera            
cuda80                      1.0               0  jchodera            
cuda90                      1.0               0  jchodera            
cuda91                      1.0               0  jchodera            
cuda92                      1.0               0  jchodera            
openmm                    7.3.0   py36_cuda92_0  jchodera            
yank            d342139d92d1ce893d89f5d60ef04d3b8ba98211          np18_0  jchodera            
yank-dev                  0.1.0          np18_0  jchodera

But for some reason it can't install them. Here's another strange behavior that might or might not be a clue.

$ conda install -c jchodera openmm=7.3.0
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - openmm=7.3.0
  - fftw3f
  - openmm=7.3.0
  - *[track_features=cuda92]

Why does it list openmm twice? If I add -c omnia, one of the two disappears and it only lists openmm once in the list of unavailable packages.

@jchodera
Copy link
Member Author

jchodera commented Jul 3, 2018

This is super weird. I cleaned up some of those other unnecessary things (like cuda80) that also appear in omnia.

Can you try temporarily mving your miniconda and ~/.condarc out of the way and reinstalling from scratch?

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda
export PATH=$HOME/miniconda/bin:$PATH

Then you should be able to grab OpenMM with

conda install -c omnia -c conda-forge -c jchodera/label/cuda90 openmm==7.3.0

which should see just the package under the cuda90 label.

@jchodera
Copy link
Member Author

jchodera commented Jul 3, 2018

If that doesn't work, I wonder if the order is important here?

conda install -c jchodera/label/cuda90 -c omnia -c conda-forge openmm==7.3.0

@jchodera
Copy link
Member Author

jchodera commented Jul 3, 2018

Oh, I wonder if the issue is that I have the cudaXY feature enabled as a requirement. Let me try to rebuild without those and send the package to different channels.

@peastman
Copy link
Member

peastman commented Jul 6, 2018

Let me try to rebuild without those and send the package to different channels.

Have you done that? What channel should I try to install it from?

@jchodera
Copy link
Member Author

jchodera commented Jul 6, 2018

Still sorting this out. So far, we're still using the cudaXY feature packages, but building master as 7.3.0 and pushing to dev.

@jchodera
Copy link
Member Author

jchodera commented Jul 6, 2018

There's also a PR open to add the actual CUDA toolkits to conda-forge, which would enormously simplify our lives!

conda-forge/conda-forge.github.io#63 (comment)

That PR would just install the libraries that NVIDIA allows redistribution of. This is presumably not sufficient for building OpenMM, but could be sufficient for delivering OpenMM without CUDA toolkit installation if I understand correctly.

@jchodera
Copy link
Member Author

jchodera commented Jul 8, 2018

@peastman : The conda-dev-recipes now pushes the OpenMM packages to different labels.

If you want a dev build for a specific version of CUDA---say CUDA 7.5---you can use

conda install --yes -c omnia/label/cuda75 openmm

If you want the default dev build for the latest CUDA (CUDA 9.2), you can use

conda install --yes -c omnia/label/dev openmm

Later, when we do this for the OpenMM release, this default would just be

conda install --yes -c omnia openmm

and we can make the dev builds

conda install --yes -c omnia/label/cuda75-dev openmm

Can you test this out?

@peastman
Copy link
Member

peastman commented Jul 9, 2018

It works now!

@jchodera
Copy link
Member Author

@peastman: Great! I think we should still do some benchmarking to make sure the CUDA and OpenCL performance is as expected, and that we haven't accidentally degraded CPU performance with the compiler toolchain we're using now.

If all looks good, I propose we use this approach for an OpenMM 7.3 release "soon-ish". We can push release (candidate) builds of 7.3.0 to different CUDA channels, test them, and then add the main label to the CUDA 9.2 version once everyone is happy it performs as expected.

@Lnaden is currently looking into upgrading our conda build infrastructure to the new conda-build 3, which supports things like build variants. We're not sure how long this will take, so I suggest we not wait on that infrastructure overhaul.

There are also interesting things afoot with conda-forge potentially building CUDA toolkit library packages that we could use as dependencies down the road.

@jchodera
Copy link
Member Author

@peastman : We could simplify the conda installation process a great deal if we can use the anaconda supplied cudatoolkit package that provides the CUDA Toolkit libraries needed during runtime.

Currently, only toolkit versions up to 9.0 are provided. This issue comment notes that this is awaiting a workable recipe for packaging 9.2, and a pull request to this repository that houses the packaging scripts and recipes would be welcomed.

For OpenMM 7.3, we could either contribute to this and encourage the release of updated cudatoolkit packages that we can use as dependencies to streamline toolkit installation, or we could continue to go with our current approach where we require the user to install the toolkit themselves.

@jchodera
Copy link
Member Author

We now have a scheme for deploying different CUDA builds to different labels, and are monitoring the potential for using the cudatoolkit packages in the future to further streamline this process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants