Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create simple Google Colab demo #158

Merged
merged 5 commits into from Feb 4, 2022
Merged

Create simple Google Colab demo #158

merged 5 commits into from Feb 4, 2022

Conversation

MilesCranmer
Copy link
Collaborator

@MilesCranmer MilesCranmer commented Dec 14, 2021

Google Colab is a web-based Jupyter notebook environment which gives free access to P100 GPUs. I think it will make for a great tool for trying out Bifrost without needing to do any configuration whatsoever; even less configuration than with Docker. (@jaycedowell and I discussed this in a call a month ago and I decided to get it working.)

This PR creates a Jupyter notebook that can be opened in colab, and will automatically configure and install Bifrost, with the GPU interface working(!), for users to try out.

The demo itself is pretty short, but could grow into a full tutorial. The new README link references the live copy of the notebook in the master branch so the colab will mirror the GitHub version.

https://colab.research.google.com/github/ledatelescope/bifrost/blob/master/BifrostDemo.ipynb

This link won't work until this is merged so until then you can use https://colab.research.google.com/drive/129ZH4VAnDPRMH3rR-OPiMr7pzr01ZSqf?usp=sharing.

For the most part the regular installation of Bifrost works (the %%shell Jupyter command can be used to install things in the virtual machine), but the one catch is you need to update LD_LIBRARY_PATH from within python. I also switched to use the autoconf version in #157 but the old installation seems to work also.

Cheers,
Miles

@coveralls
Copy link

coveralls commented Dec 14, 2021

Coverage Status

Coverage remained the same at 61.364% when pulling c186633 on google_colab into 1681fde on master.

@codecov-commenter
Copy link

codecov-commenter commented Dec 14, 2021

Codecov Report

Merging #158 (c186633) into master (1681fde) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #158   +/-   ##
=======================================
  Coverage   58.46%   58.46%           
=======================================
  Files          65       65           
  Lines        5549     5549           
=======================================
  Hits         3244     3244           
  Misses       2305     2305           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1681fde...c186633. Read the comment docs.

@league
Copy link
Collaborator

league commented Dec 14, 2021

Nice, look forward to trying it later today. This has been on my task list since that call, but I ran into some snag right away that I haven't found time to solve. (I had not used Colab with GPU before.) I see you built it from the autoconf branch, so that's good. Thanks!

@league
Copy link
Collaborator

league commented Dec 14, 2021

Okay, ran into an issue – could be colab, but could be an issue with ./configure too related to cuda arch detection?

I copied the notebook you linked on your drive into my account. The blocks installing dependencies seemed to proceed okay. For the script that ran the bifrost install, the configure summary looked like this:

configure: cuda: yes - 30 37
configure: numa: yes
configure: hwloc: yes
configure: libvma: no
configure: python bindings: yes
configure: memory alignment: 4096
configure: logging directory: /dev/shm/bifrost
configure: options: native

Bifrost is now ready to be compiled.  Please run 'make'

But then as soon as it started to run make, a failure was reported:

make -C src all
make[1]: Entering directory '/root/bifrost_repo/src'
nvcc fatal   : Unsupported gpu architecture 'compute_30'
Makefile:134: recipe for target 'fft_kernels.o' failed

I ran this in the same session, to see the archs that nvcc supports:

! nvcc --list-gpu-arch
compute_35
compute_37
compute_50
compute_52
compute_53
compute_60
compute_61
compute_62
compute_70
compute_72
compute_75
compute_80
compute_86

So I think the configure reported that 30, 37 would work, but 30 did not. I changed the install script to use

./configure --with-gpu-archs=37

and it seems to be doing better. Does it mean our auto-detection needs work?

@league
Copy link
Collaborator

league commented Dec 14, 2021

Follow-up: potentially useful section of the config.log when it auto-detected.

configure:19313: checking for nvcc
configure:19337: found /usr/local/cuda/bin/nvcc
configure:19350: result: /usr/local/cuda/bin/nvcc
configure:19360: checking for nvprune
configure:19384: found /usr/local/cuda/bin/nvprune
configure:19397: result: /usr/local/cuda/bin/nvprune
configure:19407: checking for cuobjdump
configure:19431: found /usr/local/cuda/bin/cuobjdump
configure:19444: result: /usr/local/cuda/bin/cuobjdump
configure:19455: checking for a working CUDA installation
configure:19477: /usr/local/cuda/bin/nvcc -c  conftest.cpp >&5
configure:19477: $? = 0
configure:19505: /usr/local/cuda/bin/nvcc -o conftest  -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib  -lnuma -lhwloc -lcuda -lcudart conftest.cpp >&5
configure:19505: $? = 0
configure:19507: result: yes
configure:19560: checking which CUDA architectures to target
configure:19622: /usr/local/cuda/bin/nvcc -o conftest -O3 -Xcompiler "-Wall" -DBF_CUDA_ENABLED=1 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib -lcuda -lcudart conftest.cpp >&5
configure:19622: $? = 0
configure:19622: ./conftest
configure:19622: $? = 0
configure:19626: result: 30 37
configure:19644: checking for valid CUDA architectures
configure:19651: result: yes
configure:19657: checking for Pascal-style CUDA managed memory
configure:19668: result: no
configure:19730: checking for /dev/shm
configure:19744: result: yes

@jaycedowell
Copy link
Collaborator

This was attempt in autoconf to deal with #117 where it appears that you needed to compile with GPU arch. 50 in addition to 5X to have things work on Maxwell. I generalized this to all archs. but maybe it needs some work to prune out things that don't exist in the current CUDA install.

@telegraphic
Copy link
Collaborator

@MilesCranmer very cool! Nice there's a place with free GPUs.

@jaycedowell
Copy link
Collaborator

@league it looks like the "valid arch" test isn't working as expected in cuda.m4. It would be interesting to see what the values of ar_requested, ar_supported, ar_valid, and ar_found are on collab.

@jaycedowell
Copy link
Collaborator

e45ac5d at least gets configure to know that 30 is a bad arch and fail. I'm not sure what the best thing to do here is since the behavior I would want is situation specific:

  • This should be fatal if a user passed in the archs. to build.
  • This should be only a warning (along with a dropping of the bad arch.(s) if the archs. were auto-determined.

@MilesCranmer
Copy link
Collaborator Author

Thanks!
@league good catch. So while colab has an identical VM for all instances, the GPU itself can be different: P100, T4, or K40 (depending on their availability and whether on free tier or not). The one which showed up in my instance was a P100, and the one which showed up for you is–I think–a K40. So yes it definitely seems like the arch should be autodetected in compilation.

Will add the --with-gpu-archs=37 for now. It works for the P100 too.

@jaycedowell
Copy link
Collaborator

@MilesCranmer c3450e4 should fix the automatic arch. detection on colab.

@jaycedowell
Copy link
Collaborator

A couple of things I noticed from today:

In file included from /usr/local/cuda/include/thrust/detail/config/config.h:27:0,
                 from /usr/local/cuda/include/thrust/detail/config.h:23,
                 from /usr/local/cuda/include/thrust/random.h:23,
                 from romein_kernels.cuh:6,
                 from romein.cu:37:
/usr/local/cuda/include/thrust/detail/config/cpp_dialect.h:104:13: warning: Thrust
   requires C++14. Please pass -std=c++14 to your compiler. Define 
   THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
   THRUST_COMPILER_DEPRECATION(C++14, pass -std=c++14 to your compile
r);

and

Building wheels for collected packages: bifrost
  Building wheel for bifrost (setup.py) ... done
  Created wheel for bifrost: filename=bifrost-..-py3-none-any.whl size=177871 sha256=91afb4db4da01046812a8e76775297012187b3f5570f9b4b8aca3b6e65b79847
  Stored in directory: /tmp/pip-ephem-wheel-cache-xxkw04si/wheels/5b/88/bb/4f07f6235f452a6ce297916eba9ef03b0e138f2a0e4cefb35f
  WARNING: Built wheel for bifrost is invalid: Metadata 1.2 mandates PEP 440 version, but '..' is not
Failed to build bifrost

@jaycedowell
Copy link
Collaborator

bb01d95 takes care of the C++14 stuff. The Python API still has a version of '..'.

@jaycedowell
Copy link
Collaborator

d1430c3 takes care of the Python version problem.

@MilesCranmer
Copy link
Collaborator Author

Works for me! Ready to merge?

After the merge, the README.md link should be updated to https://colab.research.google.com/github/ledatelescope/bifrost/blob/master/BifrostDemo.ipynb

@jaycedowell
Copy link
Collaborator

Chris is also going to give this a try tomorrow. If that checks out as well then, yes, let's merge this.

@league
Copy link
Collaborator

league commented Feb 4, 2022

Hey guys, I was successful with the colab demo. I successfully built it from the latest commit on autoconf branch (d1430c3), without any special arguments to ./configure this time. As far as I'm concerned, this and that look ready to merge. Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants