Create simple Google Colab demo #158

MilesCranmer · 2021-12-14T10:44:58Z

Google Colab is a web-based Jupyter notebook environment which gives free access to P100 GPUs. I think it will make for a great tool for trying out Bifrost without needing to do any configuration whatsoever; even less configuration than with Docker. (@jaycedowell and I discussed this in a call a month ago and I decided to get it working.)

This PR creates a Jupyter notebook that can be opened in colab, and will automatically configure and install Bifrost, with the GPU interface working(!), for users to try out.

The demo itself is pretty short, but could grow into a full tutorial. The new README link references the live copy of the notebook in the master branch so the colab will mirror the GitHub version.

https://colab.research.google.com/github/ledatelescope/bifrost/blob/master/BifrostDemo.ipynb

This link won't work until this is merged so until then you can use https://colab.research.google.com/drive/129ZH4VAnDPRMH3rR-OPiMr7pzr01ZSqf?usp=sharing.

For the most part the regular installation of Bifrost works (the %%shell Jupyter command can be used to install things in the virtual machine), but the one catch is you need to update LD_LIBRARY_PATH from within python. I also switched to use the autoconf version in #157 but the old installation seems to work also.

Cheers,
Miles

coveralls · 2021-12-14T11:01:11Z

Coverage remained the same at 61.364% when pulling c186633 on google_colab into 1681fde on master.

codecov-commenter · 2021-12-14T11:01:50Z

Codecov Report

Merging #158 (c186633) into master (1681fde) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #158   +/-   ##
=======================================
  Coverage   58.46%   58.46%           
=======================================
  Files          65       65           
  Lines        5549     5549           
=======================================
  Hits         3244     3244           
  Misses       2305     2305

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1681fde...c186633. Read the comment docs.

league · 2021-12-14T12:37:15Z

Nice, look forward to trying it later today. This has been on my task list since that call, but I ran into some snag right away that I haven't found time to solve. (I had not used Colab with GPU before.) I see you built it from the autoconf branch, so that's good. Thanks!

league · 2021-12-14T15:22:02Z

Okay, ran into an issue – could be colab, but could be an issue with ./configure too related to cuda arch detection?

I copied the notebook you linked on your drive into my account. The blocks installing dependencies seemed to proceed okay. For the script that ran the bifrost install, the configure summary looked like this:

configure: cuda: yes - 30 37
configure: numa: yes
configure: hwloc: yes
configure: libvma: no
configure: python bindings: yes
configure: memory alignment: 4096
configure: logging directory: /dev/shm/bifrost
configure: options: native

Bifrost is now ready to be compiled.  Please run 'make'

But then as soon as it started to run make, a failure was reported:

make -C src all
make[1]: Entering directory '/root/bifrost_repo/src'
nvcc fatal   : Unsupported gpu architecture 'compute_30'
Makefile:134: recipe for target 'fft_kernels.o' failed

I ran this in the same session, to see the archs that nvcc supports:

! nvcc --list-gpu-arch
compute_35
compute_37
compute_50
compute_52
compute_53
compute_60
compute_61
compute_62
compute_70
compute_72
compute_75
compute_80
compute_86

So I think the configure reported that 30, 37 would work, but 30 did not. I changed the install script to use

./configure --with-gpu-archs=37

and it seems to be doing better. Does it mean our auto-detection needs work?

league · 2021-12-14T15:41:53Z

Follow-up: potentially useful section of the config.log when it auto-detected.

configure:19313: checking for nvcc
configure:19337: found /usr/local/cuda/bin/nvcc
configure:19350: result: /usr/local/cuda/bin/nvcc
configure:19360: checking for nvprune
configure:19384: found /usr/local/cuda/bin/nvprune
configure:19397: result: /usr/local/cuda/bin/nvprune
configure:19407: checking for cuobjdump
configure:19431: found /usr/local/cuda/bin/cuobjdump
configure:19444: result: /usr/local/cuda/bin/cuobjdump
configure:19455: checking for a working CUDA installation
configure:19477: /usr/local/cuda/bin/nvcc -c  conftest.cpp >&5
configure:19477: $? = 0
configure:19505: /usr/local/cuda/bin/nvcc -o conftest  -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib  -lnuma -lhwloc -lcuda -lcudart conftest.cpp >&5
configure:19505: $? = 0
configure:19507: result: yes
configure:19560: checking which CUDA architectures to target
configure:19622: /usr/local/cuda/bin/nvcc -o conftest -O3 -Xcompiler "-Wall" -DBF_CUDA_ENABLED=1 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib -lcuda -lcudart conftest.cpp >&5
configure:19622: $? = 0
configure:19622: ./conftest
configure:19622: $? = 0
configure:19626: result: 30 37
configure:19644: checking for valid CUDA architectures
configure:19651: result: yes
configure:19657: checking for Pascal-style CUDA managed memory
configure:19668: result: no
configure:19730: checking for /dev/shm
configure:19744: result: yes

jaycedowell · 2021-12-14T16:32:10Z

This was attempt in autoconf to deal with #117 where it appears that you needed to compile with GPU arch. 50 in addition to 5X to have things work on Maxwell. I generalized this to all archs. but maybe it needs some work to prune out things that don't exist in the current CUDA install.

telegraphic · 2021-12-15T02:36:21Z

@MilesCranmer very cool! Nice there's a place with free GPUs.

jaycedowell · 2021-12-15T03:06:11Z

@league it looks like the "valid arch" test isn't working as expected in cuda.m4. It would be interesting to see what the values of ar_requested, ar_supported, ar_valid, and ar_found are on collab.

jaycedowell · 2021-12-15T04:19:44Z

e45ac5d at least gets configure to know that 30 is a bad arch and fail. I'm not sure what the best thing to do here is since the behavior I would want is situation specific:

This should be fatal if a user passed in the archs. to build.
This should be only a warning (along with a dropping of the bad arch.(s) if the archs. were auto-determined.

MilesCranmer · 2021-12-15T10:34:18Z

Thanks!
@league good catch. So while colab has an identical VM for all instances, the GPU itself can be different: P100, T4, or K40 (depending on their availability and whether on free tier or not). The one which showed up in my instance was a P100, and the one which showed up for you is–I think–a K40. So yes it definitely seems like the arch should be autodetected in compilation.

Will add the --with-gpu-archs=37 for now. It works for the P100 too.

jaycedowell · 2021-12-15T16:53:56Z

@MilesCranmer c3450e4 should fix the automatic arch. detection on colab.

jaycedowell · 2021-12-17T23:31:12Z

A couple of things I noticed from today:

In file included from /usr/local/cuda/include/thrust/detail/config/config.h:27:0,
                 from /usr/local/cuda/include/thrust/detail/config.h:23,
                 from /usr/local/cuda/include/thrust/random.h:23,
                 from romein_kernels.cuh:6,
                 from romein.cu:37:
/usr/local/cuda/include/thrust/detail/config/cpp_dialect.h:104:13: warning: Thrust
   requires C++14. Please pass -std=c++14 to your compiler. Define 
   THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
   THRUST_COMPILER_DEPRECATION(C++14, pass -std=c++14 to your compile
r);

and

Building wheels for collected packages: bifrost
  Building wheel for bifrost (setup.py) ... done
  Created wheel for bifrost: filename=bifrost-..-py3-none-any.whl size=177871 sha256=91afb4db4da01046812a8e76775297012187b3f5570f9b4b8aca3b6e65b79847
  Stored in directory: /tmp/pip-ephem-wheel-cache-xxkw04si/wheels/5b/88/bb/4f07f6235f452a6ce297916eba9ef03b0e138f2a0e4cefb35f
  WARNING: Built wheel for bifrost is invalid: Metadata 1.2 mandates PEP 440 version, but '..' is not
Failed to build bifrost

jaycedowell · 2022-02-02T17:37:54Z

bb01d95 takes care of the C++14 stuff. The Python API still has a version of '..'.

jaycedowell · 2022-02-02T17:55:16Z

d1430c3 takes care of the Python version problem.

MilesCranmer · 2022-02-02T19:38:58Z

Works for me! Ready to merge?

After the merge, the README.md link should be updated to https://colab.research.google.com/github/ledatelescope/bifrost/blob/master/BifrostDemo.ipynb

jaycedowell · 2022-02-02T19:43:45Z

Chris is also going to give this a try tomorrow. If that checks out as well then, yes, let's merge this.

league · 2022-02-04T03:04:52Z

Hey guys, I was successful with the colab demo. I successfully built it from the latest commit on autoconf branch (d1430c3), without any special arguments to ./configure this time. As far as I'm concerned, this and that look ready to merge. Nice work!

Initialize colab demo

f5fa2ff

MilesCranmer requested a review from jaycedowell December 14, 2021 10:44

MilesCranmer assigned jaycedowell Dec 14, 2021

Remove leftover comment

c186633

jaycedowell mentioned this pull request Dec 15, 2021

FFT crashes when using Maxwell card #117

Closed

Specify arch for some of the colab GPUs

d846ec9

Try c3450e4 out.

d0c540b

Update to d1430c3 on autoconf.

108732f

jaycedowell approved these changes Feb 4, 2022

View reviewed changes

jaycedowell merged commit be11bf9 into master Feb 4, 2022

jaycedowell deleted the google_colab branch February 4, 2022 14:45

This was referenced Mar 10, 2022

Update tutorials to work easily with Google colab ledatelescope/bifrost_tutorial#4

Closed

Update tutorials to work easily with Google colab ledatelescope/bifrost_tutorial#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create simple Google Colab demo #158

Create simple Google Colab demo #158

MilesCranmer commented Dec 14, 2021 •

edited

coveralls commented Dec 14, 2021 •

edited

codecov-commenter commented Dec 14, 2021 •

edited

league commented Dec 14, 2021

league commented Dec 14, 2021

league commented Dec 14, 2021

jaycedowell commented Dec 14, 2021

telegraphic commented Dec 15, 2021

jaycedowell commented Dec 15, 2021

jaycedowell commented Dec 15, 2021

MilesCranmer commented Dec 15, 2021

jaycedowell commented Dec 15, 2021

jaycedowell commented Dec 17, 2021

jaycedowell commented Feb 2, 2022

jaycedowell commented Feb 2, 2022

MilesCranmer commented Feb 2, 2022

jaycedowell commented Feb 2, 2022

league commented Feb 4, 2022

Create simple Google Colab demo #158

Create simple Google Colab demo #158

Conversation

MilesCranmer commented Dec 14, 2021 • edited

coveralls commented Dec 14, 2021 • edited

codecov-commenter commented Dec 14, 2021 • edited

Codecov Report

league commented Dec 14, 2021

league commented Dec 14, 2021

league commented Dec 14, 2021

jaycedowell commented Dec 14, 2021

telegraphic commented Dec 15, 2021

jaycedowell commented Dec 15, 2021

jaycedowell commented Dec 15, 2021

MilesCranmer commented Dec 15, 2021

jaycedowell commented Dec 15, 2021

jaycedowell commented Dec 17, 2021

jaycedowell commented Feb 2, 2022

jaycedowell commented Feb 2, 2022

MilesCranmer commented Feb 2, 2022

jaycedowell commented Feb 2, 2022

league commented Feb 4, 2022

MilesCranmer commented Dec 14, 2021 •

edited

coveralls commented Dec 14, 2021 •

edited

codecov-commenter commented Dec 14, 2021 •

edited