Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python control of CUDA streams #167

Merged
merged 27 commits into from
Aug 5, 2022
Merged

Python control of CUDA streams #167

merged 27 commits into from
Aug 5, 2022

Conversation

jaycedowell
Copy link
Collaborator

@jaycedowell jaycedowell commented Mar 24, 2022

This PR exposes the bfStreamGet and bfStreamSet methods through the Python API. This can help with interfacing Bifrost with other CUDA-aware packages, such as cupy. For example:

import bifrost as bf
from bifrost.device import get_stream
import cupy as cp
import numpy as np

data = np.random.rand(100,1000)
data = data.astype(np.float32)
orig_data = data

stream = get_stream()
with cp.cuda.ExternalStream(stream):
    data = bf.ndarray(data, space='cuda')
    bf.map('a = a + 2', {'a': data})
    data = data.as_cupy()
    data *= 4
    data = cp.asnumpy(data)
print(data[0,:10])
print(orig_data[0,:10])
np.testing.assert_allclose(data, (orig_data+2)*4)

This PR also adds set_stream and get_stream methods to BifrostObjects that support them.

@codecov-commenter
Copy link

codecov-commenter commented Mar 24, 2022

Codecov Report

Merging #167 (96d99ac) into master (f82ae97) will increase coverage by 0.24%.
The diff coverage is 68.75%.

@@            Coverage Diff             @@
##           master     #167      +/-   ##
==========================================
+ Coverage   66.76%   67.00%   +0.24%     
==========================================
  Files          69       69              
  Lines        7416     7464      +48     
==========================================
+ Hits         4951     5001      +50     
+ Misses       2465     2463       -2     
Impacted Files Coverage Δ
python/bifrost/libbifrost.py 71.11% <23.07%> (-8.11%) ⬇️
python/bifrost/device.py 81.63% <85.71%> (+10.20%) ⬆️
python/bifrost/ndarray.py 89.60% <0.00%> (+6.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f82ae97...96d99ac. Read the comment docs.

@jaycedowell jaycedowell mentioned this pull request Mar 25, 2022
@league
Copy link
Collaborator

league commented Mar 25, 2022

Interesting. Is it worthwhile to code something like the cupy example shown above as a test? Not that we need to add cupy as a test dependency… maybe can mock something. Just so as not to cede the coverage score. :) I'll play a little.

@jaycedowell
Copy link
Collaborator Author

I was thinking about adding in some cupy tests yesterday. It would be good from a coverage perspective to be able to test that we can make biforst.ndarrays from cupy.ndarrays and vice versa. This bifrost/cupy interoperability would also be an interesting test case since it would also serve as an example of how to do this. The biggest thing that I see with the interoperability test is checking that the stream binding actually works. I was testing with nvprof/nvvp yesterday but that is way too heavy to put in. The other thing to keep in mind is that our coverage should improve once we have the dedicated test machine and we can run (and report) all of the CUDA-specific tests and paths.

@jaycedowell
Copy link
Collaborator Author

Sort of related to this, if you look in bifrost.ndarray we also have support for pycuda in there. I used to use pycuda but I've really shifted to cupy recently. It might be worth thinking about how important it is to provide interoperability with both or if one is better/easier/sufficient.

@jaycedowell
Copy link
Collaborator Author

I did some reading on pycuda vs cupy today and I think the answer to my previous interoperability question is "yes". They are targeted to two different audiences with cupy being an easy way to move onto the GPU if you have a lot of numpy-based codes and pycuda is more about building things from scratch in a very CUDA way. In that regard how you can interface pycuda and Bifrost is limited to being able to move data between the array formats since I can't find an interface in pycuda that allows you to use an external thread. cupy, on the other hand, does have that feature and I use it in the example.

test/test_interpop.py Outdated Show resolved Hide resolved
@league
Copy link
Collaborator

league commented Mar 29, 2022

Assuming the tests pass, this all looks great to me. (I can probably try it later today.) And it makes a great case for interoperability with other frameworks. Will maybe want to tout that in release notes and documentation next time.

Could it be worthwhile to put BifrostStreamManager into the library, rather than just the test file? You're providing get_stream and set_stream which can be thought of as a lower-level interface, but in most cases of Python programming it seems like calling them within a context manager is The Right Thing.

@league
Copy link
Collaborator

league commented Mar 29, 2022

(Maybe if it's in the library, BifrostStreamManager wants a more precise name? Stream substituter? Stream replacer? StreamRedirect?)

@jaycedowell
Copy link
Collaborator Author

orig_stream and BifrostStreamManager were reactions to problems I was having in test_stream. When another framework creates a stream, Bifrost can use it but we are subject to whatever lifetime was chosen for that stream. When I wasn't being careful I would not save the original Bifrost stream and running this test would make all other tests after it fail because the stream was gone.

Maybe the thing to do is use the terminology of cupy and call it something like ExternalStream or UseExternalStream?

@jaycedowell
Copy link
Collaborator Author

I'm holding off on this to see if I can get a self-hosted runner going on the test machine. It sort of works but there are many mysteries.

@jaycedowell
Copy link
Collaborator Author

Those really slow builds on self-hosted/Python3.[68] are all for this.

@jaycedowell
Copy link
Collaborator Author

One thing we could do is run two self-hosted runner at once, one per GPU.

@league
Copy link
Collaborator

league commented Apr 23, 2022

Also maybe it's not necessary to include the CuPy tests for both 3.6 and 3.8?

@jaycedowell
Copy link
Collaborator Author

The cupy-cuda112 wheel is helping. It dropped the "Software Install - Python, part 2" time down to ~1 min from ~12 min. If we keep using the wheel then we need to make sure we are using the correct one for whatever version of cuda we have in the Dockerfile.

@jaycedowell
Copy link
Collaborator Author

We should be running two runners now, one on each GPU. I should test that.

@jaycedowell
Copy link
Collaborator Author

jaycedowell commented Apr 25, 2022

Two runners gets it down to ~30 minutes. Another way to save time would be to drop the Py2.7 self-hosted test. That would mean giving up Py2.7 testing for the GPU related code.

@jaycedowell
Copy link
Collaborator Author

I think I'm happy with that is in here now that there is the "per-thread default stream" flag in configure. The only thing I'm tempted to continue working on is this seemly random test_matmul_aa_ci8 failure that we sometimes get. The full error is:

ERROR: test_matmul_aa_ci8 (test_linalg.TestLinAlg)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/docker/actions-runner/_work/bifrost/bifrost/test/test_linalg.py", line 302, in test_matmul_aa_ci8
    self.run_test_matmul_aa_ci8_transpose(True)
  File "/home/docker/actions-runner/_work/bifrost/bifrost/test/test_linalg.py", line [29](https://github.com/ledatelescope/bifrost/runs/6833550337?check_suite_focus=true#step:9:30)2, in run_test_matmul_aa_ci8_transpose
    self.run_test_matmul_aa_ci8_shape((99+kp,3+kp),          transpose=transpose)
  File "/home/docker/actions-runner/_work/bifrost/bifrost/test/test_linalg.py", line 72, in run_test_matmul_aa_ci8_shape
    self.linalg.matmul(1, a, None, 0, c)
  File "/home/docker/actions-runner/_work/_tool/Python/2.7.18/x64/lib/python2.7/site-packages/bifrost/linalg.py", line 66, in matmul
    c_array))
  File "/home/docker/actions-runner/_work/_tool/Python/2.7.18/x64/lib/python2.7/site-packages/bifrost/libbifrost.py", line 139, in _check
    raise RuntimeError(status_str)
RuntimeError: BF_STATUS_UNSUPPORTED_SHAPE

@jaycedowell
Copy link
Collaborator Author

I haven't been able to reproduce that failure. Oh well.

@jaycedowell
Copy link
Collaborator Author

@league do you think this one is ready to merge? I can't think of anything else I'd like to see as part of it.

I hand-patched ./configure along with update to configure.ac
@league
Copy link
Collaborator

league commented Aug 5, 2022

With bbd4b67, I thought it might be useful for the config summary to show what happened with the stream-model setting, so it's not just buried in the scroll.

I thought about adding it to python -m bifrost.version --config too, but for the moment it doesn't seem that it's directly passed into Python. (It's not in config.h). But maybe that's fine.

Trying to test on cetus, I discovered another thing: permission denied errors on /dev/shm/bifrost because it's owned by you. The top level of it is globally writable so it may work for my process just to create new stuff underneath, but it gets stuck on the permissions before trying that, I think. I wonder if we should use a UUID under there rather than the same global name for every user?

This happens on every CUDA test:

test_serialize_with_name_no_ringlets (test_serialize.SerializeTest) ... proclog.cpp:176 internal error:
 filesystem error: cannot set permissions: Operation not permitted [/dev/shm/bifrost]                  
ERROR
▶ ls -ld /dev/shm/bifrost
drwxrwxrwx 3 jdowell jdowell 60 Aug  3 09:59 /dev/shm/bifrost/

@jaycedowell
Copy link
Collaborator Author

This problem seems familiar (as in something I've fixed in the past) but looking back through the history the only thing I see is f2691ff. That's a different problem of cleaning up stale contents but has similar "stuff from other users" roots as this. I'm kind of surprised that it is even trying to change the permissions since world writable is what we want. Maybe we need to only try to make /dev/shm/bifrost world writable if it is not already.

The UUID thing is interesting. What are you thinking with this? That there would be a per-user /dev/shm/bifrost-XXXXXX directory?

@league
Copy link
Collaborator

league commented Aug 5, 2022

Regarding /dev/shm/bifrost-UUID, yeah I think so. At first I was thinking we could even reuse the telemetry ID if it's always loaded anyway. But I think any routines to load the telemetry state file are from the Python side, and this might be initiated from C++ side.

What's the number right beneath /dev/shm/bifrost, is it a PID? What's left there now is 1329696. If that's a PID and it's meant to be cleaned up (except when it isn't) then probably best to just use /dev/shm/bifrost world-writable (and don't choke on chmod if it's already world-writable but owned by someone else) but beneath that continue to use user-owned PID subdirs. ?

@league
Copy link
Collaborator

league commented Aug 5, 2022

Ooh! Exciting security-hole opportunity, haha. proclog.cpp uses make_dir from fileutils.cpp so I reread my comment near the top of that file.

/* NOTE: For convenience on systems with compilers without C++17 support, these
   functions build a shell command and pass it to system(). The PATH argument is
   not shell-quoted or otherwise sanitized, so only use with program constants,
   not with data from command line or config files. */

Combine that with the fact that I was just proposing pulling an identifier out of a config file and use it in the filename passed to make_dir. 🧨 🔥

@jaycedowell
Copy link
Collaborator Author

Yeah, that number is the PID. Having everything in one place makes it easier to at least attempt to cleanup after a process died before it could do so itself. I guess that would be true of /dev/shm/bifrost-UUID but you are limited to only cleaning up your own dead processes.

@league
Copy link
Collaborator

league commented Aug 5, 2022

Looks like the permission denied issue is limited to HAVE_CXX_FILESYSTEM. On the command line, this succeeds:

▶ mkdir -p -m 777 /dev/shm/bifrost

but std::filesystem::permissions throws an exception.

@league
Copy link
Collaborator

league commented Aug 5, 2022

Trying this, which is probably closer to what mkdir -p -m does anyway.

  bool created = std::filesystem::create_directories(path);
  if(created) {
    std::filesystem::permissions(path, (std::filesystem::perms) perms,
                                 std::filesystem::perm_options::replace);
  }

@league
Copy link
Collaborator

league commented Aug 5, 2022

Using sudo to nuke your /dev/shm/bifrost so I can make sure the fix creates it correctly for me.

Before, it led to permission denied exceptions when /dev/shm/bifrost had
been created by someone else. (We create it 0777 so can just proceed to
use it even if owner is different.)
@league league merged commit 734d3f8 into master Aug 5, 2022
@jaycedowell jaycedowell deleted the python-streams branch August 5, 2022 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants