Skip to content

Commit

Permalink
Move external allocators into rmm.allocators module to defer imports (#…
Browse files Browse the repository at this point in the history
…1221)

RMM provides callbacks to configure third-party libraries to use RMM
for memory allocation.

Previously, these were defined in the top-level package, but that
requires (potentially expensive) import of the package we're providing
a hook for, since typically we must import that package to define the
callback. This makes importing RMM expensive. To avoid this, move the
callbacks into (not imported by default) sub-modules in
`rmm.allocators`. So, if we want to configure the CuPy allocator, we
now import `rmm_cupy_allocator` from `rmm.allocators.cupy`, and don't
pay the price of importing pytorch.

This change **deprecates** the use of the allocator callbacks in the
top-level `rmm` module in favour of explicit imports from the relevant
`rmm.allocators.XXX` sub-module.

Before these changes, a sampling trace of `import rmm` with
pyinstrument shows:
    
    $ pyinstrument -i 0.01 importrmm.py

      _     ._   __/__   _ _  _  _ _/_   Recorded: 10:19:56  Samples:  67
     /_//_/// /_\ / //_// / //_'/ //     Duration: 0.839     CPU time: 0.837
    /   _/                      v4.4.0

    Program: importrmm.py

    0.839 <module>  importrmm.py:1
    └─ 0.839 <module>  rmm/__init__.py:1
       ├─ 0.315 <module>  rmm/allocators/torch.py:1
       │  └─ 0.315 <module>  torch/__init__.py:1
       │        [96 frames hidden]  torch, <built-in>, enum, inspect, tok...
       ├─ 0.297 <module>  rmm/mr.py:1
       │  └─ 0.297 <module>  rmm/_lib/__init__.py:1
       │     ├─ 0.216 <module>  numba/__init__.py:1
       │     │     [140 frames hidden]  numba, abc, <built-in>, importlib, em...
       │     ├─ 0.040 <module>  numba/cuda/__init__.py:1
       │     │     [34 frames hidden]  numba, asyncio, ssl, <built-in>, re, ...
       │     ├─ 0.030 __new__  enum.py:180
       │     │     [5 frames hidden]  enum, <built-in>
       │     └─ 0.011 [self]  None
       └─ 0.227 <module>  rmm/allocators/cupy.py:1
          └─ 0.227 <module>  cupy/__init__.py:1
                [123 frames hidden]  cupy, pytest, _pytest, attr, <built-i...

That is, almost a full second to import things, most of which is spent
importing pytorch and cupy. These modules are not needed in normal
usage of RMM, so we can defer the imports. Numba is a little bit
trickier, but we can also defer up-front imports, with a final result
that after these changes the same `import rmm` call takes just a tenth
of a second:

    $ pyinstrument -i 0.01 importrmm.py

      _     ._   __/__   _ _  _  _ _/_   Recorded: 10:37:40  Samples:  9
     /_//_/// /_\ / //_// / //_'/ //     Duration: 0.099     CPU time: 0.099
    /   _/                      v4.4.0

    Program: importrmm.py

    0.099 <module>  importrmm.py:1
    └─ 0.099 <module>  rmm/__init__.py:1
       └─ 0.099 <module>  rmm/mr.py:1
          └─ 0.099 <module>  rmm/_lib/__init__.py:1
             ├─ 0.059 <module>  numpy/__init__.py:1
             │     [31 frames hidden]  numpy, re, sre_compile, <built-in>, s...
             ├─ 0.020 __new__  enum.py:180
             │     [2 frames hidden]  enum
             ├─ 0.010 <module>  ctypes/__init__.py:1
             │     [3 frames hidden]  ctypes, <built-in>
             └─ 0.010 _EnumDict.__setitem__  enum.py:89
                   [3 frames hidden]  enum

Closes #1211.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)

URL: #1221
  • Loading branch information
wence- committed Feb 27, 2023
1 parent ef997db commit 1a75350
Show file tree
Hide file tree
Showing 13 changed files with 311 additions and 201 deletions.
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -691,16 +691,18 @@ resources
MemoryResources are highly configurable and can be composed together in different ways.
See `help(rmm.mr)` for more information.

## Using RMM with third-party libraries

### Using RMM with CuPy

You can configure [CuPy](https://cupy.dev/) to use RMM for memory
allocations by setting the CuPy CUDA allocator to
`rmm_cupy_allocator`:

```python
>>> import rmm
>>> from rmm.allocators.cupy import rmm_cupy_allocator
>>> import cupy
>>> cupy.cuda.set_allocator(rmm.rmm_cupy_allocator)
>>> cupy.cuda.set_allocator(rmm_cupy_allocator)
```


Expand All @@ -718,15 +720,15 @@ This can be done in two ways:
1. Setting the environment variable `NUMBA_CUDA_MEMORY_MANAGER`:

```python
$ NUMBA_CUDA_MEMORY_MANAGER=rmm python (args)
$ NUMBA_CUDA_MEMORY_MANAGER=rmm.allocators.numba python (args)
```

2. Using the `set_memory_manager()` function provided by Numba:

```python
>>> from numba import cuda
>>> import rmm
>>> cuda.set_memory_manager(rmm.RMMNumbaManager)
>>> from rmm.allocators.numba import RMMNumbaManager
>>> cuda.set_memory_manager(RMMNumbaManager)
```

**Note:** This only configures Numba to use the current RMM resource for allocations.
Expand All @@ -741,10 +743,11 @@ RMM-managed pool:

```python
import rmm
from rmm.allocators.torch import rmm_torch_allocator
import torch

rmm.reinitialize(pool_allocator=True)
torch.cuda.memory.change_current_allocator(rmm.rmm_torch_allocator)
torch.cuda.memory.change_current_allocator(rmm_torch_allocator)
```

PyTorch and RMM will now share the same memory pool.
Expand All @@ -753,13 +756,14 @@ You can, of course, use a custom memory resource with PyTorch as well:

```python
import rmm
from rmm.allocators.torch import rmm_torch_allocator
import torch

# note that you can configure PyTorch to use RMM either before or
# after changing RMM's memory resource. PyTorch will use whatever
# memory resource is configured to be the "current" memory resource at
# the time of allocation.
torch.cuda.change_current_allocator(rmm.rmm_torch_allocator)
torch.cuda.change_current_allocator(rmm_torch_allocator)

# configure RMM to use a managed memory resource, wrapped with a
# statistics resource adaptor that can report information about the
Expand Down
18 changes: 18 additions & 0 deletions python/docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,21 @@ Memory Resources
:members:
:undoc-members:
:show-inheritance:

Memory Allocators
-----------------

.. automodule:: rmm.allocators.cupy
:members:
:undoc-members:
:show-inheritance:

.. automodule:: rmm.allocators.numba
:members:
:undoc-members:
:show-inheritance:

.. automodule:: rmm.allocators.torch
:members:
:undoc-members:
:show-inheritance:
38 changes: 31 additions & 7 deletions python/docs/basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,35 +131,59 @@ resources
MemoryResources are highly configurable and can be composed together in different ways.
See `help(rmm.mr)` for more information.

## Using RMM with third-party libraries

A number of libraries provide hooks to control their device
allocations. RMM provides implementations of these for
[CuPy](https://cupy.dev),
[numba](https://numba.readthedocs.io/en/stable/), and [PyTorch](https://pytorch.org) in the
`rmm.allocators` submodule. All these approaches configure the library
to use the _current_ RMM memory resource for device
allocations.

### Using RMM with CuPy

You can configure [CuPy](https://cupy.dev/) to use RMM for memory
allocations by setting the CuPy CUDA allocator to
`rmm_cupy_allocator`:
`rmm.allocators.cupy.rmm_cupy_allocator`:

```python
>>> import rmm
>>> from rmm.allocators.cupy import rmm_cupy_allocator
>>> import cupy
>>> cupy.cuda.set_allocator(rmm.rmm_cupy_allocator)
>>> cupy.cuda.set_allocator(rmm_cupy_allocator)
```

### Using RMM with Numba

You can configure Numba to use RMM for memory allocations using the
You can configure [Numba](https://numba.readthedocs.io/en/stable/) to use RMM for memory allocations using the
Numba [EMM Plugin](https://numba.readthedocs.io/en/stable/cuda/external-memory.html#setting-emm-plugin).

This can be done in two ways:

1. Setting the environment variable `NUMBA_CUDA_MEMORY_MANAGER`:

```bash
$ NUMBA_CUDA_MEMORY_MANAGER=rmm python (args)
$ NUMBA_CUDA_MEMORY_MANAGER=rmm.allocators.numba python (args)
```

2. Using the `set_memory_manager()` function provided by Numba:

```python
>>> from numba import cuda
>>> import rmm
>>> cuda.set_memory_manager(rmm.RMMNumbaManager)
>>> from rmm.allocators.numba import RMMNumbaManager
>>> cuda.set_memory_manager(RMMNumbaManager)
```

### Using RMM with PyTorch

You can configure
[PyTorch](https://pytorch.org/docs/stable/notes/cuda.html) to use RMM
for memory allocations using their by configuring the current
allocator.

```python
from rmm.allocators.torch import rmm_torch_allocator
import torch

torch.cuda.memory.change_current_allocator(rmm_torch_allocator)
```
34 changes: 28 additions & 6 deletions python/rmm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,29 +17,51 @@
from rmm.mr import disable_logging, enable_logging, get_log_filenames
from rmm.rmm import (
RMMError,
RMMNumbaManager,
_numba_memory_manager,
is_initialized,
register_reinitialize_hook,
reinitialize,
rmm_cupy_allocator,
rmm_torch_allocator,
unregister_reinitialize_hook,
)

__all__ = [
"DeviceBuffer",
"RMMError",
"RMMNumbaManager",
"disable_logging",
"enable_logging",
"get_log_filenames",
"is_initialized",
"mr",
"register_reinitialize_hook",
"reinitialize",
"rmm_cupy_allocator",
"unregister_reinitialize_hook",
]

__version__ = "23.04.00"


_deprecated_names = {
"rmm_cupy_allocator": "cupy",
"rmm_torch_allocator": "torch",
"RMMNumbaManager": "numba",
"_numba_memory_manager": "numba",
}


def __getattr__(name):
if name in _deprecated_names:
import importlib
import warnings

package = _deprecated_names[name]
warnings.warn(
f"Use of 'rmm.{name}' is deprecated and will be removed. "
f"'{name}' now lives in the 'rmm.allocators.{package}' sub-module, "
"please update your imports.",
FutureWarning,
)
module = importlib.import_module(
f".allocators.{package}", package=__name__
)
return getattr(module, name)
else:
raise AttributeError(f"Module '{__name__}' has no attribute '{name}'")
3 changes: 2 additions & 1 deletion python/rmm/_cuda/gpu.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Copyright (c) 2020, NVIDIA CORPORATION.

import numba.cuda
from cuda import cuda, cudart


Expand Down Expand Up @@ -84,6 +83,8 @@ def runtimeGetVersion():
"""
# TODO: Replace this with `cuda.cudart.cudaRuntimeGetVersion()` when the
# limitation is fixed.
import numba.cuda

major, minor = numba.cuda.runtime.get_version()
return major * 1000 + minor * 10

Expand Down
22 changes: 11 additions & 11 deletions python/rmm/_cuda/stream.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,14 @@ from cuda.ccudart cimport cudaStream_t
from libc.stdint cimport uintptr_t
from libcpp cimport bool

from rmm._lib.cuda_stream cimport CudaStream
from rmm._lib.cuda_stream_view cimport (
cuda_stream_default,
cuda_stream_legacy,
cuda_stream_per_thread,
cuda_stream_view,
)

from numba import cuda

from rmm._lib.cuda_stream cimport CudaStream

from rmm._lib.cuda_stream import CudaStream


cdef class Stream:
def __init__(self, obj=None):
Expand All @@ -46,10 +41,11 @@ cdef class Stream:
self._init_with_new_cuda_stream()
elif isinstance(obj, Stream):
self._init_from_stream(obj)
elif isinstance(obj, cuda.cudadrv.driver.Stream):
self._init_from_numba_stream(obj)
else:
self._init_from_cupy_stream(obj)
try:
self._init_from_numba_stream(obj)
except TypeError:
self._init_from_cupy_stream(obj)

@staticmethod
cdef Stream _from_cudaStream_t(cudaStream_t s, object owner=None):
Expand Down Expand Up @@ -94,8 +90,12 @@ cdef class Stream:
return self.c_is_default()

def _init_from_numba_stream(self, obj):
self._cuda_stream = <cudaStream_t><uintptr_t>(int(obj))
self._owner = obj
from numba import cuda
if isinstance(obj, cuda.cudadrv.driver.Stream):
self._cuda_stream = <cudaStream_t><uintptr_t>(int(obj))
self._owner = obj
else:
raise TypeError(f"Cannot create stream from {type(obj)}")

def _init_from_cupy_stream(self, obj):
try:
Expand Down
Empty file.
44 changes: 44 additions & 0 deletions python/rmm/allocators/cupy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Copyright (c) 2023, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from rmm import _lib as librmm
from rmm._cuda.stream import Stream

try:
import cupy
except ImportError:
cupy = None


def rmm_cupy_allocator(nbytes):
"""
A CuPy allocator that makes use of RMM.
Examples
--------
>>> from rmm.allocators.cupy import rmm_cupy_allocator
>>> import cupy
>>> cupy.cuda.set_allocator(rmm_cupy_allocator)
"""
if cupy is None:
raise ModuleNotFoundError("No module named 'cupy'")

stream = Stream(obj=cupy.cuda.get_current_stream())
buf = librmm.device_buffer.DeviceBuffer(size=nbytes, stream=stream)
dev_id = -1 if buf.ptr else cupy.cuda.device.get_device_id()
mem = cupy.cuda.UnownedMemory(
ptr=buf.ptr, size=buf.size, owner=buf, device_id=dev_id
)
ptr = cupy.cuda.memory.MemoryPointer(mem, 0)

return ptr
Loading

0 comments on commit 1a75350

Please sign in to comment.