Move external allocators into rmm.allocators module to defer imports (#…

…1221) RMM provides callbacks to configure third-party libraries to use RMM for memory allocation. Previously, these were defined in the top-level package, but that requires (potentially expensive) import of the package we're providing a hook for, since typically we must import that package to define the callback. This makes importing RMM expensive. To avoid this, move the callbacks into (not imported by default) sub-modules in `rmm.allocators`. So, if we want to configure the CuPy allocator, we now import `rmm_cupy_allocator` from `rmm.allocators.cupy`, and don't pay the price of importing pytorch. This change **deprecates** the use of the allocator callbacks in the top-level `rmm` module in favour of explicit imports from the relevant `rmm.allocators.XXX` sub-module. Before these changes, a sampling trace of `import rmm` with pyinstrument shows: $ pyinstrument -i 0.01 importrmm.py _ ._ __/__ _ _ _ _ _/_ Recorded: 10:19:56 Samples: 67 /_//_/// /_\ / //_// / //_'/ // Duration: 0.839 CPU time: 0.837 / _/ v4.4.0 Program: importrmm.py 0.839 <module> importrmm.py:1 └─ 0.839 <module> rmm/__init__.py:1 ├─ 0.315 <module> rmm/allocators/torch.py:1 │ └─ 0.315 <module> torch/__init__.py:1 │ [96 frames hidden] torch, <built-in>, enum, inspect, tok... ├─ 0.297 <module> rmm/mr.py:1 │ └─ 0.297 <module> rmm/_lib/__init__.py:1 │ ├─ 0.216 <module> numba/__init__.py:1 │ │ [140 frames hidden] numba, abc, <built-in>, importlib, em... │ ├─ 0.040 <module> numba/cuda/__init__.py:1 │ │ [34 frames hidden] numba, asyncio, ssl, <built-in>, re, ... │ ├─ 0.030 __new__ enum.py:180 │ │ [5 frames hidden] enum, <built-in> │ └─ 0.011 [self] None └─ 0.227 <module> rmm/allocators/cupy.py:1 └─ 0.227 <module> cupy/__init__.py:1 [123 frames hidden] cupy, pytest, _pytest, attr, <built-i... That is, almost a full second to import things, most of which is spent importing pytorch and cupy. These modules are not needed in normal usage of RMM, so we can defer the imports. Numba is a little bit trickier, but we can also defer up-front imports, with a final result that after these changes the same `import rmm` call takes just a tenth of a second: $ pyinstrument -i 0.01 importrmm.py _ ._ __/__ _ _ _ _ _/_ Recorded: 10:37:40 Samples: 9 /_//_/// /_\ / //_// / //_'/ // Duration: 0.099 CPU time: 0.099 / _/ v4.4.0 Program: importrmm.py 0.099 <module> importrmm.py:1 └─ 0.099 <module> rmm/__init__.py:1 └─ 0.099 <module> rmm/mr.py:1 └─ 0.099 <module> rmm/_lib/__init__.py:1 ├─ 0.059 <module> numpy/__init__.py:1 │ [31 frames hidden] numpy, re, sre_compile, <built-in>, s... ├─ 0.020 __new__ enum.py:180 │ [2 frames hidden] enum ├─ 0.010 <module> ctypes/__init__.py:1 │ [3 frames hidden] ctypes, <built-in> └─ 0.010 _EnumDict.__setitem__ enum.py:89 [3 frames hidden] enum Closes #1211. Authors: - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #1221
rapidsai · Feb 27, 2023 · 1a75350 · 1a75350
1 parent ef997db
commit 1a75350
Show file tree

Hide file tree

Showing 13 changed files with 311 additions and 201 deletions.
diff --git a/README.md b/README.md
@@ -691,16 +691,18 @@ resources
 MemoryResources are highly configurable and can be composed together in different ways.
 See `help(rmm.mr)` for more information.
 
+## Using RMM with third-party libraries
+
 ### Using RMM with CuPy
 
 You can configure [CuPy](https://cupy.dev/) to use RMM for memory
 allocations by setting the CuPy CUDA allocator to
 `rmm_cupy_allocator`:
 
 ```python
->>> import rmm
+>>> from rmm.allocators.cupy import rmm_cupy_allocator
 >>> import cupy
->>> cupy.cuda.set_allocator(rmm.rmm_cupy_allocator)
+>>> cupy.cuda.set_allocator(rmm_cupy_allocator)
 ```
 
 
@@ -718,15 +720,15 @@ This can be done in two ways:
 1. Setting the environment variable `NUMBA_CUDA_MEMORY_MANAGER`:
 
   ```python
-  $ NUMBA_CUDA_MEMORY_MANAGER=rmm python (args)
+  $ NUMBA_CUDA_MEMORY_MANAGER=rmm.allocators.numba python (args)
   ```
 
 2. Using the `set_memory_manager()` function provided by Numba:
 
   ```python
   >>> from numba import cuda
-  >>> import rmm
-  >>> cuda.set_memory_manager(rmm.RMMNumbaManager)
+  >>> from rmm.allocators.numba import RMMNumbaManager
+  >>> cuda.set_memory_manager(RMMNumbaManager)
   ```
 
 **Note:** This only configures Numba to use the current RMM resource for allocations.
@@ -741,10 +743,11 @@ RMM-managed pool:
 
 ```python
 import rmm
+from rmm.allocators.torch import rmm_torch_allocator
 import torch
 
 rmm.reinitialize(pool_allocator=True)
-torch.cuda.memory.change_current_allocator(rmm.rmm_torch_allocator)
+torch.cuda.memory.change_current_allocator(rmm_torch_allocator)
 ```
 
 PyTorch and RMM will now share the same memory pool.
@@ -753,13 +756,14 @@ You can, of course, use a custom memory resource with PyTorch as well:
 
 ```python
 import rmm
+from rmm.allocators.torch import rmm_torch_allocator
 import torch
 
 # note that you can configure PyTorch to use RMM either before or
 # after changing RMM's memory resource.  PyTorch will use whatever
 # memory resource is configured to be the "current" memory resource at
 # the time of allocation.
-torch.cuda.change_current_allocator(rmm.rmm_torch_allocator)
+torch.cuda.change_current_allocator(rmm_torch_allocator)
 
 # configure RMM to use a managed memory resource, wrapped with a
 # statistics resource adaptor that can report information about the

diff --git a/python/docs/api.rst b/python/docs/api.rst
@@ -17,3 +17,21 @@ Memory Resources
    :members:
    :undoc-members:
    :show-inheritance:
+
+Memory Allocators
+-----------------
+
+.. automodule:: rmm.allocators.cupy
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+.. automodule:: rmm.allocators.numba
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+.. automodule:: rmm.allocators.torch
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/python/docs/basics.md b/python/docs/basics.md
@@ -131,35 +131,59 @@ resources
 MemoryResources are highly configurable and can be composed together in different ways.
 See `help(rmm.mr)` for more information.
 
+## Using RMM with third-party libraries
+
+A number of libraries provide hooks to control their device
+allocations. RMM provides implementations of these for
+[CuPy](https://cupy.dev),
+[numba](https://numba.readthedocs.io/en/stable/), and [PyTorch](https://pytorch.org) in the
+`rmm.allocators` submodule. All these approaches configure the library
+to use the _current_ RMM memory resource for device
+allocations.
+
 ### Using RMM with CuPy
 
 You can configure [CuPy](https://cupy.dev/) to use RMM for memory
 allocations by setting the CuPy CUDA allocator to
-`rmm_cupy_allocator`:
+`rmm.allocators.cupy.rmm_cupy_allocator`:
 
 ```python
->>> import rmm
+>>> from rmm.allocators.cupy import rmm_cupy_allocator
 >>> import cupy
->>> cupy.cuda.set_allocator(rmm.rmm_cupy_allocator)
+>>> cupy.cuda.set_allocator(rmm_cupy_allocator)
 ```
 
 ### Using RMM with Numba
 
-You can configure Numba to use RMM for memory allocations using the
+You can configure [Numba](https://numba.readthedocs.io/en/stable/) to use RMM for memory allocations using the
 Numba [EMM Plugin](https://numba.readthedocs.io/en/stable/cuda/external-memory.html#setting-emm-plugin).
 
 This can be done in two ways:
 
 1. Setting the environment variable `NUMBA_CUDA_MEMORY_MANAGER`:
 
   ```bash
-  $ NUMBA_CUDA_MEMORY_MANAGER=rmm python (args)
+  $ NUMBA_CUDA_MEMORY_MANAGER=rmm.allocators.numba python (args)
   ```
 
 2. Using the `set_memory_manager()` function provided by Numba:
 
   ```python
   >>> from numba import cuda
-  >>> import rmm
-  >>> cuda.set_memory_manager(rmm.RMMNumbaManager)
+  >>> from rmm.allocators.numba import RMMNumbaManager
+  >>> cuda.set_memory_manager(RMMNumbaManager)
   ```
+
+### Using RMM with PyTorch
+
+You can configure
+[PyTorch](https://pytorch.org/docs/stable/notes/cuda.html) to use RMM
+for memory allocations using their by configuring the current
+allocator.
+
+```python
+from rmm.allocators.torch import rmm_torch_allocator
+import torch
+
+torch.cuda.memory.change_current_allocator(rmm_torch_allocator)
+```
diff --git a/python/rmm/__init__.py b/python/rmm/__init__.py
@@ -17,29 +17,51 @@
 from rmm.mr import disable_logging, enable_logging, get_log_filenames
 from rmm.rmm import (
     RMMError,
-    RMMNumbaManager,
-    _numba_memory_manager,
     is_initialized,
     register_reinitialize_hook,
     reinitialize,
-    rmm_cupy_allocator,
-    rmm_torch_allocator,
     unregister_reinitialize_hook,
 )
 
 __all__ = [
     "DeviceBuffer",
     "RMMError",
-    "RMMNumbaManager",
     "disable_logging",
     "enable_logging",
     "get_log_filenames",
     "is_initialized",
     "mr",
     "register_reinitialize_hook",
     "reinitialize",
-    "rmm_cupy_allocator",
     "unregister_reinitialize_hook",
 ]
 
 __version__ = "23.04.00"
+
+
+_deprecated_names = {
+    "rmm_cupy_allocator": "cupy",
+    "rmm_torch_allocator": "torch",
+    "RMMNumbaManager": "numba",
+    "_numba_memory_manager": "numba",
+}
+
+
+def __getattr__(name):
+    if name in _deprecated_names:
+        import importlib
+        import warnings
+
+        package = _deprecated_names[name]
+        warnings.warn(
+            f"Use of 'rmm.{name}' is deprecated and will be removed. "
+            f"'{name}' now lives in the 'rmm.allocators.{package}' sub-module, "
+            "please update your imports.",
+            FutureWarning,
+        )
+        module = importlib.import_module(
+            f".allocators.{package}", package=__name__
+        )
+        return getattr(module, name)
+    else:
+        raise AttributeError(f"Module '{__name__}' has no attribute '{name}'")
diff --git a/python/rmm/_cuda/gpu.py b/python/rmm/_cuda/gpu.py
@@ -1,6 +1,5 @@
 # Copyright (c) 2020, NVIDIA CORPORATION.
 
-import numba.cuda
 from cuda import cuda, cudart
 
 
@@ -84,6 +83,8 @@ def runtimeGetVersion():
     """
     # TODO: Replace this with `cuda.cudart.cudaRuntimeGetVersion()` when the
     # limitation is fixed.
+    import numba.cuda
+
     major, minor = numba.cuda.runtime.get_version()
     return major * 1000 + minor * 10
 

diff --git a/python/rmm/_cuda/stream.pyx b/python/rmm/_cuda/stream.pyx
@@ -16,19 +16,14 @@ from cuda.ccudart cimport cudaStream_t
 from libc.stdint cimport uintptr_t
 from libcpp cimport bool
 
+from rmm._lib.cuda_stream cimport CudaStream
 from rmm._lib.cuda_stream_view cimport (
     cuda_stream_default,
     cuda_stream_legacy,
     cuda_stream_per_thread,
     cuda_stream_view,
 )
 
-from numba import cuda
-
-from rmm._lib.cuda_stream cimport CudaStream
-
-from rmm._lib.cuda_stream import CudaStream
-
 
 cdef class Stream:
     def __init__(self, obj=None):
@@ -46,10 +41,11 @@ cdef class Stream:
             self._init_with_new_cuda_stream()
         elif isinstance(obj, Stream):
             self._init_from_stream(obj)
-        elif isinstance(obj, cuda.cudadrv.driver.Stream):
-            self._init_from_numba_stream(obj)
         else:
-            self._init_from_cupy_stream(obj)
+            try:
+                self._init_from_numba_stream(obj)
+            except TypeError:
+                self._init_from_cupy_stream(obj)
 
     @staticmethod
     cdef Stream _from_cudaStream_t(cudaStream_t s, object owner=None):
@@ -94,8 +90,12 @@ cdef class Stream:
         return self.c_is_default()
 
     def _init_from_numba_stream(self, obj):
-        self._cuda_stream = <cudaStream_t><uintptr_t>(int(obj))
-        self._owner = obj
+        from numba import cuda
+        if isinstance(obj, cuda.cudadrv.driver.Stream):
+            self._cuda_stream = <cudaStream_t><uintptr_t>(int(obj))
+            self._owner = obj
+        else:
+            raise TypeError(f"Cannot create stream from {type(obj)}")
 
     def _init_from_cupy_stream(self, obj):
         try:

diff --git a/python/rmm/allocators/__init__.py b/python/rmm/allocators/__init__.py
diff --git a/python/rmm/allocators/cupy.py b/python/rmm/allocators/cupy.py
@@ -0,0 +1,44 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from rmm import _lib as librmm
+from rmm._cuda.stream import Stream
+
+try:
+    import cupy
+except ImportError:
+    cupy = None
+
+
+def rmm_cupy_allocator(nbytes):
+    """
+    A CuPy allocator that makes use of RMM.
+
+    Examples
+    --------
+    >>> from rmm.allocators.cupy import rmm_cupy_allocator
+    >>> import cupy
+    >>> cupy.cuda.set_allocator(rmm_cupy_allocator)
+    """
+    if cupy is None:
+        raise ModuleNotFoundError("No module named 'cupy'")
+
+    stream = Stream(obj=cupy.cuda.get_current_stream())
+    buf = librmm.device_buffer.DeviceBuffer(size=nbytes, stream=stream)
+    dev_id = -1 if buf.ptr else cupy.cuda.device.get_device_id()
+    mem = cupy.cuda.UnownedMemory(
+        ptr=buf.ptr, size=buf.size, owner=buf, device_id=dev_id
+    )
+    ptr = cupy.cuda.memory.MemoryPointer(mem, 0)
+
+    return ptr