Move external allocators into rmm.allocators module to defer imports #1221

wence- · 2023-02-23T12:38:01Z

Description

RMM provides callbacks to configure third-party libraries to use RMM
for memory allocation.

Previously, these were defined in the top-level package, but that
requires (potentially expensive) import of the package we're providing
a hook for, since typically we must import that package to define the
callback. This makes importing RMM expensive. To avoid this, move the
callbacks into (not imported by default) sub-modules in
rmm.allocators. So, if we want to configure the CuPy allocator, we
now import rmm_cupy_allocator from rmm.allocators.cupy, and don't
pay the price of importing pytorch.

This change deprecates the use of the allocator callbacks in the
top-level rmm module in favour of explicit imports from the relevant
rmm.allocators.XXX sub-module.

Before these changes, a sampling trace of import rmm with
pyinstrument shows:

$ pyinstrument -i 0.01 importrmm.py

  _     ._   __/__   _ _  _  _ _/_   Recorded: 10:19:56  Samples:  67
 /_//_/// /_\ / //_// / //_'/ //     Duration: 0.839     CPU time: 0.837
/   _/                      v4.4.0

Program: importrmm.py

0.839 <module>  importrmm.py:1
└─ 0.839 <module>  rmm/__init__.py:1
   ├─ 0.315 <module>  rmm/allocators/torch.py:1
   │  └─ 0.315 <module>  torch/__init__.py:1
   │        [96 frames hidden]  torch, <built-in>, enum, inspect, tok...
   ├─ 0.297 <module>  rmm/mr.py:1
   │  └─ 0.297 <module>  rmm/_lib/__init__.py:1
   │     ├─ 0.216 <module>  numba/__init__.py:1
   │     │     [140 frames hidden]  numba, abc, <built-in>, importlib, em...
   │     ├─ 0.040 <module>  numba/cuda/__init__.py:1
   │     │     [34 frames hidden]  numba, asyncio, ssl, <built-in>, re, ...
   │     ├─ 0.030 __new__  enum.py:180
   │     │     [5 frames hidden]  enum, <built-in>
   │     └─ 0.011 [self]  None
   └─ 0.227 <module>  rmm/allocators/cupy.py:1
      └─ 0.227 <module>  cupy/__init__.py:1
            [123 frames hidden]  cupy, pytest, _pytest, attr, <built-i...

That is, almost a full second to import things, most of which is spent
importing pytorch and cupy. These modules are not needed in normal
usage of RMM, so we can defer the imports. Numba is a little bit
trickier, but we can also defer up-front imports, with a final result
that after these changes the same import rmm call takes just a tenth
of a second:

$ pyinstrument -i 0.01 importrmm.py

  _     ._   __/__   _ _  _  _ _/_   Recorded: 10:37:40  Samples:  9
 /_//_/// /_\ / //_// / //_'/ //     Duration: 0.099     CPU time: 0.099
/   _/                      v4.4.0

Program: importrmm.py

0.099 <module>  importrmm.py:1
└─ 0.099 <module>  rmm/__init__.py:1
   └─ 0.099 <module>  rmm/mr.py:1
      └─ 0.099 <module>  rmm/_lib/__init__.py:1
         ├─ 0.059 <module>  numpy/__init__.py:1
         │     [31 frames hidden]  numpy, re, sre_compile, <built-in>, s...
         ├─ 0.020 __new__  enum.py:180
         │     [2 frames hidden]  enum
         ├─ 0.010 <module>  ctypes/__init__.py:1
         │     [3 frames hidden]  ctypes, <built-in>
         └─ 0.010 _EnumDict.__setitem__  enum.py:89
               [3 frames hidden]  enum

Closes #1211.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Preparation for removing by-default import of non-essential packages.

This removes an indirection through rmm.rmm, to simplify removal of the top-level imports by default.

This will make importing RMM faster if you don't want the hooks for cupy, numba, or pytorch. Before, a sampling trace of `import rmm` with pyinstrument shows: $ pyinstrument -i 0.01 importrmm.py _ ._ __/__ _ _ _ _ _/_ Recorded: 10:19:56 Samples: 67 /_//_/// /_\ / //_// / //_'/ // Duration: 0.839 CPU time: 0.837 / _/ v4.4.0 Program: importrmm.py 0.839 <module> importrmm.py:1 └─ 0.839 <module> rmm/__init__.py:1 ├─ 0.315 <module> rmm/allocators/torch.py:1 │ └─ 0.315 <module> torch/__init__.py:1 │ [96 frames hidden] torch, <built-in>, enum, inspect, tok... ├─ 0.297 <module> rmm/mr.py:1 │ └─ 0.297 <module> rmm/_lib/__init__.py:1 │ ├─ 0.216 <module> numba/__init__.py:1 │ │ [140 frames hidden] numba, abc, <built-in>, importlib, em... │ ├─ 0.040 <module> numba/cuda/__init__.py:1 │ │ [34 frames hidden] numba, asyncio, ssl, <built-in>, re, ... │ ├─ 0.030 __new__ enum.py:180 │ │ [5 frames hidden] enum, <built-in> │ └─ 0.011 [self] None └─ 0.227 <module> rmm/allocators/cupy.py:1 └─ 0.227 <module> cupy/__init__.py:1 [123 frames hidden] cupy, pytest, _pytest, attr, <built-i... After: $ pyinstrument -i 0.01 importrmm.py _ ._ __/__ _ _ _ _ _/_ Recorded: 10:20:10 Samples: 28 /_//_/// /_\ / //_// / //_'/ // Duration: 0.297 CPU time: 0.297 / _/ v4.4.0 Program: importrmm.py 0.296 <module> importrmm.py:1 └─ 0.296 <module> rmm/__init__.py:1 └─ 0.296 <module> rmm/mr.py:1 └─ 0.296 <module> rmm/_lib/__init__.py:1 ├─ 0.216 <module> numba/__init__.py:1 │ [141 frames hidden] numba, <built-in>, importlib, email, ... ├─ 0.040 <module> numba/cuda/__init__.py:1 │ [19 frames hidden] numba, asyncio, ssl, <built-in>, unit... ├─ 0.031 [self] None └─ 0.010 __new__ enum.py:180 [4 frames hidden] enum, <built-in>

After this change, a sampling trace of `import rmm` with pyinstrument shows: $ pyinstrument -i 0.01 importrmm.py _ ._ __/__ _ _ _ _ _/_ Recorded: 10:37:40 Samples: 9 /_//_/// /_\ / //_// / //_'/ // Duration: 0.099 CPU time: 0.099 / _/ v4.4.0 Program: importrmm.py 0.099 <module> importrmm.py:1 └─ 0.099 <module> rmm/__init__.py:1 └─ 0.099 <module> rmm/mr.py:1 └─ 0.099 <module> rmm/_lib/__init__.py:1 ├─ 0.059 <module> numpy/__init__.py:1 │ [31 frames hidden] numpy, re, sre_compile, <built-in>, s... ├─ 0.020 __new__ enum.py:180 │ [2 frames hidden] enum ├─ 0.010 <module> ctypes/__init__.py:1 │ [3 frames hidden] ctypes, <built-in> └─ 0.010 _EnumDict.__setitem__ enum.py:89 [3 frames hidden] enum Prior to this, when we had expunged cupy and pytorch imports from the top-level, but not yet numba, an import would take 0.3, rather than 0.1s.

wence- · 2023-02-23T12:40:21Z

This is best reviewed commit-by-commit, 172bbb5 just moves code, but doesn't change the public API, e20d071 is preparation for removal of "by-default" imports, 6f204e0 is cleanup (the full rmm package is no longer required in rmm.rmm). 359fcc0 switches off by-default import of allocators, and 7fd885b expunges final top-level imports of numba.

wence- · 2023-02-23T12:48:48Z

On thing I noticed when updating the docs is that there's significant duplication between README.md and the python docs page, such that I had to repeat the changes. This has the potential to get out of date. Is the duplication a historical legacy (README first then docs page)?

shwina · 2023-02-23T14:06:12Z

At a high level, should we consider solving the problem more directly, using (1) exclusively function-local imports, or (2) using some kind of lazy import solution, e.g., 1 or 2?

To be clear, my concern isn't about the API change/breakage (although flat is better than nested*). Rather, it's that the subpackage solution solves the problem somewhat implicitly. The user (or developers) don't know that importing rmm.allocators.cupy needs CuPy installed. As a user I might (perhaps unreasonably) expect to do this:

from rmm.allocators.cupy import rmm_cupy_allocator

cupy.cuda.set_allocator(rmm_cupy_allocator)  # I want to comment this in or out depending on whether I use cupy

Please feel free to push back here; I would also agree that the above is really just a minor inconvenience.

*but also namespaces are a honking good idea.

wence- · 2023-02-23T15:03:09Z

At a high level, should we consider solving the problem more directly, using (1) exclusively function-local imports, or (2) using some kind of lazy import solution, e.g., 1 or 2?

I don't really like these solutions because they break type-checkers which can't see through this kind of dynamic loading. I agree that some kind of automagic approach that meant one doesn't have to think too hard about where to import things would be preferable, but I'm not fully sure that it's necessary.

To be clear, my concern isn't about the API change/breakage (although flat is better than nested*). Rather, it's that the subpackage solution solves the problem somewhat implicitly. The user (or developers) don't know that importing rmm.allocators.cupy needs CuPy installed. As a user I might (perhaps unreasonably) expect to do this:

The import if cupy is not installed will still work, except that rmm_cupy_allocator will fail when called.

from rmm.allocators.cupy import rmm_cupy_allocator

cupy.cuda.set_allocator(rmm_cupy_allocator)  # I want to comment this in or out depending on whether I use cupy

So this example will work fine, I think. In the same way that

from rmm import rmm_cupy_allocator
import cupy
cupy.cuda.set_allocator(rmm_cupy_allocator)

Works today.

Moving the imports to be inside the function is doable, but makes everything marginally slower. A trivial test:

import numba.cuda
def outside():
    return numba.cuda.runtime.get_version()

def inside():
    import numba.cuda
    return numba.cuda.runtime.get_version()

Suggests that the latter costs an extra 200ns or so.

wence- · 2023-02-23T15:05:25Z

In some senses, perhaps a better way to solve this problem would be to move the definition of the allocators into the respective libraries and invert the dependency. So that if you have cupy and rmm installed, you can ask cupy to use rmm as an allocator with

import cupy
cupy.cuda.set_allocator(cupy.cuda.rmm_allocator)

Although that has the disadvantage that you might then want to have all the RMM configuration of memory resources coming through cupy, which sounds like a bad idea. So perhaps scratch that.

wence- · 2023-02-23T15:15:57Z

Note that this will need matching PRs in (at least) cudf, but I'll wait on those until we have decided on an approach here.

shwina · 2023-02-23T16:14:25Z

Thanks for illustrating your points! I'm convinced that we should go ahead with your approach here :)

python/rmm/__init__.py

python/rmm/allocators/numba.py

python/rmm/rmm.py

vyasr · 2023-02-23T22:39:21Z

I have a couple of fairly cosmetic comments but I love the overall direction of this change and approve the approach.

bdice

Minor comments, but I agree that the overall approach is great. It also makes it clearer in the docs that RMM is a fundamental tool for many parts of the Python CUDA ecosystem!

README.md

python/docs/basics.md

python/rmm/__init__.py

python/rmm/allocators/numba.py

wence- · 2023-02-24T10:11:36Z

Opened #1225 to track deprecation removal (can someone assign to me and add to 23.06 milestone?).

wence- · 2023-02-24T11:04:24Z

No longer a breaking change, since it just introduces deprecation warnings if you use the allocator callbacks from the top-level package.

The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported.

vyasr · 2023-02-24T18:43:46Z

Opened #1225 to track deprecation removal (can someone assign to me and add to 23.06 milestone?).

Done

wence- · 2023-02-27T14:10:27Z

/merge

wence- · 2023-02-27T14:11:07Z

I don't think I have powers to invoke merging here.

bdice

One small suggestion. @wence- if you are okay with it, I can commit this change and merge.

python/docs/basics.md

bdice · 2023-02-27T19:56:01Z

/merge

Following rapidsai/rmm#1221, RMM allocators now live under the `rmm.allocators` submodule and using old paths causes warnings to be raised. This change updates to the new submodule and prevents such warnings.

The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported.

The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported. Authors: - Lawrence Mitchell (https://github.com/wence-) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #932

…backs The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported.

Adapt to github.com/rapidsai/rmm/pull/1221 which moves allocator callbacks See merge request pentschev/ucxx!5

The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported. Authors: - Lawrence Mitchell (https://github.com/wence-) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1129

The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Carl Simon Adorf (https://github.com/csadorf) - William Hicks (https://github.com/wphicks) URL: #5249

@wence-

Follow up to PR ( #1129 ) and PR ( rapidsai/rmm#1221 ) Uses `rmm_cupy_allocator` from `rmm.allocators.cupy` where it has been moved to recently. cc @wence- Authors: - https://github.com/jakirkham Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1138

The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Alex Barghi (https://github.com/alexbarghi-nv) - Rick Ratzel (https://github.com/rlratzel) URL: #3300

Closes rapidsai#1221.

wence- added 6 commits February 23, 2023 12:29

Move third-party allocators to submodule

172bbb5

Preparation for removing by-default import of non-essential packages.

Move import of allocators to __init__.py

e20d071

This removes an indirection through rmm.rmm, to simplify removal of the top-level imports by default.

Just import mr, rather than rmm as a whole

6f204e0

Update documentation for new location of allocators

73074de

wence- requested a review from a team as a code owner February 23, 2023 12:38

github-actions bot added the Python Related to RMM Python API label Feb 23, 2023

Clean up duplicate import

47e50d4

shwina reviewed Feb 23, 2023

View reviewed changes

python/rmm/__init__.py Show resolved Hide resolved

vyasr reviewed Feb 23, 2023

View reviewed changes

python/rmm/allocators/numba.py Show resolved Hide resolved

vyasr reviewed Feb 23, 2023

View reviewed changes

python/rmm/rmm.py Outdated Show resolved Hide resolved

bdice reviewed Feb 23, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

python/docs/basics.md Outdated Show resolved Hide resolved

python/rmm/__init__.py Show resolved Hide resolved

python/rmm/allocators/numba.py Outdated Show resolved Hide resolved

wence- changed the title ~~Defer importing packages for third-party allocation hooks~~ Move external allocators into rmm.allocators module to defer imports Feb 24, 2023

wence- added 5 commits February 24, 2023 09:30

Absolute import of rmm.mr

8dd78e6

Fix docstring example

6f0900e

Fix header level

04df09f

Snake-case variable naming for ipc handle

caafde5

Introduce deprecation via FutureWarning for allocator moves

8ad1acb

wence- mentioned this pull request Feb 24, 2023

Remove deprecated top-level allocator references #1225

Closed

vyasr added 3 - Ready for review Ready for review by team non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Feb 24, 2023

vyasr approved these changes Feb 24, 2023

View reviewed changes

bdice approved these changes Feb 27, 2023

View reviewed changes

python/docs/basics.md Outdated Show resolved Hide resolved

Simplify sentence.

c436826

Merge branch 'branch-23.04' into wence/fix/lazy-imports

d40f71e

rapids-bot bot merged commit 1a75350 into rapidsai:branch-23.04 Feb 27, 2023

pentschev mentioned this pull request Feb 28, 2023

Update RMM allocators rapidsai/cudf#12861

Merged

3 tasks

pentschev added a commit to rapidsai/ucx-py that referenced this pull request Feb 28, 2023

Merge branch 'wence/fix/rmm-allocator-move' into 'main'

dedecce

Adapt to github.com/rapidsai/rmm/pull/1221 which moves allocator callbacks See merge request pentschev/ucxx!5

wence- deleted the wence/fix/lazy-imports branch February 28, 2023 17:19

jakirkham mentioned this pull request Mar 9, 2023

Update rmm_cupy_allocator usage rapidsai/dask-cuda#1138

Merged

wence- mentioned this pull request Mar 28, 2023

[FEA] Use RMM memory pool by default rapidsai/cudf#12235

Open

wence- added a commit to wence-/rmm that referenced this pull request May 18, 2023

Remove now-deprecated top-level allocator functions

6a2cc0d

Closes rapidsai#1221.

vyasr mentioned this pull request May 31, 2023

[FEA] Reorganize Cython to separate C++ bindings and make Cython classes public #1280

Open

wence- added a commit to wence-/rmm that referenced this pull request May 31, 2023

Remove now-deprecated top-level allocator functions

2e5ca69

Closes rapidsai#1221.

wence- mentioned this pull request May 31, 2023

Remove now-deprecated top-level allocator functions #1281

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move external allocators into rmm.allocators module to defer imports #1221

Move external allocators into rmm.allocators module to defer imports #1221

wence- commented Feb 23, 2023 •

edited

wence- commented Feb 23, 2023

wence- commented Feb 23, 2023

shwina commented Feb 23, 2023

wence- commented Feb 23, 2023

wence- commented Feb 23, 2023

wence- commented Feb 23, 2023

shwina commented Feb 23, 2023 •

edited

vyasr commented Feb 23, 2023

bdice left a comment

wence- commented Feb 24, 2023

wence- commented Feb 24, 2023

vyasr commented Feb 24, 2023

wence- commented Feb 27, 2023

wence- commented Feb 27, 2023

bdice left a comment

bdice commented Feb 27, 2023

Move external allocators into rmm.allocators module to defer imports #1221

Move external allocators into rmm.allocators module to defer imports #1221

Conversation

wence- commented Feb 23, 2023 • edited

Description

Checklist

wence- commented Feb 23, 2023

wence- commented Feb 23, 2023

shwina commented Feb 23, 2023

wence- commented Feb 23, 2023

wence- commented Feb 23, 2023

wence- commented Feb 23, 2023

shwina commented Feb 23, 2023 • edited

vyasr commented Feb 23, 2023

bdice left a comment

Choose a reason for hiding this comment

wence- commented Feb 24, 2023

wence- commented Feb 24, 2023

vyasr commented Feb 24, 2023

wence- commented Feb 27, 2023

wence- commented Feb 27, 2023

bdice left a comment

Choose a reason for hiding this comment

bdice commented Feb 27, 2023

wence- commented Feb 23, 2023 •

edited

shwina commented Feb 23, 2023 •

edited