Shared `~/.cache/torch_extensions` needs to be pytorch version aware. #68905

stas00 · 2021-11-25T02:18:38Z

Feature Request

There is an issue with using ~/.cache/torch_extensions when a developer has multiple virtual python environments.

I noticed the recent pytorch (1.10?) started to add a py38_cu113/ prefix, e.g. ~/.cache/torch_extensions/py38_cu113/ which is a great improvements to be able to share ~/.cache/torch_extensions/ between different python envs.

But it's not enough. The pt version should also be part of this prefix, e.g. py38_pt110_cu113/

The problem is that if I have several environments on the same machine and I have to test software with multiple pytorch versions, I have to manually wipe out ~/.cache/torch_extensions/ after each env switch. Otherwise I have pytorch trying to load an invalid torch extension built with a different pytorch.

For example, here is the type of an error one gets when the wrong shared object is loaded:

ImportError: /github/home/.cache/torch_extensions/py38_cu111/cpu_adam/cpu_adam.so: undefined symbol: curandCreateGenerator

As I flagged some months back #55267 ideally the cuda extensions shouldn't be installed into a single shared dir, but to be installed into the virtual environment tree of the python that's being used. i.e. along with torch files.

So there goes the need to add py38. And we assume the user will use a fixed python virtual env for a given set of pytorch+cuda. So the unique prefix like py38_pt110_cu113 is not needed here.

I'm aware that not all users have the write access to the python env, but the ~/.cache/torch_extensions/ could be used as a fallback and not as the default.

This is a constant issue for me since I use DeepSpeed a lot, which builds multiple extensions and I have to constantly rebuild it and or/remember to wipe out the shared ~/.cache/torch_extensions/ when I switch python envs to test things. It's not a smooth sailing.

Proposal

at the very least to embed the pytorch version into the shared location e.g.
- before: ~/.cache/torch_extensions/py38_cu113/
- after: ~/.cache/torch_extensions/py38_pt110_cu113/
ideally install the extensions into the running python's tree if it's writable instead of (1)

Thank you!

cc @malfet @zou3519

The text was updated successfully, but these errors were encountered:

stas00 mentioned this issue Nov 25, 2021

[CI] clear ~/.cache/torch_extensions between builds huggingface/transformers#14520

Merged

samdow added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: cpp-extensions Related to torch.utils.cpp_extension triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 30, 2021

santhnm2 mentioned this issue May 10, 2022

Undefined symbol error when compiling and loading C++ extension #77184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared `~/.cache/torch_extensions` needs to be pytorch version aware. #68905

Shared `~/.cache/torch_extensions` needs to be pytorch version aware. #68905

stas00 commented Nov 25, 2021 •

edited by pytorch-probot bot

Loading

Shared ~/.cache/torch_extensions needs to be pytorch version aware. #68905

Shared ~/.cache/torch_extensions needs to be pytorch version aware. #68905

Comments

stas00 commented Nov 25, 2021 • edited by pytorch-probot bot Loading

Feature Request

Proposal

Shared `~/.cache/torch_extensions` needs to be pytorch version aware. #68905

Shared `~/.cache/torch_extensions` needs to be pytorch version aware. #68905

stas00 commented Nov 25, 2021 •

edited by pytorch-probot bot

Loading