Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared ~/.cache/torch_extensions needs to be pytorch version aware. #68905

Open
stas00 opened this issue Nov 25, 2021 · 0 comments
Open

Shared ~/.cache/torch_extensions needs to be pytorch version aware. #68905

stas00 opened this issue Nov 25, 2021 · 0 comments
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: cpp-extensions Related to torch.utils.cpp_extension triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@stas00
Copy link
Contributor

stas00 commented Nov 25, 2021

Feature Request

There is an issue with using ~/.cache/torch_extensions when a developer has multiple virtual python environments.

I noticed the recent pytorch (1.10?) started to add a py38_cu113/ prefix, e.g. ~/.cache/torch_extensions/py38_cu113/ which is a great improvements to be able to share ~/.cache/torch_extensions/ between different python envs.

But it's not enough. The pt version should also be part of this prefix, e.g. py38_pt110_cu113/

The problem is that if I have several environments on the same machine and I have to test software with multiple pytorch versions, I have to manually wipe out ~/.cache/torch_extensions/ after each env switch. Otherwise I have pytorch trying to load an invalid torch extension built with a different pytorch.

For example, here is the type of an error one gets when the wrong shared object is loaded:

ImportError: /github/home/.cache/torch_extensions/py38_cu111/cpu_adam/cpu_adam.so: undefined symbol: curandCreateGenerator

As I flagged some months back #55267 ideally the cuda extensions shouldn't be installed into a single shared dir, but to be installed into the virtual environment tree of the python that's being used. i.e. along with torch files.

So there goes the need to add py38. And we assume the user will use a fixed python virtual env for a given set of pytorch+cuda. So the unique prefix like py38_pt110_cu113 is not needed here.

I'm aware that not all users have the write access to the python env, but the ~/.cache/torch_extensions/ could be used as a fallback and not as the default.

This is a constant issue for me since I use DeepSpeed a lot, which builds multiple extensions and I have to constantly rebuild it and or/remember to wipe out the shared ~/.cache/torch_extensions/ when I switch python envs to test things. It's not a smooth sailing.

Proposal

  1. at the very least to embed the pytorch version into the shared location e.g.
    • before: ~/.cache/torch_extensions/py38_cu113/
    • after: ~/.cache/torch_extensions/py38_pt110_cu113/
  2. ideally install the extensions into the running python's tree if it's writable instead of (1)

Thank you!

cc @malfet @zou3519

@samdow samdow added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: cpp-extensions Related to torch.utils.cpp_extension triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: cpp-extensions Related to torch.utils.cpp_extension triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

2 participants