Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.9.1] [collect_env] collect_env does not collect actual runtime-loaded cudnn version #78489

Open
vadimkantorov opened this issue May 30, 2022 · 6 comments
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: collect_env.py Related to collect_env.py, which collects system information about users module: cudnn Related to torch.backends.cudnn, and CuDNN support triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@vadimkantorov
Copy link
Contributor

vadimkantorov commented May 30, 2022

馃悰 Describe the bug

In opposite, torch.backends.cudnn.version() does collect it correctly.
Originally reported by @grazder in a comment: #78475 (comment)

This occurred on old-ish version of 1.9.1, not sure if it still happens on 1.11.0

Versions

1.9.1

cc @csarofeen @ptrblck @xwang233

@malfet malfet added module: cudnn Related to torch.backends.cudnn, and CuDNN support module: collect_env.py Related to collect_env.py, which collects system information about users enhancement Not as big of a feature, but technically not a bug. Should be easy to fix labels May 30, 2022
@malfet
Copy link
Contributor

malfet commented May 30, 2022

collect_env collects version of cudnn installed on the system, rather than the one PyTorch were compiled against (similarly to CUDA used to build PyTorch: vs CUDA runtime version:)

@ptrblck
Copy link
Collaborator

ptrblck commented May 30, 2022

Just wanted to post the same :P
Reference can be seen here.

@vadimkantorov
Copy link
Contributor Author

vadimkantorov commented May 30, 2022

I guess it is what's happening. But for debugging it makes sense to also report the version linked in / discovered by PyTorch.

A lazy version (if PyTorch installation is workable enough to be imported) could just report this torch.backends.cudnn.version(). Some less demanding version could probably scan the binaries if cudnn linked in statically or even copying contents of some build-time-generated versions.py distributed with PyTorch that lists linked libraries versions / compilation / linking config

As is, it's quite confusing, since requires to manually find linked in versions, since they are quite relevant

Bonus for making the printed message clearer to explain what exactly collect_env is trying to discover/report (version at build time, version at linking time, dynamically loaded version, etc)

Reporting both env AND actual pytorch found one is important for debugging some linking/loading issues when PyTorch is loading a version different from what it is compiled against

@ejguan ejguan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 31, 2022
@vadimkantorov vadimkantorov changed the title [1.9.1] [collect_env] collect_env does not collect correctly cudnn version [1.9.1] [collect_env] collect_env does not collect runtime-bound cudnn version Jun 11, 2022
@vadimkantorov vadimkantorov changed the title [1.9.1] [collect_env] collect_env does not collect runtime-bound cudnn version [1.9.1] [collect_env] collect_env does not collect actual runtime-loaded cudnn version Jun 11, 2022
@vadimkantorov
Copy link
Contributor Author

Related: #80637

@vadimkantorov
Copy link
Contributor Author

So probably can be done by reading f"/proc/{os.getpid()}/maps" (on Unix machines, and if this file exists) and filtering cudnn/blas/actually loaded dependencies .so files

@vadimkantorov
Copy link
Contributor Author

Also, there is some improved collect_env in detectron2: https://github.com/facebookresearch/detectron2/blob/main/detectron2/utils/collect_env.py#L55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: collect_env.py Related to collect_env.py, which collects system information about users module: cudnn Related to torch.backends.cudnn, and CuDNN support triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants