[1.9.1] [collect_env] collect_env does not collect actual runtime-loaded cudnn version #78489

vadimkantorov · 2022-05-30T15:21:46Z

🐛 Describe the bug

In opposite, torch.backends.cudnn.version() does collect it correctly.
Originally reported by @grazder in a comment: #78475 (comment)

This occurred on old-ish version of 1.9.1, not sure if it still happens on 1.11.0

Versions

1.9.1

cc @csarofeen @ptrblck @xwang233

The text was updated successfully, but these errors were encountered:

malfet · 2022-05-30T19:23:15Z

collect_env collects version of cudnn installed on the system, rather than the one PyTorch were compiled against (similarly to CUDA used to build PyTorch: vs CUDA runtime version:)

ptrblck · 2022-05-30T19:24:43Z

Just wanted to post the same :P
Reference can be seen here.

vadimkantorov · 2022-05-30T19:26:11Z

I guess it is what's happening. But for debugging it makes sense to also report the version linked in / discovered by PyTorch.

A lazy version (if PyTorch installation is workable enough to be imported) could just report this torch.backends.cudnn.version(). Some less demanding version could probably scan the binaries if cudnn linked in statically or even copying contents of some build-time-generated versions.py distributed with PyTorch that lists linked libraries versions / compilation / linking config

As is, it's quite confusing, since requires to manually find linked in versions, since they are quite relevant

Bonus for making the printed message clearer to explain what exactly collect_env is trying to discover/report (version at build time, version at linking time, dynamically loaded version, etc)

Reporting both env AND actual pytorch found one is important for debugging some linking/loading issues when PyTorch is loading a version different from what it is compiled against

vadimkantorov · 2022-06-30T17:39:28Z

Related: #80637

vadimkantorov · 2022-07-07T14:35:17Z

So probably can be done by reading f"/proc/{os.getpid()}/maps" (on Unix machines, and if this file exists) and filtering cudnn/blas/actually loaded dependencies .so files

vadimkantorov · 2023-01-09T20:42:42Z

Also, there is some improved collect_env in detectron2: https://github.com/facebookresearch/detectron2/blob/main/detectron2/utils/collect_env.py#L55

malfet added module: cudnn Related to torch.backends.cudnn, and CuDNN support module: collect_env.py Related to collect_env.py, which collects system information about users enhancement Not as big of a feature, but technically not a bug. Should be easy to fix labels May 30, 2022

ejguan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 31, 2022

vadimkantorov changed the title ~~[1.9.1] [collect_env] collect_env does not collect correctly cudnn version~~ [1.9.1] [collect_env] collect_env does not collect runtime-bound cudnn version Jun 11, 2022

vadimkantorov changed the title ~~[1.9.1] [collect_env] collect_env does not collect runtime-bound cudnn version~~ [1.9.1] [collect_env] collect_env does not collect actual runtime-loaded cudnn version Jun 11, 2022

This was referenced Jun 30, 2022

PyTorch 1.12 cu113 wheels cudnn discoverability issue #80637

Closed

Change cudnn incompatibility message wording #80877

Closed

vadimkantorov mentioned this issue Jul 25, 2022

[feature request] Discover actually loaded shared libraries at runtime #82098

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.9.1] [collect_env] collect_env does not collect actual runtime-loaded cudnn version #78489

[1.9.1] [collect_env] collect_env does not collect actual runtime-loaded cudnn version #78489

vadimkantorov commented May 30, 2022 •

edited by pytorch-bot bot

malfet commented May 30, 2022

ptrblck commented May 30, 2022

vadimkantorov commented May 30, 2022 •

edited

vadimkantorov commented Jun 30, 2022

vadimkantorov commented Jul 7, 2022

vadimkantorov commented Jan 9, 2023

[1.9.1] [collect_env] collect_env does not collect actual runtime-loaded cudnn version #78489

[1.9.1] [collect_env] collect_env does not collect actual runtime-loaded cudnn version #78489

Comments

vadimkantorov commented May 30, 2022 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

malfet commented May 30, 2022

ptrblck commented May 30, 2022

vadimkantorov commented May 30, 2022 • edited

vadimkantorov commented Jun 30, 2022

vadimkantorov commented Jul 7, 2022

vadimkantorov commented Jan 9, 2023

vadimkantorov commented May 30, 2022 •

edited by pytorch-bot bot

vadimkantorov commented May 30, 2022 •

edited