-
Notifications
You must be signed in to change notification settings - Fork 206
Description
I'm attempting to run the profiler and on multiple machines now when I run Tensorboard after installing the profiler from PyPi, the machines crash due to an excessive amount of python tasks spawining. If I run Tensorboard prior to installing the profiler it runs fine, so I'm pretty sure this is a problem with my installation of the profiler.
Machine 1: Docker
base Docker Image: nvidia/cuda:11.6.2-devel-ubuntu20.04 CUDA 11.6.2, Ubuntu 20.04
Python 3.8
Torch 1.12
This Docker image does training and inference with torch and CUDA fine so I'm confident in the image and the underlying system.
Machine 2: Desktop PC
Arch Linux
Anaconda Environment created just to test this
CUDA 11.7
Python 3.10
PyTorch 1.12
On both of these machines I can run Tensorbarod fine, but if I run pip install torch_tb_profiler
The next time I load Tensorboard it crashes the system. Running 'top' in another terminal shows a flood of "python" tasks that utilize the full processor resources available.
If I 'CTR-C' on the tensorboard command the machine will eventually kill all these processes and recover, but it's unusable.
I can't find any other instances of this on the web
