Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cusolver error: 7 #21

Closed
LyWangPX opened this issue Nov 22, 2020 · 3 comments
Closed

cusolver error: 7 #21

LyWangPX opened this issue Nov 22, 2020 · 3 comments

Comments

@LyWangPX
Copy link

LyWangPX commented Nov 22, 2020

What I run

python train.py ../train/ --img_size 128 --batch 8

SPEC

2 x RTX2080 ti
Two cards or one card does not change the error.

CODE

Latest.

ENVIRONMENT

# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
async-generator           1.10                     pypi_0    pypi
attrs                     20.3.0                   pypi_0    pypi
backcall                  0.2.0                      py_0  
bash-kernel               0.7.2                    pypi_0    pypi
beautifulsoup4            4.9.3              pyhb0f4dca_0  
blas                      1.0                         mkl  
bleach                    3.2.1                    pypi_0    pypi
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2020.10.14                    0  
certifi                   2020.6.20          pyhd3eb1b0_3  
cffi                      1.14.0           py38he30daa8_1  
chardet                   3.0.4                 py38_1003  
conda                     4.9.2            py38h06a4308_0  
conda-build               3.20.5                   py38_1  
conda-package-handling    1.6.1            py38h7b6447c_0  
cryptography              2.9.2            py38h1ba5d50_0  
cudatoolkit               11.0.221             h6bb024c_0  
dataclasses               0.6                      pypi_0    pypi
decorator                 4.4.2                      py_0  
defusedxml                0.6.0                    pypi_0    pypi
dnspython                 2.0.0                    pypi_0    pypi
entrypoints               0.3                      pypi_0    pypi
filelock                  3.0.12                     py_0  
freetype                  2.10.4               h5ab3b9f_0  
future                    0.18.2                   pypi_0    pypi
gdown                     3.12.2                   pypi_0    pypi
glob2                     0.7                        py_0  
icu                       58.2                 he6710b0_3  
idna                      2.9                        py_1  
intel-openmp              2020.2                      254  
ipykernel                 5.3.4                    pypi_0    pypi
ipython                   7.19.0                   pypi_0    pypi
ipython_genutils          0.2.0                    py38_0  
ipywidgets                7.5.1                    pypi_0    pypi
jedi                      0.17.2                   py38_0  
jinja2                    2.11.2                     py_0  
jpeg                      9b                   h024ee3a_2  
json5                     0.9.5                    pypi_0    pypi
jsonschema                3.2.0                    pypi_0    pypi
jupyter                   1.0.0                    pypi_0    pypi
jupyter-client            6.1.7                    pypi_0    pypi
jupyter-console           6.2.0                    pypi_0    pypi
jupyter-core              4.7.0                    pypi_0    pypi
jupyterlab                2.2.9                    pypi_0    pypi
jupyterlab-pygments       0.1.2                    pypi_0    pypi
jupyterlab-server         1.2.0                    pypi_0    pypi
lcms2                     2.11                 h396b838_0  
ld_impl_linux-64          2.33.1               h53a641e_7  
libarchive                3.4.2                h62408e4_0  
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.3                  he6710b0_1  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
liblief                   0.10.1               he6710b0_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.1.0                h2733197_1  
libuv                     1.40.0               h7b6447c_0  
libxml2                   2.9.10               hb55368b_3  
lz4-c                     1.9.2                heb0550a_3  
markupsafe                1.1.1            py38h7b6447c_0  
mistune                   0.8.4                    pypi_0    pypi
mkl                       2020.2                      256  
mkl-service               2.3.0            py38he904b0f_0  
mkl_fft                   1.2.0            py38h23d657b_0  
mkl_random                1.1.1            py38h0573a6f_0  
nbclient                  0.5.1                    pypi_0    pypi
nbconvert                 6.0.7                    pypi_0    pypi
nbformat                  5.0.8                    pypi_0    pypi
ncurses                   6.2                  he6710b0_1  
nest-asyncio              1.4.3                    pypi_0    pypi
ninja                     1.10.1           py38hfd86e86_0  
notebook                  5.7.5                    pypi_0    pypi
numpy                     1.19.2           py38h54aff64_0  
numpy-base                1.19.2           py38hfa32c7d_0  
olefile                   0.46                       py_0  
openssl                   1.1.1h               h7b6447c_0  
packaging                 20.4                     pypi_0    pypi
pandocfilters             1.4.3                    pypi_0    pypi
parso                     0.7.0                      py_0  
patchelf                  0.12                 he6710b0_0  
pexpect                   4.8.0                    py38_0  
pickleshare               0.7.5                 py38_1000  
pillow                    8.0.0            py38h9a89aac_0  
pip                       20.0.2                   py38_3  
pkginfo                   1.6.0                    py38_0  
prometheus-client         0.9.0                    pypi_0    pypi
prompt-toolkit            3.0.8                      py_0  
psutil                    5.7.2            py38h7b6447c_0  
ptyprocess                0.6.0                    py38_0  
py-lief                   0.10.1           py38h403a769_0  
pycosat                   0.6.3            py38h7b6447c_1  
pycparser                 2.20                       py_0  
pygments                  2.7.1                      py_0  
pyopenssl                 19.1.0                   py38_0  
pyparsing                 2.4.7                    pypi_0    pypi
pyrsistent                0.17.3                   pypi_0    pypi
pysocks                   1.7.1                    py38_0  
python                    3.8.3                hcff3b4d_0  
python-dateutil           2.8.1                    pypi_0    pypi
python-etcd               0.4.5                    pypi_0    pypi
python-libarchive-c       2.9                        py_0  
pytorch                   1.7.0           py3.8_cuda11.0.221_cudnn8.0.3_0    pytorch
pytz                      2020.1                     py_0  
pyyaml                    5.3.1            py38h7b6447c_0  
pyzmq                     20.0.0                   pypi_0    pypi
qtconsole                 4.7.7                    pypi_0    pypi
qtpy                      1.9.0                    pypi_0    pypi
readline                  8.0                  h7b6447c_0  
requests                  2.23.0                   py38_0  
ripgrep                   12.1.1                        0  
ruamel_yaml               0.15.87          py38h7b6447c_0  
scipy                     1.5.2            py38h0b6359f_0  
send2trash                1.5.0                    pypi_0    pypi
setuptools                46.4.0                   py38_0  
six                       1.14.0                   py38_0  
soupsieve                 2.0.1                      py_0  
sqlite                    3.31.1               h62c20be_1  
terminado                 0.9.1                    pypi_0    pypi
testpath                  0.4.4                    pypi_0    pypi
tk                        8.6.8                hbc83047_0  
torchelastic              0.2.1                    pypi_0    pypi
torchvision               0.8.0                py38_cu110    pytorch
tornado                   5.1.1                    pypi_0    pypi
tqdm                      4.46.0                     py_0  
traitlets                 5.0.5                      py_0  
typing_extensions         3.7.4.3                    py_0  
urllib3                   1.25.8                   py38_0  
wcwidth                   0.2.5                      py_0  
webencodings              0.5.1                    pypi_0    pypi
wheel                     0.34.2                   py38_0  
widgetsnbextension        3.5.1                    pypi_0    pypi
xz                        5.2.5                h7b6447c_0  
yaml                      0.1.7                had09818_2  
zlib                      1.2.11               h7b6447c_3  
zstd                      1.4.5                h9ceee32_0 

ERROR MESSAGE:

Namespace(affine=False, batch=8, img_size=128, iter=200000, lr=0.0001, n_bits=5, n_block=4, n_flow=32, n_sample=20, no_lu=False, path='../train/', temp=0.7)
/workspace/glow-pytorch/model.py:102: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/utils/tensor_numpy.cpp:141.)
  w_s = torch.from_numpy(w_s)
Loss: 2.15042; logP: -2.13823; logdet: 4.98781; lr: 0.0001000:   0%| | 1/200000 
Traceback (most recent call last):
  File "train.py", line 177, in <module>
    train(args, model, optimizer)
  File "train.py", line 148, in train
    model_single.reverse(z_sample).cpu().data,
  File "/workspace/glow-pytorch/model.py", line 367, in reverse
    input = block.reverse(z_list[-1], z_list[-1], reconstruct=reconstruct)
  File "/workspace/glow-pytorch/model.py", line 322, in reverse
    input = flow.reverse(input)
  File "/workspace/glow-pytorch/model.py", line 239, in reverse
    input = self.invconv.reverse(input)
  File "/workspace/glow-pytorch/model.py", line 136, in reverse
    return F.conv2d(output, weight.squeeze().inverse().unsqueeze(2).unsqueeze(3))
RuntimeError: cusolver error: 7, when calling `cusolverDnCreate(handle)`

EXTRA:

This bug happens when doing reverse calculation when i % 100 == 0. I changed it to i == 1 to faster the bug reproduction.

And, changing w_s = torch.from_numpy(w_s) to w_s = torch.from_numpy(w_s.copy()) turn offs all warnings above. But the error still occurs.

@LyWangPX
Copy link
Author

FIX FOUND:

Even if your GPU memory is enough for training, the reverse part requires ADDITIONAL GPU memory.

I do not know if there is a workround in the code to improve this behavior. Right now we have to check the mem usage by eye.

@rosinality
Copy link
Owner

rosinality commented Nov 22, 2020

You can reduce --n_sample arguments to reduce number of samples or doing reverse steps in the cpus.

@LyWangPX
Copy link
Author

Thanks. I will try that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants