Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the amount of TLS used in pytorch #2083

Closed
lxj0276 opened this issue Jul 13, 2017 · 31 comments
Closed

Reduce the amount of TLS used in pytorch #2083

lxj0276 opened this issue Jul 13, 2017 · 31 comments

Comments

@lxj0276
Copy link

lxj0276 commented Jul 13, 2017

when I run the code:https://github.com/bear63/sceneReco
I shows thatImportError:dlopen:cannot load any more object with static TLS

@soumith
Copy link
Member

soumith commented Jul 13, 2017

this is a known issue when loading multiple libraries (torch, cv2) that use a lot of TLS. I'm not sure how to fix it.

@soumith soumith changed the title ImportError:dlopen:cannot load any more object with static TLS Reduce the amount of TLS used in pytorch Jul 13, 2017
@lxj0276
Copy link
Author

lxj0276 commented Jul 13, 2017

I have fixed it.anyway thx

@lxj0276 lxj0276 closed this as completed Jul 13, 2017
@soumith
Copy link
Member

soumith commented Jul 13, 2017

How did you fix it?

@lxj0276
Copy link
Author

lxj0276 commented Jul 13, 2017

the order of import
import cv2,torch,caffe is ok,not import caffe,torch,anyway ,it's import between torch and caffe

@LiangXu123
Copy link

LiangXu123 commented Sep 30, 2017

Very odd case,I solve it by add one line in the very beginning of my *.py file :

import torch,cv2

wherever it occurs this error.

@lionel3
Copy link

lionel3 commented Nov 7, 2017

Just like @cc786537662. I solved the problem by putting the line " import torch" in the very beginning of my .py file.

@jph00
Copy link

jph00 commented Nov 15, 2017

There are some options proposed here that claim to fix it: https://stackoverflow.com/questions/19268293/matlab-error-cannot-open-with-static-tls . Not sure if any of these might help?

FYI I (and our students) can reliably replicate this bug by running lesson1 of the new fastai on Paperspace. It's not a high priority since we can just use AWS instead, but I figured you might be interested in a reliable replication environment.

@soumith
Copy link
Member

soumith commented Nov 16, 2017

@jph00 i am building binaries for v0.3.0 right now, so i am looking to fix this once and for-all for the new binaries.

I got a paperspace instance and tried to replicate this error. I ran a few notebooks that use pytorch under deeplearning2 but had no luck reproducing so far.

Which notebook is reproducing this error for you?
You said lesson1, but lesson1 doesn't even use pytorch: https://github.com/fastai/courses/blob/master/deeplearning1/nbs/lesson1.ipynb

Am I looking at the wrong location?

@soumith
Copy link
Member

soumith commented Nov 16, 2017

nevermind, i see that you're talking about https://github.com/fastai/fastai
I'm working on reproducing that.

@htwaijry
Copy link
Contributor

@soumith I was having a similar issue, then I added these compile flags -fPIC -ftls-model=global-dynamic in setup.py, and recompiled pytorch, and it seems to have resolved my issue.

@soumith
Copy link
Member

soumith commented Nov 17, 2017

@htwaijry from what i know the source installs dont have this problem (you dont even need the compile flags). Only the binary installs have this issue. Can you confirm that if you recompiled pytorch from source (without your flags) the problem doesn't exist?

@htwaijry
Copy link
Contributor

@soumith You are right, compiling from source without the flags also works. I just went ahead and added those flags when I ran into the problem without trying to compile from source directly without them.

@xiyaofu
Copy link

xiyaofu commented Nov 21, 2017

It is kind of hilarious. When I put cv2 before pytorch I got this error, and when I exchange the position I got the same error as #926 , what can I do?

@soumith
Copy link
Member

soumith commented Nov 21, 2017

@frankenstan the workaround is to do this on your machine:

rm $(dirname $(which python))/../lib/*/site-packages/torch/lib/libgomp.so*

@xiyaofu
Copy link

xiyaofu commented Nov 21, 2017

@soumith thx for your reply. Sadly I didn't find it in that directory.
I am in anaconda2 and ubuntu14.04. Anything else I can do?

@liruilong940607
Copy link

build pytorch from source can avoid this issue

@chenxi-ge
Copy link

I solved this issue by separately import torch early in my program. (it could be cv2 issue, as later fastai/imports.py will import cv2 for me)
However, I also saw people say we should import cv2 first.

Nevertheless, worth a try.

@PhotonChow
Copy link

Use this instruction to update your pytorch to 0.3.0 and you shall solve this problem.
conda install pytorch torchvision -c pytorch

@teichert
Copy link
Contributor

Fwiw, note that the temporary workaround of putting import torch first source files using torch may be insufficient if functions from those source files are being imported elsewhere.

For example, if a.py imports torch and b.py doesn't import torch but imports functions defined in a.py, then other imports in b.py may happen before the torch import.

In my case, I saw that this was happening by noticing that some source files in the stack trace for the "ImportError:dlopen:cannot load any more object with static TLS" didn't even import torch at all. Adding import torch on the first line of source files lower on the stack trace "fixed" the problem for me.

(btw, I had this error and work-around even after the update to 0.3.0.)

@PkuRainBow
Copy link

@cc786537662 Thanks for your solution!

@ayushidalmia
Copy link

ayushidalmia commented Feb 27, 2018

Do we have any resolution for this yet:

I am using pip install (have to use /usr/bin/python) method to install torch.

I am trying to serve a pytorch application from a docker container using gunicorn. The docker image is having cv2 and therefore when I import torch I am running into this issue.

Now, when I change the order of imports, gunicorn goes into an infinite reboot state. By any chance experienced this?

@zhangpiu
Copy link

zhangpiu commented Mar 2, 2018

Similar issue when I import torch in Julia using PyCall.
image

skyreflectedinmirrors pushed a commit to skyreflectedinmirrors/spyJac that referenced this issue Mar 19, 2018
@jianchao-li
Copy link

jianchao-li commented Sep 4, 2018

I just came across this problem when I did

import tensorflow as tf
import mxnet as mx
import torch
import torchvison

Actually it looked a bit scary since a segment fault occurred after executing import torchvison. Anyway, the error disappeared if I changed the order a little bit.

import torch
import torchvision
import tensorflow as tf
import mxnet as mx

matthewfeickert added a commit to scikit-hep/pyhf that referenced this issue Dec 31, 2018
In requiring the latest version of PyTorch (v1.0) the loading of it
after other imports in tests/conftest.py causes there to be an error
during setup of the pytest fixtures

ImportError while loading conftest
'/home/travis/build/diana-hep/pyhf/tests/conftest.py'.
tests/conftest.py:45: in <module>
    (pyhf.tensor.pytorch_backend(), None),
pyhf/tensor/__init__.py:24: in __getattr__
    e,
E   pyhf.exceptions.ImportBackendError: ('There was a problem importing
PyTorch. The pytorch backend cannot be used.', ImportError('dlopen:
cannot load any more object with static TLS',))

TLS appears to stand for thead local storage

* https://stackoverflow.com/questions/19268293/matlab-error-cannot-open-with-static-tls

and seems to be a reoccuring issue with PyTorch. The advice for how to
deal with it from PyTorch Issue 2083 and 2575 seems to be to simply
change the import order so that

import torch

is always the first import in the files that are causing the issue. The
rational for this fix is not understood to me, other then it just makes
sure that libraries that (presumably) require a large amount of static
TLS are dealt with first.

* pytorch/pytorch#2083
* pytorch/pytorch#2575
houseroad added a commit to houseroad/pytorch that referenced this issue Jun 20, 2019
…feb6b4

Summary:
Previous import was dd599b05f424eb161a31f3e059566a33310dbe5e

Included changes:
- **[355a4954](onnx/onnx@355a4954)**: Update codeowners to have community folder changes assigned to steering committee (pytorch#2104) <Prasanth Pulavarthi>
- **[ceaa5da7](onnx/onnx@ceaa5da7)**: Fix Resize/Upsample Shape inference function (pytorch#2085) <Raymond Yang>
- **[4de8dc0d](onnx/onnx@4de8dc0d)**: Clarify shape inference requirements for new operators (pytorch#2088) <Hariharan Seshadri>
- **[52aa1fad](onnx/onnx@52aa1fad)**: Fix NN defs file (pytorch#2083) <Hariharan Seshadri>

Differential Revision: D15924221

fbshipit-source-id: 7b3cb6d8398e1f3dcd95aaf6293dd9cfc3a8f5b6
facebook-github-bot pushed a commit that referenced this issue Jun 20, 2019
…feb6b4 (#22030)

Summary:
Pull Request resolved: #22030

Previous import was dd599b05f424eb161a31f3e059566a33310dbe5e

Included changes:
- **[355a4954](onnx/onnx@355a4954)**: Update codeowners to have community folder changes assigned to steering committee (#2104) <Prasanth Pulavarthi>
- **[ceaa5da7](onnx/onnx@ceaa5da7)**: Fix Resize/Upsample Shape inference function (#2085) <Raymond Yang>
- **[4de8dc0d](onnx/onnx@4de8dc0d)**: Clarify shape inference requirements for new operators (#2088) <Hariharan Seshadri>
- **[52aa1fad](onnx/onnx@52aa1fad)**: Fix NN defs file (#2083) <Hariharan Seshadri>

Reviewed By: bddppq

Differential Revision: D15924221

fbshipit-source-id: 91ba64ef3e1a2de4a7dd0b02ee6393508cc44a73
iotamudelta pushed a commit to ROCm/pytorch that referenced this issue Jun 21, 2019
…feb6b4 (pytorch#22030)

Summary:
Pull Request resolved: pytorch#22030

Previous import was dd599b05f424eb161a31f3e059566a33310dbe5e

Included changes:
- **[355a4954](onnx/onnx@355a4954)**: Update codeowners to have community folder changes assigned to steering committee (pytorch#2104) <Prasanth Pulavarthi>
- **[ceaa5da7](onnx/onnx@ceaa5da7)**: Fix Resize/Upsample Shape inference function (pytorch#2085) <Raymond Yang>
- **[4de8dc0d](onnx/onnx@4de8dc0d)**: Clarify shape inference requirements for new operators (pytorch#2088) <Hariharan Seshadri>
- **[52aa1fad](onnx/onnx@52aa1fad)**: Fix NN defs file (pytorch#2083) <Hariharan Seshadri>

Reviewed By: bddppq

Differential Revision: D15924221

fbshipit-source-id: 91ba64ef3e1a2de4a7dd0b02ee6393508cc44a73
@ezyang
Copy link
Contributor

ezyang commented Jul 26, 2019

With a more recent version of scikit learn, our gcc5.4 build is affected by this:

Jul 26 05:19:59 Running test_cuda ... [2019-07-26 05:19:59.821119]
Jul 26 05:20:01 Traceback (most recent call last):
Jul 26 05:20:01   File "/opt/conda/lib/python3.6/site-packages/sklearn/__check_build/__init__.py", line 44, in <module>
Jul 26 05:20:01     from ._check_build import check_build  # noqa
Jul 26 05:20:01 ImportError: dlopen: cannot load any more object with static TLS
Jul 26 05:20:01 
Jul 26 05:20:01 During handling of the above exception, another exception occurred:
Jul 26 05:20:01 
Jul 26 05:20:01 Traceback (most recent call last):
Jul 26 05:20:01   File "test_cuda.py", line 20, in <module>
Jul 26 05:20:01     from test_torch import _TestTorchMixin
Jul 26 05:20:01   File "/var/lib/jenkins/workspace/test/test_torch.py", line 46, in <module>
Jul 26 05:20:01     import librosa
Jul 26 05:20:01   File "/opt/conda/lib/python3.6/site-packages/librosa/__init__.py", line 15, in <module>
Jul 26 05:20:01     from . import decompose
Jul 26 05:20:01   File "/opt/conda/lib/python3.6/site-packages/librosa/decompose.py", line 19, in <module>
Jul 26 05:20:01     import sklearn.decomposition
Jul 26 05:20:01   File "/opt/conda/lib/python3.6/site-packages/sklearn/__init__.py", line 75, in <module>
Jul 26 05:20:01     from . import __check_build
Jul 26 05:20:01   File "/opt/conda/lib/python3.6/site-packages/sklearn/__check_build/__init__.py", line 46, in <module>
Jul 26 05:20:01     raise_build_error(e)
Jul 26 05:20:01   File "/opt/conda/lib/python3.6/site-packages/sklearn/__check_build/__init__.py", line 41, in raise_build_error
Jul 26 05:20:01     %s""" % (e, local_dir, ''.join(dir_content).strip(), msg))
Jul 26 05:20:01 ImportError: dlopen: cannot load any more object with static TLS
Jul 26 05:20:01 ___________________________________________________________________________
Jul 26 05:20:01 Contents of /opt/conda/lib/python3.6/site-packages/sklearn/__check_build:
Jul 26 05:20:01 __pycache__               setup.py                  _check_build.cpython-36m-x86_64-linux-gnu.so
Jul 26 05:20:01 __init__.py
Jul 26 05:20:01 ___________________________________________________________________________
Jul 26 05:20:01 It seems that scikit-learn has not been built correctly.
Jul 26 05:20:01 
Jul 26 05:20:01 If you have installed scikit-learn from source, please do not forget
Jul 26 05:20:01 to build the package before using it: run `python setup.py install` or
Jul 26 05:20:01 `make` in the source directory.
Jul 26 05:20:01 
Jul 26 05:20:01 If you have used an installer, please check that it is suited for your
Jul 26 05:20:01 Python version, your operating system and your platform.
Jul 26 05:20:01 Traceback (most recent call last):
Jul 26 05:20:01   File "test/run_test.py", line 408, in <module>
Jul 26 05:20:01     main()
Jul 26 05:20:01   File "test/run_test.py", line 400, in main
Jul 26 05:20:01     raise RuntimeError(message)
Jul 26 05:20:01 RuntimeError: test_cuda failed!

https://circleci.com/gh/pytorch/pytorch/2292470?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

@Arctanxy
Copy link

Arctanxy commented Aug 9, 2019

this error appeared when I upgrade torch from 1.1 to 1.2

@ezyang
Copy link
Contributor

ezyang commented Aug 16, 2019

To consolidate, I'm closing this as a dupe of #2575

@ezyang ezyang closed this as completed Aug 16, 2019
kousu added a commit to spinalcordtoolbox/spinalcordtoolbox that referenced this issue May 24, 2020
Merging pytorch (6aa1f4e) caused
> OSError: dlopen: cannot load any more object with static TLS

on Debian 8 and Ubuntu Trusty.

This is a known issue with pytorch, see e.g.
pytorch/pytorch#2083
scikit-learn/scikit-learn#14485
pytorch/pytorch#2575
https://stackoverflow.com/questions/14892101/cannot-load-any-more-object-with-static-tls

Apparently there might be a workaround involving reordering
import statements, but the root issue is that pytorch doesn't get testing
on "obscure" platforms. Including pytorch means we can't reliably support
them.
kousu added a commit to spinalcordtoolbox/spinalcordtoolbox that referenced this issue May 24, 2020
Merging pytorch (6aa1f4e) caused
> OSError: dlopen: cannot load any more object with static TLS

on Debian 8 and Ubuntu Trusty.

This is a known issue with pytorch, see e.g.
pytorch/pytorch#2083
scikit-learn/scikit-learn#14485
pytorch/pytorch#2575
https://stackoverflow.com/questions/14892101/cannot-load-any-more-object-with-static-tls

Apparently there might be a workaround involving reordering
import statements, but the root issue is that pytorch doesn't get testing
on "obscure" platforms. Including pytorch means we can't reliably support
them.
@yiyuzhuang
Copy link

This error occurs after I upgrade the version of GCC with some wrong setting.
And I fix this problem through compling the dependency of GCC,

wget https://gcc.gnu.org/pub/gcc/infrastructure/isl-0.18.tar.bz2
tar xvf isl-0.18.tar.bz2 
cd isl-0.18
./configure --prefix=/usr/local/isl --with-gmp-prefix=/usr/local/gmp/
make && make install

Btw, I don't know the reason, and hope for futher disscussion.

jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this issue Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests