Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use GPU, regardless of TrainerConfig(gpus=) setting #16

Closed
Sunishchal opened this issue May 19, 2021 · 10 comments
Closed

Unable to use GPU, regardless of TrainerConfig(gpus=) setting #16

Sunishchal opened this issue May 19, 2021 · 10 comments

Comments

@Sunishchal
Copy link

I am unable to use GPUs whether I set 0 or 1 for the gpus parameter. I think the issue may lie inside an internal calling of the distributed.py script. As the warning states, the --gpus flag seems to not be invoked.

C:\Users\sunis\miniconda3\envs\numerai\lib\site-packages\pytorch_lightning\utilities\distributed.py:45: UserWarning: GPU available but not used. Set the --gpus flag when calling the script.
  warnings.warn(*args, **kwargs)
GPU available: True, used: False
TPU available: False, using: 0 TPU cores
@Sunishchal
Copy link
Author

Sunishchal commented May 19, 2021

Additional details:
OS = Windows 10 Home 10.0.19041 Build 19041
GPU 0 = Intel(R) UHD Graphics
GPU 1 = NVIDIA GeForce RTX 2060
Local environment = miniconda virtual environment
Name Version Build Channel
absl-py 0.12.0 pypi_0 pypi
anyio 2.2.0 py38haa95532_2
argon2-cffi 20.1.0 py38h2bbff1b_1
async_generator 1.10 pyhd3eb1b0_0
attrs 21.2.0 pyhd3eb1b0_0
babel 2.9.1 pyhd3eb1b0_0
backcall 0.2.0 pyhd3eb1b0_0
blas 1.0 mkl
bleach 3.3.0 pyhd3eb1b0_0
brotlipy 0.7.0 py38h2bbff1b_1003
ca-certificates 2020.12.5 h5b45459_0 conda-forge
cachetools 4.2.2 pypi_0 pypi
category-encoders 2.2.2 pypi_0 pypi
certifi 2020.12.5 py38haa244fe_1 conda-forge
cffi 1.14.5 py38hcd4344a_0
chardet 4.0.0 py38haa95532_1003
click 8.0.0 pypi_0 pypi
colorama 0.4.4 pyhd3eb1b0_0
configparser 5.0.2 pypi_0 pypi
cryptography 3.4.7 py38h71e12ea_0
cudatoolkit 10.2.89 h74a9793_1
cycler 0.10.0 pypi_0 pypi
decorator 5.0.9 pypi_0 pypi
defusedxml 0.7.1 pyhd3eb1b0_0
docker-pycreds 0.4.0 pypi_0 pypi
entrypoints 0.3 pypi_0 pypi
freetype 2.10.4 hd328e21_0
fsspec 2021.5.0 pypi_0 pypi
future 0.18.2 pypi_0 pypi
gitdb 4.0.7 pypi_0 pypi
gitpython 3.1.17 pypi_0 pypi
google-auth 1.30.0 pypi_0 pypi
google-auth-oauthlib 0.4.4 pypi_0 pypi
grpcio 1.37.1 pypi_0 pypi
idna 2.10 pyhd3eb1b0_0
importlib-metadata 3.10.0 py38haa95532_0
importlib_metadata 3.10.0 hd3eb1b0_0
intel-openmp 2021.2.0 haa95532_616
ipykernel 5.5.5 pypi_0 pypi
ipython 7.23.1 pypi_0 pypi
ipython_genutils 0.2.0 pyhd3eb1b0_1
ipywidgets 7.6.3 pypi_0 pypi
jedi 0.18.0 pypi_0 pypi
jinja2 3.0.0 pypi_0 pypi
joblib 1.0.1 pypi_0 pypi
jpeg 9b hb83a4c4_2
json5 0.9.5 py_0
jsonschema 3.2.0 py_2
jupyter-packaging 0.7.12 pyhd3eb1b0_0
jupyter_client 6.1.12 pyhd3eb1b0_0
jupyter_core 4.7.1 py38haa95532_0
jupyter_server 1.4.1 py38haa95532_0
jupyterlab 3.0.14 pyhd3eb1b0_1
jupyterlab-widgets 1.0.0 pypi_0 pypi
jupyterlab_pygments 0.1.2 py_0
jupyterlab_server 2.4.0 pyhd3eb1b0_0
kiwisolver 1.3.1 pypi_0 pypi
libpng 1.6.37 h2a8f88b_0
libsodium 1.0.18 h62dcd97_0
libtiff 4.2.0 hd0e1b90_0
libuv 1.40.0 he774522_0
lz4-c 1.9.3 h2bbff1b_0
m2w64-gcc-libgfortran 5.3.0 6
m2w64-gcc-libs 5.3.0 7
m2w64-gcc-libs-core 5.3.0 7
m2w64-gmp 6.1.0 2
m2w64-libwinpthread-git 5.0.0.4634.697f757 2
markdown 3.3.4 pypi_0 pypi
markupsafe 2.0.0 pypi_0 pypi
matplotlib 3.4.2 pypi_0 pypi
matplotlib-inline 0.1.2 pypi_0 pypi
mistune 0.8.4 py38he774522_1000
mkl 2021.2.0 haa95532_296
mkl-service 2.3.0 py38h2bbff1b_1
mkl_fft 1.3.0 py38h277e83a_2
mkl_random 1.2.1 py38hf11a4ad_2
msys2-conda-epoch 20160418 1
nb_conda_kernels 2.3.1 py38haa244fe_0 conda-forge
nbclassic 0.2.6 pyhd3eb1b0_0
nbclient 0.5.3 pyhd3eb1b0_0
nbconvert 6.0.7 py38_0
nbformat 5.1.3 pyhd3eb1b0_0
nest-asyncio 1.5.1 pyhd3eb1b0_0
ninja 1.10.2 h6d14046_1
notebook 6.3.0 py38haa95532_0
numerapi 2.5.1 pypi_0 pypi
numpy 1.20.1 py38h34a8a5c_0
numpy-base 1.20.1 py38haf7ebc8_0
oauthlib 3.1.0 pypi_0 pypi
olefile 0.46 py_0
omegaconf 2.0.5 pypi_0 pypi
openssl 1.1.1k h8ffe710_0 conda-forge
packaging 20.9 pyhd3eb1b0_0
pandas 1.1.5 pypi_0 pypi
pandoc 2.12 haa95532_0
pandocfilters 1.4.3 py38haa95532_1
parso 0.8.2 pyhd3eb1b0_0
patsy 0.5.1 pypi_0 pypi
pickleshare 0.7.5 pyhd3eb1b0_1003
pillow 8.2.0 py38h4fa10fc_0
pip 21.0.1 py38haa95532_0
plotly 4.14.3 pypi_0 pypi
prometheus_client 0.10.1 pyhd3eb1b0_0
promise 2.3 pypi_0 pypi
prompt-toolkit 3.0.18 pypi_0 pypi
protobuf 3.17.0 pypi_0 pypi
psutil 5.8.0 pypi_0 pypi
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pycparser 2.20 py_2
pygments 2.9.0 pypi_0 pypi
pyopenssl 20.0.1 pyhd3eb1b0_1
pyparsing 2.4.7 pyhd3eb1b0_0
pyrsistent 0.17.3 py38he774522_0
pysocks 1.7.1 py38haa95532_0
python 3.8.8 hdbf39b2_5
python-dateutil 2.8.1 pyhd3eb1b0_0
python_abi 3.8 1_cp38 conda-forge
pytorch 1.8.1 py3.8_cuda10.2_cudnn7_0 pytorch
pytorch-lightning 1.0.8 pypi_0 pypi
pytorch-tabnet 3.0.0 pypi_0 pypi
pytorch-tabular 0.5.0 pypi_0 pypi
pytz 2021.1 pyhd3eb1b0_0
pywin32 300 pypi_0 pypi
pywinpty 0.5.7 py38_0
pyyaml 5.4.1 pypi_0 pypi
pyzmq 22.0.3 pypi_0 pypi
requests 2.25.1 pyhd3eb1b0_0
requests-oauthlib 1.3.0 pypi_0 pypi
retrying 1.3.3 pypi_0 pypi
rsa 4.7.2 pypi_0 pypi
scikit-learn 0.23.2 pypi_0 pypi
scipy 1.6.3 pypi_0 pypi
send2trash 1.5.0 pyhd3eb1b0_1
sentry-sdk 1.1.0 pypi_0 pypi
setuptools 52.0.0 py38haa95532_0
shortuuid 1.0.1 pypi_0 pypi
six 1.15.0 py38haa95532_0
smmap 4.0.0 pypi_0 pypi
sniffio 1.2.0 py38haa95532_1
sqlite 3.35.4 h2bbff1b_0
statsmodels 0.12.2 pypi_0 pypi
subprocess32 3.5.4 pypi_0 pypi
tensorboard 2.5.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.0 pypi_0 pypi
terminado 0.9.5 pypi_0 pypi
testpath 0.4.4 pyhd3eb1b0_0
threadpoolctl 2.1.0 pypi_0 pypi
tk 8.6.10 he774522_0
torchaudio 0.8.1 py38 pytorch
torchvision 0.9.1 py38_cu102 pytorch
tornado 6.1 py38h2bbff1b_0
tqdm 4.60.0 pypi_0 pypi
traitlets 5.0.5 pyhd3eb1b0_0
typing_extensions 3.7.4.3 pyha847dfd_0
urllib3 1.26.4 pyhd3eb1b0_0
vc 14.2 h21ff451_1
vs2015_runtime 14.27.29016 h5e58377_2
wandb 0.10.11 pypi_0 pypi
watchdog 2.1.1 pypi_0 pypi
wcwidth 0.2.5 py_0
webencodings 0.5.1 pypi_0 pypi
werkzeug 2.0.0 pypi_0 pypi
wheel 0.36.2 pyhd3eb1b0_0
widgetsnbextension 3.5.1 pypi_0 pypi
win_inet_pton 1.1.0 py38haa95532_0
wincertstore 0.2 py38_0
winpty 0.4.3 4
xz 5.2.5 h62dcd97_0
zeromq 4.3.3 ha925a31_3
zipp 3.4.1 pyhd3eb1b0_0
zlib 1.2.11 h62dcd97_4
zstd 1.4.5 h04227a9_0

Model configs =

data_config = DataConfig(
target=['target'], #target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
continuous_cols=feature_cols,
categorical_cols=[],
)
trainer_config = TrainerConfig(
auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
batch_size=1024,
max_epochs=100,
gpus=1, #index of the GPU to use. 0, means CPU
)
optimizer_config = OptimizerConfig()

model_config = TabNetModelConfig(
task="regression",
learning_rate = 1e-3,
n_d = 4,
n_a = 4,
n_steps = 3,
virtual_batch_size = 1024,
mask_type = 'sparsemax'
)

tabular_model = TabularModel(
data_config=data_config,
model_config=model_config,
optimizer_config=optimizer_config,
trainer_config=trainer_config,
)

@manujosephv
Copy link
Owner

Can you run the below code and check if your PyTorch installation is using GPU?

import torch

torch.cuda.is_available()
>>> True

torch.cuda.current_device()
>>> 0

torch.cuda.device(0)
>>> <torch.cuda.device at 0x7efce0b03be0>

torch.cuda.device_count()
>>> 1

torch.cuda.get_device_name(0)
>>> 'GeForce GTX 950M'

@Sunishchal
Copy link
Author

image

@manujosephv
Copy link
Owner

okay.. So PyTorch is seeing the GPU. Can you try giving [0], instead of 0 for the gpu parameter?

@rafaljanwojcik
Copy link

rafaljanwojcik commented Jun 2, 2021

Hello, I have similar problem (ran all previous steps, and pytorch for sure sees the GPU), also tried giving [0] instead of 0, still the GPU remains unused. What is strange, is that in first messages the logger states that the GPUs are unused, and then it finally says it is used:
image

but actually, when I check the workload on my GPU, it isn't used:
image

and when I compare speed of computations for TabNet model with gpus argument set to [0] vs None, it is exactly the same

@manujosephv
Copy link
Owner

@rafaljanwojcik
From the printed message(the second one. I get the first one which says we are not using GPUs in my machine as well) and the memory consumption(23%), I'd say your GPU is being used, although not fully. The num_workers is set to zero by default. If you are running linux, you can try increasing that to increase GPU utilization.

if [0] or None is using the GPUs, I think something fishy is going on. Which version of PyTorch Lightning are you running?

I think I need to re-look at the gpu setting. PyTorch Lightning has had some changes and have auto_select_gpus now. Since this GPU config is creating a lot of confusion, I'll try to simplify this in the next release.

@rafaljanwojcik
Copy link

@manujosephv thanks for responding! You're right, it is actually using GPU, just not whole, sorry for the fuss

@manujosephv
Copy link
Owner

@Sunishchal @rafaljanwojcik This should be fixed now in the develop branch.. Have changed the way we use the gpus parameter.

None means CPU, -1 means all GPUs, int means number of GPUs to use.

If you turn off auto_select_gpus in TrainerConfig, you can even specify indices of GPUs to be used as a list in gpus

Not able to publish to PyPi cause of my CI/CD pipeline which is currently with travics-ci.org. need to migrate to travis-ci.com or Github Actions.

@manujosephv
Copy link
Owner

Fixed in v 0.6. Now in PyPi. Can you check and revert?

@manujosephv
Copy link
Owner

Closing the issue. Feel Free to reopen if it still persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants