-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPUassert: invalid device symbol #5
Comments
How many GPUs do you have on your machine and which one is GPU-0 (the one you are using since you are passing ('cuda', 0) as device)? |
I got the exact same error
here is my nvidia-smi
|
I have the same gpu configuration: two 1080TI.
вт, 13 авг. 2019 г. в 5:07, KyunghyunLee <notifications@github.com>:
I got the exact same error
python ./examples/ppo/ppo_main.py --use-cuda-env --use-openai-test-env --gpu 0
{'ale_start_steps': 400,
'alpha': 0.99,
'batch_size': 256,
'clip_epsilon': 0.1,
'conf_file': None,
'entropy_coef': 0.01,
'env_name': 'PongNoFrameskip-v4',
'episodic_life': False,
'eps': 1e-05,
'evaluation_episodes': 10,
'evaluation_interval': 1000000,
'gamma': 0.99,
'gpu': 0,
'local_rank': 0,
'log_dir': 'runs',
'loss_scale': None,
'lr': 0.00065,
'lr_scale': False,
'max_episode_length': 18000,
'max_grad_norm': 0.5,
'multiprocessing_distributed': False,
'no_cuda_train': False,
'normalize': False,
'num_ales': 16,
'num_gpus_per_node': -1,
'num_stack': 4,
'num_steps': 5,
'opt_level': 'O0',
'output_filename': None,
'plot': False,
'ppo_epoch': 3,
'profile': False,
'save_interval': 0,
'seed': 1565661279,
't_max': 50000000,
'tau': 1.0,
'use_adam': False,
'use_cuda_env': True,
'use_gae': False,
'use_openai': False,
'use_openai_test_env': True,
'value_loss_coef': 0.5,
'verbose': False}
PyTorch : 1.1.0
CUDA : 10.0.130
CUDNN : 7501
APEX : 0.1.0
GPUassert: invalid device symbol /home/lkh/Codes/cule/cule/atari/cuda/tables.hpp 43
here is my nvidia-smi
Tue Aug 13 11:06:38 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 33% 57C P0 65W / 250W | 12MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:04:00.0 On | N/A |
| 0% 48C P8 16W / 250W | 501MiB / 11177MiB | 8% Default |
+-------------------------------+----------------------+----------------------+
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAAQE2SRMTQ3MWQQ43P3PA3QEIJMXA5CNFSM4II7WRDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4EKLII#issuecomment-520660385>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAQE2RG7OYS4QR3BVYN4NLQEIJMXANCNFSM4II7WRDA>
.
--
wbr, Max Lapan
|
I figured out the issue. I dig into the table.hpp and find that the code means the architecture of GPU. I cleaned 'build' and 'dist' folder, then rebuild torchcule. |
Thanks - we are modifying the code to support multiple architectures, although this may require a larger compilation time. Will close when done. |
Hi!
Trying to make CuLE working, but after setting it up, this script fails with message:
Script:
Having Cuda 10.0, pytorch 1.1.0, drivers 410.79. Python 3.7
The text was updated successfully, but these errors were encountered: