Cannot run IMPALA in conda environment #151

mxfengustc · 2019-09-12T12:24:22Z

I cannot run IMPALA algorithm in my docker's conda environment.
My docker container is built on nvidia/cuda:18.04 with anaconda 5.3.0, and i create an environment named dist-rl with installing python=3.7 paddlepaddle-gpu=1.5.2 cudatoolkit=10.0 via conda and installing parl/gym[atari]/opencv-python via pip
When i run python train.py after starting the cpu cluster using xparl start --port 8010 --cpu_num 5(i also changed the number of cpus in impala_config.py), it occurred errors as follows:

It seems that the main error is paddle.fluid.core_avx.EnforceNotMet: Invoke operator elementwise_mul error., but i don't know how to deal with it.
Thanks very much~

The text was updated successfully, but these errors were encountered:

zenghsh3 · 2019-09-12T16:04:33Z

这边测试了下发现paddle>=1.5.1版本时，在GPU上训练时会有这个问题，CPU能正常训练(export CUDA_VISIBLE_DEVICES="")，这个问题应该跟Paddle的这个issue类似：
PaddlePaddle/Paddle#19628

这边先问下Paddle开发者这个issue解决的进展，你也可以先尝试用CPU训练先。

TomorrowIsAnOtherDay · 2019-09-12T16:32:07Z

Thanks for raising the issue. Can you run the IMPALA algorithm in your environment when setting CUDA_VISIBLE_DEVICES="" ?

mxfengustc · 2019-09-15T05:38:54Z

@TomorrowIsAnOtherDay When I only use CPU to run it, no errors occurred.

zenghsh3 · 2019-09-16T02:38:57Z

@mxfengustc Hi, we find that IMPALA example also can run normally when using single GPU to train. You can try to set export CUDA_VISIBLE_DEVICES=0.

mxfengustc · 2019-09-16T03:08:57Z

@zenghsh3 Hi,thanks for the advice, now i can run the code~
You can close the issue , and i think that maybe you can communicate with paddlepaddle team to try to solve the problem for getting errors when using multiple gpus.
I offen use some nodes with gpus as actors to sample experiences, i want to know if errors will occur in this setting(actors with gpus)

TomorrowIsAnOtherDay · 2019-09-16T03:24:59Z

Thanks @zenghsh3 for providing the temporary solution. I'm closing the issue as it is not an error of PARL. We can confirm that it is a bug of PaddlePaddle.
We will do following things:

assert that only a GPU is being used when running the IMPALA algorithm.
locate the bug and report it to the Paddle team.

Thanks @mxfengustc for reporting the issue and @yobobobo for providing suggestion.

zenghsh3 mentioned this issue Sep 12, 2019

1.5和更高版本模型GPU报错 PaddlePaddle/Paddle#19628

Closed

TomorrowIsAnOtherDay added the good first issue Good for newcomers label Sep 12, 2019

TomorrowIsAnOtherDay closed this as completed Sep 16, 2019

TomorrowIsAnOtherDay mentioned this issue Sep 17, 2019

Limit impala to single GPU training #152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot run IMPALA in conda environment #151

Cannot run IMPALA in conda environment #151

mxfengustc commented Sep 12, 2019 •

edited

Loading

zenghsh3 commented Sep 12, 2019 •

edited

Loading

TomorrowIsAnOtherDay commented Sep 12, 2019

mxfengustc commented Sep 15, 2019

zenghsh3 commented Sep 16, 2019

mxfengustc commented Sep 16, 2019

TomorrowIsAnOtherDay commented Sep 16, 2019 •

edited

Loading

Cannot run IMPALA in conda environment #151

Cannot run IMPALA in conda environment #151

Comments

mxfengustc commented Sep 12, 2019 • edited Loading

zenghsh3 commented Sep 12, 2019 • edited Loading

TomorrowIsAnOtherDay commented Sep 12, 2019

mxfengustc commented Sep 15, 2019

zenghsh3 commented Sep 16, 2019

mxfengustc commented Sep 16, 2019

TomorrowIsAnOtherDay commented Sep 16, 2019 • edited Loading

mxfengustc commented Sep 12, 2019 •

edited

Loading

zenghsh3 commented Sep 12, 2019 •

edited

Loading

TomorrowIsAnOtherDay commented Sep 16, 2019 •

edited

Loading