Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run IMPALA in conda environment #151

Closed
mxfengustc opened this issue Sep 12, 2019 · 7 comments
Closed

Cannot run IMPALA in conda environment #151

mxfengustc opened this issue Sep 12, 2019 · 7 comments
Labels
good first issue Good for newcomers

Comments

@mxfengustc
Copy link

mxfengustc commented Sep 12, 2019

I cannot run IMPALA algorithm in my docker's conda environment.
My docker container is built on nvidia/cuda:18.04 with anaconda 5.3.0, and i create an environment named dist-rl with installing python=3.7 paddlepaddle-gpu=1.5.2 cudatoolkit=10.0 via conda and installing parl/gym[atari]/opencv-python via pip
When i run python train.py after starting the cpu cluster using xparl start --port 8010 --cpu_num 5(i also changed the number of cpus in impala_config.py), it occurred errors as follows:

Screenshot from 2019-09-12 20-12-06
Screenshot from 2019-09-12 20-12-33
Screenshot from 2019-09-12 20-12-57

It seems that the main error is paddle.fluid.core_avx.EnforceNotMet: Invoke operator elementwise_mul error., but i don't know how to deal with it.
Thanks very much~

@zenghsh3
Copy link
Contributor

zenghsh3 commented Sep 12, 2019

这边测试了下发现paddle>=1.5.1版本时,在GPU上训练时会有这个问题,CPU能正常训练(export CUDA_VISIBLE_DEVICES=""),这个问题应该跟Paddle的这个issue类似:
PaddlePaddle/Paddle#19628

这边先问下Paddle开发者这个issue解决的进展,你也可以先尝试用CPU训练先。

@TomorrowIsAnOtherDay
Copy link
Collaborator

Thanks for raising the issue. Can you run the IMPALA algorithm in your environment when setting CUDA_VISIBLE_DEVICES="" ?

@TomorrowIsAnOtherDay TomorrowIsAnOtherDay added the good first issue Good for newcomers label Sep 12, 2019
@mxfengustc
Copy link
Author

@TomorrowIsAnOtherDay When I only use CPU to run it, no errors occurred.

@zenghsh3
Copy link
Contributor

@mxfengustc Hi, we find that IMPALA example also can run normally when using single GPU to train. You can try to set export CUDA_VISIBLE_DEVICES=0.

@mxfengustc
Copy link
Author

@zenghsh3 Hi,thanks for the advice, now i can run the code~
You can close the issue , and i think that maybe you can communicate with paddlepaddle team to try to solve the problem for getting errors when using multiple gpus.
I offen use some nodes with gpus as actors to sample experiences, i want to know if errors will occur in this setting(actors with gpus)

@TomorrowIsAnOtherDay
Copy link
Collaborator

TomorrowIsAnOtherDay commented Sep 16, 2019

Thanks @zenghsh3 for providing the temporary solution. I'm closing the issue as it is not an error of PARL. We can confirm that it is a bug of PaddlePaddle.
We will do following things:

  1. assert that only a GPU is being used when running the IMPALA algorithm.
  2. locate the bug and report it to the Paddle team.

Thanks @mxfengustc for reporting the issue and @yobobobo for providing suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants
@zenghsh3 @TomorrowIsAnOtherDay @mxfengustc and others