Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__]: #204

Closed
zienn opened this issue Feb 29, 2020 · 4 comments

Comments

@zienn
Copy link

zienn commented Feb 29, 2020

我运行的默认的IMPALA算法,actor数量为2
运行环境:

显卡:mx150
cuda:10.02.89
paddlepaddle-gpu (1.6.3.post107)
parl (1.2.1)

错误如下:

[02-29 17:50:42 MainThread @train.py:148] Waiting for 2 remote actors to connect.
[02-29 17:50:42 MainThread @train.py:152] Remote actor count: 1
[02-29 17:50:42 MainThread @train.py:152] Remote actor count: 2
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/home/xtq/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/xtq/anaconda3/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/xtq/PARL/examples/IMPALA/train.py", line 163, in run_remote_sample
    remote_actor = Actor(self.config)
  File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/remote/remote_decorator.py", line 127, in __init__
    raise RemoteError('__init__', traceback_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function `__init__`]:
No module named 'atari_model'
traceback:
Traceback (most recent call last):
  File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/remote/job.py", line 298, in wait_for_connection
    cls = cloudpickle.loads(message[1])
ModuleNotFoundError: No module named 'atari_model'


Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/xtq/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/xtq/anaconda3/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/xtq/PARL/examples/IMPALA/train.py", line 163, in run_remote_sample
    remote_actor = Actor(self.config)
  File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/remote/remote_decorator.py", line 127, in __init__
    raise RemoteError('__init__', traceback_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function `__init__`]:
No module named 'atari_model'
traceback:
Traceback (most recent call last):
  File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/remote/job.py", line 298, in wait_for_connection
    cls = cloudpickle.loads(message[1])
ModuleNotFoundError: No module named 'atari_model'
@zenghsh3
Copy link
Contributor

你好,请问是按照文档说明运行的吗?

  1. pip install parl
  2. 启动xparl:xparl start --port 8010 --cpu_num 2 (如果是启动2个actor)
  3. 下载PARL仓库到本地
  4. 进入PARL/examples/IMPALA/目录,修改impala_config.py中actor数量,然后执行python train.py

看错误提示是运行train.py时,没有把本地atari_model.py发送到job运行环境里,但我们测试都是没出现过这个问题的。(或者你确认下train.py同级目录里是否有atari_model.py文件)

@zienn
Copy link
Author

zienn commented Mar 1, 2020

我之前在vscode里执行有这个问题,我又用命令行执行,就能正常运行了。

@zienn zienn closed this as completed Mar 1, 2020
@TomorrowIsAnOtherDay
Copy link
Collaborator

@Termset 这个是编译器的工作目录和代码所在目录不一致的问题,导致这个问题在于PARL的设计是针对多机并行设计的,需要把当前工作目录的代码分发到不同机器上(在单机上也用同样的逻辑)。


@zenghsh3 越来越多的人使用vscode这种编译器了,我们的代码确实得要考虑下兼容vscode下的运行才行。下周一起讨论下:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@zenghsh3 @zienn @TomorrowIsAnOtherDay and others