-
Notifications
You must be signed in to change notification settings - Fork 203
Multiprocessing EOF error #2
Comments
Hi @JamesLuoau, the code works for us on Debian with Python 2.7 and Python 3.5. I'm not sure why there should be a multiprocessing error here though -- the code is not parallelized and TensorFlow only uses threads as far as I know. Maybe try commenting our the |
scripts/tasks.py
seems that this can work on macos however, find "nan" in log print
|
Yes, that's correct. The |
Hi, I can confirm that tensorflow 1.13 and tensorflow-probability 0.6.0 not working, the script tools/test_overshooting.py not able to pass, an exception "tensorflow attributeerror: 'template' object has no attribute 'updates'" will throw. However, if I downgrade to tensorflow 1.12.0 and tensorflow-probability 0.5.0, test_overshooting.py passes. Could you please provide a requirements.txt file for your environment? Thanks a lot. I'm now getting an error below, appreciated for your help.
|
Hi @JamesLuoau, these both sound like issues with other libraries. Please ask about the AttributeError on the TensorFlow Probability repo and for the multi-threaded rendering error on the dm_control repo. Neither of these happen for me under Python 3.5, TensorFlow 1.12.0, and TensorFlow Probability 0.5.0. If many people are experiencing this, please upvote the comment above this one. |
@danijar, thanks to you and your team for such an interesting contribution to RL! I am planning to scale up this implementation to a multi-agent environment to see how well it performs. I am facing the same problem as @JamesLuoau though. Here's the excerpt from the logs:
It seems the vizualizations cannot start. With debug configuration, it runs till 15th step and then crashes. I am running with --config debug, so this kicks in when the testing starts. Maybe it has something to do with workers? Or maybe is it the transition from train to test? Could you specify these things:
Thank you :) @JamesLuoau have you managed to fix this? |
Thanks for letting me know. I will look into this but it will take a couple of days before I get to it. For now, I think everything works using the previous version of TensorFlow and TensorFlow Probability. I mentioned the versions for this above. I'm using the |
Thanks @danijar 🥇. I use Tensorflow 1.12.0 and TF Prob 0.5.0 as well. I am starting to think this might be OS configuration issue, who knows 🤷♂️. Please, share Mujoco Pro, dmcontrol and mujoco py version too, as it's stated on mujoco py repo that it needs mujopro 1.5.0, yet dmcontrol depends on 2.0.0 version. And knowing your nvidia driver's version would be nice as well, as |
Hi, @astronautas, I haven't found a good way to run it yet. as you mentioned, I have to use mujoco 2.0.0 |
@astronautas and @JamesLuoau To debug this further, could you please confirm that you can create a dm_control environment and call render on it (outside of the PlaNet code)? I have both mjpro150 and mjpro200_linux installed on my machine but I think only the latter is used by dm_control. The PlaNet code is independent of the dm_control render option and should work will all of them as long as they support multi-threading -- I've used multiple options at some point. |
hi,@danijar. As for configuration, I installed mujoco-py after installing mujoco 150. Then reinstall mujoco 200 and install dmcontrol. This is the only way I can think of. I don't know if there will be any problems. |
@lunar24 You can install multiple MuJoCo versions by placing them into |
I'll have some time this weekend for this. I'll post back the results. |
@danijar I have checked a lot of information about this mistake, but I have not solved this problem. On the other hand, I am concerned that changing the code of the calling process may cause other problems. Therefore, I hope to get your advice. Thank you very much for your help. |
@danijar I can confirm these things at the moment:
Environment:
@danijar, could you please verify again that the code works both with the ExternalProcess and without using it for the environment? I suspect that launching the environment in a separate process alleviates the rendering problem, as based on the logs, the problem is that the current context is set in multiple threads in the same process. Though, neither me nor @JamesLuoau can successfully launch the environment in a separate process. EDIT: correct me if I'm wrong, that's how I see the current implementation: There are 2 processes communicating with each other: training_process <-----> worker (environment). Problem: Maybe there's something incorrect with how the external methods on the environment get called? I am not sure whether that's the case but could you verify whether the environment process always writes to its end of pipe while the reinforcement learning process always writes to its own end of pipe? |
@astronautas and @JamesLuoau Let's move this conversation over to #5 since the thread here got a bit confusing. I've responded to your questions there. @lunar24 Thanks for reporting this. To keep the threads focused, I started a new ticket for your issue: #6. Please provide the details I asked for there so we can try to resolve this. |
Hi, I'm trying to replicate your result, however the code not running well in my python 3.5 + macos, for example, multproccing I got EOF Error, I have fixed many this kind of error but I'm not sure how many I'm going to got further, so that knowing your tested environment would help me a lot. Thanks.
The text was updated successfully, but these errors were encountered: