Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple instances of minerl running in same execution environment causes shutdowns #177

Open
jon-chuang opened this issue Jul 25, 2019 · 9 comments
Assignees
Labels
bug Something isn't working malmo bug Something isn't working with base malmo

Comments

@jon-chuang
Copy link

jon-chuang commented Jul 25, 2019

I have experienced multiple shutdowns of minerl when 1. running multiple instances (e.g. 5) in parallel, 2. interrupting a jupyter kernel communicating with minerl.

The first is more serious and the second is just additional info.

I believe this is a bug.

The range of error messages I get are as follows:
~/.local/lib/python3.7/site-packages/gym/wrappers/time_limit.py in step(self, action) 13 def step(self, action): 14 assert self._elapsed_steps is not None, "Cannot call env.step() before calling reset()" ---> 15 observation, reward, done, info = self.env.step(action) 16 self._elapsed_steps += 1 17 if self._elapsed_steps >= self._max_episode_steps: ~/miniconda/envs/py37/lib/python3.7/site-packages/minerl/env/core.py in step(self, action) 525 # Receive reward done and sent. 526 reply = comms.recv_message(self.client_socket) --> 527 reward, done, sent = struct.unpack('!dbb', reply) 528 529 # Receive info from the environment. TypeError: a bytes-like object is required, not 'NoneType'

Failed to reset (socket error), trying again! Cleaning connection! Something must have gone wrong. Failed to reset (socket error), trying again! Cleaning connection! Something must have gone wrong. Connection with Minecraft client cleaned more than once; restarting.

This can require the restarting of either the gym make of minecraft, which is not a big issue, or my jupyter kernel, which is extremely disruptive to my running experiments.

There are other error messages which I will add to this issue once I encounter them again.

@MadcowD
Copy link
Collaborator

MadcowD commented Jul 25, 2019 via email

@jon-chuang
Copy link
Author

jon-chuang commented Jul 25, 2019

mc_6.log
mc_7.log

@jon-chuang
Copy link
Author

I'm not sure which logs correspond to my errors. When I get the same issue, I will attach my logs.

@MadcowD
Copy link
Collaborator

MadcowD commented Jul 25, 2019

Sweet! I have seen this error before :)))) Im on it !

@jon-chuang
Copy link
Author

Great thanks!

@jon-chuang
Copy link
Author

I've noticed this error only occurs when I try starting multiple instances of minerl at the same time; consquently a quick fix is just to stagger, annoying but I haven't encountered errors since.

@brandonhoughton
Copy link
Member

brandonhoughton commented Jul 26, 2019

I have been able to reproduce this:
https://gist.github.com/brandonhoughton/69c2a85043471c0043f9c9a003d9bf91

@NotNANtoN
Copy link

I have exactly the same issue when trying to train in 4 separate processes on a machine with 4 GPUs. The error messages are all over each other:
uesr@basegpu1:/home/user/Deep-RL-Torch$ ERROR:minerl.env.malmo.instance.54e7b2:[0 4:07:54] [EnvServerSocketHandler/INFO]: [STDOUT]: [ERROR] Video observation is null; please notify the developer. ERROR:minerl.env.malmo.instance.54e7b2:[04:07:54] [EnvServerSocketHandler/INFO]: [STD ERR]: java.lang.NullPointerException ERROR:minerl.env.malmo.instance.54e7b2:Exception in thread "EnvServerSocketHandler" [ 04:07:54] [EnvServerSocketHandler/INFO]: [STDERR]: at com.microsoft.Malmo.Client .MalmoEnvServer.stepSync(MalmoEnvServer.java:507) ERROR:minerl.env.malmo.instance.54e7b2:[04:07:54] [EnvServerSocketHandler/INFO]: [STD ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer.step(MalmoEnvServer.java:534) ERROR:minerl.env.malmo.instance.54e7b2:[04:07:54] [EnvServerSocketHandler/INFO]: [STD ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer.access$400(MalmoEnvServer.java:5 1) ERROR:minerl.env.malmo.instance.54e7b2:[04:07:54] [EnvServerSocketHandler/INFO]: [STD ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer$1.run(MalmoEnvServer.java:154) Traceback (most recent call last): File "train.py", line 227, in <module> trainer.run(600000, render=False, verbose=True) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 174, in run self.fill_replay_buffer(n_actions=self.n_initial_random_actions) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 103, in fill_replay_buffer explore=True, fully_random=True) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 120, in _act next_state, reward, done, _ = env.step(action) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/gym/core .py", line 285, in step return self.env.step(self.action(action)) File "/informatik2/students/home/8wiehe/.local/lib/python3.6/site-packages/gym/core .py", line 261, in step observation, reward, done, info = self.env.step(action) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/gym/wrap pers/time_limit.py", line 16, in step observation, reward, done, info = self.env.step(action) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/minerl/$ nv/core.py", line 536, in step reward, done, sent = struct.unpack('!dbb', reply) TypeError: a bytes-like object is required, not 'NoneType' ERROR:minerl.env.core:Failed to reset (socket error), trying again! ERROR:minerl.env.core:Cleaning connection! Something must have gone wrong. ERROR:minerl.env.core:Failed to reset (socket error), trying again! ERROR:minerl.env.core:Cleaning connection! Something must have gone wrong. ERROR:minerl.env.core:Connection with Minecraft client cleaned more than once; resta$ ting. ERROR:minerl.env.malmo:Attempted to send kill command to minecraft process and faile$ . ERROR:minerl.env.malmo.instance.af6e12:[04:09:26] [EnvServerSocketHandler/INFO]: [ST$ OUT]: [ERROR] Video observation is null; please notify the developer. ERROR:minerl.env.malmo.instance.af6e12:Exception in thread "EnvServerSocketHandler" $ 04:09:26] [EnvServerSocketHandler/INFO]: [STDERR]: java.lang.NullPointerException ERROR:minerl.env.malmo.instance.af6e12:[04:09:26] [EnvServerSocketHandler/INFO]: [ST$ ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer.stepSync(MalmoEnvServer.java:50$ ) ERROR:minerl.env.malmo.instance.af6e12:[04:09:26] [EnvServerSocketHandler/INFO]: [ST$ ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer.step(MalmoEnvServer.java:534) ERROR:minerl.env.malmo.instance.af6e12:[04:09:26] [EnvServerSocketHandler/INFO]: [ST$ ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer.access$400(MalmoEnvServer.java:$ 1) ERROR:minerl.env.malmo.instance.af6e12:[04:09:26] [EnvServerSocketHandler/INFO]: [ST$ ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer$1.run(MalmoEnvServer.java:154) Traceback (most recent call last): trainer.run(600000, render=False, verbose=True) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 174, in run self.fill_replay_buffer(n_actions=self.n_initial_random_actions) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 103, in fill_replay_buffer explore=True, fully_random=True) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 120, in _act next_state, reward, done, _ = env.step(action) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/gym/cor$ .py", line 285, in step return self.env.step(self.action(action)) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/gym/cor$ .py", line 261, in step observation, reward, done, info = self.env.step(action) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/gym/wra$ pers/time_limit.py", line 16, in step observation, reward, done, info = self.env.step(action) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/minerl/$ nv/core.py", line 536, in step reward, done, sent = struct.unpack('!dbb', reply) TypeError: a bytes-like object is required, not 'NoneType' ERROR:minerl.env.core:Failed to reset (socket error), trying again! ERROR:minerl.env.core:Cleaning connection! Something must have gone wrong. ERROR:minerl.env.core:Failed to reset (socket error), trying again! ERROR:minerl.env.core:Cleaning connection! Something must have gone wrong. ERROR:minerl.env.core:Connection with Minecraft client cleaned more than once; restar ting. File "train.py", line 227, in <module> trainer.run(600000, render=False, verbose=True) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 222, in run self.policy.optimize() File "/srv/home/user/Deep-RL-Torch/policies.py", line 104, in optimize self.policy.optimize() File "/srv/home/user/Deep-RL-Torch/policies.py", line 313, in optimize transitions = self.get_transitions() File "/srv/home/user/Deep-RL-Torch/policies.py", line 347, in get_transitions importance_weights = torch.from_numpy(importance_weights).float() TypeError: can't convert np.ndarray of type numpy.object_. The only supported types $ re: float64, float32, float16, int64, int32, int16, int8, uint8, and bool. ERROR:minerl.env.malmo.instance.fb8a11:[08:26:06] [Client thread/INFO]: [STDOUT]: CL$ ENT request state: ERROR_TIMED_OUT_WAITING_FOR_EPISODE_PAUSE ERROR:minerl.env.malmo.instance.fb8a11:[08:26:06] [Client thread/INFO]: [STDOUT]: CL$ ENT enter state: ERROR_TIMED_OUT_WAITING_FOR_EPISODE_PAUSE ERROR:minerl.env.core:Failed to reset (socket error), trying again! ERROR:minerl.env.core:Cleaning connection! Something must have gone wrong. ERROR:minerl.env.core:Failed to reset (socket error), trying again! ERROR:minerl.env.core:Cleaning connection! Something must have gone wrong. ERROR:minerl.env.core:Connection with Minecraft client cleaned more than once; resta$ ting.

@MadcowD MadcowD self-assigned this Sep 30, 2019
@MadcowD MadcowD added bug Something isn't working malmo bug Something isn't working with base malmo labels Sep 30, 2019
@shwang
Copy link
Member

shwang commented May 7, 2020

We ended up fixing this here HumanCompatibleAI#5 which led to another parallelization error addressed by HumanCompatibleAI#6 .

Be happy to merge this in the future if the maintainers are interested (though I'm a bit busy right now, so probably in a week or two)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working malmo bug Something isn't working with base malmo
Projects
None yet
Development

No branches or pull requests

5 participants