Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConnectionResetError: [Errno 104] Connection reset by peer #4

Open
rl-2 opened this issue Nov 22, 2021 · 10 comments
Open

ConnectionResetError: [Errno 104] Connection reset by peer #4

rl-2 opened this issue Nov 22, 2021 · 10 comments

Comments

@rl-2
Copy link

rl-2 commented Nov 22, 2021

Hello,

I'm trying to train a PPO agent with Stable Baselines, followed by the instructions on Sec 5.2.2. After running ./TrainAndTestOpenAIStableBaselines.sh within_template, I got the following error:

Traceback (most recent call last):
  File "OpenAI_StableBaseline_Train.py", line 231, in <module>
    range(c.num_worker)])
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 111, in __init__
    observation_space, action_space = self.remotes[0].recv()
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

I wonder if I miss a step to activate the ScienceBird application? Please let me know.

Thank you!

@Cheng-Xue
Copy link
Collaborator

Cheng-Xue commented Nov 28, 2021

Hi Rodger, please try the new version and let me know if the issue persists. Thanks.

@rl-2
Copy link
Author

rl-2 commented Nov 29, 2021

Hi Cheng, it seems the issue is still there. Here is a full log:

Error in client-server communication: [Errno 111] Connection refused
Process ForkServerProcess-20:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 24, in _worker
    env = env_fn_wrapper.var()
  File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/Utils/utils.py", line 64, in _init
    max_attempts_per_level=max_attempts_per_level)
  File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/SBEnvironment/SBEnvironmentWrapperOpenAI.py", line 78, in __init__
    self.connect_agent_to_server()
  File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/SBEnvironment/SBEnvironmentWrapperOpenAI.py", line 88, in connect_agent_to_server
    self.ar.configure(self.env_id)
  File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/Client/agent_client.py", line 171, in configure
    self.playing_mode.value
  File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/Client/agent_client.py", line 131, in _send_command
    self.server_socket.sendall(msg)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "OpenAI_StableBaseline_Train.py", line 231, in <module>
    range(c.num_worker)])
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 111, in __init__
    observation_space, action_space = self.remotes[0].recv()
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

@rl-2
Copy link
Author

rl-2 commented Nov 30, 2021

To follow up on this issue, I initialized the game server before running the script and I got the similar issue:

021-11-30 00:57:35,012 - OpenAI stable baselines Training and Testing - INFO - training step: 0
Server started...
Error in client-server communication: [Errno 111] Connection refused

On the server side, it seems it has been killed automatically:

The Science Birds Server is waiting for the first agent to connect
Waiting for agent  
Killed

@Cheng-Xue
Copy link
Collaborator

Hi Rodger,
the problem should still be that the game server is not successfully initialised. Can you provide the exact environment you are using so that we can replicate the issue? Thanks.

@rl-2
Copy link
Author

rl-2 commented Nov 30, 2021

Thanks, Cheng. Below is the environments info:

  • Ubuntu: 18.04.6 LTS
  • Python: 3.7.10
  • Numpy: 1.18.5
  • Torch: 1.10.0
  • Torchvision: 0.8.2
  • lxml: 4.6.3
  • tensorboard: 2.7.0
  • Java: 13.0.4
  • stable-baselines: 1.3.0

And the steps I've taken are:

  1. Run java -jar ./game_playing_interface.jar and the terminal shows:
The Science Birds Server is waiting for the first agent to connect
Waiting for agent 
  1. Run ./TrainAndTestOpenAIStableBaselines.sh within_template. Then I got the errors shown in this thread.

@Cheng-Xue
Copy link
Collaborator

Hi Luo, I have updated a version. The new version will open a new terminal window to run the server. Please let me know if the problem still exist. Cheers.

@rl-2
Copy link
Author

rl-2 commented Dec 6, 2021

Hi Cheng,

Thanks a ton for the update! I saw this error when I run the code:

sh: 1: gnome-terminal: not found

Note that I'm running the code on an AWS instance. I'm not sure it prevents launching a new terminal window?

@Cheng-Xue
Copy link
Collaborator

Hi Rodger, it is a bit tricky to run on AWS, although we did our test on AWS as well, it only supports 'symbolic' mode atm. The initial version (you can activate it by setting self.headless_server = True at line 10 in Server.py.

Can you please verify if the following code can successfully run start the server?

bash -c "cd ../sciencebirdsgames/Linux && nohup java -jar ./game_playing_interface.jar --headless --dev > out 2>&1 &"

@hawe66
Copy link

hawe66 commented Feb 2, 2024

I also have a question regarding server.py.
You used 3 conditions; self.if_head, self.headless_server, self.state_repr_type.

  1. --dev > out 2>&1 option is added in line 22, 33, 43, 52 (when self.headless_server==True).
    Isn't this option correspond to self.state_repr_type?

  2. --headless option is added in line 22, 27, 43, 47 (when self.if_head==False and self.state_repr_type=='symbolic or when self.if_head=='headless').
    This obviously looks like wrong code, since you didn't add self.state_repr_type condition later on (i.e. elif and else).
    Also, I don't get why you added similarly functioning conditions self.if_head and self.headless_server.
    Can you explain me about this?

@Cheng-Xue
Copy link
Collaborator

I also have a question regarding server.py. You used 3 conditions; self.if_head, self.headless_server, self.state_repr_type.

  1. --dev > out 2>&1 option is added in line 22, 33, 43, 52 (when self.headless_server==True).
    Isn't this option correspond to self.state_repr_type?
  2. --headless option is added in line 22, 27, 43, 47 (when self.if_head==False and self.state_repr_type=='symbolic or when self.if_head=='headless').
    This obviously looks like wrong code, since you didn't add self.state_repr_type condition later on (i.e. elif and else).
    Also, I don't get why you added similarly functioning conditions self.if_head and self.headless_server.
    Can you explain me about this?

Hi Hawe,

Apologies for the delay in getting back to you.

Regarding your questions:

The addition of --dev > out 2>&1 corresponds to the use of symbolic states. But when the image representation is used, the agent will not read from the symbolic states, so adding --dev will not alter the result.

When self.state_repr_type == "symbolic", the agent requests symbolic state representation from the server. The presence of --dev ensures accurate information retrieval. Conversely, when self.state_repr_type != "symbolic", the agent doesn't engage with symbolic representation and requests only the images.

Regarding the presence of both self.headless_server and self.if_head, it was an issue during our code refactoring. We are planning to integrate the Java server directly into Unity for improved usability without additional configurations. We're committed to addressing these concerns and improving code readability in our next release.

Please let me know if you have future questions or would like more clarifications.

Cheers,
Cheng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants