Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce error handling for broken service connections #37

Closed
matthias-mayr opened this issue Oct 26, 2020 · 1 comment
Closed

Introduce error handling for broken service connections #37

matthias-mayr opened this issue Oct 26, 2020 · 1 comment

Comments

@matthias-mayr
Copy link
Member

I am running a lot of tasks with SkiROS and received one of these recently:

1603701395.683933973 INFO /worker_manager [skiros_rl_client.py:215(SkirosRlClient.next_callback)] [topics: /skill_managers/discovery, /bh_robot_4/tick_rate, /bh_robot_4/set_debug, /rosout, /bh_robot_4/monitor, /skill_managers/description, /bh_robot_2/monitor, /bh_robot_3/monitor, /bh_robot_1/set_debug, /bh_robot_1/monitor, /bh_robot_3/tick_rate, /wm/monitor, /bh_robot_3/set_debug, /bh_robot_2/tick_rate, /bh_robot_1/tick_rate, /bh_robot_2/set_debug] 2: Ending task 65451
1603701395.701773881 INFO /worker_manager [skiros_rl_client.py:181(SkirosRlClient.next_callback)] [topics: /skill_managers/discovery, /bh_robot_4/tick_rate, /bh_robot_4/set_debug, /rosout, /bh_robot_4/monitor, /skill_managers/description, /bh_robot_2/monitor, /bh_robot_3/monitor, /bh_robot_1/set_debug, /bh_robot_1/monitor, /bh_robot_3/tick_rate, /wm/monitor, /bh_robot_3/set_debug, /bh_robot_2/tick_rate, /bh_robot_1/tick_rate, /bh_robot_2/set_debug] 4: Starting new task

1603701395.706203937 ERROR /worker_manager [tcpros_service.py:581(Service.error_handler)] [topics: /skill_managers/discovery, /bh_robot_4/tick_rate, /bh_robot_4/set_debug, /rosout, /bh_robot_4/monitor, /skill_managers/description, /bh_robot_2/monitor, /bh_robot_3/monitor, /bh_robot_1/set_debug, /bh_robot_1/monitor, /bh_robot_3/tick_rate, /wm/monitor, /bh_robot_3/set_debug, /bh_robot_2/tick_rate, /bh_robot_1/tick_rate, /bh_robot_2/set_debug] Error processing request: 32Broken pipe
['Traceback (most recent call last):\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 632, in _handle_request\n    response = convert_return_to_response(self.handler(request), self.response_class)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/blackdrops/src/skiros_integration/skiros_rl_client.py", line 193, in next_callback\n    self.task_id = self.agent.tick_once(skill_list=[skill], param_list=params_des)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 113, in tick_once\n    return self.execute(execution_id, skill_list, srvs.SkillCommandRequest().TICK_ONCE, param_list)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 101, in execute\n    res = self.call(self._skill_exe_client, msg)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 199, in call\n    resp1 = service(msg)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 442, in __call__\n    return self.call(*args, **kwds)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 519, in call\n    transport.send_message(request, self.seq)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 673, in send_message\n    self.write_data(self.write_buff.getvalue())\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 698, in write_data\n    raise TransportTerminated(str(ioe_errno)+msg)\n', 'TransportTerminated: 32Broken pipe\n']

1603701395.707558000 ERROR /blackdrops_1603679732000701267 [/tmp/binarydeb/ros-melodic-roscpp-1.14.9/src/libros/service_server_link.cpp:378(ServiceServerLink::call)] [topics: /rosout] Service call failed: service [/skiros_worker_4] responded with an error: error processing request: 32Broken pipe

1603701395.708432912 ERROR /worker_manager [tcpros_service.py:581(Service.error_handler)] [topics: /skill_managers/discovery, /bh_robot_4/tick_rate, /bh_robot_4/set_debug, /rosout, /bh_robot_4/monitor, /skill_managers/description, /bh_robot_2/monitor, /bh_robot_3/monitor, /bh_robot_1/set_debug, /bh_robot_1/monitor, /bh_robot_3/tick_rate, /wm/monitor, /bh_robot_3/set_debug, /bh_robot_2/tick_rate, /bh_robot_1/tick_rate, /bh_robot_2/set_debug] Error processing request: 'NoneType' object has no attribute 'close'
['Traceback (most recent call last):\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 632, in _handle_request\n    response = convert_return_to_response(self.handler(request), self.response_class)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/blackdrops/src/skiros_integration/skiros_rl_client.py", line 193, in next_callback\n    self.task_id = self.agent.tick_once(skill_list=[skill], param_list=params_des)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 113, in tick_once\n    return self.execute(execution_id, skill_list, srvs.SkillCommandRequest().TICK_ONCE, param_list)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 101, in execute\n    res = self.call(self._skill_exe_client, msg)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 199, in call\n    resp1 = service(msg)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 442, in __call__\n    return self.call(*args, **kwds)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 535, in call\n    transport.close()\n', '  File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 846, in close\n    self.socket.close()\n', "AttributeError: 'NoneType' object has no attribute 'close'\n"]
1603701395.709977804 ERROR /blackdrops_1603679732000701267 [/tmp/binarydeb/ros-melodic-roscpp-1.14.9/src/libros/service_server_link.cpp:378(ServiceServerLink::call)] [topics: /rosout] Service call failed: service [/skiros_worker_4] responded with an error: error processing request: 'NoneType' object has no attribute 'close'
1603701395.761034965 INFO /worker_manager [skiros_rl_client.py:181(SkirosRlClient.next_callback)] [topics: /skill_managers/discovery, /bh_robot_4/tick_rate, /bh_robot_4/set_debug, /rosout, /bh_robot_4/monitor, /skill_managers/description, /bh_robot_2/monitor, /bh_robot_3/monitor, /bh_robot_1/set_debug, /bh_robot_1/monitor, /bh_robot_3/tick_rate, /wm/monitor, /bh_robot_3/set_debug, /bh_robot_2/tick_rate, /bh_robot_1/tick_rate, /bh_robot_2/set_debug] 4: Starting new task

h_robot_1/tick_rate, /bh_robot_2/set_debug] 4: Starting new task
1603701395.771452903 ERROR /worker_manager [tcpros_service.py:581(Service.error_handler)] [topics: /skill_managers/discovery, /bh_robot_4/tick_rate, /bh_robot_4/set_debug, /rosout, /bh_robot_4/monitor, /skill_managers/description, /bh_robot_2/monitor, /bh_robot_3/monitor, /bh_robot_1/set_debug, /bh_robot_1/monitor, /bh_robot_3/tick_rate, /wm/monitor, /bh_robot_3/set_debug, /bh_robot_2/tick_rate, /bh_robot_1/tick_rate, /bh_robot_2/set_debug] Error processing request: 'NoneType' object has no attribute 'tell'
['Traceback (most recent call last):\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 632, in _handle_request\n    response = convert_return_to_response(self.handler(request), self.response_class)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/blackdrops/src/skiros_integration/skiros_rl_client.py", line 193, in next_callback\n    self.task_id = self.agent.tick_once(skill_list=[skill], param_list=params_des)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 113, in tick_once\n    return self.execute(execution_id, skill_list, srvs.SkillCommandRequest().TICK_ONCE, param_list)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 101, in execute\n    res = self.call(self._skill_exe_client, msg)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 199, in call\n    resp1 = service(msg)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 442, in __call__\n    return self.call(*args, **kwds)\n', '  File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 519, in call\n    transport.send_message(request, self.seq)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 672, in send_message\n    serialize_message(self.write_buff, seq, msg)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/msg.py", line 139, in serialize_message\n    start = b.tell()\n', "AttributeError: 'NoneType' object has no attribute 'tell'\n"]

My current workaround is this patch:

--- a/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py                                                                                                                                      
+++ b/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py                                                                                                                                      
@@ -84,6 +84,16 @@ class SkillManagerInterface:                                                                                                                                                           
     def get_skill(self, name):                                                                                                                                                                           
         return self._skill_list[name]                                                                                                                                                                    
                                                                                                                                                                                                          
+    # Hack: This is a workaround because the connection breaks occasionally                                                                                                                              
+    def call_execute_client(self, msg):                                                                                                                                                                  
+        try:                                                                                                                                                                                             
+            resp1 = self._skill_exe_client(msg)                                                                                                                                                          
+            return resp1                                                                                                                                                                                 
+        except rospy.ServiceException as e:                                                                                                                                                              
+            log.error("[call]", "Service call failed: %s. Renewing connection."%e)                                                                                                                       
+            self._skill_exe_client = rospy.ServiceProxy(self._skill_mgr_name + '/command', srvs.SkillCommand)                                                                                            
+            return                                                                                                                                                                                       
+                                                                                                                                                                                                         
     def execute(self, execution_id=-1, skill_list=None, action=srvs.SkillCommandRequest().START, param_list=None):                                                                                       
         """                                                                                                                                                                                              
         @brief Execute a list of skills                                                                                                                                                                  
@@ -98,7 +108,7 @@ class SkillManagerInterface:                                                                                                                                                           
         if skill_list is not None:                                                                                                                                                                       
             for s in skill_list:                                                                                                                                                                         
                 msg.skills.append(s.toMsg())                                                                                                                                                             
-        res = self.call(self._skill_exe_client, msg)                                                                                                                                                     
+        res = self.call_execute_client(msg)                                                                                                                                                              
         if res is None:                                                                                                                                                                                  
             return -1                                                                                                                                                                                    
         if not res.ok:   

Since the call function does not easily have the information available to re-instantiate the connection.

@matthias-mayr
Copy link
Member Author

This issue does not appear for us anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant