Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[replicator_node] Dies if datacentre is not starting fast enough #52

Closed
cdondrup opened this Issue May 14, 2014 · 9 comments

Comments

Projects
None yet
2 participants
@cdondrup
Copy link
Member

cdondrup commented May 14, 2014

It seems the replicator node dies if the datacentre is not yet launched:

Traceback (most recent call last):
  File "/opt/strands/strands_hydro_ws/src/ros_datacentre/ros_datacentre/scripts/replicator_node.py", line 138, in <module>
    store = Replicator()
  File "/opt/strands/strands_hydro_ws/src/ros_datacentre/ros_datacentre/scripts/replicator_node.py", line 26, in __init__
    raise Exception("No master datacentre found using datacentre_host and datacentre_port")
Exception: No master datacentre found using datacentre_host and datacentre_port

If possible it should wait I guess.

@cdondrup cdondrup added the bug label May 14, 2014

@hawesie

This comment has been minimized.

Copy link
Member

hawesie commented May 14, 2014

On 14 May 2014, at 21:35, Christian Dondrup notifications@github.com wrote:

It seems the replicator node dies if the datacentre is not yet launched:

Traceback (most recent call last):
File "/opt/strands/strands_hydro_ws/src/ros_datacentre/ros_datacentre/scripts/replicator_node.py", line 138, in
store = Replicator()
File "/opt/strands/strands_hydro_ws/src/ros_datacentre/ros_datacentre/scripts/replicator_node.py", line 26, in init
raise Exception("No master datacentre found using datacentre_host and datacentre_port")
Exception: No master datacentre found using datacentre_host and datacentre_port

If possible it should wait I guess.

This was fixed in #51 so please update your datacentre.

@cdondrup

This comment has been minimized.

Copy link
Member Author

cdondrup commented May 14, 2014

Did it today. Just tried again:

strands@linda:/opt/strands/strands_hydro_ws/src$ wstool update ros_datacentre
[ros_datacentre] Updating /opt/strands/strands_hydro_ws/src/ros_datacentre
[ros_datacentre] Done.

No updates.

@hawesie

This comment has been minimized.

Copy link
Member

hawesie commented May 14, 2014

@cdondrup

This comment has been minimized.

Copy link
Member Author

cdondrup commented May 14, 2014

Just checked. In our version these lines are exactly the same. Don't know why it fails then.

@hawesie

This comment has been minimized.

Copy link
Member

hawesie commented May 14, 2014

This just happened to me once, but now I can’t recreate it! Odd.

@hawesie

This comment has been minimized.

Copy link
Member

hawesie commented May 14, 2014

Is there something wrong with the logic?

@cdondrup

This comment has been minimized.

Copy link
Member Author

cdondrup commented May 16, 2014

Seems to happen very reliably after restarting the robot:

process[mongo_server-2]: started with pid [7408]
process[config_manager-3]: started with pid [7415]
process[message_store-4]: started with pid [7416]
process[replicator_node-5]: started with pid [7423]
[INFO] [WallTime: 1400238474.752977] Mongo server address: localhost:62345
[INFO] [WallTime: 1400238475.394581] Found MongoDB version 2.0.4
[INFO] [WallTime: 1400238475.428222] Fri May 16 12:07:55 [initandlisten] MongoDB starting : pid=7477 port=62345 dbpath=/opt/strands/ros_datacentre 64-bit host=linda
[INFO] [WallTime: 1400238475.429340] Fri May 16 12:07:55 [initandlisten] db version v2.0.4, pdfile version 4.5
[INFO] [WallTime: 1400238475.430047] Fri May 16 12:07:55 [initandlisten] git version: nogitversion
[INFO] [WallTime: 1400238475.430838] Fri May 16 12:07:55 [initandlisten] build info: Linux lamiak 2.6.42-37-generic #58-Ubuntu SMP Thu Jan 24 15:28:10 UTC 2013 x86_64 BOOST_LIB_VERSION=1_46_1
[INFO] [WallTime: 1400238475.431534] Fri May 16 12:07:55 [initandlisten] options: { dbpath: "/opt/strands/ros_datacentre", port: 62345 }
[INFO] [WallTime: 1400238475.585136] Fri May 16 12:07:55 [initandlisten] journal dir=/opt/strands/ros_datacentre/journal
[INFO] [WallTime: 1400238475.599924] Fri May 16 12:07:55 [initandlisten] recover : no journal files present, no recovery needed
[WARN] [WallTime: 1400238475.699624] Could not connect to master datacentre at localhost:62345
Traceback (most recent call last):
  File "/opt/strands/strands_hydro_ws/src/ros_datacentre/ros_datacentre/scripts/replicator_node.py", line 138, in <module>
    store = Replicator()
  File "/opt/strands/strands_hydro_ws/src/ros_datacentre/ros_datacentre/scripts/replicator_node.py", line 26, in __init__
    raise Exception("No master datacentre found using datacentre_host and datacentre_port")
Exception: No master datacentre found using datacentre_host and datacentre_port
[INFO] [WallTime: 1400238475.806865] Fri May 16 12:07:55 [initandlisten] waiting for connections on port 62345
[INFO] [WallTime: 1400238475.807705] Fri May 16 12:07:55 [websvr] admin web console waiting for connections on port 63345
[INFO] [WallTime: 1400238475.850584] Fri May 16 12:07:55 [initandlisten] connection accepted from 127.0.0.1:47969 #1
[INFO] [WallTime: 1400238475.904445] Fri May 16 12:07:55 [initandlisten] connection accepted from 127.0.0.1:47972 #2
[INFO] [WallTime: 1400238475.907097] Querying content in a futher 0 datacentres
[INFO] [WallTime: 1400238475.927005] Found default parameter file twitter_params.yaml
[replicator_node-5] process has died [pid 7423, exit code 1, cmd /opt/strands/strands_hydro_ws/src/ros_datacentre/ros_datacentre/scripts/replicator_node.py __name:=replicator_node __log:=/localhome/strands/.ros/log/4a3df80a-dcea-11e3-bc0d-00032d225887/replicator_node-5.log].
log file: /localhome/strands/.ros/log/4a3df80a-dcea-11e3-bc0d-00032d225887/replicator_node-5*.log
@cdondrup

This comment has been minimized.

Copy link
Member Author

cdondrup commented May 16, 2014

Here is the log if that helps:

cat replicator_node-5.log 
[rospy.client][INFO] 2014-05-16 12:07:54,668: init_node, name[/replicator_node], pid[7423]
[xmlrpc][INFO] 2014-05-16 12:07:54,669: XML-RPC server binding to 0.0.0.0:0
[xmlrpc][INFO] 2014-05-16 12:07:54,669: Started XML-RPC server [http://linda:48122/]
[rospy.init][INFO] 2014-05-16 12:07:54,670: ROS Slave URI: [http://linda:48122/]
[rospy.impl.masterslave][INFO] 2014-05-16 12:07:54,670: _ready: http://linda:48122/
[rospy.registration][INFO] 2014-05-16 12:07:54,671: Registering with master node http://linda:11311
[xmlrpc][INFO] 2014-05-16 12:07:54,671: xml rpc node: starting XML-RPC server
[rospy.init][INFO] 2014-05-16 12:07:54,771: registered with master
[rospy.rosout][INFO] 2014-05-16 12:07:54,771: initializing /rosout core topic
[rospy.rosout][INFO] 2014-05-16 12:07:54,775: connected to core topic /rosout
[rospy.simtime][INFO] 2014-05-16 12:07:54,776: /use_sim_time is not set, will not subscribe to simulated time [/clock] topic
[rospy.internal][INFO] 2014-05-16 12:07:54,978: topic[/rosout] adding connection to [/rosout], count 0
[rosout][WARNING] 2014-05-16 12:07:55,699: Could not connect to master datacentre at localhost:62345
[rospy.core][INFO] 2014-05-16 12:07:55,716: signal_shutdown [atexit]
[rospy.internal][INFO] 2014-05-16 12:07:55,720: topic[/rosout] removing connection to /rosout
[rospy.impl.masterslave][INFO] 2014-05-16 12:07:55,721: atexit
@hawesie

This comment has been minimized.

Copy link
Member

hawesie commented May 16, 2014

My guess at this is that ros params are not set in a blocking call and thus they are not available from the parameter server by the time the wait topic comes up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.