Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in camnode.py while calibrating #30

Closed
sdvillal opened this issue Feb 9, 2016 · 12 comments
Closed

Crash in camnode.py while calibrating #30

sdvillal opened this issue Feb 9, 2016 · 12 comments

Comments

@sdvillal
Copy link
Contributor

sdvillal commented Feb 9, 2016

@etiennecampione writes: "Hereafter is the error message I almost always get (let's say 95% of the time) while calibrating a Flycube. Can you indicate me what is going on? It's a really annoying and time-consuming problem".

[strawlab@flycube8 flycube8](master=)$ roslaunch ./flycube_only_flydra_calib.launch --screen
... logging to /home/strawlab/.ros/log/7433d1b0-cf1a-11e5-b183-bcee7bdac428/roslaunch-flycube8-5300.log
[...]
Exception in thread LogMessageSender:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/dist-packages/flydra/camnode.py", line 1041, in run
    self._main_brain.log_message(*log_message)
  File "/usr/lib/python2.7/dist-packages/flydra/camnode.py", line 197, in log_message
    self._log_message(req)
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/impl/tcpros_service.py", line 430, in __call__
    return self.call(*args, **kwds)
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/impl/tcpros_service.py", line 506, in call
    transport.send_message(request, self.seq)
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/impl/tcpros_base.py", line 590, in send_message
    self.write_buff.truncate(0)
AttributeError: 'NoneType' object has no attribute 'truncate'
[INFO] [WallTime: 1455014842.446372] setting camera property expected_trigger_framerate=100.0
[FATAL] [WallTime: 1455014842.446997] ERROR: thread LogMessageSender died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.497085] ERROR: thread save_cam.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.537913] ERROR: thread save_small.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.578685] ERROR: thread cam_processor.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.619367] ERROR: thread save_cam.mainloop died unexpectedly. Quitting
[INFO] [WallTime: 1455014842.628120] triggerbox_client: waiting for clockmodel estimate
[FATAL] [WallTime: 1455014842.629951] ERROR: thread save_small.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.630143] ERROR: thread cam_processor.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.630313] ERROR: thread LogMessageSender died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.630480] ERROR: thread save_cam.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.630644] ERROR: thread save_small.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.630809] ERROR: thread cam_processor.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.631007] ERROR: thread LogMessageSender died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.631231] ERROR: thread save_cam.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.631436] ERROR: thread save_small.mainloop died unexpectedly. Quitting
[INFO] [WallTime: 1455014843.128506] triggerbox_client: waiting for clockmodel estimate
[flydra_camera_node-4] process has died [pid 5341, exit code 1].
log files: /home/strawlab/.ros/log/7433d1b0-cf1a-11e5-b183-bcee7bdac428/flydra_camera_node-4*.log

How does the transport die? Is it possible that the logging service (and therefore the transport) is closed in one camera LogMessageThread thread while there are still messages to log in some other camera thread, and so we would need to add sync code? Or could it just be a network problem? Surely @astraw will know better about this.

I'm saving the full logs of that run here; they include a few other stacktraces.

  /mnt/strawscience/santi/strawlab/calibration-transport-not-open-logs
@nzjrs
Copy link
Contributor

nzjrs commented Feb 9, 2016

If you updated to 0.6.8 flydra then you will also need to update ros_flydra

@astraw
Copy link
Member

astraw commented Feb 9, 2016

@etiennecampione can you confirm this is new (did not happen before)?

Also: is this with the version of flydra which I just packaged and released a couple weeks ago? (What is the output of python -c "import flydra.version; print flydra.version.__version__"?) Is this on a specific flycube?

@etiennecampione
Copy link

The version 0.6.8 of flydra is installed (it is the output of "python -c "import flydra.version; print flydra.version.version"") and ros_flydra is updated ("git pull --ff-only") and re-built ("make clean", then "make").
I confirm that I did not have this problem before. I currently see it on both Flycubes 3 and 8 which would need a new calibration.

@astraw
Copy link
Member

astraw commented Feb 9, 2016

Yes, then update to git master of ros_flydra. Something like

roscd ros_flydra
git fetch
git merge origin/master

Please close this bug if that solves it for you.

@etiennecampione
Copy link

@astraw: I already was on the last commit of the ros_flydra repository, so these command lines did not change anything and alas could not solve the problem.
The last calibration I made was on the 17th December 2015, and I did not notice this problem at this time. So I fear it came with the new version of flydra.

@astraw
Copy link
Member

astraw commented Feb 9, 2016

Oh, yeah, you also have to make the latest ros_flydra

rosmake -s ros_flydra

Did you do that, too?

@nzjrs
Copy link
Contributor

nzjrs commented Feb 9, 2016

Relevant logs on the master side:

[rosout][INFO] 2016-02-09 11:47:22,196: received log message from Basler_21418139: excellent, grab thread running in maximum prioity mode
[rosout][INFO] 2016-02-09 11:47:22,202: received log message from Basler_21425980: using cam_iface driver: mega (wrapper: ctypes)
[rosout][INFO] 2016-02-09 11:47:22,210: received log message from Basler_21425975: using cam_iface driver: mega (wrapper: ctypes)
[rosout][INFO] 2016-02-09 11:47:22,211: received log message from Basler_21425975: excellent, grab thread running in maximum prioity mode
[rosout][INFO] 2016-02-09 11:47:22,214: received log message from Basler_21418139: using cam_iface driver: mega (wrapper: ctypes)
[rosout][INFO] 2016-02-09 11:47:22,421: received log message from Basler_21418139:   Basler_21418139 frames apparently skipped: 56 (168 vs 111)
[rosout][INFO] 2016-02-09 11:47:22,421: received log message from Basler_21425990:   Basler_21425990 frames apparently skipped: 98 (169 vs 70)
[rosout][INFO] 2016-02-09 11:47:22,422: received log message from Basler_21418139:   Basler_21418139 frames apparently skipped: 50 (111 vs 60)
[rosout][INFO] 2016-02-09 11:47:22,429: received log message from Basler_21425975:   Basler_21425975 frames apparently skipped: 48 (110 vs 61)
[rosout][INFO] 2016-02-09 11:47:22,627: triggerbox_client: waiting for clockmodel estimate
[rosout][INFO] 2016-02-09 11:47:23,128: triggerbox_client: waiting for clockmodel estimate
[rosout][INFO] 2016-02-09 11:47:23,628: triggerbox_client: got clockmodel estimate
[rosout][INFO] 2016-02-09 11:47:23,629: Basler_21425990 (re)synchronized
[rosout][INFO] 2016-02-09 11:47:23,629: Basler_21425985 (re)synchronized
[rosout][INFO] 2016-02-09 11:47:23,629: Basler_21425975 (re)synchronized
[rosout][INFO] 2016-02-09 11:47:23,629: Basler_21418139 (re)synchronized
[rosout][INFO] 2016-02-09 11:47:23,630: Basler_21425980 (re)synchronized
[rosout][WARNING] 2016-02-09 11:48:23,213: unknown Exception receiving timestamp echo data: timed out
[rospy.core][INFO] 2016-02-09 11:48:23,653: signal_shutdown [coordinate processor thread died]
[rospy.topics][ERROR] 2016-02-09 11:48:23,664: Traceback (most recent call last):
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/topics.py", line 245, in close
    c.close()
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/impl/tcpros_base.py", line 761, in close
    self.socket.close()
AttributeError: 'NoneType' object has no attribute 'close'

[rospy.topics][ERROR] 2016-02-09 11:48:23,664: Traceback (most recent call last):
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/topics.py", line 245, in close
    c.close()
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/impl/tcpros_base.py", line 761, in close
    self.socket.close()
AttributeError: 'NoneType' object has no attribute 'close'

[rospy.impl.masterslave][INFO] 2016-02-09 11:48:23,664: coordinate processor thread died
[rosout][INFO] 2016-02-09 11:48:24,279: mainbrain finished
[rosout][WARNING] 2016-02-09 11:48:26,779: cameras failed to quit cleanly: ['Basler_21425980', 'Basler_21425975', 'Basler_21425990', 'Basler_21418139', 'Basler_21425985']
[rosout][WARNING] 2016-02-09 11:48:29,280: cameras failed to quit cleanly: ['Basler_21425980', 'Basler_21425975', 'Basler_21425990', 'Basler_21418139', 'Basler_21425985']
[rospy.core][INFO] 2016-02-09 11:48:29,280: signal_shutdown [atexit]

It receives some log messages, and then the coordprocessor thread dies.

At first I thought the logging code I added was responsible, but now that I see it revieves some log messages I am not sure. Thoughts?

@etiennecampione
Copy link

@astraw: I ran rosmake -s ros_flydra, this alas didn't fix the problem.

@astraw
Copy link
Member

astraw commented Feb 9, 2016

@sdvillal can you chmod -R g+rx /mnt/strawscience/santi/strawlab/calibration-transport-not-open-logs

@nzjrs
Copy link
Contributor

nzjrs commented Feb 9, 2016

done, FWIW you can do all permission fixups via strawcore

root@strawcore:~# ./fix-isilon-perms.sh /mnt/strawscience/santi/strawlab/calibration-transport-not-open-logs

@sdvillal
Copy link
Contributor Author

sdvillal commented Feb 9, 2016

Yeah, since some time already permissions in strawscience are sometimes broken even when creating files with permissions properly configured.

@nzjrs nzjrs changed the title Race in camnode.py while calibrating Crash in camnode.py while calibrating Feb 10, 2016
@nzjrs
Copy link
Contributor

nzjrs commented Feb 10, 2016

I just pushed 478e6ab which stops the crashes.

etienne is still unable to detect points, but I think that is unrelated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants