Skip to content
This repository has been archived by the owner on Nov 13, 2017. It is now read-only.

Potential fix for current robot state incorrect #407

Closed
wants to merge 3 commits into from

Conversation

sachinchitta
Copy link
Contributor

updates should always happen here (they are throttled in current state monitor anyway).

This should fix #405 - the issue was the following:

(1) current_state_monitor will not pass updates on if none of the joints have changed values. industrial simulator publishes updates at 10 Hz.
(2) planning_scene_monitor will not process a state update if they are not spaced apart by at least dt_state_update_ which defaults to 0.1s. So, the last state update never gets processed when being used with industrial robot simulator.
(3) Changing the dt_state_update is not right either since planning_scene_monitor can still miss some updates

Solution is to not use dt_state_update but do an update in planning_scene_monitor everytime current_state_monitor updates - note that this could potentially cause issues if current_state updates come in too fast.

@shaun-edwards
Copy link

Fixes the issue on my system. +1 @sachinchitta , I owe you a beer.

@isucan
Copy link
Contributor

isucan commented Jan 24, 2014

This change will unfortunately also trigger updating the state many times a second, depending on how often /joint_state comes in. This means that if move_group is publishing to the rviz plugin, you will also get diffs. If an octomap is being maintained, the planning scene diff will include the octomap. We do not support diffs for the octomap, so we will end up pushing it over the wire very often. This will probably cause issues, depending on computer & network speed. The nice thing would be to actually implement diffs for the octomap, but I think we should throttle state updates in any case. I will take a look at this soon.

@sachinchitta
Copy link
Contributor Author

Updating the current state properly is important - what was happening before was not correct. Throttling it is fine but not updating is not. The other option is to have a delay in current state monitor (maybe 1s ) before updating is stopped even if the state is not changing. That way we can be sure that the last few state updates have gone through.

@gavanderhoorn
Copy link

gavanderhoorn commented Jan 24, 2014

Fixes it for me as well (though I don't use octomaps).

@sachinchitta: just out of curiosity though: what is the difference between plan then execute, and just hitting plan & execute that causes the latter to 'work' without this change? Update of the robot state in the RViz window is much smoother in that case as well (with just execute I only see the second-to-last and the last state update).

@acornacorn
Copy link

Another solution would be to create a timer callback which fires every dt_state_update_ sec (i.e. every 0.1 sec) and checks whether there is an unprocessed state update. The callback would check for any update that has arrived more than dt_state_update_ sec earlier and not been processed, and if so process it. This would mean storing (a copy of) any state update that arrives and is not processed.

Perhaps an even better way of doing it is to store any state update that arrives, and then in the timer callback process the most recent one (if any).

@isucan
Copy link
Contributor

isucan commented Jan 24, 2014

@acornacorn I agree. I actually started on a fix along these lines. I added an optional period for the addUpdateCallback() (defaults to 0). Then when we get updates, we just cache the shared ptr to the message, so copy is not really an issue. The only thing is that while we have an unprocessed update we have a busy wait loop until the last updater (longest period) has received its state (in the thread that calls the callbacks). If we have an updater that expects updates every second, this is not nice. You could wait the minimum duration too, but if that is 0, we end up with the busy wait loop. What I think is the better approach is to create an updater thread for every requested update callback. The only thing to keep track of then is making sure all threads got the new joint_state. It seems we will need a counter for that. How does this sound to you? I can work on it this weekend unless somebody else fixes it :)

@acornacorn
Copy link

@isucan I'm not following the reason we need to busy wait -- where is that needed?

@isucan
Copy link
Contributor

isucan commented Jan 24, 2014

@acornacorn I am not sure I have the right design in mind yet :) Maybe we do not. Here is what I think the version with one update thread is:

  • when we get an update with a new state, we send that update to the updating thread, which calls the callbacks. we use a condition variable to trigger this update.
  • the update thread keeps track of the last update for each updater, the period at which the updater wants to be updated and decides whether to call the updater. If it does not call all updaters (there is one that need to wait more), we have no guarantee our condition will be triggered from the outside, and we cannot discard the update we now have, just in case we do not get another. So we keep it around, until the time to update the slow callback comes. But we can't just sleep, just in case we get other updates, and there are updaters that want those updates at a higher rate. In the worst case (0 period ofr one updater, high period for another) this will essentially end up being busy wait. This is why I think we should have one thread per updater, to avoid this issue.
    Maybe there is a cleaner solution?

@acornacorn
Copy link

OK, so we are using a condition variable to trigger updates in the update thread. My suggestion is:

  • keep a variable "timeout" that is initially set to a large value (e.g. 1000 seconds)
  • the thread waits on the condition variable with "timeout" as its timeout
  • when the condition variable wakes up:
    • set timeout to 1000sec
    • loop through all pending updates of all types
      • if a new update has arrived and it is time to execute it: do so. Discard any pending updates.
      • else if a new update has arrived but it is not time to execute it: save it and set timeout = min(timeout, time_until_this_update_should_be_executed)
      • else if an update is pending and it is time to execute it: do so. Discard it.
      • else if an update is pending and it is not time to execute it yet: save it and set timeout = min(timeout, time_until_this_update_should_be_executed)

With this scheme:

  • we never busy wait
  • we process new updates as soon as they arrive if they are infrequent
  • updates that arrive too frequently are processed at the desired rate
  • If an update arrives frequently, but then stops arriving, we do not "lose" the last one. We process it as soon as possible without processing it more frequently than we want to.

Does that make sense?

@acornacorn
Copy link

Discussed with Sachin. A much simpler solution is to just use the ros callback queue like we are doing now. It would work like this:

  • update messages cause callbacks on the (single) callback thread.
  • create a periodic timer that fires at the update rate (once every dt_state_update_ sec)
    • this timer runs all the time
  • When an update arrives and it has been a long time, execute it immediately
  • When an update arrives too soon after the last update, save it
  • If a second update arrives, still too soon, combine it with the first update
  • When the timer callback is called, check for pending updates and process them

This means that the timer is running all the time. But the callback will be very quick if no updates are pending. And this is a much simpler change than creating a whole new mechanism with condition variables.

@isucan
Copy link
Contributor

isucan commented Jan 24, 2014

Hey @acornacorn, I really like your first suggestion.
I am not sure how the timers would work. What if I have a callback that has an update rate of 0? That would make the thread created by the timer essentially a busy wait loop. Or am I missing something? We also have no guarantee there is a single callback thread. That is true of move_group, but if the user constructs a spinner with more threads, that assumption breaks.
Another consideration is that we use condition variables in many places, and for consistency reasons, I'd rather use that instead of timers.

@isucan
Copy link
Contributor

isucan commented Jan 24, 2014

Hmm. That timeout 0 thing affects all suggestions so far I think :( I was looking at the timer execution thread:

http://docs.ros.org/hydro/api/roscpp/html/timer_8cpp_source.html
http://docs.ros.org/hydro/api/roscpp/html/timer__manager_8h_source.html

The main issue is that we want to not run at the minimum dt_update_time_, because that could be 0, and only trigger logic when a message comes in.

@acornacorn
Copy link

My first suggested solution

should work fine with a dt_update_time_ of 0. When dt_update_time_ is 0 the incoming message will always trigger the execution right away and the "timeout" variable will never be set (it will always be 1000sec or whatever we set it to at the start of the loop).

My second suggested solution

should also work fine. If dt_update_time_ is 0 then don't start the timer at all (or start it and set it to a very long period like 1000 sec). The updates will always be processed immediately when they arrive and the timer callback will never be needed. If there are several update messages and they all have different dt_update_time_ values then start the timer and set its period to the smallest nonzero dt_update_time_ value. Again, messages with a dt_update_time_ of 0 will always be processed when they are first seen and never need the timer callback.

The second solution seems to be closer to the way it is implemented now. Am I missing something? Where are the condition variables now?

Regarding multiple threads: either of these suggestions should work fine with multiple threads. With multiple threads each message type would need its own mutex to ensure 2 threads do not merge or execute the same type of update at the same time.

@isucan
Copy link
Contributor

isucan commented Jan 25, 2014

@acornacorn that sounds good. I think I understood something incorrectly from your first comment. Any of these solutions is fine by me!

Currently we use condition variables in robot_interaction, the rviz plugin, planning_scene_monitor, trajectory_execution, octomap updaters and pick & place.

@shaun-edwards
Copy link

I just tested Sachin's fix with a real robot that did not start at the home position (i.e. not all zeros like moveit and the ROS-I simulator assumes). The Scene robot does not match the current robot state on startup. Pressing the update start state to current state does not update the scene robot as one would expect. There is a work around that fixes the issue:

  1. Under the Planning tab, press "Plan". The robot will plan a simple path (i.e. the goal and start state both start off at zero).
  2. Once the planning step is complete, press "Update start state to current" The start state jumps to the current robot position.
    I'm not sure if this falls under the same bug or not (could be an initialization bug).

@sachinchitta
Copy link
Contributor Author

This seems related - it does look like an initialization error where initial updates are either not getting through or being missed completely. I do think Acorn's proposed fix should handle this initialization issue as well - we will try and get it done quickly this week.

@sachinchitta
Copy link
Contributor Author

@shaun-edwards - can you try with my hacked update to see if the initialization issue also gets fixed on your robot. I must admit that I thought this would not work for a robot that starts with all zeros but would work for a robot that does not start there but your problem seems to be the other way around.

@shaun-edwards
Copy link

@sachinchitta - I tried your latest change. I'm still having the same issue. Just to be sure, I verified that your latest code change was there and added a log message (it appears when I run the robot). The "scene robot" and "start state" are still stuck at the zero position.

Perhaps this is caused by a different issue. What is odd, is that the state does NOT update eventually, even if it took a few seconds or so.

I'm happy to test, but let me know if you want be to look into this further. I certainly could try to narrow down the problem better.

@sachinchitta
Copy link
Contributor Author

I can see where it would fail to update with the simulator - current_state_monitor initializes robot_state to default values and then only does an update if the values have changed. If the simulator publishes default (zero) values on startup, the initial update won't get through until the robot moves. Not sure why this would happen with a robot where the initial state is different from the default state though - that update should just go through. Will keep digging.

@sachinchitta
Copy link
Contributor Author

Ok, I think I have both issues figured out and a potential fix attached to this pull request now - its not the fix that @acornacorn is talking about above but it at least highlights the issues. There are two issues here:

(1) some updates were not getting through - this is something that we have discussed above, my solution for not throttling state updates will fix this but it might be better to use Acorn's approach
(2) rviz comes up after move_group but does not get a planning scene update from move_group - this is the source of the initialization bug. The (temporary) fix for this is to make the planning scene diff publisher latched which means that once rviz comes up it will get the last scene update (which contains the last state update). This only gets the last scene update though so if there are other updates they will be missed. A better solution to this would be for a new node that comes up to first ask for the full planning scene and then listen to diffs.

@shaun-edwards, @k-okada let me know if the two fixes I have in here at least temporarily fix your issues.

@shaun-edwards
Copy link

It has fixed the problem on my robot.

@shaun-edwards
Copy link

@sachinchitta, any ETA on when the patch will be released into debians (I realize @acornacorn is working on a more permanent fix)

@acornacorn
Copy link

I have an alternative fix in pull request #408

This works as I described above #407 (comment)

Basically a timer ensures that state updates that arrive and are not processed right away are eventually (within at most dt_state_update_*2) processed (e.g. if no other state updates arrive for a long time).

Note: the fix does not include a fix for the "Rviz starts late and misses the initial planning scene message" problem. That will be fixed in a separate pull request...

@isucan
Copy link
Contributor

isucan commented Feb 3, 2014

I propose we close this pull request without merging in favour of #408.

@acornacorn
Copy link

I have an additional fix for the planning scene initial state problem.

Pull request #409 modifies PlanningSceneDisplay so that it queries the full state of the planning scene when Rviz starts (actually when the PlanningSceneDisplay is enabled and the robot model is loaded).

Pull request #409 includes the changes in #408 so if we merge #409 then we can abandon #407 and #408 .

I have confirmed that this fixes both problems on my system.

@shaun-edwards can you confirm that #408 fixes the problems you were seeing?

@sachinchitta
Copy link
Contributor Author

Closing this as its now handled by #409.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants