Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out transactions older than the lag window #173

Merged
merged 8 commits into from
Apr 25, 2020

Conversation

svwilliams
Copy link
Contributor

There is a bug where old sensor data is getting added to the fixed-lag smoother. This PR fixes that by correctly filtering out old sensor transactions before that are inserted into the graph. For thread synchronization reasons, this is done inside of the optimization thread rather than in the ROS callback thread.

fuse_optimizers/src/fixed_lag_smoother.cpp Outdated Show resolved Hide resolved
@@ -338,7 +342,20 @@ void FixedLagSmoother::processQueue(fuse_core::Transaction& transaction)
while (transaction_riter != pending_transactions_.rend())
{
auto& element = *transaction_riter;
if (std::find(sensor_blacklist.begin(), sensor_blacklist.end(), element.sensor_name) != sensor_blacklist.end())
auto min_stamp = element.stamp();
if (!element.transaction->involvedStamps().empty() && *element.transaction->involvedStamps().begin() < min_stamp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what's the relation between the transaction stamp() and the involvedStamps(). IMHO it's probably better to make the transaction stamp() return involved_stamps_.begin() or ros::Time(0, 0) if involved_stamps_.empty(). I'm not sure if there's a valid case where stamp() could or should be different than any those two cases. 🤔

Why not only checking the involved stamps? Would it matter if there's no involved stamp? I believe that's already handled in:

const auto& stamps = transaction.involvedStamps();
if (stamps.empty())
{
return;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add this method to Transaction:

const ros::Time& Transaction::oldestStamp() const
{
  if (involved_stamps_.empty())
  {
    return stamp_;
  }
  else
  {
    return std::min(*involved_stamps_.begin(), stamp_);
  }
}

and this to the TransactionQueueElement:

const ros::Time& oldestStamp() const { return transaction->oldestStamp(); }

So you can do this here: const auto& min_stamp = element.oldestStamp();

If we get rid of stamp_ though, that could be further simplified, but I'm still not sure about that part. The question is, should the transaction stamp always be the oldest involved stamps? Or zero if there's no involved stamp, which is the current behaviour. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the concept of the transaction having an "oldestStamp()" method. That cleans up this code nicely, and insulates it from future changes.

As far as the Transaction stamp vs involved_stamps. I'm honestly not sure what the correct answer is. In my mind, I want the transactions applied in the order that they were measured/computed/etc. So in the case of relative poses, the pose1->pose2 measurement will occur at the pose2 timestamp. So the transaction stamp == pose2 stamp, but the involved stamps will contain {pose1 stamp, pose2 stamp}. But I'm honestly not sure if the application order matters at all, it just felt like the thing to do at the time.

I guess one small difference will be transaction timeout. If we always use the oldest stamp, a transaction could timeout instantly. We could add a newestStamp() method to the transaction as well, and use the newest stamp for timeout purposes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one small difference will be transaction timeout. If we always use the oldest stamp, a transaction could timeout instantly. We could add a newestStamp() method to the transaction as well, and use the newest stamp for timeout purposes.

Yes, I added newestStamp() in #176 , but the transaction timeout logic I implemented is actually comparing against the oldest stamp of each transaction:

I think you're right. The timeout logic should use the newestStamp() of all transactions. That would required more changes, since we can't rely on the pending_transactions_ ordering, if it's based on the oldestStamp(), as I did in #176

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be good for now, for this PR. I think this can be merged as is and we can continue the discussion and fixed the remaining issues in #176.

@efernandez
Copy link
Collaborator

efernandez commented Apr 13, 2020

After some recent PRs got merged, I've noticed that sometimes after setting a pose this WARN messages is printed:

[INFO]: Received a set_pose request (stamp: 173.570000000, x: 1.4, y: 0.525, yaw: -0.0350735)
[WARN ]: Unable to locate a state in this history with stamp <= 173.540000000. Variables will all be initialized to 0.

I just tried with #172 and #173, but I still see that message. Not sure if these PRs are supposed to get rid of that or not though. 🤔

I'm also getting:

[ERROR]: The queued transaction with timestamp 2589.220000000 from sensor laser_odometry is not an ignition sensor transaction. This transaction will not be processed individually.

I'm starting to suspect there's something wrong in the ignition logic I changed recently. 🤔

Update

@svwilliams The issue is that for differential/relative constraints, the transaction stamp() is the one of the newest variable:

transaction->stamp(msg->header.stamp);

Since the pending_transactions_ are order by transaction stamp() instead of the minimum/oldest involvedStamps():

// Add the new transaction to the pending set
// The pending set is arranged "smallest stamp last" to making popping off the back more efficient
auto comparator = [](const ros::Time& value, const TransactionQueueElement& element)
{
return value >= element.stamp();
};
auto position = std::upper_bound(
pending_transactions_.begin(),
pending_transactions_.end(),
transaction->stamp(),
comparator);
pending_transactions_.insert(position, {sensor_name, std::move(transaction)}); // NOLINT

When an ignition transaction is received, the transactions with differential/relative constraints don't get purged, because the minimum/oldest involvedStamps() is not considered in that check.

I'm still not sure if we should use the minimum/oldest involvedStamps() always, for the transaction stamp(). We probably want the newest stamp in the pending_transactions_, but not when purging. Thoughts? 🤔

This is definitely an issue, but I believed it's not related with any recent changes. It's probably been an issue always, but no warn/error messages was being printed in relation to it.

I've created #176 to address these issues.

@@ -204,9 +207,10 @@ void FixedLagSmoother::optimizationLoop()
// Optimization is complete. Notify all the things about the graph changes.
notify(std::move(new_transaction), graph_->clone());
// Compute a transaction that marginalizes out those variables.
lag_expiration = computeLagExpirationTime();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this done here, and not before processQueue?

It sems to me that the lag_expiration should be computed before, so it's updated after a reset:

timestamp_tracking_.clear();

Maybe lag_expiration must be an attribute. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So.....

A more straight-forward processing regime would be:

  • Compute lag expiration
  • Compute marginal (This transaction will removes variables before the lag expiration)
  • Add the marginal transaction to the graph (This actually removing those old variables)
  • Add new transactions to the graph (As long as they don't involve old variables)
  • Optimize the graph
  • Notify publishers

However, that puts additional marginalization processing at the top of the loop. In an attempt to reduce publication latency, I shifted the above sequence by half a loop.
On loop 1:

  • Add new transactions to the graph
  • Optimize the graph
  • Notify publishers
  • Compute lag expiration
  • Compute marginal (Note that we do not apply the marginal here)

Then on loop 2+:

  • Add the marginal transaction to the graph (removing the old variables computed during the previous iteration)
  • Add new transactions to the graph (as long as they don't involve those old variables)
  • Optimize the graph
  • Notify publishers
  • Compute lag expiration
  • Compute the next marginal

In the above scheme, the marginalization processing happens after the publishers are notified about the new graph values, under the theory that we have some spare time before the next optimization loop is triggered.

I could apply the marginal transaction at the end of the loop without optimizing the graph. That would clean up the code a bit, but it would leave the graph object in a different state than what was sent to the publishers, etc. I think that would be fine. But that does not change where in the loop we compute the lag expiration time.

The question you raise is, can we do this instead:

  • Compute lag expiration
  • Add new transactions to the graph
  • Optimize the graph
  • Notify publishers
  • Compute marginal
  • Add the marginal transaction to the graph

Existing sequence, iteration 1:

  • Add t1, t2, t3 to the graph
  • Optimize and publish (graph contains t1, t2, t3)
  • expiration = t1
  • Compute marginal for t1

Existing sequence, iteration 2:

  • Apply marginal for t1
  • Add new after t1 (t3, t4, t5)
  • Optimize and publish (graph contains t2, t3, t4, t5)
  • expiration = t3
  • Compute marginal for t2, t3

Existing sequence, iteration 3:

  • Apply marginal for t2, t3
  • Add new after t3 (t6, t7, t8)
  • Optimize and publish (graph contains t4, t5, t6, t7, t8)
  • expiration = t6
  • Compute marginal for t4, t5, t6

Proposed sequence, iteration 1:

  • expiration = t0 (The graph is empty)
  • Add t1, t2, t3 to the graph
  • Optimize and publish (graph contains t1, t2, t3)
  • Compute marginal for t0 (which is empty)
  • Apply marginal for t0 (which is empty)

Proposed sequence, iteration 2:

  • expiration = t1 (Graph contains t1, t2, t3)
  • Add new after t1 (t3, t4, t5)
  • Optimize and publish (graph contains t1, t2, t3, t4, t5)
  • Compute marginal for t1
  • Apply marginal for t1 (Graph contains t2, t3, t4, t5)

Proposed sequence, iteration 3:

  • expiration = t3 (Graph contains t2, t3, t4, t5)
  • Add new after t3 (t6, t7, t8)
  • Optimize and publish (graph contains t2, t3, t4, t5, t6, t7, t8)
  • Compute marginal for t2, t3
  • Apply marginal for t1, t2, t3 (Graph contains t4, t5, t6, t7, t8)

Assuming I did the above correctly, the proposed sequence ends each iteration with the same graph state, but results in more variables than needed during each optimization call. Unless there is a compelling reason to move that computation, I feel like the current sequence is correct.

But I agree there is bug relating to the reset function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming I did the above correctly, the proposed sequence ends each iteration with the same graph state, but results in more variables than needed during each optimization call. Unless there is a compelling reason to move that computation, I feel like the current sequence is correct.

Thanks for the detailed explanation. LGTM.

But I agree there is bug relating to the reset function.

Actually, it looks like even if the lag_expiration is not reset, the logic should still work, since the transactions' timestamps would still be newer than it. At least, in my tests with #176 it looks like everything was already fine and no change was required in this regard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something bad could probably still happen if you reset a bagfile or something and time moves backwards. I'll put in a fix regardless; it should be reset for consistency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. 👍

@svwilliams
Copy link
Contributor Author

[INFO]: Received a set_pose request (stamp: 173.570000000, x: 1.4, y: 0.525, yaw: -0.0350735)
[WARN ]: Unable to locate a state in this history with stamp <= 173.540000000. Variables will all be initialized to 0.

I suspect what is going on is the ignition transaction is added with stamp 173.57. That is now the current stamp, making the lag_expiration stamp = 173.57 - lag_duration, which let's say is 172.57. The older involved stamp of the relative pose constraint is 173.54, which is before the start_time but after the lag_expiration. The lag_expiration should not be allowed to be before the start_time. Let me go try to fix that.

@efernandez
Copy link
Collaborator

The lag_expiration should not be allowed to be before the start_time. Let me go try to fix that.

That makes sense, and it should get rid of some corner cases.

However, it won't cover the cases where:

  • Ignition transaction was stamp 10.0
  • A transaction with a differential constraint has stamp 11.0 with involved stamps: 9.0 and 11.0

In this case, with the current implementation, the transaction with the differential constraint is still processed after the ignition transaction, because its stamp (11.0) is greater than the ignition transaction stamp (10.0). But the first/older involved stamp (9.0) would print a WARN message like this:

[WARN ]: Unable to locate a state in this history with stamp <= 10.0. Variables will all be initialized to 0.

I think that transaction should have been purged.

Just saying, I think you're already aware of this other corner case, that it's handled in #176

Copy link
Collaborator

@efernandez efernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the last changes.

@@ -338,7 +342,20 @@ void FixedLagSmoother::processQueue(fuse_core::Transaction& transaction)
while (transaction_riter != pending_transactions_.rend())
{
auto& element = *transaction_riter;
if (std::find(sensor_blacklist.begin(), sensor_blacklist.end(), element.sensor_name) != sensor_blacklist.end())
auto min_stamp = element.stamp();
if (!element.transaction->involvedStamps().empty() && *element.transaction->involvedStamps().begin() < min_stamp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be good for now, for this PR. I think this can be merged as is and we can continue the discussion and fixed the remaining issues in #176.

@svwilliams
Copy link
Contributor Author

However, it won't cover the cases where:

Ignition transaction was stamp 10.0
A transaction with a differential constraint has stamp 11.0 with involved stamps: 9.0 and 11.0

With this PR, that differential constraint should get rejected inside the processQueue() method here:

if (min_stamp < lag_expiration)

When we iterate over the pending the transactions, the min transaction time is checked against the lag_expiration time. So, once I get the lag_expiration time to respect the start time (testing that change now), I believe the case you describe should be handled. At least, it was my intent to handle the case you describe.

Perhaps it is less than ideal to have transaction filtering in two different places (some in the transaction callback, and some in the processQueue method). Given the system's asynchronous nature, the processQueue method is the definitive place to filter message. Perhaps all of the filtering should be moved there. And the transaction callback (a) watches for ignition transactions, and (b) keeps the queue size reasonable while waiting for an ignition transaction. I'll take a look at that...but in a different PR.

@efernandez
Copy link
Collaborator

When we iterate over the pending the transactions, the min transaction time is checked against the lag_expiration time. So, once I get the lag_expiration time to respect the start time (testing that change now), I believe the case you describe should be handled. At least, it was my intent to handle the case you describe.

That makes sense. If that's the case, it'd simplify most of the code I did in #176, so probably most of it isn't needed. 😄

@efernandez
Copy link
Collaborator

@svwilliams I wonder if you could made any progress with this recently. I'm not sure what's the current status or whether I can help with this somehow or not. 😃

Indeed, I'm not sure if #176 is useful as is right now. Maybe I can split some functionality from it, or test this PR again. 🤔

@svwilliams
Copy link
Contributor Author

Yeah. Got side-tracked for a bit. I'll be testing this today and should get it merged in. And I think I'll steal the Transaction::minStamp() code from the other PR.

@efernandez
Copy link
Collaborator

I'll be testing this today and should get it merged in. And I think I'll steal the Transaction::minStamp() code from the other PR.

Sounds good. No problem, take whatever you need. 👍

@svwilliams svwilliams force-pushed the filter-transactions-on-lag-boundary branch from 519a183 to a86dd29 Compare April 24, 2020 15:18
Copy link
Collaborator

@efernandez efernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to tell if the usage of min and max stamp is doing the right thing, but it LGTM.

I can try to give this branch a try locally if it's ready for that. 🤔

(current_time - element.transaction->stamp()) << " seconds, which is greater " <<
"than the 'transaction_timeout' value of " << params_.transaction_timeout <<
". Ignoring this transaction.");
ROS_ERROR_STREAM("The queued transaction with timestamp " << element.stamp() << "and maximum "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A white space is missed before and:

Suggested change
ROS_ERROR_STREAM("The queued transaction with timestamp " << element.stamp() << "and maximum "
ROS_ERROR_STREAM("The queued transaction with timestamp " << element.stamp() << " and maximum "

@efernandez
Copy link
Collaborator

efernandez commented Apr 24, 2020

I just did a quick test using the branch from this PR and all looks fine. I also don't see any of the WARN or ERROR messages I mentioned in #176, so IMO this PR is good to merged, and #176 is good to close.

From all the minor commits in #176 I think you bring here all except this one: f6e71a3, do you mind including it as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants