New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add guard check against infinite loop in detour. #373
Conversation
This is a band-aid that does not fix an underlying problem (if there is one). In fact I think it makes things worse since now we are potentially hiding issues that may be there.
I have used R&D in projects where we did thousands of queries every second without running into these issues so I think there are some complex interactions necessary to run into this isssue. If we could get some kind of way to reproduce this issue it would of course make it much better. |
These type of infinite loops in the past have always been about bad weights. I wonder if it would make sense to add Maybe known pitfals like these could have upper bound, like in the PR, but you would assert (with comment) instead of breaking gracefully. That way you catch problems during dev and there would be instructions how to fix it. |
Asserts would be good. I don't think |
We don't set the costs so they are set to the default 1.0f in dtQueryFilter(). |
No, the important part is that the heuristic remains admissible (i.e. it should not overestimate the cost to the target). The heuristic is simply distance * H_SCALE:
If the query filter costs are less than Since you do not modify the costs it would be very helpful if you could provide a way to reproduce this issue in the demo (or with a small example). That will help a lot in getting to the bottom of this issue. |
Found some strange values while debugging a full memory dump when the issue happened: This suggests dtNavMeshQuery::closestPointOnPoly() somehow put a nan in the Y . At the moment the how to reproduce steps are "summon a pet (that follows you) and run around like crazy in a particular area" so it's a bit tricky to repro that in RecastDemo. I can although apply any patch or get any info from a VS debug session. |
If you could obtain the vertices of that polygon and the end point then that should give all the info necessary to reproduce that.
|
I added a few isfinite() checks in that function, here's the locals at
Running again dtClosestHeightPointTriangle() dtClosestHeightPointTriangle with the input values, this is what we get:
Condition
the "0" part gives -nan(ind) and 609 + nan is still nan |
I ran a test with the same input locally, the old code
Here's a small snippet with how to reproduce steps (dtClosestHeightPointTriangle is from latest commit) #include "DetourCommon.h"
bool dtClosestHeightPointTriangle(const float* p, const float* a, const float* b, const float* c, float& h)
{
float v0[3], v1[3], v2[3];
dtVsub(v0, c, a);
dtVsub(v1, b, a);
dtVsub(v2, p, a);
// Compute scaled barycentric coordinates
float denom = v0[0] * v1[2] - v0[2] * v1[0];
float u = v1[2] * v2[0] - v1[0] * v2[2];
float v = v0[0] * v2[2] - v0[2] * v2[0];
if (denom < 0) {
denom = -denom;
u = -u;
v = -v;
}
// The (sloppy) epsilon is needed to allow to get height of points which
// are interpolated along the edges of the triangles.
float epsilon = -1e-4f * denom;
// If point lies inside the triangle, return interpolated ycoord.
if (u >= epsilon && v >= epsilon && (u + v) <= denom - epsilon) {
h = a[1] + (v0[1] * u + v1[1] * v) / denom;
return true;
}
return false;
}
int main(int /*argc*/, char** /*argv*/)
{
float p[3] = { 642.133301, 609.980835, 5823.99951 };
float a[3] = { 642.133301, 609.980835, 5823.99951 };
float b[3] = { 641.066589, 609.980835, 5823.99951 };
float c[3] = { 641.866638, 609.980835, 5823.99951 };
float h;
dtClosestHeightPointTriangle(p, a, b, c, h);
return 0;
} |
This is a degenerate triangle so the underlying problem might be that Recast is generating bad data. Unfortunately this requires going even further back to see what data was passed to Recast. As a first step it would be nice with the vertices of the polygon itself (that produces this bad triangulation), i.e. the verts iterated over in the loop in The reason it "worked" before is because floating point inaccuracies caused I think the most practical fix is to check if |
I would suggest to start reverting 7ccb72b since what was supposed to be an optimization is actually not handling a float division by zero case. After that we can investigate more into recast. |
Reverting this is fixing your problem but reintroducing the problem fixed in #364 (caused by floating point inaccuracy). This is not an optimization for speed -- it is an optimization for accuracy. Definitely agree on the asserts. Also, |
Validate input values, including that points are finite. This would have saved everyone some time in recastnavigation#343/recastnavigation#373.
Validate input values, including that points are finite. This would have saved everyone some time in recastnavigation#343/recastnavigation#373.
@jakobbotsch something strange happened indeed in Recast, this is how the result look I will investigate further to check how that happened and possibly set up a test case. It also makes it quite easy to spot which commit changed the behavior, just need to try a few recast commits. Closing this PR as you are going to add anyway the checks :) |
Indeed I would expect the problem to be in Recast since Detour ends up with the degenerate triangle. But one question is whether Detour should be more robust to these scenarios. We can continue in #343 👍 |
Add a guard check against infinite loop occurring in detour, happening when there is a circular reference in m_nodePool, i.e. A -> B -> C -> A.
While this might be caused by another issue in the code, it's better to just fail the path instead of entering an infinite loop.
As every node has only 1 reference to another node, the max number of iteration we can do before we are sure we are in an infinite loop is m_nodePool->getMaxNodes() . This also ensures that we don't accidentally exit too early for huge m_nodePool sizes.
Tested this code in a project where we have been experiencing this issue after upgrading from 2c85309 to 14b2631
Ref #343