Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add guard check against infinite loop in detour. #373

Closed
wants to merge 1 commit into from

Conversation

jackpoz
Copy link
Contributor

@jackpoz jackpoz commented Feb 8, 2019

Add a guard check against infinite loop occurring in detour, happening when there is a circular reference in m_nodePool, i.e. A -> B -> C -> A.

While this might be caused by another issue in the code, it's better to just fail the path instead of entering an infinite loop.

As every node has only 1 reference to another node, the max number of iteration we can do before we are sure we are in an infinite loop is m_nodePool->getMaxNodes() . This also ensures that we don't accidentally exit too early for huge m_nodePool sizes.

Tested this code in a project where we have been experiencing this issue after upgrading from 2c85309 to 14b2631

Ref #343

@jakobbotsch
Copy link
Member

This is a band-aid that does not fix an underlying problem (if there is one). In fact I think it makes things worse since now we are potentially hiding issues that may be there.
Instead I would like to identify what the underlying issue that people are running into and then hopefully improve on that instead. Here are some common pitfalls that you should check:

  1. Bad threading
  2. Bad dtQueryFilter costs -- costs should not be less than 1

I have used R&D in projects where we did thousands of queries every second without running into these issues so I think there are some complex interactions necessary to run into this isssue. If we could get some kind of way to reproduce this issue it would of course make it much better.

@memononen
Copy link
Member

These type of infinite loops in the past have always been about bad weights.

I wonder if it would make sense to add dtMin(1.0f, m_query.filter->getCost(...)) on all uses of getCost()? Or maybe it's even a case for an assert? I suspect it will not affect performance, but needs testing of course.

Maybe known pitfals like these could have upper bound, like in the PR, but you would assert (with comment) instead of breaking gracefully. That way you catch problems during dev and there would be instructions how to fix it.

@jakobbotsch
Copy link
Member

Asserts would be good. I don't think dtMax would be a good idea, since that just hides the problem from the user again.
dtQueryFilter could also scale its costs automatically if the user adds costs that are less than 1, but this adds runtime overhead.

@jackpoz
Copy link
Contributor Author

jackpoz commented Feb 9, 2019

We don't set the costs so they are set to the default 1.0f in dtQueryFilter().
This doesn't mean that getCost() won't return a value < 1.0f as it's defined as https://github.com/recastnavigation/recastnavigation/blob/master/Detour/Source/DetourNavMeshQuery.cpp#L99 so for very close pa and pb it becomes very small (is that what you meant ?)

@jakobbotsch
Copy link
Member

No, the important part is that the heuristic remains admissible (i.e. it should not overestimate the cost to the target). The heuristic is simply distance * H_SCALE:

heuristic = dtVdist(neighbourNode->pos, endPos)*H_SCALE;

If the query filter costs are less than H_SCALE the cost of the shortest path can be smaller than the heuristic, but both calculations use the distance.

Since you do not modify the costs it would be very helpful if you could provide a way to reproduce this issue in the demo (or with a small example). That will help a lot in getting to the bottom of this issue.

@jackpoz
Copy link
Contributor Author

jackpoz commented Feb 9, 2019

Found some strange values while debugging a full memory dump when the issue happened:
image

This suggests dtNavMeshQuery::closestPointOnPoly() somehow put a nan in the Y .
As we upgraded from 2c85309 to 14b2631 (full list of changed code at TrinityCore/TrinityCore@5ff88ea#diff-e9e8feb77bdeca9109535294c4c1f8a3 ), I would say the commit that changed the behavior here is 7ccb72b

At the moment the how to reproduce steps are "summon a pet (that follows you) and run around like crazy in a particular area" so it's a bit tricky to repro that in RecastDemo. I can although apply any patch or get any info from a VS debug session.

@jakobbotsch
Copy link
Member

If you could obtain the vertices of that polygon and the end point then that should give all the info necessary to reproduce that.
Essentially the contents of verts and pos at this point:

@jackpoz
Copy link
Contributor Author

jackpoz commented Feb 9, 2019

I added a few isfinite() checks in that function, here's the locals at

closest[0] : 642.133301
closest[1] : 609.980835
closest[2] : 5823.99951

v : 0x000000cfed8fcd68 {0x000002b931c974a4 {642.133301}, 0x000002b931c97498 {641.066589}, 0x000002b931c9751c {...}}
[0x00000000]: 0x000002b931c974a4 {642.133301}
[0x00000001]: 0x000002b931c97498 {641.066589}
[0x00000002]: 0x000002b931c9751c {641.866638}

h : -nan(ind)

Running again dtClosestHeightPointTriangle() dtClosestHeightPointTriangle with the input values, this is what we get:

v0 : {-0.266662598, 0.000000000, 0.000000000}
v1 : {-1.06671143, 0.000000000, 0.000000000}
v2 : 0x000000cfed8fcb68 {0.000000000, 0.000000000, 0.000000000}

denom : 0.000000000
u : 0.000000000
v : -0.000000000

epsilon : -0.000000000

Condition

if (u >= epsilon && v >= epsilon && (u+v) <= denom - epsilon) {
is true as 0.0 > -0.0 is true so at
h = a[1] + (v0[1]*u + v1[1]*v) / denom;
there is

h = 609.980835 + (0 * 0 + 0 * -0) / 0

the "0" part gives -nan(ind) and 609 + nan is still nan

@jackpoz
Copy link
Contributor Author

jackpoz commented Feb 9, 2019

I ran a test with the same input locally, the old code

h = a[1] + v0[1]*u + v1[1]*v;
just set h to a[1] (609.980835) as everything else was 0

Here's a small snippet with how to reproduce steps (dtClosestHeightPointTriangle is from latest commit)

#include "DetourCommon.h"

bool dtClosestHeightPointTriangle(const float* p, const float* a, const float* b, const float* c, float& h)
{
	float v0[3], v1[3], v2[3];

	dtVsub(v0, c, a);
	dtVsub(v1, b, a);
	dtVsub(v2, p, a);

	// Compute scaled barycentric coordinates
	float denom = v0[0] * v1[2] - v0[2] * v1[0];
	float u = v1[2] * v2[0] - v1[0] * v2[2];
	float v = v0[0] * v2[2] - v0[2] * v2[0];

	if (denom < 0) {
		denom = -denom;
		u = -u;
		v = -v;
	}

	// The (sloppy) epsilon is needed to allow to get height of points which
	// are interpolated along the edges of the triangles.
	float epsilon = -1e-4f * denom;

	// If point lies inside the triangle, return interpolated ycoord.
	if (u >= epsilon && v >= epsilon && (u + v) <= denom - epsilon) {
		h = a[1] + (v0[1] * u + v1[1] * v) / denom;
		return true;
	}
	return false;
}

int main(int /*argc*/, char** /*argv*/)
{
	float p[3] = { 642.133301, 609.980835, 5823.99951 };
	float a[3] = { 642.133301, 609.980835, 5823.99951 };
	float b[3] = { 641.066589, 609.980835, 5823.99951 };
	float c[3] = { 641.866638, 609.980835, 5823.99951 };
	float h;
	dtClosestHeightPointTriangle(p, a, b, c, h);
	return 0;
}

@jakobbotsch
Copy link
Member

float a[3] = { 642.133301, 609.980835, 5823.99951 };
float b[3] = { 641.066589, 609.980835, 5823.99951 };
float c[3] = { 641.866638, 609.980835, 5823.99951 };

This is a degenerate triangle so the underlying problem might be that Recast is generating bad data. Unfortunately this requires going even further back to see what data was passed to Recast. As a first step it would be nice with the vertices of the polygon itself (that produces this bad triangulation), i.e. the verts iterated over in the loop in closestPointOnPoly.

The reason it "worked" before is because floating point inaccuracies caused invDenom to become a very large finite value (instead of infinity). Then u = 0 * invDenom is 0 instead of nan if invDenom was infinity.

I think the most practical fix is to check if denom is close to 0 and then just linearly interpolate the height along the line segment. @memononen what do you think? And is Detour supposed to be robust to such intricacies in the data (collinear segments in the polys)?

@jackpoz
Copy link
Contributor Author

jackpoz commented Feb 10, 2019

I would suggest to start reverting 7ccb72b since what was supposed to be an optimization is actually not handling a float division by zero case.
In addition to that I would add a dtAssert(std::isfinite(h)) call in dtClosestHeightPointTriangle() to ensure the issue gets spotted sooner without having to go through to what I had to go through.

After that we can investigate more into recast.

@jakobbotsch
Copy link
Member

I would suggest to start reverting 7ccb72b since what was supposed to be an optimization is actually not handling a float division by zero case.

Reverting this is fixing your problem but reintroducing the problem fixed in #364 (caused by floating point inaccuracy). This is not an optimization for speed -- it is an optimization for accuracy.
The previous version is not really more "correct" in this case, in that does not handle the 0 case either. You just have a degenerate triangle where the floating point inaccuracy makes it seem like it works. If you translate the triangle closer to 0 I expect you will get different results (even though it is the same triangle) on the old version.

Definitely agree on the asserts. Also, dtNavMeshQuery::findPath should validate that the input is a well-formed point, which would immediately have failed your path finds, even without assertions enabled. I will add these.

jakobbotsch added a commit to jakobbotsch/recastnavigation that referenced this pull request Feb 10, 2019
Validate input values, including that points are finite. This would have
saved everyone some time in recastnavigation#343/recastnavigation#373.
jakobbotsch added a commit to jakobbotsch/recastnavigation that referenced this pull request Feb 10, 2019
Validate input values, including that points are finite. This would have
saved everyone some time in recastnavigation#343/recastnavigation#373.
@jackpoz
Copy link
Contributor Author

jackpoz commented Feb 10, 2019

@jakobbotsch something strange happened indeed in Recast, this is how the result look
image

I will investigate further to check how that happened and possibly set up a test case. It also makes it quite easy to spot which commit changed the behavior, just need to try a few recast commits.

Closing this PR as you are going to add anyway the checks :)
Should we continue the discussion at #343 or somewhere else ?

@jakobbotsch
Copy link
Member

Indeed I would expect the problem to be in Recast since Detour ends up with the degenerate triangle. But one question is whether Detour should be more robust to these scenarios.

We can continue in #343 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants