Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Navigation map synchronization error" caused internally (by recast?) #85548

Closed
tcoxon opened this issue Nov 30, 2023 · 17 comments · Fixed by #87959
Closed

"Navigation map synchronization error" caused internally (by recast?) #85548

tcoxon opened this issue Nov 30, 2023 · 17 comments · Fixed by #87959

Comments

@tcoxon
Copy link
Contributor

tcoxon commented Nov 30, 2023

Godot version

4.1.3.stable.official

System information

Godot v4.1.3.stable - Arch Linux #1 SMP PREEMPT_DYNAMIC Tue, 10 Oct 2023 21:10:21 +0000 - X11 - Vulkan (Forward+) - dedicated NVIDIA GeForce GTX 970 (nvidia; 535.113.01) - AMD Ryzen 5 2600 Six-Core Processor (12 Threads)

Issue description

For context, I'm working on navmesh support for the Terrain3D plugin. To support navmeshes, the plugin generates 'source geometry data' for navigable portions of the terrain, and uses that as input to the engine's bake_from_source_geometry_data function.

I very frequently run into this error:

E 0:00:00:0665   sync: Navigation map synchronization error. Attempted to merge a navigation mesh polygon edge with another already-merged edge. This is usually caused by crossing edges, overlapping polygons, or a mismatch of the NavigationMesh / NavigationPolygon baked 'cell_size' and navigation map 'cell_size'.
  <C++ Source>   modules/navigation/nav_map.cpp:856 @ sync()

It's strange because I have verified that:

  • cell_size and cell_height on the navmesh match the navigation map values.
  • The mesh contains no crossing edges or overlapping polygons.
  • The vertices on the mesh are not especially dense. They're all spaced out wider than cell_size/cell_height.

I've created a minimal reproducer that reproduces this error. The reproducer creates a parabolic U-shaped mesh and bakes a navmesh from it:
image

I suspect that the issue is all these long-thin polygons from the edge of the mesh to the middle:
image

However, if I turn on wireframe, you can see that those long-thin polygons are not part of the source mesh, which is very regular:
image

Since these long polygons originate in the engine (or the recast library?) I think that this is a bug. If these long/dense edges are an error, then the engine shouldn't be creating them.

Steps to reproduce

  1. Download and open the attached project.
  2. Open the nav_bake_issue.tscn scene. A tool script will run on _ready to generate the mesh and bake it.
  3. Run the scene, and observe the logged error.

Minimal reproduction project

nav_bake_issue.zip

@smix8
Copy link
Contributor

smix8 commented Nov 30, 2023

The errors comes from the navigation map edge rasterization that is done to merge the touching navigation mesh edges together.

This project geometry error should be fixed and not ignored because it will cause between subtle-barely-noticeable to severe-and-everything-is-broken pathfinding bugs when the wrong navigation mesh edges are merged together and the excess edges are unmerged and rejected.

The long-thin edges are not the problem but the mesh has a few faces with very small edges that all fall into the same rasterization cell of the navigation map triggering this error. This is also confirmed by slightly increasing the scale of the mesh to e.g. x1.1 fixing the error immediately because that makes the small edges large enough to not occupy the same cells.

Note that while having a matching cell_size and cell_height on both the navigation map and navigation mesh is important to avoid the most common errors, hence the warnings, it is never a guarantee that these kind of errors can never occur.

While both navigation map and ReCast use a cell size for rasterization ReCast uses is to rasterize the triangles of the mesh while the navigation map uses it to rasterize the edge keys to merge polygons with touching edges. It is just that if the cell sizes mismatch from the start errors for users are so common that we added the warning, it does not check more than comparing the two properties.

The more hard climb and strong slope the source geometry has the more small regions and polygons ReCast is forced to create. The more small edges the more chances that too many of them fall into the same rasterization cell under certain transforms or when combined with other meshes. We also allow user created or imported navigation meshes not created by ReCast so there are even more options for users to play with a pandora's box and trigger these kind of errors.

The navigation mesh is not per se invalid, it is invalid on that specific navigation map in that specific context, like placement of the edge, transform, scale and what not. Depending on source geometry and bake settings ReCast will happily return a navigation mesh with small edges that could result in more of those errors in some context of a navigation map.

I think what we could do to mitigate this is to use internally a smaller subdiv of the cell size just for the rasterization. For ReCast encouraging users to use a small cell size would be a voxel amount performance kill but for the navigation map, at least at the moment, we can get away with a smaller cell size for just the rasterization. We do not use the cell size for anything else, e.g. like an octree where a very small cell size would be a performance death again.

@tcoxon
Copy link
Contributor Author

tcoxon commented Nov 30, 2023

Is there anything a user can do to avoid this issue?

This project geometry error should be fixed and not ignored because it will cause between subtle-barely-noticeable to severe-and-everything-is-broken pathfinding bugs when the wrong navigation mesh edges are merged together and the excess edges are unmerged and rejected.

Would that cause this pathfinding issue @TokisanGames noted on my PR, where agents take indirect routes to the target?

navRecording.2023-12-01.173953.mp4

@smix8
Copy link
Contributor

smix8 commented Dec 1, 2023

Would that cause this pathfinding issue @TokisanGames noted on my PR, where agents take indirect routes to the target?

Yes absolutely. If the wrong navigation mesh polygon edges are merged together or not merged at all the path search will travel nonsensical looking polygon corridors. Technically the path search does everything correct because that is how the pathfinding graph connections are build, it just makes no visual sense for the user that sees the result. The post processing that funnels the path lines on the corridor corners will amplify this stupid path because it makes the path snap to the corners of the corridor polygons.

This is how the path search has to travel and builds the polygon corridor in ordered numbers. Also when you zoom-in in the video you can already spot the duplicated edge line that is too close that likely causes the issue.
path_bug

Is there anything a user can do to avoid this issue?

Making sure that the 2 vertex of an edge never fall into the same cell on the navigation map, and also making sure that never more than 2 edges of the same length occupy the same two grid cells on the navigation map with their 2 vertices (edge key). You can try and tinker with certain NavigationMesh bake properties like max edge length to get different results but that may or may not work for a specific geometry.

As mentioned what we can do on the development side is that we by default use a far smaller cell size just for the edge rasterization. E.g. if a user sets 0.25 as cell size we internally multiply by 0.5 and turn it into 0.125 or even less as that can help avoid those smaller rasterization problems that are not immediate breaking bugs, just suboptimal meshlayouts.

@tcoxon
Copy link
Contributor Author

tcoxon commented Dec 1, 2023

Making sure that the 2 vertex of an edge never fall into the same cell on the navigation map, and also making sure that never more than 2 edges of the same length occupy the same two grid cells on the navigation map with their 2 vertices (edge key). You can try and tinker with certain NavigationMesh bake properties like max edge length to get different results but that may or may not work for a specific geometry.

Sorry, I think I wasn't very clear in my question - I wasn't asking about the navigation map, as much as nav mesh baking. Is there anything that a user of NavigationMeshGenerator.bake_from_source_geometry_data can do to keep the output from containing these tiny edges? The tiny edges exist in the output navmesh from that function, but not in the source geometry data input, so I'm struggling to see what I can do to prevent this from occurring.

@smix8
Copy link
Contributor

smix8 commented Dec 1, 2023

You can increase the sample distance and also tinker with the sample max error and max edge length in the bake settings.

The way to really tackle this in a project is with the source geometry.

In general for baking you want to reduce the high frequency in the source geometry to really avoid those issues. When you throw mini voxel-based polygons at ReCast as source geometry you will get many of those mini edges along the rasterization edges.

This is a top view of your original demo, at the bottom with the default, and at the top with a static collision plane.

navmesh_jagged

The collision plane acts as a "cut-off" plane that basically beheads all the individual voxels along the plane. Notice how ReCast immediately turns the entire jagged problematic side into optimized 1-2 edges.

navmesh_jagged_opt

This is the same with those kind of plane blockers on both sides and a larger sample distance and error max. Notice how everything is instantly more optimized and simplified. Yes the navmesh will no longer hug very close to the source geometry but it is still "good enough" and far more performant and less error prone for the pathfinding.

In case for Terrain3D I dont expect to add collision blockers like that, but what could be done is to merge adjacent voxels where feasible before adding the triangle faces to the source geometry to reduce the frequency to avoid a lot of the jaggedness and mini polygons.

@tcoxon
Copy link
Contributor Author

tcoxon commented Dec 3, 2023

If the issue is that there are vertices in the navmesh closer than the navigation map's resolution allows (did I understand correctly?) then would another potential solution be to post-process the navmesh and merge vertices that are closer than cell_size/cell_height? Like by rounding every vertex to the nearest multiple of cell_size/cell_height?

@smix8
Copy link
Contributor

smix8 commented Dec 4, 2023

Sure, at its core it is a rasterization and resolution issue.

The two cells on the navigation map are already occupied by two merged edges and the third edge that tries to go into the same edge key space is rejected because it would make no logical sense for the navigation mesh to have a third edge at that place.

@tcoxon
Copy link
Contributor Author

tcoxon commented Dec 4, 2023

I tried it out. Merging nearby vertices and deleting any polygons that are left with 2 or fewer edges solves most instances. It fixes the long strips at the edge of the mesh. However I still encounter the error in other cases. It seems like baking really can, sometimes, produce edges that are shared by more than 2 polygons:

image

I've modified the reproducer script to print out the coordinates of the edges that are shared by more than 2 polygons. It also generates colored meshes for those polygons so they can be viewed visually. Here are 5 polygons that share edges: red, yellow, green, cyan, blue. The print out makes it clear that the yellow polygon shares exact point coordinates with green and cyan, which it z-fights with:

Edge shared by more than 2 polygons:
	Polygon 1533 Edge: (94.01085, 0.5, 97.00508) -> (98.51146, 0.5, 101.5044)
	Polygon 1543 Edge: (98.51146, 0.5, 101.5044) -> (94.01085, 0.5, 97.00508)
	Polygon 1544 Edge: (98.51146, 0.5, 101.5044) -> (94.01085, 0.5, 97.00508)
Edge shared by more than 2 polygons:
	Polygon 1538 Edge: (94.01085, 0.5, 97.00508) -> (95.51038, 0.5, 92.51559)
	Polygon 1540 Edge: (95.51038, 0.5, 92.51559) -> (94.01085, 0.5, 97.00508)
	Polygon 1543 Edge: (94.01085, 0.5, 97.00508) -> (95.51038, 0.5, 92.51559)

This is with vertex-merging disabled, so it's definitely not an artifact of rounding vertex to the nearest cell_size/height. The edges of these polygons are longer than cell_size/height, and they're not near the edges of the mesh.

The updated reproducer project:
nav_bake_issue.zip

@slumberface
Copy link

slumberface commented Dec 30, 2023

Contributing a little anecdotal experience:

I'm providing a terrain mesh here that I've been experimenting with. The distribution of vertices is a perfectly regular grid with plateaus. The goal with this mesh is to have a 3-unit-height plateau slope be walk-able, while a 6-unit-height plateau wall acts as a cliff as seen here:
image
Nav Mesh Gizmo:
image
Mesh data:
image

The mesh pictured has a grid size of 3 units. The closest each flat square represents a 3x3. No vertices are ever closer than 3 units.

I have tested with both Mesh and Static Collider building options, but for the sake of this example I'm using a static mesh collider for the nav mesh building.

Here are the mesh parameters that build without the synchronization error:
image

With this mesh I have discovered that the only way to build without a synchronization error is to increase the sample max error to the total height between the lowest and highest plateaus. In the example photos above, an AABB is used to only map from height 0 up, which makes the height difference 12 units, and requires a Sample Max Error of 12 to build without synchronization errors. If the AABB is not used, and the full height of the mesh is mapped, a Sample Max Error of 18 is required (highest plateau 12 units, lowest at -6 units). If this is a hard and fast rule, we could probably improve the API to explain why.

But is this operating correctly? A sample max error of the ENTIRE height of the map seems a bit odd, and doesn't seem correct compared to the default of 1 unit of sample max error.

Mesh File GLTF
terrain_Map_collision_debug.zip

EDIT FURTHER FINDINGS:
I think I'm observing that if I increase the cell size/height from 0.25 to 0.5 I can cut the the Sample Max Error in half to 9. And if I increase the cell size/height to 1, and I can again cut the sample max error in half to 4.5. So it seems like there is correct number for this value that could probably be the result of a formula? I myself am not sure why the cell size/height of 0.25 appears to have 1:1 ratio with the height magnitude, and the cell size of 1 has a 1:4 ratio..

TIPS TO WEARY TRAVELERS:
If you find yourself here trying to build a similar nav mesh with discrete height levels, I found more success by setting my height value to the smallest height step used (3 units in my case for walkable slopes), AND THEN (IMPORTANT), you might need to shift your nav mesh's offset in the "Baking AABB offset" an arbitrary amount to get a clean bake. Again, in my example, when using a cell size of 0.5 and a height of 3, arbitrarily offsetting the Y value of the bake by 2.5 units all of a sudden gave clarity to my nav mesh slopes. As Smix8 has mentioned on many occasions, at the end of the day the nav mesh is a rasterization of points in a space, and sometimes your bake will line up in an orderly way and sometimes it will not.

Maybe something to this affect could be useful in the documentation. If a nav mesh build experiences this kind of synchronization error, is a possible solution arbitrarily shifting the offset and attempting again? This kind of error is most daunting in some scenario where your levels are procedurally generated, but maybe setting up your baking script so that it attempts different offsets if it experiences a synchronization error, and then holding onto the offset value that DID work for nav mesh rebuilds afterwards (initially attempting the rebake with that value, and then again doing arbitrary offsets if it no longer works)

@InfernalWAVE
Copy link

I am also encountering this issue using rigidbodies to update the navigation mesh. sometimes the boxes just happen to pile up or land in such a way that preventing edges from being too close is just impossible without snapping the landing location of the rigidbody to a grid, which sort of ruins the physics. it would be really nice if there was some kind of internal fallback like slumberface mentioned for when these sync errors occur.

@Scony
Copy link
Contributor

Scony commented Feb 4, 2024

I've drafted a fix for this issue, but I'd like to test with any of the demos mentioned in this issue. Unfortunately none of those works in Godot 4.3.

@tcoxon @slumberface or @InfernalWAVE could any of you upload an updated version of the demo project that reproduces the issue?

@Scony
Copy link
Contributor

Scony commented Feb 4, 2024

Btw. This issue was discussed in the past: #56786

@akien-mga akien-mga added this to the 4.3 milestone Feb 12, 2024
@Scony
Copy link
Contributor

Scony commented Feb 12, 2024

Just in case someone pops into this issue again. If you're certain that:

  • cell_size and cell_height on the navmesh match the navigation map values.
  • The mesh contains no crossing edges or overlapping polygons.
  • The vertices on the mesh are not especially dense. They're all spaced out wider than cell_size/cell_height.

then change navigation/3d/merge_rasterizer_cell_scale to 0.001 - that will fix the problem. See #87959 for details.

@Zylann
Copy link
Contributor

Zylann commented Mar 27, 2024

I'm getting a similar error for a mesh baked from code. cell_size and cell_height match, and I think the mesh does not contain overlapping polygons. It is a mesh produced by the Transvoxel algorithm (similar to marching cubes, with voxel size 1), procedurally at runtime, and it can be modified by players at will, so there are overhangs and I can't really "decide" for it to look a certain way or another. Triangles are evenly spaced, though they can inherently be quite thin and some vertices can get close, as per the nature of marching cubes, so I dont know how much of a problem the third point is. Although the way Transvoxel goes, while vertices can get close, there can't be more than a certain amount of them in the same area. All navigation settings are left to defaults.
image
image

I dumped my mesh data to test into a minimal project, and I get the same error both from code and from manual setup (only the first time though, whichever is opened first?), and for some reason only the editor one shows preview polygons.
VoxelNavmesh.zip
I feel like this should just work without so much fuss, and I really need something that can run efficiently and continuously at runtime without randomly erroring (so I'm trying to avoid as much heavy processing steps as I can), at the moment I'm not sure what to do.
I'm on Godot 4.2 so haven't tried #87959 yet

@Scony
Copy link
Contributor

Scony commented Mar 27, 2024

@Zylann for 4.2 you can only play a bit with what @smix8 mentioned here: #85548
For 4.3 go for #87959

@MGilleronFJ
Copy link

MGilleronFJ commented Mar 27, 2024

I tried increasing cell size to 0.5:
Each "quad" in my mesh has length 1 so I thought 0.25 is probably way too small regarding performance, especially when baking a relatively large terrain at runtime, and 0.5 is still pretty small;
The error no longer occurs, however the navmesh is all wrong, only produces sparse disjoint pieces:
image
None of the polygons connect here even though areas should clearly be walkable
image
Something similar occurs if I try with an almost-flat mesh:
image
Higher sizes are worse.

Looks like increasing Max Climb to 1 fixes it. 0.25 seemed quite small, considering 1 would be the height of a Minecraft block?
Unfortunately the sync error is back again :(
image

Further increased cell size and height to 1, to no avail, the error keeps happening.
image

Also everytime I test I have to restart the editor because sometimes the error is only logged the first time?

Tried to increase Sample Max Error to 2, and it seems that for this mesh, the error no longer happens. Not sure I understand what this property affects though. Will have to try on a larger scale.

I'd like to emphasize that terrain (and props that might go on it) can't be tweaked to "perfection" in the editor, it's runtime-generated content, which can range from flat ground to pretty wild player-edited terrain. So far the system feels way too sensitive to "errors" and tedious to adjust in my situation (I'll have to keep tinkering each time that error shows up in some random situation, so I'm not very confident regarding issues players might encounter), so I'm wondering how viable it will be as a result. Or maybe I'm missing some other options?

@Zylann
Copy link
Contributor

Zylann commented Mar 28, 2024

I tested on a larger scale (view distance 128):
image

The error happened, but I still don't know where and how much.
So I modified nav_map.cpp in such a way it can report the position of the first point of the offending edges. Then I spawn a red cube on every position.

It turns out, on that whole scene, it only occurred 5 times, and always at these two points near the border:
image
Wireframe view:
image

There is indeed a navigation polygon that looks a bit dodgy, however I don't know how I should avoid this.

I tried generating the terrain from different locations, each time it seems the issue occurs at a handful of points at the border of an unloaded area:
image
image
image

So I thought, maybe there is a pattern then, and maybe there is a way to deal with this?
Turns out, my sample_max_error was still set to 3, which I think causes the navmesh to clip through the ground in some areas, resulting in lower quality. So I set it back to 1, and a lot more errors happened, maybe 50? Most of which also near borders, but some started occurring within the map too:
image
image

I tried sample_max_error to 5 then. Nope, still loads of errors. It's actually not reliable, with procedural terrain it's like rolling a dice with Recast.

So it sounds like I'm at a dead-end trying to make this system error-free on marching cubes terrain...

I tested setting merge_rasterizer_cell_scale to 0.001 in Godot 4.3, and indeed the error went away, but it doesn't change the navmesh, so it only "silences" it rather than fixing anything.
Later on I may have to find ways to chunk the navmesh using a region per chunk, to allow cheaper modification of the terrain by players, which would require efficient connection between the regions. But then I hear that this workaround would prevent this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants