Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MapDataController::getMapDataWithGeometry returns incomplete Ways #4980

Closed
westnordost opened this issue May 1, 2023 · 16 comments · Fixed by #4988
Closed

MapDataController::getMapDataWithGeometry returns incomplete Ways #4980

westnordost opened this issue May 1, 2023 · 16 comments · Fixed by #4988
Labels

Comments

@westnordost
Copy link
Member

westnordost commented May 1, 2023

I think as part of @Helium314 's extensive caching PR or even a bit earlier than than, we made bounding box queries to local map data mimic the behavior of the OpenStreetMap API:

  1. get all nodes in bounding box
  2. get all ways that use any of these nodes
  3. get all relations that use any of these nodes or any of these ways

However, we have overlooked something: The OpenStreetMap API never returns incomplete ways. In other words, if a way is included in the result set, it always also includes all nodes of the way, even if some of these nodes are not in the given bounding box. Currently, MapDataController::getMapDataWithGeometry returns all ways that have nodes within the given bounding box, but omits nodes of these ways that are not within the given bounding box.

This inconsistency has not been deliberate, but an oversight (at least from my part). I am pretty sure this is the source of #4925, maybe others. It definitely affects #4976.

A fix would look like this in the ElementDao (plus tests)...

--- a/app/src/main/java/de/westnordost/streetcomplete/data/osm/mapdata/ElementDao.kt
+++ b/app/src/main/java/de/westnordost/streetcomplete/data/osm/mapdata/ElementDao.kt
@@ -45,12 +45,19 @@ class ElementDao(
 
     fun getAll(bbox: BoundingBox): List<Element> {
         val nodes = nodeDao.getAll(bbox)
-        val nodeIds = nodes.map { it.id }
+        val nodeIds = nodes.map { it.id }.toSet()
         val ways = wayDao.getAllForNodes(nodeIds)
         val wayIds = ways.map { it.id }
-        val relations = relationDao.getAllForElements(nodeIds = nodeIds, wayIds = wayIds)
-        val result = ArrayList<Element>(nodes.size + ways.size + relations.size)
+        val additionalWayNodeIds = ways
+            .asSequence()
+            .flatMap { it.nodeIds }
+            .filter { it !in nodeIds }
+            .toList()
+        val additionalNodes = nodeDao.getAll(additionalWayNodeIds)
+        val relations = relationDao.getAllForElements(nodeIds = additionalWayNodeIds + nodeIds, wayIds = wayIds)
+        val result = ArrayList<Element>(nodes.size + additionalNodes.size + ways.size + relations.size)
         result.addAll(nodes)
+        result.addAll(additionalNodes)
         result.addAll(ways)
         result.addAll(relations)
         return result

but I think the MapDataCache also would need to be changed somewhat. I looked a bit into that, and it'd be more than a few lines of code. @Helium314, what do you think? Do you think my assessment is correct? And if yes, would you be willing to take care of changing the MapDataCache to this effect?

@Helium314
Copy link
Collaborator

Helium314 commented May 1, 2023

Adjusting the cache would most likely need some more work... sure, I would do it.

Currently nodes are only stored in SpatialCache, but this won't be feasible if we also want to cache nodes that basically might be anywhere.
My idea is adding some cache (like wayCache and relationCache for those extra nodes, but it will be a bit tricky making this work nicely with adding/removing/trimming tiles from SpatialCache.

I think the MapDataCache should get the ability to fetch nodes from DB. This would remove the need to solve every edge case, as sometimes it might be necessary to loop over every node of every cached way (and this might be slower than fetching a few nodes from db).

[Edit: I overlooked the wayIdsByNodeIdCache that should make things less bad, but still I think simply re-fetching "outside nodes" from DB when necessary would likely simplify things]

@westnordost
Copy link
Member Author

Regarding nodeCache, I was suspecting something like this when reading the code. I don't follow though why the MapDataCache should have (direct??) access to the DB... indirectly, sure, this is how it currently works, isn't it.

If you are saying that fetching nodes outside of the bbox without caching, ... well, this basically would need to be done for every single bbox no? There will always be one way or another that is not completely contained within that bbox.

@Helium314
Copy link
Collaborator

If you are saying that fetching nodes outside of the bbox without caching, ... well, this basically would need to be done for every single bbox no?

In the initial bbox fetch for those ways, the nodes are included and thus will be in the nodeCache.
The DB access is meant for cases where some of the nodes that were in SpatialCache are removed.

Though maybe if SpatialCache.trim would return removed nodes, it should be possible to find necessary ones and add them to the nodeCache.
But when trimming because of memory pressure, responding with a (temporary and likely small) increase of memory use might be bad.

@Helium314
Copy link
Collaborator

Biggest potential issue when MapDataCache is not allowed to fetch nodes:
SpatialCache may remove tiles without explicitly calling trim. This can be when on replaceAllInBBox and get(bbox) call trim, or when replaceAllInBBox removes incomplete tiles.
But MapDataCache would need to know if anything is removed from SpatialCache, and put it to nodeCache if it's part of a cached way.

So even returning removed nodes on trim and putting some of them to nodeCache likely won't guarantee that all nodes of cached ways remain cached.

See master...Helium314:SCEE:complete_ways for more comments.

@westnordost
Copy link
Member Author

Disclaimer, the cache is your brainchild, which is why

  1. my understanding of it without deeply putting my head into it is limited and
  2. I guess fetching the nodes from DB will result in somewhat of a performance impact because it would have to be done on virtually every bbox query and I know that performance optimization is very important to you. So if you say that caching nodes outside their bbox is not feasible, I trust your assessment.

And, I guess, optimizations could always be done later, after the fix. If we have a working version first, we could also have tests for that first, so when finally coming to the performance optimizations, we can be more sure that everything is working as expected because it is covered by tests.
In any case, maybe this is a case where test-driven development would be useful. There should be tests anyway, might as well start with them.

private val nodeCache = HashMap<Long, Node>() // what capacity? some estimate would be useful...

initialCapacity

I've looked at the other todos but I can't be of any help without wrapping my head completely around the caching stuff again.

@Helium314
Copy link
Collaborator

Helium314 commented May 2, 2023

So I'll start with fetching nodes from DB and simply clearing nodeCache on trim, and do some performance logging.
Current version of moving some nodes from spatialCache to nodeCache on trim is ridiculously slow.

@Helium314
Copy link
Collaborator

After some testing I think fetching nodes from DB is fine. It's necessary only after trimming, which does not occur frequently, and those nodes are put to nodeCache anyway, so no need to fetch them again until the next trim.
Even if a fetch from node DB is necessary, it usually takes around 50% of the time spent in getMapDataWithGeometry. So no huge slowdown.

@Helium314
Copy link
Collaborator

I stumbled upon a weird issue that might be a bug in the cache. Looks like it's triggered by tilesRect.asBoundingBox(zoom).enclosingTilesRect(zoom) sometimes not being equal to the intial rect.

e.g. for TilesRect(71512, 45464, 71513, 45465) it's equal at zoom 18, but not at zoom 17.

This leads to incomplete tiles in SpatialCache.

The issue with the cache seems to be that these removed incomplete tiles are added to SpatialCache again as empty tiles when fetching newly added nodes form MapDatCache. On the next fetch, MapDataCache only sees that these tiles exist, but actually they are empty (and should not be in SpatialCache).

@westnordost
Copy link
Member Author

Oh wow, that's cool! I mean, that you maybe found some bug. Bugs in the cache can lead to all sorts of esoteric problems. I suggest you write a test for this and fix it in a separate branch though, it has nothing to do with this here, right?

@Helium314
Copy link
Collaborator

it has nothing to do with this here, right?

No, only that I noticed it as part of this work.
Fix for cache is already done, test not.

But the tilesRect.asBoundingBox(zoom).enclosingTilesRect(zoom) behavior is a bug too, right? At very least it should be consistent.
Here I have no idea how to fix it... I only remember there were similar issues with bbox -> tilesRect -> bbox which you fixed.

@westnordost
Copy link
Member Author

Hmm, not sure about that. Maybe we conversed about this already.

Maybe one of the two is inclusive on one edge and the other is exclusive on one edge. There is some lengthy comment in TilesRect.kt

@westnordost
Copy link
Member Author

But I think you are right... TilesRect.asBoundingBox should return a bounding box that is floating-point-imprecision smaller than the tilesrect. So if you call BoundingBox.enclosingTilesRect, it should return again the same TilesRect.

@Helium314
Copy link
Collaborator

I had a suspicion that tilesRect.asBoundingBox(zoom).enclosingTilesRect(zoom) means that TilesRectTest.convert bbox to tiles rect and back results in same bbox might sometimes fail too.
The test works for p(53.0, 9.0) and most other positions I tried, but fails sometimes, e.g. for p(85.049, -179.989) and p(48.179, 16.414).

@westnordost
Copy link
Member Author

Because of float imprecision I presume?

@Helium314
Copy link
Collaborator

I guess, but didn't investigate any further

westnordost added a commit that referenced this issue May 5, 2023
@Helium314
Copy link
Collaborator

@westnordost this bug may still happen in a more subtle way. When playing with an update to the insert node capabilities of SCEE, I had a crash because node -16 was not in the map data.

Looks like we completely overlooked MapDataWithEditsSource.modifyBBoxMapData...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants