-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TopologyPreservingSimplifier is non-stable for some MultiLineString inputs #667
Comments
Confirmed, still in PostGIS, but probably in GEOS? Will push this into a unit test to see.
|
Well, pulling that test back into GEOS, I'm seeing nothing. No instability over multiple calls. It's very odd indeed. Also no particularly strange usage patterns in PostGIS that might lead to bad memory handling or anything. I also re-wrote the call in PostGIS to push the work into liblwgeom, but that changed nothing either. The results are still quite unstable, and you can see them in things like the ST_Length() of the simplified output, the length(ST_AsText()) of the simplified output (!!) and of course the difference of the output against itself. It looks a lot like some old bugs where the functions scribbled into the memory space where the input was copied, but that is not the case here: the path into GEOS takes a full copy and the path out writes a fresh memory buffer. I'm stumped at this point. Oh, I also built GEOS with an address sanitizer but it didn't notice any places where memory was being written to out of bounds. I added that test to the main CI, so that's a net improvement, but still no closer to knowing what's going wrong here. |
Continuing with the investigation, on the https://github.com/pramsey/geos/tree/main-gh667 branch, there is now a test in TopologyPreservingSimplifierTest.cpp that generates differing outputs from the same input. for (std::size_t i = 0; i < 20; ++i) {
GeomPtr g(wktreader.read(wkt));
ensure_equals("length test tolerance", g->getLength(), 156783.87824769085, 0.00001);
GeomPtr simp = TopologyPreservingSimplifier::simplify(g.get(), 1222);
double simp_len = simp->getLength();
std::cout.precision(14);
std::cout << "got simp length " << simp_len << std::endl;
// ensure_equals(simp_len, 155774.1823328273);
} The implication is that the problem is in fact back in the C++ code, so we can eliminate the CAPI and PostGIS as factors. The fun behaviour in PgSQL comes from the fact that the PgSQL executor is inlining calls to ST_SimplifyPreserveTopology() from the subquery up to the calling query, resulting in multiple independent evaluations of the function which, as we now see in the unit tests, can result in different answers. All that said though, I'm putting this aside for now, as it's not super critical that the results of TopologyPreservingSimplifier::simplify() be stable, the results are very nearly the same, and I can detect neither memory leaks (via valgrind) nor bad reads/writes via -fsanitize. Not closing though, would still like to track this down. |
I noticed that
geos/src/simplify/TopologyPreservingSimplifier.cpp Lines 308 to 310 in c68119c
I'm not familiar with the TPS algorithm but I would expect that to get stability we would need to use Maybe this is somehow related to https://trac.osgeo.org/geos/ticket/1081 |
I think you may have nailed it. The whole core of that problem is not so much getting "wrong" results as getting "slightly different" results, and a stable answer would fix the whole thing. |
This is reflecting the JTS implementation exactly. A Are the addresses of the elements of the input Geometry stable over the duration of the execution of the TPS algorithm? If so, it seems like this approach should work identically to the JTS algorithm. And in fact it seems like things would go badly wrong if this is not the case. @dbaston thoughts? |
Oh wait, I get it... the problem is the unordered map, which produces a non-deterministic ordering of the elements in it. The Topology-Preserving simplification is dependent on the order the simplification of lines is carried out. This is a likely a problem in the JTS code as well, it has just never been noticed. I'll test this out. One way to handle this is to put the extracted line elements into an ordered list as well, and then simplify the list items in their (deterministic) order. The map is still needed to allow reconstructing the simplified output geometry. There may also be a simpler way to approach this, by simplifying the lines as they are scanned. But that's a bigger change. |
Great, thanks ! |
I have a question.
It seems ST_SIMPLIFYPRESERVETOPOLOGY (from postgis, which documentation says it is performed by the geos module) sends varying data with some MULTILINESTRING inputs
It also seems that the longer the MULTILINESTRING, the greater the chance of variation.
My question: is this a bug or something to be expected ?
example SQL : (run a few times, check that diff is sometimes not empty)
result (screen from pgAdmin)
Version:
POSTGIS="3.2.2 628da50" [EXTENSION] PGSQL="140" GEOS="3.9.0-CAPI-1.16.2" PROJ="7.2.1" GDAL="GDAL 3.2.2, released 2021/03/05" LIBXML="2.9.10" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" TOPOLOGY RASTER
(not sur if this should be here or on postgis side, so I opened both https://trac.osgeo.org/postgis/ticket/5225#ticket)
The text was updated successfully, but these errors were encountered: