Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures with GEOS master / 3.10 #233

Closed
jorisvandenbossche opened this issue Nov 6, 2020 · 35 comments · Fixed by #410
Closed

Failures with GEOS master / 3.10 #233

jorisvandenbossche opened this issue Nov 6, 2020 · 35 comments · Fixed by #410
Milestone

Comments

@jorisvandenbossche
Copy link
Member

I was testing with GEOS master (in light of the new OverlayNG), and next to 2 failures related to the new overlay implementation (which I am fixing in #232), there are also 2 other failures:

_______________________________________ test_points_invalid_ndim ________________________________________

    def test_points_invalid_ndim():
        with pytest.raises(pygeos.GEOSException):
>           pygeos.points([0, 1, 2, 3])
E           Failed: DID NOT RAISE <class 'pygeos.GEOSException'>

pygeos/test/test_creation.py:41: Failed
_______________________________________ test_from_wkb_empty[POLYGON EMPTY] _____________________________

wkt = 'POLYGON EMPTY'

    @pytest.mark.parametrize(
        "wkt", ("POINT EMPTY", "LINESTRING EMPTY", "POLYGON EMPTY", "GEOMETRYCOLLECTION EMPTY")
    )
    def test_from_wkb_empty(wkt):
        wkb = pygeos.to_wkb(pygeos.Geometry(wkt))
        geom = pygeos.from_wkb(wkb)
        assert pygeos.is_geometry(geom).all()
        assert pygeos.is_empty(geom).all()
>       assert pygeos.to_wkb(geom) == wkb
E       AssertionError: assert b'\x01\x03\x0...0\x00\x00\x00' == b'\x01\x03\x0...0\x00\x00\x00'
E         At index 4 diff: b'\x80' != b'\x00'
E         Use -v to get the full diff

pygeos/test/test_io.py:121: AssertionError

The first case now seems to ignore the 4th dimension, and create a POINT Z:

(Pdb) pygeos.points([0, 1, 2, 3])
<pygeos.Geometry POINT Z (0 1 2)>

In the second case, the WKB of an empty polygon differs depending on whether the empty polygon was created from WKT or WKB:

(Pdb) geom1 = pygeos.Geometry("POLYGON EMPTY")
(Pdb) geom1
<pygeos.Geometry POLYGON EMPTY>
(Pdb) pygeos.to_wkb(geom1)
b'\x01\x03\x00\x00\x00\x00\x00\x00\x00'
(Pdb) geom2 = pygeos.from_wkb(pygeos.to_wkb(geom1))
(Pdb) geom2
<pygeos.Geometry POLYGON EMPTY>
(Pdb) pygeos.to_wkb(geom2)
b'\x01\x03\x00\x00\x80\x00\x00\x00\x00'
@jorisvandenbossche
Copy link
Member Author

So for the WKB issue, in my example above, geom1 gives a coordinate dimension of 2, and geom2 of 3 (using pygeos.get_coordinate_dimension. But pygeos.has_z gives False for both. That seems like something buggy in GEOS.

@jorisvandenbossche
Copy link
Member Author

The WKB issue is already fixed by @pramsey: libgeos/geos@fa24fb2

@mwtoews
Copy link
Contributor

mwtoews commented Nov 9, 2020

The issue was somewhat describe in GEOS Trac #1048. I haven't had a chance to investigate the new behaviour yet...

@jorisvandenbossche
Copy link
Member Author

Note that the second issue mentioned above (the dimensionality in WKB roundtrips) is already fixed, see the commit I linked to above (I reported it on the GEOS chat channel, and Paul promptly fixed it)

The first issue is

(Pdb) pygeos.points([0, 1, 2, 3])
<pygeos.Geometry POINT Z (0 1 2)>

while before this raised an error. This relates to this code in pygeos:

pygeos/src/ufuncs.c

Lines 1491 to 1503 in 2e65c2d

static void points_func(char** args, npy_intp* dimensions, npy_intp* steps, void* data) {
GEOS_INIT;
SINGLE_COREDIM_LOOP_OUTER {
CREATE_COORDSEQ(1, n_c1);
SINGLE_COREDIM_LOOP_INNER {
double coord = *(double*)cp1;
SET_COORD(0, i_c1);
}
GEOSGeometry* ret_ptr = GEOSGeom_createPoint_r(ctx, coord_seq);
CHECK_RET_PTR;
OUTPUT_Y;
}

and

pygeos/src/ufuncs.c

Lines 26 to 31 in 2e65c2d

#define SET_COORD(N, DIM) \
if (!GEOSCoordSeq_setOrdinate_r(ctx, coord_seq, N, DIM, coord)) { \
GEOSCoordSeq_destroy_r(ctx, coord_seq); \
errstate = PGERR_GEOS_EXCEPTION; \
goto finish; \
}

So before, this raised an error if you tried to create a Point with a 4 dimensional coord sequence, or when trying to create a 4 dimensional coord sequence (not directly sure which of the two raised the error)

So it might be related to the changes that were done to fix the WKB/WKT dimensionality issues, but it is not about WKB/WKT itself.

@jorisvandenbossche
Copy link
Member Author

So with GEOS 3.8, the error we currently get is:

In [19]: pygeos.points([0, 1, 2, 3])
...
GEOSException: IllegalArgumentException: Unknown ordinate index 0

This error comes from setOrdinate in GEOS: https://github.com/libgeos/geos/blob/26cd4782faf7451c7a3df6a33a462b2669ef5153/include/geos/geom/FixedSizeCoordinateSequence.h#L63-L81
But this code itself hasn't changed since GEOS 3.8.0, so not directly sure why we don't get this error anymore when using GEOS master

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Nov 9, 2020

Ah, got it. libgeos/geos@8e29f76 accidentally changed the error return value for GEOSCoordSeq_setOrdinate_r from 0 to 1, so we no longer catch the error.

Should be an easy fix, can do a PR in a few days unless someone already wants to give it a go (would ideally add a test for the GEOS C API as well):

--- a/capi/geos_ts_c.cpp
+++ b/capi/geos_ts_c.cpp
@@ -2164,7 +2164,7 @@ extern "C" {
     GEOSCoordSeq_setOrdinate_r(GEOSContextHandle_t extHandle, CoordinateSequence* cs,
                                unsigned int idx, unsigned int dim, double val)
     {
-        return execute(extHandle, 1, [&]() {
+        return execute(extHandle, 0, [&]() {
             cs->setOrdinate(idx, dim, val);
             return 1;
         });

@jorisvandenbossche
Copy link
Member Author

I made a small PR to fix the above explained issue: libgeos/geos#354

But, we have a new bunch of failures: a large part of the STRtree tests are failing with GEOS master (which is about to be released). See eg https://github.com/pygeos/pygeos/runs/1465460418#step:9:86

I mailed geos-devel about it: https://lists.osgeo.org/pipermail/geos-devel/2020-November/009870.html
I don't yet directly see what in our code could cause this ..

@caspervdw
Copy link
Member

@jorisvandenbossche This is solved right?

@jorisvandenbossche
Copy link
Member Author

We still have a few failures on GEOS master:

_____________________ [doctest] pygeos.constructive.buffer _____________________
075         Crops of 'mitre'-style joins if the point is displaced from the
076         buffered vertex by more than this limit.
077     single_sided : bool
078         Only buffer at one side of the geometry.
079 
080     Examples
081     --------
082     >>> buffer(Geometry("POINT (10 10)"), 2, quadsegs=1)
083     <pygeos.Geometry POLYGON ((12 10, 10 8, 8 10, 10 12, 12 10))>
084     >>> buffer(Geometry("POINT (10 10)"), 2, quadsegs=2)
Expected:
    <pygeos.Geometry POLYGON ((12 10, 11.4 8.59, 10 8, 8.59 8.59, 8 10, 8.59 11....>
Got:
    <pygeos.Geometry POLYGON ((12 10, 11.414 8.586, 10 8, 8.586 8.586, 8 10, 8.5...>

/home/runner/work/pygeos/pygeos/pygeos/constructive.py:84: DocTestFailure
_________________________________ test_eq_nan __________________________________

    def test_eq_nan():
>       assert point_nan != point_nan
E       assert <pygeos.Geometry POINT EMPTY> != <pygeos.Geometry POINT EMPTY>

pygeos/test/test_geometry.py:234: AssertionError
_________________________________ test_neq_nan _________________________________

    def test_neq_nan():
>       assert not (point_nan == point_nan)
E       assert not <pygeos.Geometry POINT EMPTY> == <pygeos.Geometry POINT EMPTY>

pygeos/test/test_geometry.py:238: AssertionError
_________________________________ test_set_nan _________________________________

    def test_set_nan():
        # As NaN != NaN, you can have multiple "NaN" points in a set
        # set([float("nan"), float("nan")]) also returns a set with 2 elements
        a = set(pygeos.points([[np.nan, np.nan]] * 10))
>       assert len(a) == 10  # different objects: NaN != NaN
E       assert 1 == 10
E        +  where 1 = len({<pygeos.Geometry POINT EMPTY>})

pygeos/test/test_geometry.py:245: AssertionError
______________________________ test_geos_version _______________________________

    def test_geos_version():
        expected = "{0:d}.{1:d}.{2:d}".format(*pygeos.geos_version)
>       assert pygeos.geos_version_string == expected
E       AssertionError: assert '3.10.0dev' == '3.10.0'
E         - 3.10.0
E         + 3.10.0dev
E         ?       +++

pygeos/test/test_misc.py:23: AssertionError
____________________________ test_geos_capi_version ____________________________

    @pytest.mark.skipif(
        sys.platform.startswith("win") and pygeos.geos_version[:2] == (3, 7),
        reason="GEOS_C_API_VERSION broken for GEOS 3.7.x on Windows",
    )
    def test_geos_capi_version():
        expected = "{0:d}.{1:d}.{2:d}-CAPI-{3:d}.{4:d}.{5:d}".format(
            *(pygeos.geos_version + pygeos.geos_capi_version)
        )
>       assert pygeos.geos_capi_version_string == expected
E       AssertionError: assert '3.10.0dev-CAPI-1.15.0' == '3.10.0-CAPI-1.15.0'
E         - 3.10.0-CAPI-1.15.0
E         + 3.10.0dev-CAPI-1.15.0
E         ?       +++

pygeos/test/test_misc.py:34: AssertionError

For the doctests, we could maybe limit ourselves to run the doctests on a single GEOS / Python combo, eg latest versions (printing precision changed in GEOS 3.10?).

For the geos version string failures, see also #262 (comment)

The empty point / nan failures is something to further look into

@pramsey
Copy link

pramsey commented Feb 9, 2021

assert <pygeos.Geometry POINT EMPTY> != <pygeos.Geometry POINT EMPTY>

This seems wrong to me. It confuses the WKB representation of a thing (two nans for empty) with the thing itself (POINT EMPTY). While yes, Nan != Nan, that's not what should be tested. POINT EMPTY == POINT EMPTY. Just like LINESTRING EMPTY == LINESTRING EMPTY.

@jorisvandenbossche
Copy link
Member Author

Indeed POINT EMPTY should equal POINT EMPTY. But I assume the test was supposed to check POINT(nan nan) != POINT(nan nan) (which I think is expected?), and something changed in the construction of POINT(nan nan).

@pramsey
Copy link

pramsey commented Feb 9, 2021

Yes, it's possible that WKB logic leaked into the constructor, I had to wrestle with POINT EMPTY WKB issues earlier.

@jorisvandenbossche
Copy link
Member Author

So with GEOS 3.8 (and I assume 3.9 as well), we have:

In [39]: pygeos.points(np.nan, np.nan)
Out[39]: <pygeos.Geometry POINT (nan nan)>

But with GEOS master the above results in empty points

Or possibly related with libgeos/geos@693c1aa? Which makes that POINT(nan nan) is regarded as empty

@pramsey
Copy link

pramsey commented Feb 9, 2021

The change came from a pygeos/shapely ticket... https://trac.osgeo.org/geos/ticket/1049

I dunno. It's a stupid thing to spend too much time on, as a Point(nan nan) is a fundamentally useless thing. Do we want to be able to represent it with fidelity? Do we really care how it behaves?

@caspervdw
Copy link
Member

I am comfortable with making POINT (nan nan) convert automatically to POINT EMPTY in the constructor. Completely agree with the fact that we should not spend to much time on this.

I will have a look at fixing our tests.

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Feb 26, 2021

Yeah, sorry for the slow reply, and to be clear @pramsey I didn't mean to say that you should spend more time on this topic ;). I was just trying to understand what changed.

@caspervdw
Copy link
Member

We have one failure left on GEOS master:

 _____________________ [doctest] pygeos.constructive.buffer _____________________
076         Crops of 'mitre'-style joins if the point is displaced from the
077         buffered vertex by more than this limit.
078     single_sided : bool
079         Only buffer at one side of the geometry.
080 
081     Examples
082     --------
083     >>> buffer(Geometry("POINT (10 10)"), 2, quadsegs=1)
084     <pygeos.Geometry POLYGON ((12 10, 10 8, 8 10, 10 12, 12 10))>
085     >>> buffer(Geometry("POINT (10 10)"), 2, quadsegs=2)
Expected:
    <pygeos.Geometry POLYGON ((12 10, 11.4 8.59, 10 8, 8.59 8.59, 8 10, 8.59 11....>
Got:
    <pygeos.Geometry POLYGON ((12 10, 11.414 8.586, 10 8, 8.586 8.586, 8 10, 8.5...>

Seems to be a rounding difference, maybe just in the representation.

@pramsey
Copy link

pramsey commented Mar 1, 2021

Hard to tell without seeing the whole object, but you might want to ensure the result and the expected are both normalized so you don't have failures just because of things like ordinate order.

@caspervdw caspervdw added this to the 0.11 milestone May 19, 2021
@jorisvandenbossche
Copy link
Member Author

We have a bunch of new failures with GEOS master related to the creation of LinearRings, especially with too few coordinates:

With GEOS 3.9.1, we get:

>>> pygeos.linearrings([(0.0, 0.0), (1.0, 1.0)])
...
GEOSException: IllegalArgumentException: Invalid number of points in LinearRing found 3 - must be 0 or >= 4

while with GEOS master this actually creates an (invalid?) LinearRing:

>>> pygeos.geos_version_string
'3.10.0dev'

>>> pygeos.linearrings([(0.0, 0.0), (1.0, 1.0)]) 
<pygeos.Geometry LINEARRING (0 0, 1 1, 0 0)>

Was this a deliberate change in GEOS? (in which case we probably can easily check the length ourselves and still raise an error if we want)

@jorisvandenbossche
Copy link
Member Author

@pramsey this changed in libgeos/geos@5156001#diff-41827735da42696d53e556926e2a31c49970245de846ac5b4169f506bcdc7f81R65 (the commit porting the new MakeValid implementation) because MINIMUM_VALID_SIZE was changed from 4 to 3.
Quoting the added comment:

A ring with 3 points is invalid, because it is collapsed and thus has a self-intersection. It is allowed to be constructed so that it can be represented, and repaired if needed.

That seems quite deliberate ;)
So if we want to keep disallowing creating such invalid LinearRing in the Python API, we can check the length ourselves?

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Oct 14, 2021

It seems we have a whole bunch of failures with GEOS dev branch (in latest CI run it was using the beta3 tag).
(we are testing against GEOS main, but since failures are not making the build fail, it's not very visible if there are new failures ..)

Groups of failures:

  • Constructing a LinearRing now accepts too few geometries (I thought we already handled that in [Done] Disallow linearrings with 3 coordinates in GEOS 3.10 #378? EDIT: this is probably because we only fixed this in the default code path, and not when specifying indices)
  • set_precision behaviour changes in how it collapses geometries
  • WKB -> geometry conversion now accepting invalid linearrings (this probably just needs a test update on our side)
  • STRtree query changes with touches/crosses predicates
  • Change in output for STRtree nearest_all
Pytest output:
================================== FAILURES ===================================
___________________ test_linearrings_invalid[coordinates0] ____________________

coordinates = [[1, 1], [2, 1], [1, 1]]

    @pytest.mark.parametrize(
        "coordinates",
        [
            ([[1, 1], [2, 1], [1, 1]]),  # too few coordinates
            ([[1, np.nan], [2, 1], [2, 2], [1, 1]]),  # starting with nan
        ],
    )
    def test_linearrings_invalid(coordinates):
        # attempt to construct linestrings with 1 coordinate
        with pytest.raises(pygeos.GEOSException):
>           pygeos.linearrings(coordinates, indices=np.zeros(len(coordinates)))
E           Failed: DID NOT RAISE <class 'pygeos.GEOSException'>

pygeos\tests\test_creation_indices.py:197: Failed
______________ test_set_precision_collapse[geometry0-expected0] _______________

geometry = <pygeos.Geometry LINESTRING (0 0, 0.1 0.1)>
expected = <pygeos.Geometry LINESTRING EMPTY>

    @pytest.mark.skipif(pygeos.geos_version < (3, 6, 0), reason="GEOS < 3.6")
    @pytest.mark.parametrize(
        "geometry,expected",
        [
            (
                pygeos.Geometry("LINESTRING (0 0, 0.1 0.1)"),
                pygeos.Geometry("LINESTRING EMPTY"),
            ),
            (
                pygeos.Geometry("LINEARRING (0 0, 0.1 0, 0.1 0.1, 0 0.1, 0 0)"),
                pygeos.Geometry("LINEARRING EMPTY"),
            ),
            (
                pygeos.Geometry("POLYGON ((0 0, 0.1 0, 0.1 0.1, 0 0.1, 0 0))"),
                pygeos.Geometry("POLYGON EMPTY"),
            ),
        ],
    )
    def test_set_precision_collapse(geometry, expected):
        """Lines and polygons collapse to empty geometries if vertices are too close"""
>       assert pygeos.equals(pygeos.set_precision(geometry, 1), expected)
E       assert False
E        +  where False = <function equals at 0x00000240746D1790>(<pygeos.Geometry LINESTRING (0 0, 0 0)>, <pygeos.Geometry LINESTRING EMPTY>)
E        +    where <function equals at 0x00000240746D1790> = pygeos.equals
E        +    and   <pygeos.Geometry LINESTRING (0 0, 0 0)> = <function set_precision at 0x00000240746B25E0>(<pygeos.Geometry LINESTRING (0 0, 0.1 0.1)>, 1)
E        +      where <function set_precision at 0x00000240746B25E0> = pygeos.set_precision

pygeos\tests\test_geometry.py:548: AssertionError
______________ test_set_precision_collapse[geometry1-expected1] _______________

geometry = <pygeos.Geometry LINEARRING (0 0, 0.1 0, 0.1 0.1, 0 0.1, 0 0)>
expected = <pygeos.Geometry LINEARRING EMPTY>

    @pytest.mark.skipif(pygeos.geos_version < (3, 6, 0), reason="GEOS < 3.6")
    @pytest.mark.parametrize(
        "geometry,expected",
        [
            (
                pygeos.Geometry("LINESTRING (0 0, 0.1 0.1)"),
                pygeos.Geometry("LINESTRING EMPTY"),
            ),
            (
                pygeos.Geometry("LINEARRING (0 0, 0.1 0, 0.1 0.1, 0 0.1, 0 0)"),
                pygeos.Geometry("LINEARRING EMPTY"),
            ),
            (
                pygeos.Geometry("POLYGON ((0 0, 0.1 0, 0.1 0.1, 0 0.1, 0 0))"),
                pygeos.Geometry("POLYGON EMPTY"),
            ),
        ],
    )
    def test_set_precision_collapse(geometry, expected):
        """Lines and polygons collapse to empty geometries if vertices are too close"""
>       assert pygeos.equals(pygeos.set_precision(geometry, 1), expected)
E       assert False
E        +  where False = <function equals at 0x00000240746D1790>(<pygeos.Geometry LINEARRING (0 0, 0 0, 0 0, 0 0, 0 0)>, <pygeos.Geometry LINEARRING EMPTY>)
E        +    where <function equals at 0x00000240746D1790> = pygeos.equals
E        +    and   <pygeos.Geometry LINEARRING (0 0, 0 0, 0 0, 0 0, 0 0)> = <function set_precision at 0x00000240746B25E0>(<pygeos.Geometry LINEARRING (0 0, 0.1 0, 0.1 0.1, 0 0.1, 0 0)>, 1)
E        +      where <function set_precision at 0x00000240746B25E0> = pygeos.set_precision

pygeos\tests\test_geometry.py:548: AssertionError
______________ test_set_precision_collapse[geometry2-expected2] _______________

geometry = <pygeos.Geometry POLYGON ((0 0, 0.1 0, 0.1 0.1, 0 0.1, 0 0))>
expected = <pygeos.Geometry POLYGON EMPTY>

    @pytest.mark.skipif(pygeos.geos_version < (3, 6, 0), reason="GEOS < 3.6")
    @pytest.mark.parametrize(
        "geometry,expected",
        [
            (
                pygeos.Geometry("LINESTRING (0 0, 0.1 0.1)"),
                pygeos.Geometry("LINESTRING EMPTY"),
            ),
            (
                pygeos.Geometry("LINEARRING (0 0, 0.1 0, 0.1 0.1, 0 0.1, 0 0)"),
                pygeos.Geometry("LINEARRING EMPTY"),
            ),
            (
                pygeos.Geometry("POLYGON ((0 0, 0.1 0, 0.1 0.1, 0 0.1, 0 0))"),
                pygeos.Geometry("POLYGON EMPTY"),
            ),
        ],
    )
    def test_set_precision_collapse(geometry, expected):
        """Lines and polygons collapse to empty geometries if vertices are too close"""
>       assert pygeos.equals(pygeos.set_precision(geometry, 1), expected)
E       assert False
E        +  where False = <function equals at 0x00000240746D1790>(<pygeos.Geometry POLYGON ((0 0, 0 0, 0 0, 0 0, 0 0))>, <pygeos.Geometry POLYGON EMPTY>)
E        +    where <function equals at 0x00000240746D1790> = pygeos.equals
E        +    and   <pygeos.Geometry POLYGON ((0 0, 0 0, 0 0, 0 0, 0 0))> = <function set_precision at 0x00000240746B25E0>(<pygeos.Geometry POLYGON ((0 0, 0.1 0, 0.1 0.1, 0 0.1, 0 0))>, 1)
E        +      where <function set_precision at 0x00000240746B25E0> = pygeos.set_precision

pygeos\tests\test_geometry.py:548: AssertionError
__________________________ test_from_wkb_exceptions ___________________________

    def test_from_wkb_exceptions():
        with pytest.raises(TypeError, match="Expected bytes, got int"):
            pygeos.from_wkb(1)
    
        # invalid WKB
        with pytest.raises(pygeos.GEOSException, match="Unexpected EOF parsing WKB"):
            result = pygeos.from_wkb(b"\x01\x01\x00\x00\x00\x00")
            assert result is None
    
        # invalid ring in WKB
        with pytest.raises(
            pygeos.GEOSException,
            match="Invalid number of points in LinearRing found 3 - must be 0 or >= 4",
        ):
            result = pygeos.from_wkb(
                b"\x01\x03\x00\x00\x00\x01\x00\x00\x00\x03\x00\x00\x00P}\xae\xc6\x00\xb15A\x00\xde\x02I\x8e^=A0n\xa3!\xfc\xb05A\xa0\x11\xa5=\x90^=AP}\xae\xc6\x00\xb15A\x00\xde\x02I\x8e^=A"
            )
>           assert result is None
E           assert <pygeos.Geometry POLYGON ((1421568.776 1924750.285, 1421564.131 1924752.241,...> is None

pygeos\tests\test_io.py:174: AssertionError
_____________________ test_from_wkb_warn_on_invalid_warn ______________________

    def test_from_wkb_warn_on_invalid_warn():
        # invalid WKB
        with pytest.warns(Warning, match="Invalid WKB"):
            result = pygeos.from_wkb(b"\x01\x01\x00\x00\x00\x00", on_invalid="warn")
            assert result is None
    
        # invalid ring in WKB
        with pytest.warns(Warning, match="Invalid WKB"):
            result = pygeos.from_wkb(
                b"\x01\x03\x00\x00\x00\x01\x00\x00\x00\x03\x00\x00\x00P}\xae\xc6\x00\xb15A\x00\xde\x02I\x8e^=A0n\xa3!\xfc\xb05A\xa0\x11\xa5=\x90^=AP}\xae\xc6\x00\xb15A\x00\xde\x02I\x8e^=A",
                on_invalid="warn",
            )
>           assert result is None
E           assert <pygeos.Geometry POLYGON ((1421568.776 1924750.285, 1421564.131 1924752.241,...> is None

pygeos\tests\test_io.py:189: AssertionError
___________________ test_from_wkb_ignore_on_invalid_ignore ____________________

    def test_from_wkb_ignore_on_invalid_ignore():
        # invalid WKB
        with pytest.warns(None) as w:
            result = pygeos.from_wkb(b"\x01\x01\x00\x00\x00\x00", on_invalid="ignore")
            assert result is None
            assert len(w) == 0  # no warning
    
        # invalid ring in WKB
        with pytest.warns(None) as w:
            result = pygeos.from_wkb(
                b"\x01\x03\x00\x00\x00\x01\x00\x00\x00\x03\x00\x00\x00P}\xae\xc6\x00\xb15A\x00\xde\x02I\x8e^=A0n\xa3!\xfc\xb05A\xa0\x11\xa5=\x90^=AP}\xae\xc6\x00\xb15A\x00\xde\x02I\x8e^=A",
                on_invalid="ignore",
            )
>           assert result is None
E           assert <pygeos.Geometry POLYGON ((1421568.776 1924750.285, 1421564.131 1924752.241,...> is None

pygeos\tests\test_io.py:205: AssertionError

________________ test_query_crosses_lines[geometry3-expected3] ________________

line_tree = <pygeos.strtree.STRtree object at 0x000002407DF3D1F0>
geometry = <pygeos.Geometry POLYGON ((3 1, 2.981 0.805, 2.924 0.617, 2.831 0.444, 2.707...>
expected = [1]

    @pytest.mark.parametrize(
        "geometry,expected",
        [
            # point intersects first line but is completely in common with line
            (pygeos.points(0, 0), []),
            # box overlaps envelope of first 2 lines, contains first and crosses second
            (box(0, 0, 1.5, 1.5), [1]),
            # buffer intersects 2 lines
            (pygeos.buffer(pygeos.points(3, 3), 0.5), [2, 3]),
            # buffer crosses line
            (pygeos.buffer(pygeos.points(2, 1), 1), [1]),
            # envelope of points overlaps lines but intersects none
            (pygeos.multipoints([[5, 7], [7, 5]]), []),
            # only one point of multipoint intersects
            (pygeos.multipoints([[5, 7], [7, 7], [7, 8]]), []),
        ],
    )
    def test_query_crosses_lines(line_tree, geometry, expected):
>       assert_array_equal(line_tree.query(geometry, predicate="crosses"), expected)
E       AssertionError: 
E       Arrays are not equal
E       
E       (shapes (0,), (1,) mismatch)
E        x: array([], dtype=int64)
E        y: array([1])

pygeos\tests\test_strtree.py:588: AssertionError
________________ test_query_touches_lines[geometry5-expected5] ________________

line_tree = <pygeos.strtree.STRtree object at 0x000002407DF99D90>
geometry = <pygeos.Geometry POLYGON ((2.707 1, 2.694 0.862, 2.653 0.729, 2.588 0.607, 2...>
expected = []

    @pytest.mark.parametrize(
        "geometry,expected",
        [
            # point intersects first line
            (pygeos.points(0, 0), [0]),
            # point is within line
            (pygeos.points(0.5, 0.5), []),
            # point at shared vertex between 2 lines
            (pygeos.points(1, 1), [0, 1]),
            # box overlaps envelope of first 2 lines (touches edge of 1)
            (box(0, 0, 1, 1), [1]),
            # buffer intersects 2 lines but does not touch edges of either
            (pygeos.buffer(pygeos.points(3, 3), 0.5), []),
            # buffer intersects midpoint of line at tangent but there is a little overlap
            # due to precision issues
            (pygeos.buffer(pygeos.points(2, 1), HALF_UNIT_DIAG), []),
            # envelope of points overlaps lines but intersects none
            (pygeos.multipoints([[5, 7], [7, 5]]), []),
            # only one point of multipoint intersects at vertex between lines
            (pygeos.multipoints([[5, 7], [7, 7], [7, 8]]), [6, 7]),
        ],
    )
    def test_query_touches_lines(line_tree, geometry, expected):
>       assert_array_equal(line_tree.query(geometry, predicate="touches"), expected)
E       AssertionError: 
E       Arrays are not equal
E       
E       (shapes (1,), (0,) mismatch)
E        x: array([1], dtype=int64)
E        y: array([], dtype=float64)

pygeos\tests\test_strtree.py:654: AssertionError
________________ test_nearest_all_points[geometry9-expected9] _________________

tree = <pygeos.strtree.STRtree object at 0x000002407DF9A1C0>
geometry = <pygeos.Geometry POLYGON ((3.207 2.5, 3.194 2.362, 3.153 2.229, 3.088 2.107,...>
expected = [[0], [2]]

    @pytest.mark.skipif(pygeos.geos_version < (3, 6, 0), reason="GEOS < 3.6")
    @pytest.mark.parametrize(
        "geometry,expected",
        [
            (pygeos.points(0.25, 0.25), [[0], [0]]),
            (pygeos.points(0.75, 0.75), [[0], [1]]),
            (pygeos.points(1, 1), [[0], [1]]),
            # 2 equidistant points in tree
            (pygeos.points(0.5, 0.5), [[0, 0], [0, 1]]),
            ([pygeos.points(1, 1), pygeos.points(0, 0)], [[0, 1], [1, 0]]),
            ([pygeos.points(1, 1), pygeos.points(0.25, 1)], [[0, 1], [1, 1]]),
            ([pygeos.points(-10, -10), pygeos.points(100, 100)], [[0, 1], [0, 9]]),
            (box(0.5, 0.5, 0.75, 0.75), [[0], [1]]),
            # multiple points in box
            (box(0, 0, 3, 3), [[0, 0, 0, 0], [0, 1, 2, 3]]),
            (pygeos.buffer(pygeos.points(2.5, 2.5), HALF_UNIT_DIAG), [[0], [2]]),
            (pygeos.buffer(pygeos.points(3, 3), HALF_UNIT_DIAG), [[0], [3]]),
            (pygeos.multipoints([[5.5, 5], [7, 7]]), [[0], [7]]),
            (pygeos.multipoints([[5, 7], [7, 5]]), [[0], [6]]),
            # return nearest point in tree for each point in multipoint
            (pygeos.multipoints([[5, 5], [7, 7]]), [[0, 0], [5, 7]]),
            # 2 equidistant points per point in multipoint
            (pygeos.multipoints([[0.5, 0.5], [3.5, 3.5]]), [[0, 0, 0, 0], [0, 1, 3, 4]]),
        ],
    )
    def test_nearest_all_points(tree, geometry, expected):
>       assert_array_equal(tree.nearest_all(geometry), expected)
E       AssertionError: 
E       Arrays are not equal
E       
E       (shapes (2, 2), (2, 1) mismatch)
E        x: array([[0, 0],
E              [2, 3]], dtype=int64)
E        y: array([[0],
E              [2]])

pygeos\tests\test_strtree.py:1390: AssertionError
_________________ test_nearest_all_lines[geometry5-expected5] _________________

line_tree = <pygeos.strtree.STRtree object at 0x000002407E272940>
geometry = <pygeos.Geometry POLYGON ((3.207 2.5, 3.194 2.362, 3.153 2.229, 3.088 2.107,...>
expected = [[0, 0], [1, 2]]

    @pytest.mark.skipif(pygeos.geos_version < (3, 6, 0), reason="GEOS < 3.6")
    @pytest.mark.parametrize(
        "geometry,expected",
        [
            (pygeos.points(0.5, 0.5), [[0], [0]]),
            # at junction between 2 lines, will return both
            (pygeos.points(2, 2), [[0, 0], [1, 2]]),
            # contains one line, intersects with another
            (box(0, 0, 1, 1), [[0, 0], [0, 1]]),
            # overlaps 2 lines
            (box(0.5, 0.5, 1.5, 1.5), [[0, 0], [0, 1]]),
            # second box overlaps 2 lines and intersects endpoints of 2 more
            ([box(0, 0, 0.5, 0.5), box(3, 3, 5, 5)], [[0, 1, 1, 1, 1], [0, 2, 3, 4, 5]]),
            (pygeos.buffer(pygeos.points(2.5, 2.5), HALF_UNIT_DIAG), [[0, 0], [1, 2]]),
            (pygeos.buffer(pygeos.points(3, 3), HALF_UNIT_DIAG), [[0, 0], [2, 3]]),
            # multipoints at endpoints of 2 lines each
            (pygeos.multipoints([[5, 5], [7, 7]]), [[0, 0, 0, 0], [4, 5, 6, 7]]),
            # second point in multipoint at endpoints of 2 lines
            (pygeos.multipoints([[5.5, 5], [7, 7]]), [[0, 0], [6, 7]]),
            # multipoints are equidistant from 2 lines
            (pygeos.multipoints([[5, 7], [7, 5]]), [[0, 0], [5, 6]]),
        ],
    )
    def test_nearest_all_lines(line_tree, geometry, expected):
>       assert_array_equal(line_tree.nearest_all(geometry), expected)
E       AssertionError: 
E       Arrays are not equal
E       
E       (shapes (2, 3), (2, 2) mismatch)
E        x: array([[0, 0, 0],
E              [1, 2, 3]], dtype=int64)
E        y: array([[0, 0],
E              [1, 2]])

pygeos\tests\test_strtree.py:1417: AssertionError

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Oct 14, 2021

Making the changes between GEOS 3.9.1 and GEOS 3.10.0 a bit more visible (compared to the test output). Summary seems to be that most of them are precision issues in our tests, only the set_precision changes seems an actual change in GEOS (but maybe deliberate?)

GEOSGeom_setPrecision behaviour when "collapsing" geometries:

>>> pygeos.set_precision(pygeos.Geometry("LINESTRING (0 0, 0.1 0.1)"), grid_size=1)
<pygeos.Geometry LINESTRING EMPTY>       # <-- with GEOS 3.9.1
<pygeos.Geometry LINESTRING (0 0, 0 0)>  # <-- with GEOS 3.10.0

>>> pygeos.set_precision(pygeos.Geometry("POLYGON ((0 0, 0.1 0, 0.1 0.1, 0 0.1, 0 0))"), grid_size=1)
<pygeos.Geometry POLYGON EMPTY>                        # <-- with GEOS 3.9.1 
<pygeos.Geometry POLYGON ((0 0, 0 0, 0 0, 0 0, 0 0))>  # <-- with GEOS 3.10.0

I don't really know if one or the other is preferred, but one consequence of the new result is that those returned geometries are no longer "valid" geometries.

Changes in "crosses" and "touches"

>>> line = pygeos.Geometry("LINESTRING (1 1, 2 2)")
>>> poly = pygeos.buffer(pygeos.Geometry("POINT (2 1)"), 1)
>>> pygeos.crosses(poly, line)
True   # <-- with GEOS 3.9.1
False  # <-- with GEOS 3.10.0

This could also be due to slight changes in the output of the buffer operation.

>>> import math 
>>> line = pygeos.Geometry("LINESTRING (1 1, 2 2)") 
>>> poly = pygeos.buffer(pygeos.Geometry("POINT (2 1)"), math.sqrt(2) / 2) 
>>> pygeos.touches(poly, line)
False  # <-- with GEOS 3.9.1
True   # <-- with GEOS 3.10.0

Similarly here (the test actually mentions that there is no overlap due to precision issues)

So I suppose those two were corner cases / non-robust test cases due to crossing or not touching only because of precision issues.

Changes in nearest_all

The same seems to be true for the nearest_all failures (being related to floating point issues):

>>> import math 
>>> poly = pygeos.buffer(pygeos.points(2.5, 2.5), math.sqrt(2) / 2) 
>>> points = pygeos.points(np.arange(10), np.arange(10))  
>>> points 
array([<pygeos.Geometry POINT (0 0)>, <pygeos.Geometry POINT (1 1)>,
       <pygeos.Geometry POINT (2 2)>, <pygeos.Geometry POINT (3 3)>,
       <pygeos.Geometry POINT (4 4)>, <pygeos.Geometry POINT (5 5)>,
       <pygeos.Geometry POINT (6 6)>, <pygeos.Geometry POINT (7 7)>,
       <pygeos.Geometry POINT (8 8)>, <pygeos.Geometry POINT (9 9)>],
      dtype=object)

>>> pygeos.distance(poly, points)  
array([2.82842712e+00, 1.41421356e+00, 0.00000000e+00, 3.69350335e-16,    # <-- with GEOS 3.9.1
       1.41421356e+00, 2.82842712e+00, 4.24264069e+00, 5.65685425e+00,
       7.07106781e+00, 8.48528137e+00])
array([2.82842712, 1.41421356, 0.        , 0.        , 1.41421356,            # <-- with GEOS 3.10.0
       2.82842712, 4.24264069, 5.65685425, 7.07106781, 8.48528137])

So the distance to POINT (3 3) is 3.69350335e-16 vs 0.0.

@pramsey
Copy link

pramsey commented Oct 14, 2021

How are you handling that second argument in

pygeos.set_precision(pygeos.Geometry("LINESTRING (0 0, 0.1 0.1)"), 1)

It's the argument that determines if collapses will be kept or not. The enum in GEOS is this:

enum GEOSPrecisionRules {
    /** This option causes GEOSGeom_setPrecision_r() to not attempt at preserving the topology */
    GEOS_PREC_NO_TOPO = 1,
    /** This option causes GEOSGeom_setPrecision_r() to retain collapsed elements */
    GEOS_PREC_KEEP_COLLAPSED = 2
};

And the argument is the bitwise OR of those values together...

* \param flags The bitwise OR of members of the \ref GEOSPrecisionRules enum

If the python wrapper is exposing the CAPI directly, maybe that '1' in the test should be a '2'? Or maybe the wrapper has its own internal mapping?

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Oct 14, 2021

That second argument is actually the grid_size (I updated the code snippet to be explicit about that).

But I was also just checking how we deal with those flags. With GEOS 3.10.0 I see:

>>> pygeos.set_precision(pygeos.Geometry("LINESTRING (0 0, 0.1 0.1)"), grid_size=1, preserve_topology=False) 
<pygeos.Geometry LINESTRING (0 0, 0 0)>

>>> pygeos.set_precision(pygeos.Geometry("LINESTRING (0 0, 0.1 0.1)"), grid_size=1, preserve_topology=True)
<pygeos.Geometry LINESTRING Z EMPTY>

This preserve_topology keyword translates to setting the GEOS_PREC_NO_TOPO flag getting set or not.

We currently don't set the GEOS_PREC_KEEP_COLLAPSED flag, seeing this code comment:

pygeos/src/ufuncs.c

Lines 2048 to 2052 in 01e02cd

// flags:
// GEOS_PREC_NO_TOPO (1<<0): if set, do not try to preserve topology
// GEOS_PREC_KEEP_COLLAPSED (1<<1): Not used because uncollapsed geometries are
// invalid and will not be retained in GEOS >= 3.9 anyway.
int flags = in3 ? 0 : GEOS_PREC_NO_TOPO;

(I don't exactly remember the reasoning for this)

@pramsey
Copy link

pramsey commented Oct 14, 2021

The behaviour is tested in https://github.com/libgeos/geos/blob/main/tests/unit/capi/GEOSGeom_setPrecisionTest.cpp, and seems to be "according to plan".

@pramsey
Copy link

pramsey commented Oct 14, 2021

There's unfortunately a poor mapping between what the old and new routines do for collapses, and the behaviour you see is (a) on purpose but (b) not exactly as before but (c) a little closer to what the worlds "collapse" and "topology" might be construed to mean. I'd generally get rid of "preserve topology" as an option, because what it means in entirely unclear and go with just "preserve collapsed" since that's a good deal more obvious and as a bonus will neatly match behaviour.

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Oct 14, 2021

I patched pygeos for a moment to let the preserve_topology keyword pass through the exact value for the flag passed to GEOSGeom_setPrecision. For all possible values with GEOS 3.10.0:

In [2]: import pygeos 

In [3]: line = pygeos.Geometry("LINESTRING (0 0, 0.1 0.1)")  

In [4]: pygeos.set_precision(line, grid_size=1, preserve_topology=0) 
Out[4]: <pygeos.Geometry LINESTRING Z EMPTY>

In [5]: pygeos.set_precision(line, grid_size=1, preserve_topology=1)
Out[5]: <pygeos.Geometry LINESTRING (0 0, 0 0)>

In [6]: pygeos.set_precision(line, grid_size=1, preserve_topology=2)   
Out[6]: <pygeos.Geometry LINESTRING (0 0, 0 0)>

In [7]: pygeos.set_precision(line, grid_size=1, preserve_topology=3) 
Out[7]: <pygeos.Geometry LINESTRING (0 0, 0 0)>

and with GEOS 3.9.1:

In [3]: pygeos.set_precision(line, grid_size=1, preserve_topology=0)
Out[3]: <pygeos.Geometry LINESTRING Z EMPTY>

In [4]: pygeos.set_precision(line, grid_size=1, preserve_topology=1)
Out[4]: <pygeos.Geometry LINESTRING Z EMPTY>

In [5]: pygeos.set_precision(line, grid_size=1, preserve_topology=2)
Out[5]: <pygeos.Geometry LINESTRING (0 0, 0 0)>

In [6]: pygeos.set_precision(line, grid_size=1, preserve_topology=3)
Out[6]: <pygeos.Geometry LINESTRING (0 0, 0 0)>

So we were using preserve_topology=False -> flag=1 (i.e. setting GEOS_PREC_NO_TOPO) as the default, which is exactly the one case that changed behaviour.

I'd generally get rid of "preserve topology" as an option, because what it means in entirely unclear

Would you recommend to still set the GEOS_PREC_NO_TOPO as default as well? (EDIT: I suppose not, since that's the case that changed behaviour, and if we want the empty geometry as result, we should not set the flag)

@dr-jts
Copy link

dr-jts commented Oct 14, 2021

Would you recommend to still set the GEOS_PREC_NO_TOPO as default as well? (EDIT: I suppose not, since that's the case that changed behaviour, and if we want the empty geometry as result, we should not set the flag)

I don't recommend setting GEOS_PREC_NO_TOPO as the default, since it is likely to output invalid geometries. (Not just in this case, but in general - it reduces pointwise, which can cause polygons to self-intersect).

You can think of GEOS_PREC_NO_TOPO as GEOS_PREC_MAY_BE_INVALID, if that helps :)

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Oct 14, 2021

Thanks for the feedback!

Looking back at the original PR adding this to PyGEOS (#257), it seems we went with a default of preserve_topology=False (i.e. GEOS_PREC_NO_TOPO) was to match simplify, but that doesn't seem like a strong reason to me (cc @brendan-ward)

We should probably switch the default, and maybe even reconsider whether it is useful at all to expose the option to ask for GEOS_PREC_NO_TOPO?

@brendan-ward
Copy link
Contributor

Right, we went with those defaults to match simplify purely out of an attempt at consistency, and as you pointed out a while back we should reconsider this; we just haven't yet. Perhaps we should consider flipping the default on simplify too, for the same reasons?

@dr-jts
Copy link

dr-jts commented Oct 15, 2021

Here's some clarification of the GEOSGeom_setPrecision flag semantics.

  • Default (0) (which might be called GEOS_PREC_VALID_OUTPUT) - The output is always valid. Collapsed geometry elements (including both polygons and lines) are removed. Duplicate vertices are removed.
  • GEOS_PREC_NO_TOPO (1) - Precision reduction is performed pointwise. Output geometry may be invalid due to collapse or self-intersection. Duplicate vertices are not removed. (This might be better called GEOS_PREC_POINTWISE - the current name is historical.)
  • GEOS_PREC_KEEP_COLLAPSED (2) - Like the default mode, except that collapsed linear geometry elements are preserved. Collapsed polygonal input elements are removed. Duplicate vertices are removed.

Notes

  • In the Default and GEOS_PREC_KEEP_COLLAPSED modes invalid input may cause an error to occur, unless the invalidity is below the scale of the requested precision
  • There are only 3 modes. The GEOS_PREC_NO_TOPO mode takes precedence over GEOS_PREC_KEEP_COLLAPSED. So the combination GEOS_PREC_NO_TOPO || GEOS_PREC_KEEP_COLLAPSED has the same semantics as GEOS_PREC_NO_TOPO

@pramsey
Copy link

pramsey commented Oct 15, 2021

I've added these to the doxygen, which will be more visible online shortly.

@caspervdw
Copy link
Member

I propose exposing the 3 different modes directly to the user, with (as recommended) 0 as default option. This will be an API breakage, which is fine by me as we are still doing “experimental” releases in pygeos.

The tests should then be examined and possibly be made GEOS version dependent where the output of GEOS changed.

@caspervdw
Copy link
Member

@pramsey Do you have an estimate of the planned release date of GEOS 3.10?

@pramsey
Copy link

pramsey commented Oct 19, 2021

Today or maybe a couple days, depending on how bloody-minded I am with the demands of packagers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants