Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wkt reader #20

Merged
merged 4 commits into from
Apr 1, 2022
Merged

wkt reader #20

merged 4 commits into from
Apr 1, 2022

Conversation

michaelkirk
Copy link
Member

@michaelkirk michaelkirk commented Mar 30, 2022

Fixes #19

I have a couple things I was hoping to get input on before submitting for actual review... questions inline.
Update: OK! Ready for review!

Definitely some weirdness around handling POINT EMPTY, but I went with what felt to me like a reasonable compromise of trying to:

  • handle POINT EMPTY where possible. Currently this is just Wkt->Wkt which isn't very likely to be used, but I suppose if someone wrote a custom processor, they might want to have the info.
  • Ignore EMPTY in a MultiPoint.
  • Else return Err when encountering a POINT EMPTY. I couldn't find a way to "be lazy" and ignore a POINT EMPTY for a top level geometry - we have to return something. And having Wkt("POINT EMPTY).to_geo() -> point!(x: 0, y: 0) seems likely to be a bad surprise.

@michaelkirk michaelkirk force-pushed the mkirk/wkt-reader branch 2 times, most recently from 4a0814e to 18320e0 Compare March 30, 2022 20:20
use std::str::FromStr;
// PERF: it would be good to avoid copying data into this string when we already
// have a string as input. Maybe the wkt crate needs a from_reader implementation.
let mut wkt_string = String::new();
Copy link
Member Author

@michaelkirk michaelkirk Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this file was copied from the geojson_reader. Like there, we copy the input into a String, which seems wasteful, but I don't know of another way around it.

I don't think the wkt crate currently exposes anything other than a from_str constructor.

Does this seem reasonable for now?

edit: the eventual solution might be similar to the work proposed in https://github.com/georust/geozero/pull/16/files

if let Some(ref coord) = point.0 {
process_coord(coord, multi_dim, 0, processor)?;
} else {
return Err(GeozeroError::Geometry(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK WKT is the only format with an "empty point".

I think erroring here is the safest thing to do here given that there is no support for it in other formats, so it's not really defined how they should handle it.

Another option would be to add

pub trait GeomProcessor {
...
    fn empty_coordinate() -> Result {
         Err(GeozeroError::Geometry("The WKT Point was empty, but Points cannot be empty in other formats".to_string()))
    }
}

And then only override it in the WktReader/WktWriter, but I'm not sure if it would be useful for anything. WDYT @pka?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paul Ramsey from PostGIS calls ignoring empty geometries strategic laziness (https://gis.stackexchange.com/a/106942). In the meantime PostGIS supports empty geometries and so does GEOS AFAIK. I would continue with strategic laziness and choose the cheapest solution. What happens, if we emit no coordinate and don't return an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried being as lazy as possible with dc6a63a

But I didn't like the results.

    #[test]
     fn empty_point() {
         let wkt = WktStr("POINT EMPTY");
         let actual = wkt.to_geo().unwrap();

         // This is weird - but it's not obvious to me what we should do in this case.
         let expected: geo_types::Geometry<f64> = point!(x: 0.0, y: 0.0).into();

         assert_eq!(expected, actual);
     }

We can't just "skip" an empty point - since the return type is not optional. It strikes me as pretty bad to return (0, 0).

I've doubled back and reimplemented a "return error" approach when trying to process an empty point into anything other than that WKTWriter.

WDYT?

Copy link

@phayes phayes Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about representing empty points as Geometry<f64> = point!(x: f64::NaN, y: f64::NaN) ?

Copy link
Member

@urschrei urschrei Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geo algorithms assume the absence of NaN values in order to produce correct results, and they don't check for it (for performance reasons), so this would be a source of catastrophic downstream errors (what's the Euclidean distance between Point(0.0, 0.0) and Point(NaN, NaN)) and / or a catastrophic perf regression (e.g. checking 10e7 points for NaN values)

Comment on lines 110 to 115
if let Some(ref coord) = point.0 {
processor.point_begin(idx)?;
process_coord(coord, multi_dim, 0, processor)?;
processor.point_end(idx)
} else {
return Err(GeozeroError::Geometry(
"The WKT Point was empty, but Points cannot be empty in other formats".to_string(),
));
// skip processing of POINT EMPTY for now since no other formats support it
Copy link
Member

@pka pka Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is

    processor.point_begin(idx)?;
    if let Some(ref coord) = point.0 {
        process_coord(coord, multi_dim, 0, processor)?;
    } else {
        // POINT EMPTY
    }
    processor.point_end(idx)

causing problems? It looks like the correct representation of an empty geometry to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the spelling is POINT EMPTY not POINT ()

@michaelkirk michaelkirk force-pushed the mkirk/wkt-reader branch 2 times, most recently from 700bc8f to 0915431 Compare March 31, 2022 22:50
@michaelkirk michaelkirk marked this pull request as ready for review March 31, 2022 23:18
@pka
Copy link
Member

pka commented Apr 1, 2022

Looks like we can't avoid going into that rabbit hole. At least we're not alone (locationtech/jts#567 https://trac.osgeo.org/geos/ticket/1005). Looks like your solution with an explicit empty_point is the cleanest one. But in the end we would probably need this for all geometry types?

Anyway let's merge this and improve handling of empty geometries seperately. Thanks a lot!

@pka pka merged commit 035239a into georust:master Apr 1, 2022
@michaelkirk
Copy link
Member Author

michaelkirk commented Apr 1, 2022

in the end we would probably need this for all geometry types?

Luckily, I think POINT EMPTY is kind of an exception. It seems like we can more reasonably handle other empty geometries like MULTIPOINT EMPTY, LINESTRING EMPTY, etc. without exploding.

I've expounded on this here: #21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement WKT Reader
4 participants