Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement 3D and Measure support for geo-types only #797

Closed
wants to merge 9 commits into from

Conversation

nyurik
Copy link
Member

@nyurik nyurik commented Apr 6, 2022

This PR includes #772


This PR focuses on geo-types only. This is a part of #742.

This PR changes the underlying geo-type data structures to support 3D data and measurement values (M and Z values). The PR attempts to cause relatively minor disruptions to the existing users (see breaking changes below). My knowledge of the actual geo algorithms is limited, so please ping me for any specific algo change.

geo-types restructuring

All geo type structs have been renamed from Foo<T>(...) to Foo<T, Z=NoValue, M=NoValue>(...), and several type aliases were added:

// old
pub struct LineString<T: CoordNum>(pub Vec<Coordinate<T>>);

// new
pub struct LineString<T: CoordNum, Z: ZCoord=NoValue, M: Measure=NoValue>(pub Vec<Coordinate<T, Z, M>>);

pub type LineStringM<T, M=T> = LineString<T, NoValue, M>;
pub type LineString3D<T> = LineString<T, T, NoValue>;
pub type LineString3DM<T, M=T> = LineString<T, T, M>;

NoValue magic

NoValue is an empty struct that behaves just like a number. It supports all math and comparison operations. This means that a Z or M value can be manipulated without checking if it is actually there. This code works for Z and M being either a number or a NoValue:

pub struct NoValue;

impl<T: CoordNum, Z: ZCoord, M: Measure> Sub for Coordinate<T, Z, M> {
    type Output = Self;

    fn sub(self, rhs: Self) -> Self {
        coord! {
            x: self.x - rhs.x,
            y: self.y - rhs.y,
            z: self.z - rhs.z,
            m: self.m - rhs.m,
        }
    }
}

Variant algorithm implementations

Function implementations can keep just the original 2D <T> variant, or add support for 3D <Z> and/or the Measure <M>. The above example works for all combinations of objects. This function only works for 2D objects with and without the Measure.

impl<T: CoordNum, M: Measure> Line<T, NoValue, M> {
    /// Calculate the slope (Δy/Δx).
    pub fn slope(&self) -> T {
        self.dy() / self.dx()
    }
}

Breaking changes

  • It is no longer possible to create Coordinate with just x and y values using implicit tuple constructor. Use coord! instead.
  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

@nyurik nyurik requested a review from rmanoka April 6, 2022 17:00
@nyurik nyurik changed the title Implement 3D and Measure support for geo types Implement 3D and Measure support for geo types only Apr 6, 2022
@nyurik nyurik changed the title Implement 3D and Measure support for geo types only Implement 3D and Measure support for geo-types only Apr 6, 2022

pub trait CoordFloat: CoordNum + Float {}
impl<T: CoordNum + Float> CoordFloat for T {}

pub trait ZCoord: NumOps + Zero + Copy + PartialEq + PartialOrd + Debug {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add docs for the ZCoord and Measure traits

Also, how come the trait bounds here (NumOps + Zero + ...) differ from CoordNum?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how come the trait bounds here (NumOps + Zero + ...) differ from CoordNum?

Actually, is there any reason why we couldn't make ZCoord just extend CoordNum?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to support CoordNum in NoValue, we would need to implement ToPrimitive, NumCast, and Num traits. Possible implementation below. I am worried though that it will cause more harm than good -- i.e. Num::from_str_radix() shouldn't allow all possible values, etc. Also, having fewer traits ensures that algorithms that don't work with NoValues are not accidentally used in that way - e.g. someone wouldn't be able to create a CoordTZM<NoValue, f64, NoValue> (only Z value is given) -- we expect coordinate to always have real x,y values.

impl ToPrimitive for NoValue {
    fn to_i64(&self) -> Option<i64> {
        Some(0)
    }

    fn to_u64(&self) -> Option<u64> {
        Some(0)
    }
}

impl NumCast for NoValue {
    fn from<T: ToPrimitive>(_: T) -> Option<Self> {
        Some(Self::default())
    }
}

impl Num for NoValue {
    type FromStrRadixErr = ();

    fn from_str_radix(str: &str, radix: u32) -> Result<Self, Self::FromStrRadixErr> {
        Ok(Self::default())
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the response, I think I understand. I am going to check out your git branch in the next day or two and experiment with this so I have a better grasp of this

geo-types/src/novalue.rs Outdated Show resolved Hide resolved
Copy link
Member

@lnicola lnicola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anything obviously wrong here, but it's too complicated for me to properly think about it these days. I'm not convinced this is the best design, but I can't suggest a better one.

@nyurik
Copy link
Member Author

nyurik commented Apr 9, 2022

Question for the community: for the types with the measure value, should the M type be the same as T? E.g. which of these types is better. Note that the TZM variants still uses 3 different generic types.

LineStringZM<T> = LineStringTZM<T, T, T>

  • PROs: usage simplicity (single generic type), possibly easier mathematics in various algos
  • CONs: a bit less flexibility

LineStringZM<T, M> = LineStringTZM<T, T, M>

  • PROs: flexibility - measure could be any numeric value different from T, even a custom user type that implements the same traits.
  • CONs: always have to specify two generic types

@lnicola
Copy link
Member

lnicola commented Apr 9, 2022

I don't have much experience with these, but it feels that should be different types, even if in practice they might all be floats.

@nyurik
Copy link
Member Author

nyurik commented Apr 9, 2022

I don't have much experience with these, but it feels that should be different types, even if in practice they might all be floats.

@lnicola it would still be possible to create values with different M types even if we have a single generic type like LineStringZM<T>. If some user wants them different, they can always use the "real" TZM value -- LineStringTZM<T,Z,M>, with any combination of the generic types.

This question is more about convenience i guess -- having a single generic type makes the new types easier to declare (you don't need to write LineStringZM<f64,f64>) -- but this is only relevant if typically M value is OK to be the same type.

@lnicola
Copy link
Member

lnicola commented Apr 9, 2022

I know it's an alias and traditionally they're all floats anyway, but it's not uncommon to define newtypes for different measures (angles, distances etc.). It seems more clear to have different types and show the user that they can put anything there instead of having an easy path of f32 everywhere.

@nyurik
Copy link
Member Author

nyurik commented Apr 9, 2022

I looked at all the usages of M I can find - and it seems Postgis is the only major one, which uses identical type for M and others. I will simplify M for the most common usecase for now, unless we have evidence of a common real world use-case where M must be different. This will still allow any user to use TZM variant with any M values they want, even define their own struct that implements numeric operations.

@nyurik
Copy link
Member Author

nyurik commented Apr 9, 2022

Another thought- actually we can do both -- type aliases are cheap. Just need a good name, like PointMc for custom M

@lnicola
Copy link
Member

lnicola commented Apr 10, 2022

@nyurik why not pub type LineStringM<T, M = T> = LineStringTZM<T, NoValue, M>;?

@nyurik
Copy link
Member Author

nyurik commented Apr 10, 2022

@nyurik why not pub type LineStringM<T, M = T> = LineStringTZM<T, NoValue, M>;?

@lnicola this is a great idea! Moreover, I think we can get rid of the TZM types all together thanks to this approach -- instead of introducing a new LineStringTZM, we can simply define struct LineString<T, M=NoValue, Z=NoValue> {...}, plus introduce a few helper type aliases like type LineStringM<T, M=T> = LineString<T, NoValue, M>;

@nyurik
Copy link
Member Author

nyurik commented Apr 11, 2022

Success! It seems we can introduce this change with far fewer breaking changes than originally anticipated, all thanks to generic parameter defaults. We can keep the original geo type names as before, and just add two more generic type parameters with NoValue as default.

Also, it seems the moment the new proj version is released, we can even merge this without any geo-types unlinking/patching from the proj or wkt.

New structure (also updated in the description above):

// old
pub struct LineString<T: CoordNum>(pub Vec<Coordinate<T>>);

// new
pub struct LineString<T: CoordNum, Z: ZCoord=NoValue, M: Measure=NoValue>(pub Vec<Coordinate<T, Z, M>>);
pub type LineStringM<T, M=T> = LineString<T, NoValue, M>;
pub type LineStringZ<T> = LineString<T, T, NoValue>;
pub type LineStringZM<T, M=T> = LineString<T, T, M>;

geo-types/src/coordinate.rs Outdated Show resolved Hide resolved
}

impl<T: CoordNum, Z: ZCoord, M: Measure> Coordinate<T, Z, M> {
pub fn new(x: T, y: T, z: Z, m: M) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding docs here encouraging the user to utilize the coord! macro?

geo-types/src/coordinate.rs Show resolved Hide resolved
geo-types/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member

@frewsxcv frewsxcv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise, this is looking great @nyurik! Amazing work 👏🏻

One thing I'd like to verify before we merge is performance impact. Can you cargo bench your branch and main and share the results?

* Remove `set_lng()` -- use `set_x()` instead.
* Remove `lat()` -- use `y()` instead.
* Remove `set_lat()` -- use `set_y()` instead.
* BREAKING: Remove deprecated `Polygon<T>::is_convex()` -- use `geo::is_convex` on `poly.exterior()` instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is geo::is_convex?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frewsxcv this is part of the deprecation PR #772 -- https://github.com/georust/geo/pull/772/files#diff-3efed880fa0562aa828523c8b8024b977ea2e9a3ea079be1755cc8def1f84108L416-L417 -- i simply copied the deprecation note to the CHANGES.md

@nyurik
Copy link
Member Author

nyurik commented Apr 16, 2022

One thing I'd like to verify before we merge is performance impact. Can you cargo bench your branch and main and share the results?

@frewsxcv theoretically there should be no perf impact, but more practically there are several issues:

  • I tried running cargo bench on my laptop 3 times just on the main branch itself, and my results were inconsistent up to 10.1%. That's without changing to any other branch, just rerunning on the main branch.
  • Running cargo bench on this PR makes no sense because the geo-types used by the bench code is not the code that changed (this is the case where patching crate's location actually does make sense, esp because neither wkt nor proj are used). I will try to run the benchmarks a few times to see what perf i get.

PS I created #809 to fix some annoying warnings during the cargo bench. Should be a noop.

@Komzpa
Copy link

Komzpa commented May 7, 2022

tuple, where m is 1 by default, and z is 0 by default.

er, no. M and Z can be present or absent per-geometry, and Z=0 is not same as absent Z. Box with 0 as Z will only intersect geometries with Z=0, box with absent Z will intersect geometries with any Z values.

@nyurik
Copy link
Member Author

nyurik commented Jun 8, 2022

@michaelkirk @frewsxcv et al, i'm a bit stuck now. This PR was approved and ready to merge, and now it is in a bit of a limbo. Since then, there has been a number of changes made to the repo that are fairly difficult to merge with this PR, and I really don't want to restart the merge process until there is an agreement with the direction, or else I'm risking spending many more hours on this only to see it not being merged again. What are your thoughts?

P.S. I have pushed a non-working merge attempt that ignores some of the breaking changes, just so that the more obvious changes don't have to be done again

@michaelkirk
Copy link
Member

My concern is that this is an intrusive change, and that it is going to be hard to back out if it proves to be the wrong approach.

Is there some real world use cases that you had in mind that we could evaluate it against?

tuple, where m is 1 by default, and z is 0 by default.

er, no. M and Z can be present or absent per-geometry, and Z=0 is not same as absent Z. Box with 0 as Z will only intersect geometries with Z=0, box with absent Z will intersect geometries with any Z values.

It seems like there is not a clear consensus on how this should behave. @nyurik - If you want to move it forward, I'd suggest seeing how these alternative approaches would affect each of our algorithms in practice and reporting back here. What do other libraries do in these cases? Does that approach seem to cause problems for those library users? What are the pitfalls inherent to the alternatives - etc.

Alternatively, if this work was done in a fork where it could be evaluated for a while, or somehow behind a feature flag, then I'd be much less concerned.

@frewsxcv
Copy link
Member

frewsxcv commented Jun 8, 2022

I know this doesn't address all the points that @michaelkirk brought up, but one way we can reduce scope and potentially unblock some of this is by removing M altogether and revisiting that in a follow-up discussion or pull request.

@urschrei
Copy link
Member

Something I've been turning over in my mind as a related issue: we currently have algorithms (the vast majority) that operate on Euclidean space, (or more precisely on coordinates in the real plane). There are also some that operate on spherical coordinates, and I don't want to talk about them for now.

In any case, if we were to merge the minimum viable changes to allow z coordinates, we would also have to make it clear that all our existing algorithms ignore z, and potentially add a level to our algorithm module that distinguishes between algorithms that operate on 2D coordinates (ignoring z values but maintaining them where they exist), and algorithms that operate on 3D coordinates (and possibly won't accept 2D coordinates), in a more robust way than e.g. our EuclideanDistance / VincentyDistance split.

I haven't thought at all about 3D algorithms and I frankly don't intend to – it's simply not an area I work in – but I'd like to ensure that even if we do manage to make this change in a technically sound way, that we also work out a way that keeps the distinction clear for users of geo, the vast majority of whom are unlikely to need z coordinates.

Co-authored-by: Corey Farwell <coreyf@rwell.org>
@urschrei
Copy link
Member

urschrei commented Aug 29, 2022

I have had a go at rebasing this. It is neither correct nor complete, but probably…almost?
FAO: @nyurik

Individual crate tests are all passing. Still seeing a top-level compilation failure due to (I think) a hard-wired geojson dep in bool-ops-benches. Still, progress.

https://github.com/georust/geo/tree/dim-types-rebased

@nyurik
Copy link
Member Author

nyurik commented Sep 2, 2022

Thanks @urschrei! I keep wondering if @michaelkirk idea might be better (not a hard change really) -- instead of having Foo<T,Z,M>, introduce a Foo<T, C=Coordinate<T>> -- where the Coordinate is still a struct with {pub x: T, y: T}. This way far less code change, still can use .x and .y without the parenthesis because all Foo will implement the Deref trait. This way we can very easily introduce Coordinate3D implemented as {pub x:T, y:T, z:T} or something with M, or even a custom implementation supplied by the user.

@nyurik
Copy link
Member Author

nyurik commented Sep 2, 2022

P.S. the only problem with the above is that all existing algorithms will continue to work in 2D space, so if the storage is Coordinate3D, which still has x and y, all algorithms will continue treating them as a 2D object, instead of refusing to compile. A separate set of functions would be needed that are explicitly 3D focused - i.e. they will all use .z in addition to x and y. Same goes for the functions that support measure param. My initial approach would force each algorithm implementer to explicitly declare which variant they want to handle - i.e. all (ignore z & m), just 3D, just 2D, etc., and if someone tries to use them incorrectly, it would fail to compile.

@urschrei
Copy link
Member

urschrei commented Sep 2, 2022

all existing algorithms will continue to work in 2D space

Depending on your point of view, this could actually be a good thing. 2D is the default and – in my opinion, and in keeping with the industry standard, i.e. JTS and GEOS specifically note this (see link) – privileged coordinate space in geo. We could have an entirely separate crate for 3D algorithms, while allowing Z and M values to "come along for the ride" in geo. That would eliminate a lot of confusion instantly IMO.

@nyurik
Copy link
Member Author

nyurik commented Sep 2, 2022

@urschrei I think this is slightly more complex. Think of the following cases to compute a distance between two points a & b using a.euclidean_distance(&b), for different types of a and b:

  • a and b are in 2D: all good, we have that now.
  • a and b are in 3D: silently drops z coordinate, and computes in 2D
  • if the type of a is different from b, even if both have x and y, it probably won't compile, even though technically they should because euclidean_distance is designed as a 2D only operation, so it doesn't matter what other values are there

Compare this with a more explicit euclidean_distance that works differently depending on the coordinate type, and which refuses to compile if the operation is undefined.

I think silent drop of an internal value could lead to surprising bugs

@urschrei
Copy link
Member

urschrei commented Sep 2, 2022

But this is my point: Z values can be supplied, but they aren't used. This is what JTS does today:

Coordinates are two-dimensional points, with an additional Z-ordinate. JTS does not support any operations on the Z-ordinate except the basic accessor functions. If a Z-ordinate value is not specified or not defined, constructed coordinates have a Z-ordinate of NaN (which is also the value of NULL_ORDINATE). The standard comparison functions ignore the Z-ordinate.

This is what I'm asking: what if we allowed Z values in Point and Coordinate, kept Z values where possible when processing geometries with algorithms, but didn't use them for anything in geo. That's what JTS and GEOS do, and that's what people are expecting. What if 3D algorithms were available in a separate crate that used geo-types, but whose algorithms were only type-compatible with 3D coordinates?

Are there use cases that are possible outside the Rust ecosystem today that would then not be feasible in geo?

@nyurik
Copy link
Member Author

nyurik commented Sep 2, 2022

Thanks for the link! That's exactly what I don't have a good answer to -- is the "silently ignore" a good approach in Rust? We usually try to make the system as fool proof as possible simply because we have a lot more compilation control, and are able to catch many mistakes at the compile time.

If a person tries to do an a.euclidean_distance(&b) on two 3D points, what is the expectation? If the expectation is that it is always in 2D with the z coordinate dropping, we should support the case of a being 2D and b being 3D, without casting. If the expectation is that it computes the real distance (whatever that is), then a mix of 2D and 3D shouldn't work, but using two 3D points should use a different algorithm internally than for two 2D points. I honestly don't know what the true expectation is for this and other algorithms we have.

@urschrei
Copy link
Member

urschrei commented Sep 2, 2022

If a person tries to do an a.euclidean_distance(&b) on two 3D points, what is the expectation? If the expectation is that it is always in 2D with the z coordinate dropping, we should support the case of a being 2D and b being 3D, without casting.

This is what I am currently in favour of in geo. I'm just trying to figure out whether this matches other peoples' expectations, and whether this would actually enable broader compatibility with other software, and eventual adoption of geo as a low-level library as an alternative to GEOS.

@frewsxcv
Copy link
Member

frewsxcv commented Sep 2, 2022

Just like how we have a # Units top-level heading for some of our algorithms (like our area algorithms), we can have something like # Dimensionality or # Dimensions that makes it very clear to readers what dimensions the algorithm operates in. In my opinion that seems like enough clarity to users what they should expect.

@nyurik
Copy link
Member Author

nyurik commented Sep 2, 2022

If I understand it correctly, given two 3D points and these two choices:

  • algorithm not designed to work with 3D silently treats it as 2D
  • algorithm not designed to work with 3D fails to compile

we want to use the first approach? This seems very anti-Rust design. I think instead we should require an explicit cast to 2D to ensure the user really understands the implications.

@frewsxcv
Copy link
Member

frewsxcv commented Sep 2, 2022

I don't think it'd be possible to cast a 3D to 2D– you'd have to drop all the 3D coordinate values. Can you go into more detail with what you're thinking with that cast?

@urschrei
Copy link
Member

urschrei commented Sep 2, 2022

I think this what I'm wondering too. Is there an inherent disadvantage to what JTS and GEOS do (treat as 2D)? How would [some other approach] be an improvement?

@nyurik
Copy link
Member Author

nyurik commented Sep 2, 2022

I don't think it'd be possible to cast a 3D to 2D– you'd have to drop all the 3D coordinate values. Can you go into more detail with what you're thinking with that cast?

That's exactly what I mean -- if an algorithm is only defined for 2D coordinates, and the object has 3D coordinates, I think there must be a way for the user to explicitly "downcast" it to 2D -- a way for the user to say "yes, I know I have 3D objects, but I want to ignore their z-coordinate when calling this algorithm which only understands 2D".

I am not certain of how we can achieve this -- I'm trying to state (and agree on) the ideal usage first. If not possible to implement, then sure, let's discuss next best thing.

So if we do want users to be explicit (IMO - a much safer approach), we could introduce a "2D object wrapper" that has the following properties:

  • works transparently on all 2D objects, without any additional cast, e.g. a.euclidean_distance(&b)
  • requires a special casting for 3D objects, e.g. a.as2d().euclidean_distance(b.as2d()) - where as2d() would be a readonly ref, and wouldn't actually allocate any new object. Essentially this must be a noop when executing.
  • The as_2d() should be available on all geometry types.

@nyurik
Copy link
Member Author

nyurik commented Sep 2, 2022

I think this what I'm wondering too. Is there an inherent disadvantage to what JTS and GEOS do (treat as 2D)? How would [some other approach] be an improvement?

The main disadvantage is a potential for bugs. We should not rely on users reading documentation if we can make compiler do the work. When working with 3D coordinates, dropping to 2D algos should be an explicit desire, not auto-magical effect just because user accidentally forgot to check if the algo is 2d or 3d. I suspect JTS/GEOS either didn't think it through, or maybe they did, but couldn't implement it in java/c++

@bjornharrtell
Copy link
Contributor

bjornharrtell commented Sep 15, 2022

Long time user of JTS and its C# port NTS here with a growing interest in Rust land. IMHO, Z definitely an afterthought in JTS and I have a long standing issue that they have actually broken it, from my point of view, some time ago by introducing Z interpolation in many of the (purely 2D algorithms). For my use case ignoring Z with new coords produced by 2D algorithms set Z to undefined was great, it allowed me to post process Z in different ways. So I'm still trying to affect them to fix this situation.

That said I do wish for this PR to go forward. Not having Z in the geo primitives makes it less useful as defacto geometry types in Rust fx. to be an intermediate for other geometry representations.

PS. Note that JTS and its C# port NTS is providing the defacto geometry types in those ecosystems with lots and lots of software benefiting from being able to agree on a common interface.

@urschrei
Copy link
Member

That said I do wish for this PR to go forward. Not having Z in the geo primitives makes it less useful as defacto geometry types in Rust fx. to be an intermediate for other geometry representations.

PS. Note that JTS and its C# port NTS is providing the defacto geometry types in those ecosystems with lots and lots of software benefiting from being able to agree on a common interface.

I think that having Z available in some form is a question of "when", not "if", for exactly the reasons you note: I (and I don't think I'm alone in this, though I don't want to speak for any other core contributors) want to see geo-types and geo used more widely as foundational libraries.

That having been said, let me reiterate my opinion that focusing on 3D is a distraction. I want the semantics of interactions between (x, y) and (x, y, z) nailed down and clearly documented for library consumers as they pertain to planar geometries and algorithms, and then I want to get on with my life.

I don't particularly want 3D algorithms in geo in its current form (They can exist in separate modules or a separate crate – let's have a separate conversation about it, the decision isn't up to me).

@frewsxcv
Copy link
Member

Closing for inactivity. Feel free to reopen if you resume work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants