-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert CoordinateSequence from abstract to concrete class #674
Conversation
Very exciting. I think the preparedgeometryxy stuff is going to be required anyways, to really crank up point performance on the classic p-i-p case. A little surprised the gains were this small, but I guess C++ compilers are just really really good. |
Yes, even with the
They're not huge but these changes would enable more stuff down the line. A better algorithm for duplicate-point removal (or just delegate to |
a23f91d
to
27546f5
Compare
One thing I'd like to cogitate on is the extent to which this does/doesn't foreclose on a higher dimension strategy that splits the XY coordinates from the Z and M values, so a CoordinateSequence in storage terms ends up like
I think this is doable, but I haven't spent enough time cataloguing the places that make use of Z and M coordinates (except, obviously, the readers and writers) and whether a move to having |
I don't think the current implementation is the final implementation, but switching to a concrete class makes experimentation a bit easier, I think. Here is one way to handle multiple dimensions with a single class: https://gist.github.com/dbaston/3902b785ed3d841333d2b1c7f3b4ed25 I'm not sure if it's a good way or not. I don't like the potential for slicing (pass a |
Well that's slick! I guess the codebase would have to be changed to delegate creating new |
Implementation is taken from CoordinateArraySequence. FixedSizeCoordinateSequence is removed.
27546f5
to
cc2282c
Compare
Backing |
In JTS there is a clear distinction between It seems to me that basing
So how about reintroducing this distinction in GEOS? |
My comment probably wasn't clear. I am working on a The distinction between extensible and non-extensible sequences seems like a different discussion. GEOS also has the |
I've put together additional implementations of
In the benchmarking I've done, the
|
Doesn't it have a bearing on whether CoordinateSequences are better off using a |
Does the (slightly) lower performance of the union-based |
For the record, I'm in favour of keeping the |
What would be the advantage of a fixed-size buffer over a vector? You can initialize a vector to whatever size you want; there is no overhead beyond dynamically allocating a fixed-size buffer.
I think it comes from
The tests are done with each of the underlying representations described here. Unless I'm misunderstanding the question.
I don't understand how this would work or why it would be simple. How would you get a |
|
I don't think there's anything particularly magical about fixing the size of a CoordinateSequence. I look forward to seeing what happens when higher dimensionality comes into play. If one pre-supposes that "most" GEOS usage is 2D, then a 2D main Coordinate buffer and optional Z and M double buffers might trade a little complexity of implementation for (maybe?) some performance gains in the "common" use case. But that's total finger-to-the-wind guessing. Didn't you have a templated branch packing higher dimensional coordinates into a different coordinate buffer as an optional thing? |
Doesn't the |
Via a method |
What is the benefit of introducing copy-on-access where we currently have none? |
Yes, that's the second implementation from left in the plots. https://github.com/dbaston/libgeos/tree/concrete-coordseq-buf |
Couldn't it allow return-by-reference of a |
Yes, you could avoid a copy for the 2D case. To back up, we are fundamentally talking about
This can return a reference for any Coordinate dimension, allows GEOS to retain a 3D default for now, and can be backed by an external buffer. vs
This returns a reference for 2D only, requires GEOS to universally switch to a 2D default or adopt copy-on-access by default, and can only be backed by an external buffer for the 2D case. What is the advantage? |
I'm fine with the simple concept of a buffer of XY, XYZ/M, or XYZM. Can't recall the reasons for the split ZM concept - @pramsey may know? |
The basis for splitting off Z/M from X/Y is totally theoretical! Except it's also practically implemented in the Shape file. Also the OGR feature implementation. It packs the things we actually care about and iterate on and read closer together (we only access Z and M for relatively rare introduced intersection calculation). Locality! Why? Because locality! It makes Coordinate smaller, and we do shove Coordinate around a great deal. We can increase and decrease the dimensionality of a CoordinateSequence really cheaply (I don't think we care about this?). All in all, a grab bag of not a lot. It might have made more sense way back when Z was first introduced as a much less invasive way to bring it in. |
Yes, I think a future GEOS mostly uses I have a separate branch (#701) that adds the |
Superseded by #721 |
This PR
FixedSizeCoordinateSequence
andCoordinateArraySequence
classCoordinateSequence
class from an abstract to a concrete class, with the implementation cribbed fromCoordinateArraySequence
I compared the performance to the current
main
branch using the suite of benchmarks in @pramsey's geos-performance. This shows:Point
geometry ("Prepared geometry", which is a point-in-polygon operation). This could be mitigated by providing dedicated C API functions for common point operations (e.g.,GEOSPreparedContainsXY
) and/or by using a union orstd::variant
to stash a singleCoordinate
within the concreteCoordinateSequence
.This PR does not implement any special algorithms to take advantage of the known structure of a
CoordinateSequence
, as discussed here. Some further performance gains may be possible by replacingstd::unique_ptr<CoordinateSequence>
withCoordinateSequence
(e.g., inLineString
) but I think a more fruitful track is to explore switching tostd::shared_ptr<CoordinateSequence>
instead.As discussed elsewhere, I think the benefits of the abstract
CoordinateSequence
are minimal, so this change is probably worthwhile even for a modest performance benefit. But I'm curious what others think.