-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-dimensional CoordinateSequence #721
Conversation
I was never a huge fan of |
A follow-on is figuring out how to properly handle various coordinate types throughout GEOS. I think we have three types of situations:
This last one is tricky. For example, how should we best replace a block like this one, where we're performing an operation on two coordinates from two difference sequences which may have different dimensionality? We don't want an
This particular case comes from a noding intersection finder. Those are virtual anyway, so I guess we could template the class on the coordinate types of the input geometries. If we didn't want to do that, the most flexible thing I've found so far is to move the block into a class with a templated method, like:
and then pass it to some dispatching functions defined elsewhere, so the original block gets replaced by something like
Clearly it's a loss for readability. Other ideas welcome! |
39b7354
to
a09c72c
Compare
Could you just ditch the idea of fetching coordinates of some specific type altogether? So, rather than getAt<CoordinateXY>(i), you just have getAt(i) and you always return a reference to a XYZM. If the algorithm wants to treat it as XY or XYZ, it's free to do that. You'd have to pad your array with (up to) 2 doubles to avoid an accvio. But I probably don't understand. :) |
The thinking is that most usage of GEOS does not involve Z or M, so it would be nice to avoid a 100% storage penalty for the common XY case. |
He's not proposing a 4D underlying vector, he's proposing that the reads always be via a Coordinate4D, and that the caller check the dimensionality of the CoordinateSequence to determine whether or not the Z/M values are garbage or not. |
1338046
to
2c87470
Compare
Implementation is taken from CoordinateArraySequence. FixedSizeCoordinateSequence is removed.
Improves performance of copy-to-buffer by about 75%
ed2d27c
to
07cc61b
Compare
I think this is ready to go. I've run an updated set of benchmarks against the current
Until the entire library is made to be more dimension-aware (if ever) we are still storing Z values in all cases to avoid a crash or incorrect result when accessing values by This PR does not include the ability to read coordinates directly from an external buffer. I think the best way to do this would be to replace the |
Any concerns with this one? |
79bf831
to
974a5de
Compare
CoordinateSequence::CoordinateSequence(std::size_t sz, std::size_t dim) : | ||
m_vect(sz * 3), | ||
m_stride(3u), | ||
m_hasdim(dim > 0), | ||
m_hasz(dim == 3), | ||
m_hasm(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running into another issue in Shapely.
If one would do GEOSCoordSeq_create(1, 4)
, which ends up above CoordinateSequence(1, 4)
, that would create a 3D coordseq with both hasz and hasm set to False.
I see that the documentation of both CoordinateSequence(std::size_t size, std::size_t dim = 0)
and the GEOSCoordSeq_create
C API says that this is only for creating XY or XYZ sequences (so dim
can be 2 or 3). So that's a user error that I pass a value of 4.
But just noting that before, this would raise an exception, while now you silently get wrong values if you subsequently set the ordinates with GEOSCoordSeq_setOrdinate_r
/ CoordinateSequence::setOrdinate
(the ordinate index 2 and 3 (z and m) will overwrite each other)
This is totally fixable on the Shapely side (we just need to verify the dimension on our side instead of just passing it to GEOS). But I do wonder if the GEOSCoordSeq_create
should be updated to allow passing dim=4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see #753
for (std::size_t i = 0; i < size; i++) { | ||
coords[i] = { *buf, *(buf + 1) }; | ||
coords->setAt(Coordinate{ *buf, *(buf + 1) }, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question: the above doesn't yet support creating a XYM coordinate sequence, right? (there would need to be another if (hasM)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is fixed with #772
This PR builds on #674 to implement
CoordinateSequence
as a concrete class capable of storing XY, XYZ, XYM, or XYZM coordinates. Like the liblwgeomPOINTARRAY
, it provides direct access to coordinates of a dimensionality compatible with the stored values, and copy access to higher-dimensionality coordinates padded with NaNs. For example, if aCoordinateSequence
stores XYM Coordinates, you can access XY and XYM Coordinates directly and can read an XYZM Coordinate with copying.It still needs some polish but I wanted to give a chance for overall feedback.
I have been running a series of benchmarks comparing this implementation to the current
main
branch. These numbers represent a worst case because we have the overhead of a multi-dimensionalCoordinateSequence
but we are continuing to store XYZ Coordinates for 2D geometries because most of GEOS depends on direct access to XYZ Coordinates even when the XY values are not needed. As we transition to the CoordinateXY type, we should start to see benefits from not storing Z values.Among these cases, the only degradation is in PIP tests, since we no longer have the stack-only
FixedSizeCoordinateSequence
. This is mostly but not completely mitigated by the newGEOSPreparedContainsXY
signature.I can think of two alternative implementations:
Coordinate
a semi-abstract class with direct access to X and Y values and virtual method access to Z and M values. I did not go this route because it would have required eachCoordinate
to store a pointer to its vtable, essentially taking up the space used by the Z value and negating the locality benefits that we expect for X and Y.CoordinateSequence
. Relative to the implementation in this PR, this approach would give more locality benefits for XY algorithms only in the case where the user is storing XYZ Coordinates but does not want the Z values used. This seems uncommon.