-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for M values #45
Conversation
| @@ -79,7 +79,7 @@ namespace tut | |||
| ensure( 0 != col); | |||
|
|
|||
| const size_t size0 = 0; | |||
| CoordinateSequencePtr sequence = factory->create(col); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this the API breakage, the need to pass -1, -1 ? Should they be default values instead, to retain API compatibility ? Do they need to be integer rather than booleans ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default values: The previous signature was with just one integer, i.e.
CoordinateSequence *create(std::vector<Coordinate> *coords, std::size_t dims)
with dims 1, 2 or 3. The new signature is
CoordinateSequence *create(std::vector<Coordinate> *coords, int hasZ, int hasM)
which, if it provided default values, would be compatible with the old signature, but the user would end up potentially passing 3 to hasZ which is undesirable. Hence breaking the API forces the user to adapt the calls correctly.
Why int and not bool: the value -1 is needed if the dimensions are unknown (i.e. what before was dimension=0. If -1, the dimensions are autodetected as they were before, i.e. in
CoordinateArraySequence::getHasZ() const
CoordinateArraySequence::getHasM() const
I suppose a way to retain API compatibility would be something like
CoordinateSequence *create(std::vector<Coordinate> *coords, std::size_t dims, int dim3isM)
with dim3isM accepting -1 (autodetect), 0 (false) or 1 (true). dims would then accept 1, 2, 3 or 4 (and dim3isM relevant only if dims=3). Perhaps this would be a better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some history: the "dims" parameter was added by Frank Warmerdam around 2010 to preserve dimension info when reading from WKT (see 6449265 and before).
The only really expected values for that are 0 for "determine from first coordinate value", 2 and 3.
I'm not really sure what the usecase for "autodetect" was, nor how good that autodetection is [ it's been recently brougth to my attention that some OGC implementors use POINT(NaN NaN) to encode an empty point in WKB form ] so I would not build too much on that.
What if we keep dims=0 as a generic, backward compatible "autodetect" (no number of dims, no semantic, all autodetected) and we add a boolean dim2isM, defaulting to false ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you mean dim3IsM instead of dim2isM? In any event, yes this approach should work equally well and preserve compatibility.
Concerning the POING(NaN NaN) aspect, the main issue is probably whether the WKT and WKB parsers are able to correctly handle such values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Updated patch has backwards compatible signatures (and also includes the previous |
|
Squashed and rebased patch, all tests still passing. Side note: if whitespace-cleanup-only patch would be appreciated, I can work on one. Occasionally somewhat annoying when editors insist on cleaning up whitespace which end up polluting the patch. |
|
Better configure editors to avoid those cleanups, and manually do it About the patch, I'd like to see tests of how it affects memory usage. |
|
Uhm, the cases you listed issues of excessive ram consumption so not really suited for comparison since already the m-less version has problems. I'm not sure how I can best present some useful numbers. I could run some buffers on complexish layers and report peak memory usage for the two variants, perhaps though you have better benchmark metrics on your mind? |
|
I was just stressing out that there is a memory usage problem already and adding an M member to points is likely to make it worst. It's ok to show numbers from different cases, whatever you have handy. I would still love to see numbers with 2D-only coordinates (is it still possible to build such version of GEOS at compile-time via macros definitions ?) |
|
Okay, so... So in this case, there is a 18% increase in memory usage. As far as memory usage is concerned, I think it is pretty predictable that in the worst case, we are looking at a 33% increase (one double more in the coordinate class). Concerning the pre-existing memory usage issues: in the cases where the algorithms suffer from continuous memory increase, adding one coordinate will just make you hit the limit sooner, but won't change the core issue that is that the algorithms probably get stuck in some loop. (Concerning 2D only coordinates: no, it is not currently possible without adding many So much for memory usage. The runtime performance worries me more: Something is needing many more CPU cycles when buffering. Probably due to the added complexity in |
|
I'd prefer to have a compile time define and a ./configure switch to enable. |
|
Concerning the configure switch, I'm somewhat worried that this might lead to certain packages having it enabled while others not. Would you have it enabled by default? Trying to address the general concern that is memory consumption, which I take is the controversial issue here: I suppose the severity of the issue really depends on how people are using GEOS. Typically for huge data sets you'd have some spatial data structure partitioning the geometries, and only a small subset of those will actually be loaded as GEOS geometries at one particular time. So the memory consumption in those scenarios should not be a huge penalty. If you are loading all the geometries of a huge dataset in memory at once, you're kind of waiting for memory-blowups to happen, a system may as well have 2GB of memory as it might have 8 or any other amount. It is not that I don't see this as an unfortunate side effect, it is just that the only other alternative I see is using dynamic memory allocation, which adds a performance penalty throughout, whereas the memory increase caused by this approach typically should not really hit the users, unless they are storing loads of geometries in memory at once. |
|
I'm fine with having the M support enabled by default, as long as
it can be turned off with a configure switch.
|
|
Ok cool, I'll take care of than then. |
|
Ok this is done. Concerning the performance regression spotted when benchmarking, this actually is unrelated to this particular commit, but it is a regression which is already in trunk. With the above mentioned test case: This should probably be bisected. |
|
Please do not mix style and functional changes. This patch is full of tabs to spaces (or spaces to tab) which make it bigger than needed and harder to read. Could you please remove the indentation changes and squash-rebase against current trunk ? Will review after that. About the performance regression, please file a ticket on trac: https://trac.osgeo.org/geos |
|
Bummer, missed the whitespace reformatting, fixed now. I'll see if a quick git-bisect is able to pin-point the issue, then sure I'll file a ticket. |
|
Just a note concerning the performance regression issue: this looks more like an issue with a combination of compiler flags than with the actual code. That is to day, compiling the latest stable release with the flags I'm using now I notice the same performance issues. I'll still try to figure out which of these are responsible, but it is likely not an upstream issue. |
|
About performance. Dirty check: "time make check" returns in 36 seconds with your patch, 12 without. Same mantra for building: ./autogen.sh && ./configure && make |
|
Uhm, really? Versus Which test is taking so long for you? |
|
How did you get that nice output ?
Anyway, trying again with a more controlled environment seems to be
giving comparable times: ~17.22 secs with trunk, ~17.43 with m_values
(laptop running on batteries this time)
|
|
Any reason to keep GEOS_MVALUES out of include/config.h ?
|
|
|
|
The command output. I don't get that with "make check".
Maybe a different automake version or configuration ?
I'm using automake (GNU automake) 1.14.1
(will check config.h later or next week)
|
|
I have automake 1.15, so might well be the newer version. |
|
Ok so it is now defined in |
|
I think the windows one is not really processed but just copied
verbatime from .h.vc to .h, and only when not using cmake
(I don't build on windows either).
About robust: did you add a test for each code path you touched ?
|
|
Mh actually some more tests should indeed be extended to cover M (and also Z I suppose) coordinates, such as |
|
Regarding
|
|
@manisandro in addition to tests (which I understood you were taking a look to) could you please also file an enhancement ticket on https://trac.osgeo.org/geos (so to assign a milestone to it). Thanks |
|
@strk I had had a look at the issue, and there are many functionalities which are not covered by any test at all, where I simply lack the insight and time to understand what the expected result would be. In the interest of moving this forward, it might be helpful to list which tests are missing and then I'll look which tests I can implement given my current work load. |
|
I've squash-merged this into an NOTE: a trac ticket is still needed if not opened yet |
|
For a start, tests are failing: |
|
@manisandro Thanks for undertaking this critically important work. @strk is there any way to help move this forward? Supporting multiple dimensions is critical for many applications and PostGIS, for one, relies on this library. |
|
Showing some numbers about the RAM usage with big geometries
comparing M and M-less versions would be a start.
Comparing speed between virtual and non-virtual versions
of CoordinateSequence could be another (harder, but even
more long-sighted).
|
RFC
Overall remark: the patch is pretty invasive for the reason that it adds support for M-values with and without Z-values (the occasional reference to future M-value support in the original code seem to indicate that a possible 3rd dimension was always to be Z and a possible 4th dimension always M).
All tests still pass with this patch.
In short, this patch removes internal
dimensionvariables, using instead variables such ashasZandhasM(withdimensiongetters then returning2 + hasZ + hasM). It adjusts WKT and WKB readers to correctly handle combinations of XY, XYZ, XYM and XYZM geometries. It also introduces variousinterpolateMstyle functions, which are copies of the respectiveinterpolateZfunctions (with the exception ofElevationMatrixrelated methods which I suppose don't make sense for M values).A C++ API break was introduced with the
Coordinate{Array}Sequenceconstructors. For the rest, compatibility is preserved.Questions:
Does the old WKT style only handle possible 3D values (i.e.
POINT (1 2 3)), or also Z and M (i.e.POINT (1 2 3 4))?C-API: how to deal with XYM geometries? This needs to be considered in the following functions:
GEOSCoordSeq_create(handle, size, dims)
GEOSWKTWriter_setOutputDimension(handle, writer, dim)
GEOSWKBWriter_setOutputDimension(writer, newDimension)
Both these now accept
4as dimension for XYZM geometries, and3means XYZ as before. For XYM geometries, I suppose the only solution is to add new functions such as