Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for M values #45

Closed
wants to merge 4 commits into from
Closed

Add support for M values #45

wants to merge 4 commits into from

Conversation

manisandro
Copy link
Contributor

RFC

Overall remark: the patch is pretty invasive for the reason that it adds support for M-values with and without Z-values (the occasional reference to future M-value support in the original code seem to indicate that a possible 3rd dimension was always to be Z and a possible 4th dimension always M).

All tests still pass with this patch.

In short, this patch removes internal dimension variables, using instead variables such as hasZ and hasM (with dimension getters then returning 2 + hasZ + hasM). It adjusts WKT and WKB readers to correctly handle combinations of XY, XYZ, XYM and XYZM geometries. It also introduces various interpolateM style functions, which are copies of the respective interpolateZ functions (with the exception of ElevationMatrix related methods which I suppose don't make sense for M values).

A C++ API break was introduced with the Coordinate{Array}Sequence constructors. For the rest, compatibility is preserved.

Questions:

  • Does the old WKT style only handle possible 3D values (i.e. POINT (1 2 3)), or also Z and M (i.e. POINT (1 2 3 4))?

  • C-API: how to deal with XYM geometries? This needs to be considered in the following functions:

    GEOSCoordSeq_create(handle, size, dims)
    GEOSWKTWriter_setOutputDimension(handle, writer, dim)
    GEOSWKBWriter_setOutputDimension(writer, newDimension)

Both these now accept 4 as dimension for XYZM geometries, and 3 means XYZ as before. For XYM geometries, I suppose the only solution is to add new functions such as

GEOSCoordSeq_create_zm(handle, size, hasZ, hasM)
GEOSWKTWriter_setOutputDimension_zm(handle, writer, hasZ, hasM)
GEOSWKBWriter_setOutputDimension_zm(writer, hasZ, hasM)

@@ -79,7 +79,7 @@ namespace tut
ensure( 0 != col);

const size_t size0 = 0;
CoordinateSequencePtr sequence = factory->create(col);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this the API breakage, the need to pass -1, -1 ? Should they be default values instead, to retain API compatibility ? Do they need to be integer rather than booleans ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default values: The previous signature was with just one integer, i.e.

CoordinateSequence *create(std::vector<Coordinate> *coords, std::size_t dims)

with dims 1, 2 or 3. The new signature is

CoordinateSequence *create(std::vector<Coordinate> *coords, int hasZ, int hasM)

which, if it provided default values, would be compatible with the old signature, but the user would end up potentially passing 3 to hasZ which is undesirable. Hence breaking the API forces the user to adapt the calls correctly.


Why int and not bool: the value -1 is needed if the dimensions are unknown (i.e. what before was dimension=0. If -1, the dimensions are autodetected as they were before, i.e. in

CoordinateArraySequence::getHasZ() const
CoordinateArraySequence::getHasM() const

I suppose a way to retain API compatibility would be something like

CoordinateSequence *create(std::vector<Coordinate> *coords, std::size_t dims, int dim3isM)

with dim3isM accepting -1 (autodetect), 0 (false) or 1 (true). dims would then accept 1, 2, 3 or 4 (and dim3isM relevant only if dims=3). Perhaps this would be a better approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some history: the "dims" parameter was added by Frank Warmerdam around 2010 to preserve dimension info when reading from WKT (see 6449265 and before).

The only really expected values for that are 0 for "determine from first coordinate value", 2 and 3.

I'm not really sure what the usecase for "autodetect" was, nor how good that autodetection is [ it's been recently brougth to my attention that some OGC implementors use POINT(NaN NaN) to encode an empty point in WKB form ] so I would not build too much on that.

What if we keep dims=0 as a generic, backward compatible "autodetect" (no number of dims, no semantic, all autodetected) and we add a boolean dim2isM, defaulting to false ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean dim3IsM instead of dim2isM? In any event, yes this approach should work equally well and preserve compatibility.

Concerning the POING(NaN NaN) aspect, the main issue is probably whether the WKT and WKB parsers are able to correctly handle such values.

Copy link
Member

@strk strk Apr 1, 2015 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manisandro
Copy link
Contributor Author

Updated patch has backwards compatible

CoordinateSequence *create(std::vector<Coordinate> *coords,
                           std::size_t dimension = 0,
                           bool dim3isM = false) const;

CoordinateSequence *create(std::size_t size, std::size_t dimension = 0,
                           bool dim3isM = false) const;

signatures (and also includes the previous 2DM -> 3DM fix).

@manisandro
Copy link
Contributor Author

Squashed and rebased patch, all tests still passing.

Side note: if whitespace-cleanup-only patch would be appreciated, I can work on one. Occasionally somewhat annoying when editors insist on cleaning up whitespace which end up polluting the patch.

@strk
Copy link
Member

strk commented Apr 21, 2015

Better configure editors to avoid those cleanups, and manually do it
whenever there's a need to touch a specific line.

About the patch, I'd like to see tests of how it affects memory usage.
See for example:

@manisandro
Copy link
Contributor Author

Uhm, the cases you listed issues of excessive ram consumption so not really suited for comparison since already the m-less version has problems. I'm not sure how I can best present some useful numbers. I could run some buffers on complexish layers and report peak memory usage for the two variants, perhaps though you have better benchmark metrics on your mind?

@strk
Copy link
Member

strk commented May 4, 2015

I was just stressing out that there is a memory usage problem already and adding an M member to points is likely to make it worst. It's ok to show numbers from different cases, whatever you have handy.

I would still love to see numbers with 2D-only coordinates (is it still possible to build such version of GEOS at compile-time via macros definitions ?)

@manisandro
Copy link
Contributor Author

Okay, so...

$ wget https://smani.fedorapeople.org/tmp/test.cpp
# Using new 4D coordinates
$ g++ -O3 -flto -g -o test test.cpp -lgeos
$ valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out ./test; cat massif.out | grep mem_heap_B | sed -e 's/mem_heap_B=\(.*\)/\1/' | sort -g | tail -n 1
[...]
31879168
# Same with old 3D coordinates
26820608

So in this case, there is a 18% increase in memory usage. As far as memory usage is concerned, I think it is pretty predictable that in the worst case, we are looking at a 33% increase (one double more in the coordinate class).

Concerning the pre-existing memory usage issues: in the cases where the algorithms suffer from continuous memory increase, adding one coordinate will just make you hit the limit sooner, but won't change the core issue that is that the algorithms probably get stuck in some loop.

(Concerning 2D only coordinates: no, it is not currently possible without adding many ifdefs)

So much for memory usage. The runtime performance worries me more:

# Using new 4D coordinates
$ time ./test
real    0m0.951s
user    0m0.946s
sys     0m0.006s
# Same with old 3D coordinates
$ time ./test
real    0m0.251s
user    0m0.246s
sys     0m0.003s

Something is needing many more CPU cycles when buffering. Probably due to the added complexity in OverlayOp. But before chasing that issue, I'd like to know your position on memory usage and whether you are okay with this approach per-se. Personally the memory usage does not worry me, the only other alternatives I see are dynamic memory allocation for the coordinates or possibly inheritance for the Coordinate class (which would be very invasive indeed).

@strk
Copy link
Member

strk commented May 22, 2015

I'd prefer to have a compile time define and a ./configure switch to enable.
You might also want to ask this on the mailing list to hear what other users think about it.

@manisandro
Copy link
Contributor Author

Concerning the configure switch, I'm somewhat worried that this might lead to certain packages having it enabled while others not. Would you have it enabled by default?

Trying to address the general concern that is memory consumption, which I take is the controversial issue here: I suppose the severity of the issue really depends on how people are using GEOS. Typically for huge data sets you'd have some spatial data structure partitioning the geometries, and only a small subset of those will actually be loaded as GEOS geometries at one particular time. So the memory consumption in those scenarios should not be a huge penalty. If you are loading all the geometries of a huge dataset in memory at once, you're kind of waiting for memory-blowups to happen, a system may as well have 2GB of memory as it might have 8 or any other amount.

It is not that I don't see this as an unfortunate side effect, it is just that the only other alternative I see is using dynamic memory allocation, which adds a performance penalty throughout, whereas the memory increase caused by this approach typically should not really hit the users, unless they are storing loads of geometries in memory at once.

@strk
Copy link
Member

strk commented May 26, 2015 via email

@manisandro
Copy link
Contributor Author

Ok cool, I'll take care of than then.

@manisandro
Copy link
Contributor Author

Ok this is done. Concerning the performance regression spotted when benchmarking, this actually is unrelated to this particular commit, but it is a regression which is already in trunk. With the above mentioned test case:

stable (3.4.2):
real    0m0.853s
user    0m0.848s
sys     0m0.005s

trunk:
real    0m2.472s
user    0m2.464s
sys     0m0.008s

mvals:
real    0m2.554s
user    0m2.545s
sys     0m0.006s

This should probably be bisected.

@strk
Copy link
Member

strk commented May 27, 2015

Please do not mix style and functional changes. This patch is full of tabs to spaces (or spaces to tab) which make it bigger than needed and harder to read. Could you please remove the indentation changes and squash-rebase against current trunk ? Will review after that.

About the performance regression, please file a ticket on trac: https://trac.osgeo.org/geos

@manisandro
Copy link
Contributor Author

Bummer, missed the whitespace reformatting, fixed now.

I'll see if a quick git-bisect is able to pin-point the issue, then sure I'll file a ticket.

@manisandro
Copy link
Contributor Author

Just a note concerning the performance regression issue: this looks more like an issue with a combination of compiler flags than with the actual code. That is to day, compiling the latest stable release with the flags I'm using now I notice the same performance issues. I'll still try to figure out which of these are responsible, but it is likely not an upstream issue.

@strk
Copy link
Member

strk commented Jun 2, 2015

About performance. Dirty check: "time make check" returns in 36 seconds with your patch, 12 without. Same mantra for building: ./autogen.sh && ./configure && make

@manisandro
Copy link
Contributor Author

Uhm, really?

$ git branch
* m_values
  svn-trunk

$ time make check
Scanning dependencies of target check
Test project /home/sandro/Documents/Devel/libgeos/build
    Start 1: geos_unit
1/4 Test #1: geos_unit ........................   Passed    0.19 sec
    Start 2: xmltester
2/4 Test #2: xmltester ........................   Passed    6.16 sec
    Start 3: bug234
3/4 Test #3: bug234 ...........................   Passed    0.01 sec
    Start 4: TestSweepLineSpeed
4/4 Test #4: TestSweepLineSpeed ...............   Passed    1.63 sec

100% tests passed, 0 tests failed out of 4

Total Test time (real) =   7.99 sec
Built target check

real    0m8.069s
user    0m7.854s
sys     0m0.187s

Versus

$ git branch
  m_values
* svn-trunk
$ time make check
Test project /home/sandro/Documents/Devel/libgeos/build
    Start 1: geos_unit
1/4 Test #1: geos_unit ........................   Passed    0.17 sec
    Start 2: xmltester
2/4 Test #2: xmltester ........................   Passed    6.00 sec
    Start 3: bug234
3/4 Test #3: bug234 ...........................   Passed    0.00 sec
    Start 4: TestSweepLineSpeed
4/4 Test #4: TestSweepLineSpeed ...............   Passed    1.52 sec

100% tests passed, 0 tests failed out of 4

Total Test time (real) =   7.69 sec
Built target check

real    0m7.757s
user    0m7.577s
sys     0m0.146s

Which test is taking so long for you?

@strk
Copy link
Member

strk commented Jun 5, 2015 via email

@strk
Copy link
Member

strk commented Jun 5, 2015 via email

@manisandro
Copy link
Contributor Author

GEOS_MVALUES done. Nice output: you mean the markdown formatting (just indent by four spaces) or the command output itself (which is what I get by just running time make check)?

@strk
Copy link
Member

strk commented Jun 6, 2015 via email

@manisandro
Copy link
Contributor Author

I have automake 1.15, so might well be the newer version.

@manisandro
Copy link
Contributor Author

Ok so it is now defined in platform.h (except when building on Windows - no idea how to do processing on platform.h.vc in a Windows "shell" compatible way). GEOS_MVALUES now seems to propagate sufficiently through the codebase, so adding -DGEOS_MVALUES to the project cflags should not be necessary anymore - though question is still whether this is really robust.

@strk
Copy link
Member

strk commented Jun 10, 2015 via email

@manisandro
Copy link
Contributor Author

Mh actually some more tests should indeed be extended to cover M (and also Z I suppose) coordinates, such as tests/unit/triangulate/*. Will take a look.

@mloskot
Copy link
Contributor

mloskot commented Aug 5, 2015

Regarding platform.h.{in|vc|cmake},

  • platforn.h.in is processed by Autotools
  • platforn.h.cmake is processed by CMake
  • platforn.h.vc is required by VS+NMAKE builds and it is copied verbatim by src/Makefile.vc

@strk
Copy link
Member

strk commented Oct 6, 2015

@manisandro in addition to tests (which I understood you were taking a look to) could you please also file an enhancement ticket on https://trac.osgeo.org/geos (so to assign a milestone to it). Thanks

@manisandro
Copy link
Contributor Author

@strk I had had a look at the issue, and there are many functionalities which are not covered by any test at all, where I simply lack the insight and time to understand what the expected result would be. In the interest of moving this forward, it might be helpful to list which tests are missing and then I'll look which tests I can implement given my current work load.

strk added a commit that referenced this pull request Nov 8, 2016
This is a squashed merge of:

 - Added failing tests for WKBReader with Z, M, ZM geometries
   #39
   by Benjamin Morel

 - Add support for M values
   #45
   by Sandro Mani
@strk
Copy link
Member

strk commented Nov 8, 2016

I've squash-merged this into an mvalue branch, togheter with the changes in #39
Please continue work there if still interested

NOTE: a trac ticket is still needed if not opened yet

@strk strk closed this Nov 8, 2016
@strk
Copy link
Member

strk commented Nov 8, 2016

For a start, tests are failing:
https://travis-ci.org/libgeos/libgeos/builds/174321424

@sventech
Copy link

@manisandro Thanks for undertaking this critically important work. @strk is there any way to help move this forward? Supporting multiple dimensions is critical for many applications and PostGIS, for one, relies on this library.

@strk
Copy link
Member

strk commented Feb 13, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants