Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExtendableCoordinateSequence #271

Closed

Conversation

FObermaier
Copy link
Contributor

@FObermaier FObermaier commented Mar 13, 2018

This is a wrapper around the CoordinateSequence interface.

Internally it has access to a CoordinatSequenceFactory that it uses to create a buffer sequence in which it stores the ordinate values. If the capacity of the sequence is used up, a new sequence with extended capacity is created and the present values are stored in it.

Extensions to the public interface

  • several add methods
  • several insertAt methods
  • getCapacity function
  • truncated function

This PR comes with a unit test and a performance comparing ExtendableCoordinateSequence to creating several 1-coordinate sequences and merging those into one.

This class is convenient wherever the final size of a CoordinateSequence is not known, e.g. when parsing WKT (WKTReader) or when building buffers. It might help with issue #188.

I am not sure if the wording is correct, maybe GrowableCoordinateSequence or BufferedCoordinateSequence is a better name.

Completed unit test
Added performance test

Signed-off-by: Felix Obermaier <felix.obermaier@netcologne.de>
Added add and insertAt functions to ExtendableCoordinateSequence
Completed unit test
Added performance test

Signed-off-by: Felix Obermaier <felix.obermaier@netcologne.de>
Added test to testConstructor, made other test use
  PackedCoordinateSequence.Float

Signed-off-by: Felix Obermaier <felix.obermaier@netcologne.de>
Added comparison to current behavior, ArrayList storing Coordinates and creating
  a sequence from the resulting Coordinate array.
Fixed performance number initialization, added overtime in percent
Sort methods in PerformanceTestRunner.findMethods
Another change to the ensureCapacity logic
Tweaked output of performance test
Changed PerformanceTestRunner to use System.nanoTime() instead
  of StopWatch

Signed-off-by: Felix Obermaier <felix.obermaier@netcologne.de>
* ensures that a sequence is large enough to store ordinate
* values.
*/
public class ExtendableCoordinateSequence implements CoordinateSequence, Serializable {
Copy link
Contributor

@jodygarnett jodygarnett Jun 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class name kind of conflicts with ExtendedCoordinateSequence.java which we have noticed is in common use by downstream projects.

Is there any other kind of naming convention we could use rather than "Extendable"? I know for collections I often use CopyOnWriteArrayList (not the best naming contention).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could use the word delegate since it is delegating storage to another csFactory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problem with a different name at all. Perhaps sth. that includes Builder (as in StringBuilder) or Buffer would describe it better but CoordinateSequenceBuilderSequence is weird. How about BufferedCoordinateSequence?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • DynamicResizingCoordinateSequence?
  • AutoResizingCoordinateSequence?
  • ResizingCoordinateSequence?
  • ExpandableCoordinateSequence?
  • ExpandingCoordinateSequence?
  • VariableSizeCoordinateSequence?

@jodygarnett
Copy link
Contributor

This is really interesting functionality, see comment for question about the naming of this class.

@FObermaier
Copy link
Contributor Author

Is there an interest in persuing this any further?

@jnh5y
Copy link
Contributor

jnh5y commented Aug 30, 2018

@FObermaier I think one of the philosophical questions at play is whether CoordinateSequences should be mutable or immutable. I know this has come up in conversations with @jodygarnett and @dr-jts, but my memory is faulty and don't remember which position they'd take or if we've reached a conclusion.

I do know that Jody worked with Martin to add the interface so that downstream users of the JTS could do things which make sense.

That said, I'd like to see the JTS library have some 'collections-like api' support for common operations with CoordinateSequences. For the case of immutable coordinatesequences, some obvious operations like concatenating and reversing sequences could be implemented.

I think the next step is Jody and Martin weighing in on mutability. If that goes, then I think could make sense.

@dr-jts
Copy link
Contributor

dr-jts commented Aug 30, 2018

Generally speaking I think it is less error-prone to have Geometry be immutable. That said, it seems a bit draconian to enforce this 100%.

And there are other uses for mutable CoordinateSequences. A prime example being when Geometry is being constructed from a stream of incoming data (as occurs for example in some of the Readers).

Perhaps it might have been better to split the read and update accessors for CoordinateSequence into two separate interfaces, so that immutable usage was more obvious (and enforceable).

@FObermaier
Copy link
Contributor Author

As far as I recall, the CoordinateSequence interface always allowed modification of its values.

This PR only addresses that currently adding Coordinates is cumbersome. It comes in handy for streaming or WKT parsing.

@dr-jts
Copy link
Contributor

dr-jts commented Aug 30, 2018

Is this really the most efficient way to support streaming construction of CoordinateSequences when reading? Would a purpose-built class be simpler or more efficient?

@airbreather
Copy link
Contributor

airbreather commented Aug 30, 2018

Is this really the most efficient way to support streaming construction of CoordinateSequences when reading? Would a purpose-built class be simpler or more efficient?

Before I noticed that Felix had started work on this, I had started work on an alternative way to build immutable (well, fixed-length) coordinate sequences using a stream of coordinate data. My solution looked like this (C#). I have a few reasons to believe that that solution is going to be more efficient, in the .NET version, than creating and expanding coordinate sequences using a factory, but I admittedly haven't run any numbers to compare the two ideas, and I don't know how much of my reasons would be equally valid on the JVM.

I don't really want to volunteer to test the Java side of things, but if it helps, I could try to port this to .NET and compare the performances of the two.

@dr-jts
Copy link
Contributor

dr-jts commented Aug 30, 2018

@airbreather That seems along the lines of what I was thinking about.

Is it more efficient to create the separate ordinate lists, rather than a list of Coordinate objects? I guess it would be if they are optimized to store doubles as native values.

@dr-jts
Copy link
Contributor

dr-jts commented Aug 30, 2018

The design that @airbreather referenced seems like a more efficient and safer way of creating an arbitrary CoordinateSequence from streamed data. It avoids any issues with trying to use an arbitrary CoordinateSequenceFactory to repeatedly create new sequences. After all, there is no guarantee that a CoordinateSequenceFactory is particularly efficient at creating sequences, or even that they are modifiable.

The only downside might be that the CoordinateBuffer implementation is limited to a max of 4 dimensions. But this should be adequate to support WKT and WKB reading, which are primary use cases.

@airbreather
Copy link
Contributor

Is it more efficient to create the separate ordinate lists, rather than a list of Coordinate objects?

The main reason I did it this way was so that users don't have to pay for Z or M values that aren't actually used. Here, if all you need is XY, then you only pay 16 bytes per coordinate (plus the extra room in the array list).

The only downside might be that the CoordinateBuffer implementation is limited to a max of 4 dimensions.

There's no fundamental reason why it couldn't contain an arbitrary number of dimensions, it just seemed like the extra flexibility wasn't worth the cost when I wrote this up.

* Added GrowableCoordinateSequence
* Tweaked ExtendableCoordinateSequence's add methods
  and the ensureCapacity functionality

Signed-off-by: Felix Obermaier <felix.obermaier@netcologne.de>
Signed-off-by: Felix Obermaier <felix.obermaier@netcologne.de>
@FObermaier
Copy link
Contributor Author

I admit that GrowableCoordinateSequence of GeoServer performs better in most cases, especially when CoordinateArraySequenceFactory is being used to get a result geometry

You can read the figures of a comparison test I created in this file: ExpandableVsGrowable.txt
The perfomance comparison measures adding 1500x varying numbers of coordinates to either GrowableCoordinateSequences or ExtendableCoordinateSequences and returning a usable sequence using the observed CoordinateSequenceFactory.

Annotations:

  • Both sequences are started with an initial capacity of 50.
  • The above implementation of GrowableCoordinateSequence is limited in that it
    • only supports 2 dimensions.
    • saves ordinate values as floats

* newCapacity = oldCapacity + (oldCapacity >> 1)
* implement add(2d) and add(3d)

Signed-off-by: Felix Obermaier <felix.obermaier@netcologne.de>
@dr-jts
Copy link
Contributor

dr-jts commented Feb 25, 2019

@FObermaier what is your view of the status of this?

The @airbreather design of CoordinateBuffer seems like a better alternative - do you agree?

Would be nice to close this PR if no longer active.

@FObermaier
Copy link
Contributor Author

FObermaier commented Feb 26, 2019

IMHO there is no need for a CoordinateBuffer to parse WKT anymore since there are Coordinate classes for all possible ordinates in WKT (XY[Z|M|ZM].

Plain ArrayList or ArrayList<Coordinate> will suffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants