Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read SQL Servers binary format directly #174

Closed
wants to merge 4 commits into from

Conversation

bakk
Copy link

@bakk bakk commented Apr 4, 2013

Read SQL Server binary directly into JTS Geometries, instead of going
via WKB. While geotools will use a little longer to decode (about 50% more
compared to WKB), SQL Server won't have to encode at all, resulting in a performance
improvement.

A simple benchmark showed a performance increase around 50% on a
layer with many small geometries (a grid), while another layer only showed
7% performance increase.

The binary reader only support v1 of the specification. V2 adds support for CircularString, CompoundCurve, CurvePolygon, FullGlobe (http://go.microsoft.com/fwlink/?LinkId=226407), and is not supported by today's WKB implementation either (as far as I can see).

It would be nice if the added classes (for parsing the sql server binary) could be licenced freely (eg. apache/bsd), but if you would like it to be LGPL like the rest, that's OK too.

It's important to review this thoroughly so that everything that today's implementation supports still works :)

Read SQL Server binary directly into JTS Geometries, instead of going
via WKB. While geotools will use a little longer to decode (than WKB),
SQL Server won't have to encode at all, resulting in a performance
improvement.
@aaime
Copy link
Member

aaime commented Apr 4, 2013

Everything that becomes part of GeoTools has to be licenced under LGPL. I'll have a look at the patches as spare time allows (or maybe Justin wants to?). Wondering, are V1 and V2 specs compatible when it comes to the same geometry type? Like, Point, Line and Polygon are encoded the same way?
Given this is new code, I'm wondering if having a configurable choice between WKB and native binary would be better

@bakk
Copy link
Author

bakk commented Apr 4, 2013

About the license: that's ok.
I forgot to include a link to the actual spec: http://msdn.microsoft.com/en-us/library/ee320529(v=sql.105).aspx
The differences between V1 and V2 are:

  • new geometry types, as mentioned
  • a new flag to indicate that a geography is larger than a hemisphere
  • a segments-structure (used for compound curves)
  • the definition of "figure attribute" has been changed

To support points, lines, polygons in V2, you would at least have to support the new definition of "figure attribute". Otherwise the formats are the same. However, I downloaded the latest SQL Server 2012 release, and when I created a new geometry, I got version 1. I've also tested with SQL Server 2008 R2. So the fact that I wasn't able to produce test-data with SQL Server stopped me from implementing partial support for V2.

If you could point me in the right direction of how to make the choice configurable, I can implement it. How would the user specify it - in the GUI?

@aaime
Copy link
Member

aaime commented Apr 4, 2013

Just add a new configuration parameter in the factory, that gets eventually set in the dialect, like the one for supporting native paging. The GUI in GeoServer is automatically generated from the parameter list (you'll have to also add it to the list of params advertised by the factory, and its jndi equivalent).

Added a parameter to switch between WKB and SQL Server binary as
serialization format
@bakk
Copy link
Author

bakk commented Apr 4, 2013

I've added a new configuration parameter. It defaults to using WKB. If the new code proves to be stable, I suggest switching default settings later.

@@ -0,0 +1,25 @@
package org.geotools.data.sqlserver.Reader;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to stick to the convention of keeping package names lowercase.

@jdeolive
Copy link
Contributor

jdeolive commented Apr 4, 2013

First off great work! This is something i have wanted to see for a while since as you note it can have a huge performance impact.

I think the patch looks great for the most part, except for the optional configuration parameter, which as implemented will still change the default for people upgrading. I think instead of explicitly having to enable wkb serialization we should have to explicitly enable native serializastion. So i would change the parameter to "NATIVE_SERIALIZATION" and have it by default set to false.

As you say if it shows to be stable we can make it the default in a future version.

Additionally the parameter is moved up, which groups it together with
two other boolean params in Geoserver's GUI. Corrected package name to
lowercase.
@bakk
Copy link
Author

bakk commented Apr 4, 2013

You are correct - changing default for people upgrading was unintended. I've corrected it, and I also took the liberty to move the parameter up by one position in the list of advertised parameters. I just thought it looked better in GeoServer. The casing of the parameters (as they appear in GeoServer) is inconsistent, so I'm not sure if there is a correct way.

Another thing: I believe all the tests now run with the default setting (WKB), so the native implementation is not tested (other than the unit tests which test serialization directly). Not sure if this is a problem.

@jdeolive
Copy link
Contributor

jdeolive commented Apr 4, 2013

Great, thanks for changing that. And yeah, the casing is inconsistent so i wouldn't worry about that.

Indeed it would be nice to have at least one of the tests run with the native serialization, ideally one of the tests that involves reading geometries, like SQLServerFeatureSourceTest. It should just be a matter of extending the existing class and overriding the setupDataStore method to set the native serialization. But I will leave that up to you, i don't see it as a blocker for this change since it already has a good test that exercise the changes.

So as far as i am concerned the patch is good but i'll give Andrea a chance to weigh in as well before merge.

@aaime aaime closed this Apr 6, 2013
@aaime
Copy link
Member

aaime commented Apr 6, 2013

Thanks a lot, looking good. I've manually merged it and added a few of changes:

  • there was a package named Reader instead of reader, as a result the code would not even compile on a case sensitive file system like Linux has, fixed it
  • made it clear that the serialization format is for the geometries ("use native geometry serialization format")
  • tested it with several geometry types with success, observed the speedup by the naked eye on large enough data sets
  • created a ticket in jira (http://jira.codehaus.org/browse/GEOT-4442), squashed all commits to one and reworded the commit message to refer to the jira ticket

The full result is here:
9d7f87d

Given that the behavior is optional and I've found no issues in the code or tests, I've also backported the patch to the stable series (9.x).

@bakk
Copy link
Author

bakk commented Apr 6, 2013

That's great! Thank you both for reviewing it so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants