Skip to content

Commit

Permalink
Fixes and updates related to WKT and WKB
Browse files Browse the repository at this point in the history
  • Loading branch information
dazuma committed Apr 25, 2011
1 parent 762926f commit bfd3e40
Show file tree
Hide file tree
Showing 29 changed files with 582 additions and 280 deletions.
14 changes: 11 additions & 3 deletions History.rdoc
@@ -1,15 +1,23 @@
=== 0.2.9 / 2011-04-25

* INCOMPATIBLE CHANGE: mutator methods for the configurations of the WKRep parsers and generators have been removed. Create a new parser/generator if you need to change behavior.
* POSSIBLE INCOMPATIBLE CHANGE: The GEOS implementation now uses WKRep (by default) instead of the native GEOS WKB/WKT parsers and generators. This is because of some issues with the GEOS 3.2.2 implementation: namely, that the GEOS WKT generator suffers from some floating-point roundoff issues due to its "fixed point" output, and that the GEOS WKT parser fails to recognize names not in all caps, in violation of the version 1.2 update of the SFS. (Thanks to sharpone74 for report GH-4.)
* WKRep::WKTGenerator injects some more whitespace to make output more readable and more in line with the examples in the SFS.
* It is now possible to configure the WKT/WKB parsers/generators for each of the implementations, by passing the configuration hash to the factory constructor. In addition, it is also possible to configure the GEOS factory to use the native GEOS WKT/WKB implementation instead of RGeo::WKRep (that is, to restore the behavior of RGeo <= 0.2.8).
* The WKB parser auto-detects and interprets hex strings.

=== 0.2.8 / 2011-04-11

* A .gemspec file is now available for gem building and bundler git integration.

=== 0.2.7 / 2011-04-09

* POSSIBLE INCOMPATIBLE CHANGE: GeometryCollection#geometry_n, Polygon#interior_ring_n, and LineString#point_n, in some implementations, allowed negative indexes (which counted backwards from the end of the collection as per Ruby arrays.) This was against the SFS interface, and so the behavior has been removed. However, GeometryCollection#[], because it is supposed to model Ruby arrays, now explicitly DOES allow negative indexes. This means GeometryCollection#[] is no longer exactly the same as GeometryCollection#geometry_n. These clarifications have also been made in the RDoc.
* POSSIBLE INCOMPATIBLE CHANGE: GeometryCollection#geometry_n, Polygon#interior_ring_n, and LineString#point_n, in some implementations, allowed negative indexes (which counted backwards from the end of the collection as per Ruby arrays). This was contrary to the SFS interface, and so the behavior has been removed. However, GeometryCollection#[], because it is supposed to model Ruby arrays, now explicitly DOES allow negative indexes. This means GeometryCollection#[] is no longer exactly the same as GeometryCollection#geometry_n. These clarifications have also been made in the RDoc.
* The GEOS implementations of GeometryCollection#geometry_n and Polygon#interior_ring_n segfaulted when given an index out of bounds. Bounds Check Fail fixed. (Reported by sharpone74.)

=== 0.2.6 / 2011-03-31

* Ring direction analysis crashed if any of the line segments were zero length. Fixed. (Reported by spara.)
* Ring direction analysis raised an exception if any of the line segments were zero length. Fixed. (Reported by spara.)

=== 0.2.5 / 2011-03-21

Expand All @@ -25,7 +33,7 @@

=== 0.2.3 / 2010-12-19

* The "simpler mercator" geographic type incorrectly reported EPSG 3857 instead of EPSG 3785 for the projection. Dyslexia fixed.
* The "simple mercator" geographic type incorrectly reported EPSG 3857 instead of EPSG 3785 for the projection. Dyslexia fixed.
* Geographic types couldn't have their coord_sys set. Fixed.
* You can now pass an :srs_database option when creating most factory types. This lets the factory look up its coordinate system using the given SRID.
* There are now explicit methods you can call to obtain FactoryGenerator objects; you should not need to call <tt>method</tt>.
Expand Down
16 changes: 8 additions & 8 deletions Spatial_Programming_With_RGeo.rdoc
Expand Up @@ -242,12 +242,12 @@ Several size and distance calculations are available. You can compute the distan
The SFS defines two serialization schemes for geometric objects, known as the WKT (well-known text) and WKB (well-known binary) formats. The WKT is often used for textual display and transmission of a geometric object, while the WKB is sometimes used as an internal data format by spatial databases. Geometric objects in \RGeo define the <tt>as_text</tt> and <tt>as_binary</tt> methods to serialize the object into a data string, while \RGeo factories provide <tt>parse_wkt</tt> and <tt>parse_wkb</tt> methods to reconstruct geometric objects from their serialized form.

p00 = factory.point(0, 0)
p00.as_text # returns "Point(0 0)"
p00.as_text # returns "Point (0.0 0.0)"
p10 = factory.point(1, 0)
line = factory.line(p00, p10)
line.as_text # returns "LineString(0 0, 1 0)"
p = factory.parse_wkt('POINT(3 4)')
p.x # returns 3
line.as_text # returns "LineString (0.0 0.0, 1.0 0.0)"
p = factory.parse_wkt('POINT (3 4)')
p.x # returns 3.0

Note that there are several key shortcomings in the WKT and WKB formats as strictly defined by the SFS. In particular, neither format has official support for Z or M coordinates, and neither provides a way to specify the coordinate system (i.e. spatial reference ID) in which the object is represented. Because of this, variants of these formats have been developed. The most important to know are probably the EWKT and EWKB (or "extended" well-known formats) used by the PostGIS database, which supports Z and M as well as SRID. More recent versions of the SFS also have defined extensions to handle Z and M coordinates, but do not embed an SRID. \RGeo supports parsing and generating these variants through the RGeo::WKRep module.

Expand Down Expand Up @@ -291,11 +291,11 @@ Does this matter in your application? The answer is, it depends: on what kind of

This subsection covers some more advanced topics that most developers may not need to deal with directly, but I believe it is important to have at least a high-level understanding of them.

Simply put, there's more to a coordinate system than just the type: geocentric, geographic, or projected. For a geocentric coordinate system, we know it's centered at the center of the earth, but where _is_ the center of the earth? Which direction do the axes point? And do we measure the units in meters, miles, or light-years? For a geographic coordinate system, again, we need a center and orientation (i.e. where is the "zero longitude" line?), but we also need to define specifically _which_ "latitude". The latitude commonly used is the "geodetic latitude", which is the angle between the equator and what is normal (i.e. vertical) to the surface of the earth. This means it is dependent on one's model of the earth's surface, whether you use a sphere or a flattened ellipsoid, and how much flattening you choose. The same location on the earth's surface may have different latitudes depending on which system you use! As for projected systems, not only do we need to specify which projection to use (and there are hundreds defined), but we also need to know which geographic (latitude-longitude) system to start from. That is, a map projection is merely a function mapping latitude/longitude to flat coordinates, so we need to specify _which_ latitude/longitude.
Simply put, there's more to a coordinate system than just the type: geocentric, geographic, or projected. For a geocentric coordinate system, we know it's centered at the center of the earth, but where _is_ the center of the earth? Which direction do the axes point? And do we measure the units in meters, miles, or light-years? For a geographic coordinate system, again, we need a center and orientation (i.e. where is the "zero longitude" line?), but we also need to define specifically _which_ "latitude". The latitude commonly used is the "geodetic latitude", which is the angle between the equator and what is normal (i.e. vertical) to the surface of the earth. This means it is dependent on one's model of the earth's surface, whether you use a sphere or a flattened ellipsoid, and how much flattening you choose. The same location on the earth's surface may have different latitudes depending on which system you use! As for projected systems, not only do we need to specify which projection to use (and there are hundreds defined), but we also need to know which geographic (latitude-longitude) system to start from. That is, because a map projection is a function mapping latitude/longitude to flat coordinates, we need to specify _which_ latitude/longitude.

To completely specify a coordinate system, then, a number of parameters are involved. Below I briefly describe the major parameters and what they mean:

*Ellipsoid*: (Also called a *sphereoid*) An ellipsoid is an approximation of the shape of the earth, defined by the length of the <b>semi-major axis</b>, or the radius at the equator (measured in meters) and the <b>inverse flattening</b> ratio, defined as the ratio between the semi-major axis, and the difference between the semi-major and semi-minor axes. Note that the earth is not a true ellipsoid, both because the gravitational and centrifugal bulging is not solved exactly by an ellipsoid, and because of local changes in gravity due to, for example, large mountain ranges. However, an ellipsoid is commonly used for cartographic applications. The ellipsoid matters because it defines how latitude is measured and what path a straight line will take.
*Ellipsoid*: (Also called a *sphereoid*) An ellipsoid is an approximation of the shape of the earth, defined by the length of the <b>semi-major axis</b>, or the radius at the equator (measured in meters) and the <b>inverse flattening</b> ratio, defined as the ratio between the semi-major axis, and the difference between the semi-major and semi-minor axes. Note that the earth is not a true ellipsoid, both because the gravitational and centrifugal bulging is not solved exactly by an ellipsoid, and because of local changes in gravity due to, for example, large mountain ranges. However, an ellipsoid is commonly used for cartographic applications. The ellipsoid matters because it defines how latitude is measured and what path will be followed by a "straight" line across the earth's surface.

*Datum*: This is a reference location against which measurements are made. There are generally two types of datums: horizontal datums, which define horizontal (e.g. latitude-longitude) coordinate systems, and vertical datums, which define the "zero altitude" point against which altitude measurements are made.

Expand Down Expand Up @@ -347,11 +347,11 @@ As we have seen, there exist a variety of ways to serialize geometric objects, n

The OGC defines a {specification}[http://www.opengeospatial.org/standards/sfs], related to the SFS, describing SQL extensions for a spatial database. This specification includes a table for spatial reference systems (that is, coordinate systems) which can contain OGC and Proj4 representations, and a table of metadata for geometry columns which stores such information as type, dimension, and srid constraints. It also defines a suite of SQL functions that you can call in a query. For example, in a compliant database, to find all rows in "mytable" where the geometry-valued column "geom" contains data within 5 units of the coordinates (10, 20), you might be able to run a query similar to:

SELECT * FROM mytable WHERE ST_Distance(geom, ST_WKTToSQL("POINT(10 20)")) > 5;
SELECT * FROM mytable WHERE ST_Distance(geom, ST_WKTToSQL("POINT (10 20)")) > 5;

Like all database queries, however, when there are a large number of rows, such a query can be slow if it has to do a full table scan. This is especially true if it has to evaluate geometric functions like the above, which can be numerically complex and slow to execute. To speed up queries, it is necessary to index your spatial columns.

Spatial indexes are somewhat more complex than typical database indexes. A typical B-tree index relies on a global ordering of data: the fact that you can sort scalar values in a binary tree and hence perform logarithmic-time searches. However, there isn't an obvious global ordering for spatial data. Should POINT(0 1) come before or after POINT(1 0)? And how do each of those compare with LINESTRING(0 1, 1 0)? More concretely, spatial data exists in two dimensions rather than one, and can span finite ranges.
Spatial indexes are somewhat more complex than typical database indexes. A typical B-tree index relies on a global ordering of data: the fact that you can sort scalar values in a binary tree and hence perform logarithmic-time searches. However, there isn't an obvious global ordering for spatial data. Should <tt>POINT (0 1)</tt> come before or after <tt>POINT (1 0)</tt>? And how do each of those compare with <tt>LINESTRING (0 1, 1 0)</tt>? Becase spatial data exists in two dimensions rather than one, and can span finite ranges in additional to infinitesimal points, the notion of a global ordering becomes ill-defined, and normal database indexes do not apply as well as we would like.

Spatial databases handle the problem of indexing spatial data in various ways, but most techniques are variants on an indexing algorithm known as an R-tree. I won't go into the details of how an R-tree works here. For the interested, I recommend the text {"Spatial Databases With Application To GIS"}[http://www.amazon.com/dp/1558605886], which covers a wide variety of issues related to basic spatial database implementation. For our purposes, just note that for large datasets, it is necessary to index the geometry columns, and that the index creation process may be different from that of normal scalar columns. The next sections provide some information specific to some of the common spatial databases.

Expand Down
2 changes: 1 addition & 1 deletion Version
@@ -1 +1 @@
0.2.8
0.2.9
23 changes: 20 additions & 3 deletions ext/geos_c_impl/factory.c
Expand Up @@ -99,6 +99,20 @@ static void destroy_geometry_func(RGeo_GeometryData* data)
}


// Mark function for factory data. This marks the wkt and wkb generator
// handles so they don't get collected.

static void mark_factory_func(RGeo_FactoryData* data)
{
if (!NIL_P(data->wkrep_wkt_generator)) {
rb_gc_mark(data->wkrep_wkt_generator);
}
if (!NIL_P(data->wkrep_wkb_generator)) {
rb_gc_mark(data->wkrep_wkb_generator);
}
}


// Mark function for geometry data. This marks the factory and klasses
// held by the geometry so those don't get collected.

Expand Down Expand Up @@ -193,7 +207,8 @@ static VALUE method_factory_parse_wkb(VALUE self, VALUE str)
}


static VALUE cmethod_factory_create(VALUE klass, VALUE flags, VALUE srid, VALUE buffer_resolution)
static VALUE cmethod_factory_create(VALUE klass, VALUE flags, VALUE srid, VALUE buffer_resolution,
VALUE wkt_generator, VALUE wkb_generator)
{
VALUE result = Qnil;
RGeo_FactoryData* data = ALLOC(RGeo_FactoryData);
Expand All @@ -210,7 +225,9 @@ static VALUE cmethod_factory_create(VALUE klass, VALUE flags, VALUE srid, VALUE
data->wkb_reader = NULL;
data->wkt_writer = NULL;
data->wkb_writer = NULL;
result = Data_Wrap_Struct(klass, NULL, destroy_factory_func, data);
data->wkrep_wkt_generator = wkt_generator;
data->wkrep_wkb_generator = wkb_generator;
result = Data_Wrap_Struct(klass, mark_factory_func, destroy_factory_func, data);
}
else {
free(data);
Expand All @@ -237,7 +254,7 @@ RGeo_Globals* rgeo_init_geos_factory()
rb_define_method(geos_factory_class, "_srid", method_factory_srid, 0);
rb_define_method(geos_factory_class, "_buffer_resolution", method_factory_buffer_resolution, 0);
rb_define_method(geos_factory_class, "_flags", method_factory_flags, 0);
rb_define_module_function(geos_factory_class, "_create", cmethod_factory_create, 3);
rb_define_module_function(geos_factory_class, "_create", cmethod_factory_create, 5);

// Wrap the globals in a Ruby object and store it off so we have access
// to it later. Each factory instance will reference it internally.
Expand Down
2 changes: 2 additions & 0 deletions ext/geos_c_impl/factory.h
Expand Up @@ -89,6 +89,8 @@ typedef struct {
GEOSWKBReader* wkb_reader;
GEOSWKTWriter* wkt_writer;
GEOSWKBWriter* wkb_writer;
VALUE wkrep_wkt_generator;
VALUE wkrep_wkb_generator;
int flags;
int srid;
int buffer_resolution;
Expand Down
50 changes: 31 additions & 19 deletions ext/geos_c_impl/geometry.c
Expand Up @@ -209,16 +209,22 @@ static VALUE method_geometry_as_text(VALUE self)
const GEOSGeometry* self_geom = self_data->geom;
if (self_geom) {
RGeo_FactoryData* factory_data = RGEO_FACTORY_DATA_PTR(self_data->factory);
GEOSWKTWriter* wkt_writer = factory_data->wkt_writer;
GEOSContextHandle_t geos_context = self_data->geos_context;
if (!wkt_writer) {
wkt_writer = GEOSWKTWriter_create_r(geos_context);
factory_data->wkt_writer = wkt_writer;
VALUE wkt_generator = factory_data->wkrep_wkt_generator;
if (!NIL_P(wkt_generator)) {
result = rb_funcall(wkt_generator, rb_intern("generate"), 1, self);
}
char* str = GEOSWKTWriter_write_r(geos_context, wkt_writer, self_geom);
if (str) {
result = rb_str_new2(str);
GEOSFree_r(geos_context, str);
else {
GEOSWKTWriter* wkt_writer = factory_data->wkt_writer;
GEOSContextHandle_t geos_context = self_data->geos_context;
if (!wkt_writer) {
wkt_writer = GEOSWKTWriter_create_r(geos_context);
factory_data->wkt_writer = wkt_writer;
}
char* str = GEOSWKTWriter_write_r(geos_context, wkt_writer, self_geom);
if (str) {
result = rb_str_new2(str);
GEOSFree_r(geos_context, str);
}
}
}
return result;
Expand All @@ -232,17 +238,23 @@ static VALUE method_geometry_as_binary(VALUE self)
const GEOSGeometry* self_geom = self_data->geom;
if (self_geom) {
RGeo_FactoryData* factory_data = RGEO_FACTORY_DATA_PTR(self_data->factory);
GEOSWKBWriter* wkb_writer = factory_data->wkb_writer;
GEOSContextHandle_t geos_context = self_data->geos_context;
if (!wkb_writer) {
wkb_writer = GEOSWKBWriter_create_r(geos_context);
factory_data->wkb_writer = wkb_writer;
VALUE wkb_generator = factory_data->wkrep_wkb_generator;
if (!NIL_P(wkb_generator)) {
result = rb_funcall(wkb_generator, rb_intern("generate"), 1, self);
}
size_t size;
char* str = (char*)GEOSWKBWriter_write_r(geos_context, wkb_writer, self_geom, &size);
if (str) {
result = rb_str_new(str, size);
GEOSFree_r(geos_context, str);
else {
GEOSWKBWriter* wkb_writer = factory_data->wkb_writer;
GEOSContextHandle_t geos_context = self_data->geos_context;
if (!wkb_writer) {
wkb_writer = GEOSWKBWriter_create_r(geos_context);
factory_data->wkb_writer = wkb_writer;
}
size_t size;
char* str = (char*)GEOSWKBWriter_write_r(geos_context, wkb_writer, self_geom, &size);
if (str) {
result = rb_str_new(str, size);
GEOSFree_r(geos_context, str);
}
}
}
return result;
Expand Down

0 comments on commit bfd3e40

Please sign in to comment.