Mirror of code.flickr.com: clustr
C++ C
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Changes
LICENSE
Makefile
README
clustr.cpp
clustr.h
component.cpp
polygon.cpp
shapefile.cpp
shapefile.h

README

clustr - construct polygons from tagged points
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
written by Schuyler Erle <schuyler@nocat.net>
(c) 2007-2009 Yahoo! Inc.

Overview
--------
Clustr takes a text file containing longitude/latitude points, tagged with
a bit of text, and attempts to generate minimal polygons that "cover" those
points, using an "alpha" parameter to determine the notion of "coverage".
The polygons are written to an ESRI Shapefile, suitable for use in GIS
software.

Building clustr
---------------
You will need to have the GDAL library (http://www.gdal.org/) installed,
with OGR support enabled, for writing the Shapefiles. You will also need
the CGAL library (http://www.cgal.org/) installed.

Depending on your system configuration, you may need to edit the Makefile
to point to the location of the GDAL and CGAL libraries and C/C++ headers.

To build clustr, simply run 'make'. The 'clustr' executable is
self-contained (aside from shared libraries) and does not rely on any
supporting files. The code has been tested with GCC 3.4 and 4.2.

Using clustr
------------

$ clustr [-a <n>] [-p] [-v] <input> <output>

   -h, -?      this help message
   -v          be verbose (default: off)
   -a <n>      set alpha value (default: use optimal value)
   -p          output points to shapefile, instead of polygons

If <input> is missing or given as "-", stdin is used.

  * The input file should be formatted as: <tag> <lon> <lat>\n

  * The input file *must* be sorted. Piping it through sort(1) will
    suffice.

  * Tags must not contain spaces.

If <output> is missing, output is written to "clustr.shp". The output is
always an ESRI Shapefile.

The -p option simply converts the input file to a Shapefile containing the
input points. This can be useful for comparing the polygon output of clustr
to its input dataset in a GIS browser.

The -a option sets the value of "alpha", which in turn determines the
convexity of the output polygons. Larger values make more convex polygons,
smaller values make less convex polygons. Smaller values will also filter
out outlying points, which may be what you want.

Omitting -a or setting it to zero will cause clustr to use the CGAL
library's idea of "optimal alpha", which, if there are significant outliers
in the input file, may not actually be what you want. In general, we
recommend starting with a value of 0.001.

Bugs
----
Clustr does not use an exceptionally sophisticated algorithm for extracting
polygons from the generated alpha shape, with the result that on occasion
improper or self-intersecting polygons get generated. This can confuse GIS
programs that expect better behaved data sets, and ought to be fixed someday.

Also, the total count of points input is written to every shape record, which
will be wrong if there are multiple components for a given input tag. This
could be fixed by testing inclusion of each point after the polygons are
constructed, which would lengthen runtime. Cost/benefit? Not sure.

License
-------

Copyright (c) 2007-2009 Yahoo! Inc.

All rights reserved.  This code is free software; you can redistribute it
and/or modify it under the terms of the GNU General Public License (GPL),
version 2 only.  This code is distributed WITHOUT ANY WARRANTY, whether
express or implied. See the GNU GPL for more details
(http://www.gnu.org/licenses/gpl.html)

AS A SPECIAL EXCEPTION, YOU HAVE PERMISSION TO LINK THIS PROGRAM WITH THE
CGAL LIBRARY AND DISTRIBUTE EXECUTABLES, AS LONG AS YOU FOLLOW THE REQUIREMENTS
OF THE GNU GPL VERSION 2 IN REGARD TO ALL OF THE SOFTWARE IN THE EXECUTABLE
ASIDE FROM CGAL. 

- Fin -