From 5fc430c7bff3818deeb8a6349e213e9934c18fa1 Mon Sep 17 00:00:00 2001 From: Schuyler Erle Date: Fri, 5 Jun 2009 18:04:01 -0400 Subject: [PATCH] Substantially update README.txt. --- Makefile | 3 ++ README | 3 -- README.txt | 129 ++++++++++++++++++++++++++++++++++++++++++++++++------- TODO => TODO.txt | 0 4 files changed, 117 insertions(+), 18 deletions(-) delete mode 100644 README rename TODO => TODO.txt (100%) diff --git a/Makefile b/Makefile index 4409823..42757d7 100644 --- a/Makefile +++ b/Makefile @@ -2,6 +2,9 @@ all: make -C src install gem build gemspec +test: all + ruby -Ilib tests/run.rb + install: all gem install *.gem diff --git a/README b/README deleted file mode 100644 index 5939bc4..0000000 --- a/README +++ /dev/null @@ -1,3 +0,0 @@ -Implementation of address geocoder in Ruby, starting with US TIGER/Line data. - - diff --git a/README.txt b/README.txt index 9babf84..f501e2c 100644 --- a/README.txt +++ b/README.txt @@ -1,32 +1,131 @@ -= geocoder_us += Geocoder::US -* FIX (url) +Geocoder::US 2.0 is a software package designed to geocode US street +addresses. Although it is primarily intended for use with the US Census +Bureau's free TIGER/Line dataset, it uses an abstract US address data model +that can be employed with other sources of US street address range data. -== DESCRIPTION: +Geocoder::US 2.0 implements a Ruby interface to parse US street addresses, and +perform fuzzy lookup against an SQLite 3 database. Geocoder::US is designed to +return the best matches found, with geographic coordinates interpolated from +the street range dataset. Geocoder::US will fill in missing information, and +it knows about standard and common non-standard postal abbreviations, ordinal +versus cardinal numbers, and more. -FIX (describe your package) +Geocoder::US 2.0 is shipped with a free US ZIP code data set, compiled from +public domain sources. -== FEATURES/PROBLEMS: +== Synopsis -* FIX (list of features or problems) + >> require 'geocoder/us' + >> db = Geocoder::US::Database.new("/opt/tiger/geocoder.db") + >> p db.geocode("1600 Pennsylvania Av, Washington DC") -== SYNOPSIS: + [{:pretyp=>"", :street=>"Pennsylvania", :sufdir=>"NW", :zip=>"20502", + :lon=>-77.037528, :number=>"1600", :fips_county=>"11001", :predir=>"", + :precision=>:range, :city=>"Washington", :lat=>38.898746, :suftyp=>"Ave", + :state=>"DC", :prequal=>"", :sufqual=>"", :score=>0.906, :prenum=>""}] - FIX (code sample of usage) +== Prerequisites -== REQUIREMENTS: +To build Geocoder::US, you will need gcc/g++, make, and the SQLite 3 +executable and development files installed on your system. -* FIX (list of requirements) +To use the Ruby interface, you will need the 'text' gem installed from +rubyforge. -== INSTALL: +Additionally, you will need a custom build of the 'sqlite3-ruby' gem that +supports loading extension modules in SQLite. You can get a patched version of +this gem from http://github.com/schuyler/sqlite3-ruby/. Until the sqlite3-ruby +maintainers roll in the relevant patch, you will need *this* version. -* FIX (sudo gem install, anything else) +== Building Geocoder::US -== LICENSE: +Unpack the source and run 'make'. This will compile the SQLite 3 extension +needed by Geocoder::US, the Shapefile import utility, and the Geocoder-US +gem. -(The MIT License) +You can run 'make install' to install the gem systemwide. -Copyright (c) 2009 FIX +== Generating a Geocoder::US Database + +Build the package from source as described above. Generating the database +involves three basic steps: + +* Import the Shapefile data into an SQLite database. +* Build the database indexes. +* Optionally, rebuild the database to cluster indexed rows. + +We will presume that you are building a Geocoder::US database from TIGER/Line, +and that you have obtained the complete set of TIGER/Line ZIP files, and put +the entire tree in /opt/tiger. Please adjust these instructions as needed. + +A full TIGER/Line database import takes 1-2 days to run on a normal Amazon EC2 +instance, and takes up a little over 5 gigabytes after all is said and done. +You will need to have at least 12 gigabytes of free disk space *after* +downloading the TIGER/Line dataset, if you are building the full database. + +=== Import TIGER/Line + +From inside the Geocoder::US source tree, run the following: + + $ bin/tiger_import /opt/tiger/geocoder.db /opt/tiger + +This will unpack each TIGER/Line ZIP file to a temporary directory, and +perform the extract/transform/load sequence to incrementally build the +database. The process takes about a day on a normal Amazon EC2 instance. Note +that not all TIGER/Line source files contain address range information, so you +will see error messages for some counties, but this is normal. + +If you only want to import specific counties, you can pipe a list of +TIGER/Line county directories to tiger_import on stdin. + +=== Build the indexes + +After the database import is complete, you will want to construct the database +indexes: + + $ bin/build_indexes /opt/tiger/geocoder.db + +This process will take a few hours, but it's a *lot* faster than building +the indexes incrementally during the import process. + +=== Cluster the database tables (optional) + +As a final optional step, you can cluster the database tables according to +their indexes, which will make the database smaller, and lookups faster. This +process will take an hour or two, and may be a micro-optimization. + + $ bin/rebuild_cluster /opt/tiger/geocoder.db + +You will need as much free disk space to run rebuild_cluster as the database +takes up, because the process essentially reconstructs the database in a +new file and then renames the new one over top of the old. + +== Running the unit tests + +From within the source tree, you can run the following. + + $ ruby tests/run.rb + +This tests the libraries, except for the database routines. If you have a +database built, you can run the test harness like so: + + $ ruby tests/run.rb /opt/tiger/geocoder.db + +The full test suite may take 30 or so seconds to run completely. + +== License + +Geocoder::US 2.0 was based on earlier work by Schuyler Erle on +a Perl module of the same name. You can find it at +http://search.cpan.org/~sderle/. + +Geocoder::US 2.0 was written by Schuyler Erle, of Entropy Free LLC, +with the gracious support of FortiusOne, Inc. Please send bug reports, +patches, kudos, etc. to patches at geocoder.us. + +Copyright (c) 2009 FortiusOne, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the diff --git a/TODO b/TODO.txt similarity index 100% rename from TODO rename to TODO.txt