Permalink
Browse files

Substantially update README.txt.

  • Loading branch information...
1 parent b246b59 commit 5fc430c7bff3818deeb8a6349e213e9934c18fa1 @schuyler schuyler committed Jun 5, 2009
Showing with 117 additions and 18 deletions.
  1. +3 −0 Makefile
  2. +0 −3 README
  3. +114 −15 README.txt
  4. 0 TODO → TODO.txt
View
@@ -2,6 +2,9 @@ all:
make -C src install
gem build gemspec
+test: all
+ ruby -Ilib tests/run.rb
+
install: all
gem install *.gem
View
3 README
@@ -1,3 +0,0 @@
-Implementation of address geocoder in Ruby, starting with US TIGER/Line data.
-
-
View
@@ -1,32 +1,131 @@
-= geocoder_us
+= Geocoder::US
-* FIX (url)
+Geocoder::US 2.0 is a software package designed to geocode US street
+addresses. Although it is primarily intended for use with the US Census
+Bureau's free TIGER/Line dataset, it uses an abstract US address data model
+that can be employed with other sources of US street address range data.
-== DESCRIPTION:
+Geocoder::US 2.0 implements a Ruby interface to parse US street addresses, and
+perform fuzzy lookup against an SQLite 3 database. Geocoder::US is designed to
+return the best matches found, with geographic coordinates interpolated from
+the street range dataset. Geocoder::US will fill in missing information, and
+it knows about standard and common non-standard postal abbreviations, ordinal
+versus cardinal numbers, and more.
-FIX (describe your package)
+Geocoder::US 2.0 is shipped with a free US ZIP code data set, compiled from
+public domain sources.
-== FEATURES/PROBLEMS:
+== Synopsis
-* FIX (list of features or problems)
+ >> require 'geocoder/us'
+ >> db = Geocoder::US::Database.new("/opt/tiger/geocoder.db")
+ >> p db.geocode("1600 Pennsylvania Av, Washington DC")
-== SYNOPSIS:
+ [{:pretyp=>"", :street=>"Pennsylvania", :sufdir=>"NW", :zip=>"20502",
+ :lon=>-77.037528, :number=>"1600", :fips_county=>"11001", :predir=>"",
+ :precision=>:range, :city=>"Washington", :lat=>38.898746, :suftyp=>"Ave",
+ :state=>"DC", :prequal=>"", :sufqual=>"", :score=>0.906, :prenum=>""}]
- FIX (code sample of usage)
+== Prerequisites
-== REQUIREMENTS:
+To build Geocoder::US, you will need gcc/g++, make, and the SQLite 3
+executable and development files installed on your system.
-* FIX (list of requirements)
+To use the Ruby interface, you will need the 'text' gem installed from
+rubyforge.
-== INSTALL:
+Additionally, you will need a custom build of the 'sqlite3-ruby' gem that
+supports loading extension modules in SQLite. You can get a patched version of
+this gem from http://github.com/schuyler/sqlite3-ruby/. Until the sqlite3-ruby
+maintainers roll in the relevant patch, you will need *this* version.
-* FIX (sudo gem install, anything else)
+== Building Geocoder::US
-== LICENSE:
+Unpack the source and run 'make'. This will compile the SQLite 3 extension
+needed by Geocoder::US, the Shapefile import utility, and the Geocoder-US
+gem.
-(The MIT License)
+You can run 'make install' to install the gem systemwide.
-Copyright (c) 2009 FIX
+== Generating a Geocoder::US Database
+
+Build the package from source as described above. Generating the database
+involves three basic steps:
+
+* Import the Shapefile data into an SQLite database.
+* Build the database indexes.
+* Optionally, rebuild the database to cluster indexed rows.
+
+We will presume that you are building a Geocoder::US database from TIGER/Line,
+and that you have obtained the complete set of TIGER/Line ZIP files, and put
+the entire tree in /opt/tiger. Please adjust these instructions as needed.
+
+A full TIGER/Line database import takes 1-2 days to run on a normal Amazon EC2
+instance, and takes up a little over 5 gigabytes after all is said and done.
+You will need to have at least 12 gigabytes of free disk space *after*
+downloading the TIGER/Line dataset, if you are building the full database.
+
+=== Import TIGER/Line
+
+From inside the Geocoder::US source tree, run the following:
+
+ $ bin/tiger_import /opt/tiger/geocoder.db /opt/tiger
+
+This will unpack each TIGER/Line ZIP file to a temporary directory, and
+perform the extract/transform/load sequence to incrementally build the
+database. The process takes about a day on a normal Amazon EC2 instance. Note
+that not all TIGER/Line source files contain address range information, so you
+will see error messages for some counties, but this is normal.
+
+If you only want to import specific counties, you can pipe a list of
+TIGER/Line county directories to tiger_import on stdin.
+
+=== Build the indexes
+
+After the database import is complete, you will want to construct the database
+indexes:
+
+ $ bin/build_indexes /opt/tiger/geocoder.db
+
+This process will take a few hours, but it's a *lot* faster than building
+the indexes incrementally during the import process.
+
+=== Cluster the database tables (optional)
+
+As a final optional step, you can cluster the database tables according to
+their indexes, which will make the database smaller, and lookups faster. This
+process will take an hour or two, and may be a micro-optimization.
+
+ $ bin/rebuild_cluster /opt/tiger/geocoder.db
+
+You will need as much free disk space to run rebuild_cluster as the database
+takes up, because the process essentially reconstructs the database in a
+new file and then renames the new one over top of the old.
+
+== Running the unit tests
+
+From within the source tree, you can run the following.
+
+ $ ruby tests/run.rb
+
+This tests the libraries, except for the database routines. If you have a
+database built, you can run the test harness like so:
+
+ $ ruby tests/run.rb /opt/tiger/geocoder.db
+
+The full test suite may take 30 or so seconds to run completely.
+
+== License
+
+Geocoder::US 2.0 was based on earlier work by Schuyler Erle on
+a Perl module of the same name. You can find it at
+http://search.cpan.org/~sderle/.
+
+Geocoder::US 2.0 was written by Schuyler Erle, of Entropy Free LLC,
+with the gracious support of FortiusOne, Inc. Please send bug reports,
+patches, kudos, etc. to patches at geocoder.us.
+
+Copyright (c) 2009 FortiusOne, Inc.
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
View
File renamed without changes.

0 comments on commit 5fc430c

Please sign in to comment.