Permalink
Browse files

add info about 2000 SQL scripts to README (Thanks to Tom Meagher)

  • Loading branch information...
1 parent 091b7ab commit a287df6843898c9248e0d1a7d8e733576ea13c4a @JoeGermuska JoeGermuska committed Aug 12, 2011
Showing with 2 additions and 2 deletions.
  1. +2 −2 tools/README.rst
View
@@ -4,7 +4,7 @@ As an adjunct to the core census.ire.org web app, we are collecting scripts and
SQL
===
-The SQL directory contains create table scripts for working with SF1 data, and a general script for Census 2010 geoheaders. These are meant to be used to directly load the raw data files, with all columns.
+The SQL directory contains create table scripts for working with SF1 data, and a general script for Census 2010 geoheaders. These are meant to be used to directly load the raw data files, with all columns. It also contains create table scripts (in tools/sql/ire_export) for the bulk data exports downloadable from http://census.ire.org/data/bulkdata.html
For the geoheader files, which are fixed width, you may find useful the csvkit library, which has a tool called 'in2csv' which can be used to convert fixed-width files to CSV.
@@ -15,7 +15,7 @@ For the geoheader files, which are fixed width, you may find useful the csvkit l
NOTE: Take care with character encoding. Some place names (such as those with Spanish words) contain non-ASCII characters. The Census Bureau encodes the files using "latin-1" encoding.
The in2csv example above handles this correctly. in2csv always writes output files in UTF-8, so adjust your database load scripts accordingly.
-Thanks to Ron Campbell of the Orange County Register for contributing the basis of **geo_2010.sql** Thanks to Mike Stucka of the Telegraph of Macon for suggestions to clarify the SQL and make it more compatible.
+Thanks to Ron Campbell of the Orange County Register for contributing the basis of **geo_2010.sql** Thanks to Mike Stucka of the Telegraph of Macon for suggestions to clarify the SQL and make it more compatible. Thanks to Tom Meagher of the Star-Ledger for SQL scripts for the 2000 census.
Note that the column IDs do not exactly match the values printed in the SF1 technical documentation. Our method was to zero-pad digits to three positions, but we made no allowance for the occasional presence of letters qualifying race/ethnic variations on certain tables. Therefore, our column lengths vary in length somewhat compared to the SF1 versions of the labels. (If anyone is motivated to create an alternate SQL file or view which maintains tighter consistency with the SF1 technical documentation, please feel free to send a file or issue a pull request.)

0 comments on commit a287df6

Please sign in to comment.