This project showcases a bash script that robostly interacts with the POSTGIS Tiger Extension to setup, load, and configure a geospatial database with various levels of depth Before continuing, the project structure is as follows: .
├── ./Logs : Directory for Log files generated during script (unique logs created for each distinct state loaded)
├── ./Resources : Directory containing various files utilized throughout the project
│ ├── ./Resources/Images : Images used
│ └── ./Resources/Scripts : Examples Scripts
│ └── ./Resources/mtfccs2022.csv : Raw Mtfccs Key table loaded into Database with Script
├── ./tiger_overview_with_ddl.md : Claude generated markdown incorpating DDL from Database
├── ./overview.md : Granular examination of the Bash Script and the ELT process
├── ./README.md : General overview of project with high level steps
└── ./tigerload.sh : Consolidated Bash script that will allow user to configure, load, and get detailed information of output
To run, clone the repository and run the following in the command line:
chmod +x tigerload.sh
./tigerload.shDuring the initial commit, this project has only been optimized for a Debian OS (Debian 12 Bookworm). Plans exists to showcase parallel processes via PS1 (powershell) and PY (python) With that being said, the first steps are to install postgres with postgis!
sudo apt-get update -y && sudo apt-get upgrade -y
sudo apt-get install -y wget systemctl
sudo apt-get install -y gnupg2 && wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update -y
sudo apt-get install -y postgresql postgresql-contrib postgresql-client postgisWhile a degree of familiarity with the command line will be necessary to fully follow along; the next steps I undertake after installing PostgreSql is configuring a Superuser profile with the same name as the local user (me)
#Create a super user
sudo psql -U postgres -c "CREATE USER $(whoami) WITH PASSWORD 'PASSWORD_HERE' SUPERUSER;"
# psql -d postgres -c "CREATE DATABASE tigercensus;" This step is automatically done in the script FYIsequenceDiagram
participant Census as US Census Bureau
participant Staging as Staging Directory
participant TempDB as tiger_staging Schema
participant ProdDB as tiger_data Schema
participant Geocoder as PostGIS Geocoder
Note over Census, Geocoder: National Loader Process
Census->>Staging: Download TIGER/Line Files (ZIP)
Staging->>Staging: Extract ZIP files to temp directory
Staging->>TempDB: Load shapefiles using shp2pgsql
TempDB->>TempDB: Apply transformations & validations
TempDB->>ProdDB: Move processed data via loader_load_staged_data()
ProdDB->>Geocoder: Install and enable national geocoder support
Note over Census, Geocoder: State Loader Process
Census->>Staging: Download State-specific files
Staging->>TempDB: Load state data to staging
TempDB->>ProdDB: Transform & load state data
ProdDB->>Geocoder: Rebuild geocoder indexes for updated state data
The above diagram provides a generalized overview of how the data ELT process will run. Refer to documentation: 2.4. Installing, Upgrading Tiger Geocoder, and loading data
To start of with, you will need to create the following extensions in a Database of your chosing (tigercensus from above).
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS address_standardizer;
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
CREATE EXTENSION IF NOT EXISTS postgis_tiger_geocoder;This will go ahead and populate the database with geospatial functions, defined structure via inheritance for Tiger components. To view all tables, run the following command in the console:
psql -d tigercensus -c "SELECT table_catalog, table_schema, table_name, table_type, is_typed FROM information_schema.tables as x1 where table_schema != 'pg_catalog' and table_schema != 'information_schema';"
#or Via the official documentation, you can test everything is setup correctly via:
psql -d tigercensus -c "SELECT na.address, na.streetname,na.streettypeabbrev, na.zip
FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na;"At this point, you will create a 'loader' profile which can be done rather simply, or a little more confidently:
INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep,
loader, environ_set_command, county_process_command)
SELECT 'debbie', declare_sect, pgbin, wget, unzip_command, psql, path_sep,
loader, environ_set_command, county_process_command
FROM tiger.loader_platform
WHERE os = 'sh';
UPDATE tiger.loader_variables
SET
tiger_year = '${TIGER_YEAR}',
website_root = '${TIGER_WEBSITE_ROOT}',
staging_fold = '${STAGING_FOLDER}',
data_schema = 'tiger_data',
staging_schema = 'tiger_staging'or, the shell version:
db_fnc_run_psql "${DB_NAME}" <<EOF
INSERT INTO tiger.loader_platform (
os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, loader, environ_set_command, county_process_command
)
VALUES (
'debbie',
E'TMPDIR=${STAGING_FOLDER}/temp/\\nUNZIPTOOL=unzip\\nWGETTOOL="/usr/bin/wget"\\nexport PGBIN=/usr/lib/postgresql/15/bin\\nexport PGPORT=5432\\nexport PGHOST=localhost\\nexport PGUSER=${PG_USER}\\nexport PGPASSWORD=${PGPASSWORD}\\nexport PGDATABASE=${DB_NAME}\\nPSQL=\${PGBIN}/psql\\nSHP2PGSQL=shp2pgsql\\ncd ${STAGING_FOLDER}',
'/usr/lib/postgresql/15/bin',
'wget',
E'rm -f \${TMPDIR}/*.*\\n\${PSQL} -c "DROP SCHEMA IF EXISTS \${staging_schema} CASCADE;"\\n\${PSQL} -c "CREATE SCHEMA \${staging_schema};"\\nfor z in *.zip; do \$UNZIPTOOL -o -d \$TMPDIR \$z; done\\ncd \$TMPDIR;',
'\${PSQL}',
'/',
'\${SHP2PGSQL}',
'export',
E'for z in *\${table_name}*.dbf; do\\n\${loader} -D -s 4269 -g the_geom -W "latin1" \$z \${staging_schema}.\${state_abbrev}_\${table_name} | \${psql}\\n\${PSQL} -c "SELECT loader_load_staged_data(lower(\'\${state_abbrev}_\${table_name}\'), lower(\'\${state_abbrev}_\${lookup_name}\'));"\\ndone'
)
EOF
db_fnc_run_psql "${DB_NAME}" <<EOF
UPDATE tiger.loader_variables
SET
tiger_year = '${TIGER_YEAR}',
website_root = '${TIGER_WEBSITE_ROOT}',
staging_fold = '${STAGING_FOLDER}',
data_schema = 'tiger_data',
staging_schema = 'tiger_staging'
EOF
Additional loading configuration includes:
--Impacts initial setup
UPDATE tiger.loader_lookuptables SET load = true WHERE table_name = 'zcta520';
--Impacts State specific setup
UPDATE tiger.loader_lookuptables SET load = true WHERE load = false AND lookup_name IN('tract', 'bg', 'tabblock20');The final touches are straightforward and just maintenance and generation of the 'nation_loading script.'
psql -d tigercensus -c "ALTER EXTENSION postgis UPDATE;"
psql -d tigercensus -c "ALTER EXTENSION postgis_tiger_geocoder UPDATE;"
nation_script_path="${STAGING_FOLDER}/nation_script_load.sh"
psql -d tigercensus -tA -c "SELECT Loader_Generate_Nation_Script('debbie');" > "${nation_script_path}";
#and finally to run it:
chmod +x "${nation_script_path}"
#if you want to run it with debug: bash -x "${nation_script}"
bash "${nation_script_path}"At this point, individual state data with boundaries-from tablock level and up, place, and features — roads, utilities, rails, water. This is done with the following:
$STATE_TO_LOAD="KS"
state_script_path="${STAGING_FOLDER}/${STATE_TO_LOAD}_script_load.sh"
psql -d tigercensus -tA -c "SELECT Loader_Generate_Script(Array['${STATE_TO_LOAD}'],'debbie');" > "${state_script_path}"
chmod +x "${state_script_path}"
#if you want to run it with debug: bash -x "${state_script_path}"
bash "${state_script_path}"
#Alternative use: psql -c "SELECT Loader_Generate_Script(Array['TX','NM','SC','NC','TN'],'debbie')" -d testdb -tA > ~/GISData/batch_02.shflowchart TD
StartState([Start State Loading]) --> SelectState[Select State by Abbreviation<br/>e.g., 'ks', 'tx', 'ca']
SelectState --> GenStateScript[Generate State Script<br/>Loader_Generate_Script]
GenStateScript --> DownloadState[Download State-Specific Files<br/>from Census TIGER]
DownloadState --> ExtractState[Extract State ZIP Files]
ExtractState --> ProcessState{Process Each County/Feature}
ProcessState --> LoadCounty[Load County Shapefiles<br/>to tiger_staging]
LoadCounty --> TransformCounty[Transform County Data<br/>loader_load_staged_data]
TransformCounty --> ProcessState
ProcessState --> StateComplete([State Loading Complete])
subgraph "State Data Types"
Roads[Roads & Streets]
Boundaries[County Boundaries]
Census[Census Blocks]
Water[Water Features]
Landmarks[Landmarks]
end
LoadCounty --> Roads
LoadCounty --> Boundaries
LoadCounty --> Census
LoadCounty --> Water
LoadCounty --> Landmarks
style SelectState fill:#e8f5e8
style DownloadState fill:#e3f2fd
style LoadCounty fill:#fff3e0
style TransformCounty fill:#fce4ec
A diagram (TO BE FIXED) showcasing this process
While by no means will this be a finalized script/product, but rather a way to continue developing skills and help plan additional projects on my end. With this document, capturing the components at play will hopefully allow for others to either replicate, critique, or offer suggestions! Tentative goals with this as a 'framework' are:
- Manipulation of Geospatial data across years and various sources (ex. introducing multiple years of census data, loading rasterized data to supplement census data)
- Connection to other data sources (Census,FRED,BLS,USDA,USGS,CDC) to allow for robust analysis/modeling of economic data with integration of geospatial data
- QGIS/Plotly Loading via python (Interacting with this database to create projects that visualize the data)

