**References**

* [Legal/Statistical Area Description (LSAD)](https://www.census.gov/library/reference/code-lists/legal-status-codes.html)
* The cartographic boundary files used by this notebook are courtesy of the United States Census Bureau.
  * [Shapefiles](https://www2.census.gov/geo/tiger/)
  * Thus far, only the [2018 shapefiles have been used](https://www2.census.gov/geo/tiger/GENZ2018/)
* [Gazetteer Files](https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-files.html)
* [Gazetteer File Record Layout](https://www.census.gov/programs-surveys/geography/technical-documentation/records-layout/gaz-record-layouts.html): Includes explanatory notes for a variety of field names used in a variety of Census Bureau geographic files.  The names tend to be standard U.S. geographic codes.


<br>

## Preliminaries

In [1]:
!rm -rf *.sh

<br>

### Packages

In [2]:
import subprocess

In [3]:
if 'google.colab' in str(get_ipython()):
    subprocess.run('wget -q https://raw.githubusercontent.com/miscellane/cartographs/develop/...sh', shell=True)
    subprocess.run('chmod u+x scripts.sh', shell=True)
    subprocess.run('./scripts.sh', shell=True)

<br>

### Paths

In [4]:
import os
import pathlib
import sys

In [5]:
if not 'google.colab' in str(get_ipython()):    
    notebooks = os.getcwd()
    parent = str(pathlib.Path(notebooks).parent)
    sys.path.append(parent)
else:
    notebooks = os.getcwd()
    parent = notebooks

<br>

Warehouse

In [6]:
warehouse = os.path.join(parent, 'warehouse', 'gazetteer')

if not os.path.exists(warehouse):
    os.makedirs(warehouse)

<br>

### Libraries

In [7]:
import logging
import collections

import pandas as pd
import numpy as np

import json

<br>

### Logging

In [8]:
logging.basicConfig(level=logging.INFO, format='%(message)s\n%(asctime)s.%(msecs)03d', datefmt='%Y-%m-%d %H:%M:%S')
logger = logging.getLogger(__name__)

<br>

### Custom

In [9]:
import cartographs.boundaries.us.boundaries
import cartographs.boundaries.us.settings

In [10]:
settings = cartographs.boundaries.us.settings.Settings()
boundaries = cartographs.boundaries.us.boundaries.Boundaries(crs=settings.crs)

<br>
<br>

## Geographies

### States

In [11]:
states: pd.DataFrame = boundaries.states(year=settings.latest)
logger.info(states.info())

None
2021-06-14 19:28:09.699


<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 56 entries, 0 to 55
Data columns (total 10 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   STATEFP   56 non-null     object  
 1   STATENS   56 non-null     object  
 2   AFFGEOID  56 non-null     object  
 3   GEOID     56 non-null     object  
 4   STUSPS    56 non-null     object  
 5   NAME      56 non-null     object  
 6   LSAD      56 non-null     object  
 7   ALAND     56 non-null     int64   
 8   AWATER    56 non-null     int64   
 9   geometry  56 non-null     geometry
dtypes: geometry(1), int64(2), object(7)
memory usage: 4.5+ KB


<br>

Names

In [12]:
states.rename(columns={'GEOID': 'STATEGEOID', 'AFFGEOID': 'STATEAFFGEOID', 'NAME': 'STATE', 
                       'ALAND': 'STATESQMLAND', 'AWATER': 'STATESQMWATER'}, inplace=True)

In [13]:
states.rename(str.lower, axis='columns', inplace=True)

<br>

Coördinates

In [14]:
states.loc[:, 'latitude'] = states.centroid.y
states.loc[:, 'longitude'] = states.centroid.x

<br>

Preview

In [15]:
logger.info(states.head())

  statefp   statens stateaffgeoid stategeoid stusps           state lsad  \
0      28  01779790   0400000US28         28     MS     Mississippi   00   
1      37  01027616   0400000US37         37     NC  North Carolina   00   
2      40  01102857   0400000US40         40     OK        Oklahoma   00   
3      51  01779803   0400000US51         51     VA        Virginia   00   
4      54  01779805   0400000US54         54     WV   West Virginia   00   

   statesqmland  statesqmwater  \
0  121533519481     3926919758   
1  125923656064    13466071395   
2  177662925723     3374587997   
3  102257717110     8528531774   
4   62266474513      489028543   

                                            geometry   latitude  longitude  
0  MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...  32.750861 -89.665208  
1  MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...  35.541527 -79.372377  
2  POLYGON ((-103.00257 36.52659, -103.00219 36.6...  35.583479 -97.508285  
3  MULTIPOLYGON (((-75.74241 3

<br>

### Counties

In [16]:
counties = boundaries.counties(year=settings.latest)
logger.info(counties.info())

None
2021-06-14 19:28:22.069


<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 3233 entries, 0 to 3232
Data columns (total 10 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   STATEFP   3233 non-null   object  
 1   COUNTYFP  3233 non-null   object  
 2   COUNTYNS  3233 non-null   object  
 3   AFFGEOID  3233 non-null   object  
 4   GEOID     3233 non-null   object  
 5   NAME      3233 non-null   object  
 6   LSAD      3233 non-null   object  
 7   ALAND     3233 non-null   int64   
 8   AWATER    3233 non-null   int64   
 9   geometry  3233 non-null   geometry
dtypes: geometry(1), int64(2), object(7)
memory usage: 252.7+ KB


<br>

Names

In [17]:
counties.rename(columns={'AFFGEOID': 'COUNTYAFFGEOID', 'GEOID': 'COUNTYGEOID', 'NAME': 'COUNTY', 
                        'ALAND': 'COUNTYSQMLAND', 'AWATER': 'COUNTYSQMWATER'}, inplace=True)

In [18]:
counties.rename(str.lower, axis='columns', inplace=True)

<br>

Coördinates

In [19]:
counties.loc[:, 'latitude'] = counties.centroid.y
counties.loc[:, 'longitude'] = counties.centroid.x

<br>

Preview

In [20]:
logger.info(counties.head())

  statefp countyfp  countyns  countyaffgeoid countygeoid   county lsad  \
0      21      007  00516850  0500000US21007       21007  Ballard   06   
1      21      017  00516855  0500000US21017       21017  Bourbon   06   
2      21      031  00516862  0500000US21031       21031   Butler   06   
3      21      065  00516879  0500000US21065       21065   Estill   06   
4      21      069  00516881  0500000US21069       21069  Fleming   06   

   countysqmland  countysqmwater  \
0      639387454        69473325   
1      750439351         4829777   
2     1103571974        13943044   
3      655509930         6516335   
4      902727151         7182793   

                                            geometry   latitude  longitude  
0  POLYGON ((-89.18137 37.04630, -89.17938 37.053...  37.058482 -88.999256  
1  POLYGON ((-84.44266 38.28324, -84.44114 38.283...  38.206735 -84.217151  
2  POLYGON ((-86.94486 37.07341, -86.94346 37.074...  37.207285 -86.681623  
3  POLYGON ((-84.12662 37.6454

<br>
<br>

## Gazetteer

In [21]:
gazetteer = counties[['statefp', 'countyfp', 'countygeoid', 'county', 'countysqmland', 'countysqmwater', 'latitude', 'longitude']].\
merge(states[['statefp', 'stusps', 'state']], how='left', on='statefp')

logger.info(gazetteer.info())

None
2021-06-14 19:28:22.489


<class 'pandas.core.frame.DataFrame'>
Int64Index: 3233 entries, 0 to 3232
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   statefp         3233 non-null   object 
 1   countyfp        3233 non-null   object 
 2   countygeoid     3233 non-null   object 
 3   county          3233 non-null   object 
 4   countysqmland   3233 non-null   int64  
 5   countysqmwater  3233 non-null   int64  
 6   latitude        3233 non-null   float64
 7   longitude       3233 non-null   float64
 8   stusps          3233 non-null   object 
 9   state           3233 non-null   object 
dtypes: float64(2), int64(2), object(6)
memory usage: 277.8+ KB


<br>

Save

In [22]:
gazetteer.to_csv(path_or_buf=os.path.join(warehouse, 'countylevel.csv'), index=False, encoding='UTF-8', header=True)