<IMG SRC="https://github.com/jacquesroy/byte-size-data-science/raw/master/images/Banner.png" ALT="BSDS Banner" WIDTH=1195 HEIGHT=200>

<table align="left">
    <tr><td>
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a></td><td>This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.</td>
    </tr>
    <tr><td>Jacques Roy, Byte Size Data Science</td><td> </td></tr>
    </table>

# Tiger Files
**TIGER**: Topologically Integrated Geographic Encoding and Referencing

The TIGER/Line Shapefiles contain a standard geographic identifier for each entity that links to the geographic identifier in the data from censuses and surveys.

Technical documentation: https://www2.census.gov/geo/pdfs/maps-data/data/tiger/tgrshp2019/TGRSHP2019_TechDoc.pdf

https://www.census.gov/programs-surveys/geography/guidance/tiger-data-products-guide.html

| Product | Best for... |  
| :-- | :-- |
| <a href="https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html">TIGER/Line Shapefiles</a> | Most mapping projects--this is our most comprehensive dataset. Designed for use with GIS (geographic information systems). | 
| <a href="https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-geodatabase-file.html">TIGER Geodatabases</a> | Useful for users needing national datasets or all major boundaries by state. Designed for use in ArcGIS. Files are extremely large. |
| <a href="https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-data.html">TIGER/Line with Selected Demographic and Economic Data</a> | Data from selected attributes from the 2010 Census, 2006-2010 through 2012-2016 ACS 5-year estimates and County Business Patterns (CBP) for selected geographies. Designed for use with GIS. |
| <a href="https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html">Cartographic Boundary Shapefiles</a> | Small scale (limited detail) mapping projects clipped to shoreline. Designed for thematic mapping using GIS. |


### 048-Tiger Files
Execute the next cell if you want to see the `Byte Size Data Science` youtube channel video

In [None]:
from IPython.display import IFrame

IFrame(src="https://www.youtube.com/embed/d0-vTCWe0jk?rel=0&amp;controls=0&amp;showinfo=0", width=560, height=315)


## File format
- Naming convention: tl_yyyy_`extend`_`layer`.ext<br/>
    extend: nation (us), state, or county -based

## Layer examples
There are 43 different layers available. You can see a list on page 12 (page 2-5, table 1) of the <a href="https://www2.census.gov/geo/pdfs/maps-data/data/tiger/tgrshp2019/TGRSHP2019_TechDoc.pdf">technical documentation</a>

Here are some examples:

| Layer | Nation | State | County | Name example | 
| -- | -- | -- | -- | -- | 
| Address Range-Feature | - | - | X | tl_2019_&lt;state-countyFIPS&gt;_addrfeat.shp |
| All Roads | - | - | X | tl_2019_&lt;state-countyFIPS&gt;_roads.shp |
|Coastline | X | - | - | tl_2019_us_coastline.shp |
| County and Equivalent | X | - | - | tl_2019_us_county.shp |
| Place | - | X | - | tl_2019_&lt;stateFIPS&gt;_place.shp |
| State and Equivalent | X | - | - | tl_2019_us_state.shp |
|ZIP Code Tabulation Area | X | - | - | tl_2019_us_zcta510.shp |

In [None]:
!pip install geopandas 2>&1 >pipgeopandas.txt
import geopandas as gp

In [None]:
import pandas as pd
import requests, zipfile, io

## Tiger subdirectories
Each layer is stored in a different subdirectory.<br/>
The directory names are shorter nmaes for the layers. The documentation has a table on page 93 that lists the directory nmaes and the layer names.

In [None]:
# List the directory names
from ftplib import FTP

addr = "ftp2.census.gov"
targer_dir = 'geo/tiger/TIGER2019'

ftp = FTP(addr)
ret = ftp.login("anonymous", "ftplib-example-1")

ret = ftp.login("anonymous", "ftplib-example-1")
ftp.cwd(targer_dir)
data = []

ftp.dir(data.append)
ftp.quit()

res = ""
for line in data:
    res=res + (line.split()[8]) + ", "
print(res)

## Reference files
Some data files are divided by **state** and or **county**.<br/>
There are some reference files that are useful as we see below.<br/>
They can be found in https://www2.census.gov/geo/docs/reference and its sub-directories.

The county file (`county_adjacent.txt`) is not in a format that is easy to read. It will make more sense to simply read the county shape file instead. There is another file listing countines in the `code` sub-directory (`national_county.txt`) but it does not contain all the information we need.

In [None]:
# States
url = 'http://www2.census.gov/geo/docs/reference/state.txt'
states_pd = pd.read_csv(url, sep='|')
print ("Number of states: " + str(states_pd['STATE'].count()))
states_pd.head()

In [None]:
# Counties: This file does not contain the codes we want
url = 'https://www2.census.gov/geo/docs/reference/codes/national_county.txt'
counties_pd = pd.read_csv(url, sep=',')
print ("Number of counties: " + str(counties_pd['State'].count()))

counties_pd.head()

## Counties
The Counties information is contained at the US level. This means that the subdiretory `COUNTY` will have only one file that follows the naming convention:`tl_2019_us_county`. Since shape files are actually multiple files, the file to download is a zip file: `tl_2019_us_county.zip`

In [None]:
# Landmarks in Illinois
zip_file='https://www2.census.gov/geo/tiger/TIGER2019/COUNTY/tl_2019_us_county.zip'
r = requests.get(zip_file)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()
!ls 

In [None]:
# Project file providing the reference system
!cat tl_2019_us_county.prj

In [None]:
gdf_counties = gp.read_file('tl_2019_us_county.dbf')
print("Number of records: " + str(gdf_counties['STATEFP'].count()))
gdf_counties.head()

## Place example
The `PLACE` layer is divided in files by state.<br/>
By using the state information above, we find that `Oregon` is 41, `Colorado` is 8 and `Illinois` is 17.

In [None]:
# Display all the available files
addr = "ftp2.census.gov"
target_dir = 'geo/tiger/TIGER2019/PLACE'

ftp = FTP(addr)
ret = ftp.login("anonymous", "ftplib-example-1")

ftp.cwd(target_dir)
data = []

ftp.dir(data.append)
ftp.quit()

res = ""
for line in data:
    res=res + (line.split()[8]) + ", "
print(res)

In [None]:
print(states_pd['STATE'].values)
print("Missing 'STATE': " + states_pd[states_pd['STATE']==74]['STATE_NAME'].values)

In [None]:
# places in Illinois
!rm tl*
zip_file='https://www2.census.gov/geo/tiger/TIGER2019/PLACE/tl_2019_17_place.zip'
r = requests.get(zip_file)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()
!ls 

In [None]:
gdf_places = gp.read_file('tl_2019_17_place.shp')
print("Number of records: " + str(gdf_places['STATEFP'].count()))
gdf_places.head()

## CLASSFP and LSAD
- Class code list (CLASSFP): https://www.census.gov/library/reference/code-lists/class-codes.html
- LSA Description (LSAD) : https://www.census.gov/library/reference/code-lists/legal-status-codes.html

### CLASSFP
- C1: An active incorporated place that does not serve as a county subdivision equivalent
- C5: An active incorporated place that is independent of any county subdivision and serves as a county subdivision equivalent
- M2: A military or other defense installation entirely within a place
- U1: A census designated place with an official federally recognized name
- U2: A census designated place without an official federally recognized name

### LSAD
- 25: Consolidated City, County or Equivalent Feature, County Subdivision, Economic Census Place, Incorporated Place
- 46: County Subdivision
- 47: County Subdivision, Economic Census Place, Incorporated Place
- 57: Census Designated Place, Economic Census Place

In [None]:
# Count of object types
print(gdf_places.groupby('CLASSFP')['PLACEFP'].nunique())
print(gdf_places.groupby('LSAD')['PLACEFP'].nunique())

In [None]:
# Need to convert long/lat from strings to floats
gdf_places['INTPTLAT'] = gdf_places['INTPTLAT'].astype(float)
gdf_places['INTPTLON'] = gdf_places['INTPTLON'].astype(float)
gdf_places.dtypes

In [None]:
# List places that intersect with the bounding box around Chicago
# Find objects within a bounding box
geom = gdf_places[gdf_places['NAME']=='Chicago']['geometry'].reset_index().iloc[0]['geometry']
chi_gdf = gdf_places[gdf_places['geometry'].intersects(geom.envelope)]
print("Number of records: " + str(chi_gdf['STATEFP'].count()))
chi_gdf.head()

## Display objects on a map

In [None]:
!pip install folium 2>&1 >foliumpip.out

import folium

In [None]:
latlong = chi_gdf[['INTPTLAT', 'INTPTLON']].mean()
chi_map = folium.Map(location=[latlong[0], latlong[1]], zoom_start=10, width="80%", height="80%")

geom2 = chi_gdf[chi_gdf['NAME']=='Chicago'].reset_index()
folium.GeoJson(
        geom2.iloc[0]['geometry'].envelope,
        name='Bounding box',
        tooltip='Bounding box'
    ).add_to(chi_map)

folium.GeoJson(
        geom2.iloc[0]['geometry'],
        name=geom2.iloc[0]['NAME'],
        tooltip=geom2.iloc[0]['NAME']
    ).add_to(chi_map)


# Add the long lat point for each city
for ix in range(chi_gdf['NAME'].count()) :
        folium.CircleMarker(
            [chi_gdf.iloc[ix]['INTPTLAT'].item(), chi_gdf.iloc[ix]['INTPTLON'].item()],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            tooltip=chi_gdf.iloc[ix]['NAME'] + " ctr",
            fill_opacity=0.6
        ).add_to(chi_map)

folium.LayerControl().add_to(chi_map)
chi_map