<IMG SRC="https://github.com/jacquesroy/byte-size-data-science/raw/master/images/Banner.png" ALT="BSDS Banner" WIDTH=1195 HEIGHT=200>

<table align="left">
    <tr><td>
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a></td><td>This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.</td>
    </tr>
    <tr><td>Jacques Roy, Byte Size Data Science</td><td> </td></tr>
    </table>

# Accessing shape files


### 047-Shape files
Execute the next cell if you want to see the `Byte Size Data Science` youtube channel video

In [None]:
from IPython.display import IFrame

IFrame(src="https://www.youtube.com/embed/nL4XXB8QMMg?rel=0&amp;controls=0&amp;showinfo=0", width=560, height=315)


## Read the census bureau state data

In [None]:
# Redirecting the output to a file in case of problems
!pip install geopandas 2>&1 >pipgeopandas.txt

In [None]:
import pandas as pd
import requests, zipfile, io
import geopandas as gp

## Read Illinois place file

In [None]:
place_file='https://www2.census.gov/geo/tiger/TIGER2019/PLACE/tl_2019_17_place.zip'
r = requests.get(place_file)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()

!ls -l

## Place file definition
Place State-based shapefile record layout:

| field | Length | Type | Description |
| -- | -- | -- | -- |
| STATEFP | 2 | String | Current state FIPS code |
| PLACEFP | 5 | String | Current place FIPS code |
| PLACENS | 8 | String | Current place GNIS code |
| GEOID | 7 | String | Place identifier; a concatenation of the current state FIPS code and place FIPS code |
| NAME | 100 | String | Current place name |
| NAMELSAD | 100 | String | Current name and the translated legal/statistical area description for place |
| LSAD | 2 | String | Current legal/statistical area description code for place |
| CLASSFP | 2 | String | Current FIPS class code |
| PCICBSA | 1 | String | Current metropolitan or micropolitan statistical area principal city indicator |
| PCINECTA | 1 | String | Current New England city and town area principal city indicator |
| MTFCC | 5 | String | G4110 (incorporated place) and G4210 (census designated place) |
| FUNCSTA | 1 | String | Current functional status |
| ALAND | 14 | NUMBER | Current land area |
| AWATER | 14 | String | Current water area |
| INTPTLAT | 11 | String | Current latitude of the internal point |
| INTPTLON | 12 | String | Current longitude of the internal point |
| geometry | ?? | geometry | geometry |


In [None]:
# Format: see tech doc page 3-56
gdf_places = gp.read_file('tl_2019_17_place.shp')
print("Number of records: " + str(gdf_places['STATEFP'].count()))
gdf_places.head()

In [None]:
gdf_places.dtypes

In [None]:
gdf_places["INTPTLAT"] = gdf_places["INTPTLAT"].astype('float64')
gdf_places["INTPTLON"] = gdf_places["INTPTLON"].astype('float64')

In [None]:
gdf_places.dtypes

In [None]:
gdf_places.head()

## Manipulate a "geometry"
See: https://shapely.readthedocs.io/en/stable/manual.html#general-attributes-and-methods

In [None]:
geo = gdf_places.iloc[1]["geometry"]
print("Area: " + str(geo.area))
print("Boundary: " + str(geo.bounds))
print("Length: " + str(geo.length))
print("Type: " + str(geo.geom_type))
print("Type: " + str(type(geo)))
print("WKT: " + geo.wkt)

In [None]:
print(geo)

In [None]:
geo

In [None]:
# !conda install -c conda-forge folium=0.5.0 --yes
# !pip install folium==0.5.0
# I'm installing the latest version: 0.10.0
!pip install folium 2>&1 >foliumpip.out

import folium

In [None]:
xx = gdf_places[(gdf_places['NAME'] == 'Chicago') | (gdf_places['NAME'] == 'Naperville')].bounds
xmin = xx['minx'].min()
ymin = xx['miny'].min() 
xmax = xx['maxx'].max() 
ymax= xx['maxy'].max()
xx

In [None]:
# Select only the rows that are witn=hin a bounding box (Chicago area)
# see: http://geopandas.org/indexing.html

# xmin, ymin, xmax, ymax = (-87.92, 41.64, -87.52, 42.03)
# Get some boundaries for the cities I'll pick
xmin, ymin, xmax, ymax = \
    gdf_places[(gdf_places['NAME'] == 'Norridge') | (gdf_places['NAME'] == 'Woodridge')].\
    bounds.agg({ 'minx': 'min', 'miny': 'min', 'maxx':'max', 'maxy':'max'})

subset = gdf_places.cx[xmin:xmax, ymin:ymax].reset_index()

# Instead pick a few cities
subset = gdf_places.loc[gdf_places['NAME'].isin(['Chicago','Schaumburg','Naperville','La Grange','Lombard','West Chicago'])].reset_index()
print("Number of cities: " + str(subset["NAME"].count()))
# subset.head(2)

In [None]:
import shapely
latlong = subset[['INTPTLAT', 'INTPTLON']].mean()

loc_map = folium.Map(location=[latlong[0], latlong[1]], crs='EPSG3857', zoom_start=10, width="80%", height="80%")
geo_objects = folium.map.FeatureGroup()

# Adding the city borders
#for ix in range(subset['NAME'].count()) :
for ix in range(subset["NAME"].count()) :
    folium.GeoJson(
        subset.iloc[ix]['geometry'],
        name=subset.iloc[ix]['NAME'],
        tooltip=subset.iloc[ix]['NAME']
    ).add_to(loc_map)

# Add the long lat point for each city
for ix in range(subset["NAME"].count()) :
        folium.CircleMarker(
            [subset.iloc[ix]['INTPTLAT'].item(), subset.iloc[ix]['INTPTLON'].item()],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            tooltip=subset.iloc[ix]['NAMELSAD'],
            fill_opacity=0.6
        ).add_to(loc_map)
    
folium.LayerControl().add_to(loc_map)
loc_map

## Pysal Library
See: https://pysal.org

PySAL is an open-source project designed to support spatial data science. It released under the modified BSD license.

In [None]:
# Issue with pysal 2.1.0: does not find libspatialindex_c library file
# From import spaghetti
# Doc: https://pysal.readthedocs.io/en/latest/users/index.html
# !pip install --upgrade -U pysal
# !pip uninstall -y pysal
!pip install pysal==2.0.0 2>&1 >pippysal.txt
import pysal

In [None]:
# List the file types supported
pysal.lib.io.fileio.FileIO.check()

In [None]:
# Read a .dbf file
db = pysal.lib.io.fileio.FileIO('./tl_2019_17_place.dbf','r')
db.header

In [None]:
db.field_spec

In [None]:
xxx=db.read(10)
xxx[0]

In [None]:
# Read a shp file
shp = pysal.lib.io.fileio.FileIO('./tl_2019_17_place.shp')
print(str(shp[0].vertices))
print("Shape FileIO type: " + str(type(shp)))
print("Record type: " + str(type(shp[0])))