# Haystacks AI Project 4 Group 1: Quantitative Explainability Solution

## GeoJSON Test Run

The purpose of this notebook is to verify that the GeoJSONs from the following sources function:

Counties Georgia 2010 (Princeton University Library): <br>https://maps.princeton.edu/catalog/harvard-tg00gazcta

UA Census Zip Code Tabulation Areas, 2000 - Georgia (ibid.): <br>https://maps.princeton.edu/catalog/harvard-tg00gazcta

<a id=toc></a>
## Table of Contents

<ul>
    <li><a href=#01-data-load>Initial Data Load</a>
    <li><a href=#02-geojson-map>GeoJSON Choropleth Map Generation</a>
        <ul>
            <li><a href=#02-a-install-load>Install (if Applicable) and Load Necessary Packages</a>
        </ul>
        <ul>
            <li><a href=#02-b-ga-counties>Georgia Counties</a>
        </ul>
        <ul>
            <li><a href=#02-c-ga-zipcodes>Georgia Zip Codes</a>
        </ul>
</ul>

<a id=01-data-load></a>
## Initial Data Load

Load the house .csv into a **pandas** DataFrame in order to properly bring in the following features:

<ol>
    <b><li>latitude</li>
    <li>longitude</li>
    <li>county</li>
    <li>zipcode</li></b>
</ol>

In [None]:
import pandas as pd
import numpy as np

haystacks_ga_data = pd.read_csv('data/haystacks_ga_clean_new_format.csv', dtype={"county": str, "zipcode": str})

In [None]:
pd.set_option('display.max_columns', 27)

In [None]:
haystacks_ga_data.head()

<a href=#toc>Back to the top</a>

<a id=02-geojson-map></a>
## GeoJSON Choropleth Map Generation

Now the data is ready for generating choropleth maps

<a id=02-a-install-load></a>
### Install (if Applicable) and Load Necessary Packages

In [None]:
# Install these packages and dependencies:
# conda install -c plotly plotly-geo
# conda install -c conda-forge pyshp
# conda install -c conda-forge geopandas

from plotly.offline import init_notebook_mode, plot, iplot
import plotly.express as px
import geopandas as gpd
import json

init_notebook_mode(connected=True)

<a href=#toc>Back to the top</a>

<a id=02-b-ga-counties></a>
### Georgia Counties

First, create a DataFrame which count the number of homes in each county, as an example.

In [None]:
# Aggregate counties by number of houses
ga_county_count = pd.DataFrame(haystacks_ga_data.county.value_counts())
ga_county_count = ga_county_count.reset_index().rename(columns = {'index':'county', 'county':'num_houses'})

Next, load in the GeoJSON for the counties of Georgia.

In [None]:
# Set the filepath and load in a shapefile
# Shape file found here:
# https://maps.princeton.edu/catalog/tufts-gacounties10
ga_counties = "data/geojson/tufts-gacounties10-geojson.json"
map_ga_counties = gpd.read_file(ga_counties)

In [None]:
map_ga_counties.crs

Then, create the choropleth map in **plotly express**.

In [None]:
# plotly and geopandas necessary for producing these choropleths
# For plotly express to print maps, jsons must be used
# More here: https://plotly.com/python/mapbox-county-choropleth/
# https://plotly.github.io/plotly.py-docs/generated/plotly.express.choropleth_mapbox.html
# https://stackoverflow.com/questions/67362742/geojson-issues-with-plotly-choropleth-mapbox
# https://community.plotly.com/t/choroplethmapbox-does-not-show/41229/6
ga_counties_fig = px.choropleth_mapbox(ga_county_count, geojson=map_ga_counties, locations='county', color='num_houses',
                                       color_continuous_scale="Viridis", #range_color=(0, 12), 
                                       mapbox_style="carto-positron", zoom=5.5, 
                                       # Geographic center of Georgia:
                                       # https://georgiahistory.com/ghmi_marker_updated/geographic-center-of-georgia/
                                       center = {"lat": 32.6461, "lon": -83.4317},
                                       opacity=0.5, labels={'num_houses':'Number of Houses'},
                                      featureidkey='properties.name10')

Finally, plot the map.

In [None]:
iplot(ga_counties_fig)

# If iplot doesn't show a figure, uncomment and run the code below
# plot(ga_counties_fig)

It appears that there are either missing counties in the geoJSON itself or no houses in those respective counties.

<a href=#toc>Back to the top</a>

<a id=02-c-ga-zipcodes></a>
### Georgia Zipcodes

First, create a DataFrame which count the number of homes in each zipcode, as an example.

In [None]:
# Aggregate counties by number of houses
ga_zipcode_count = pd.DataFrame(haystacks_ga_data.zipcode.value_counts())
ga_zipcode_count = ga_zipcode_count.reset_index().rename(columns = {'index':'zipcode', 'zipcode':'num_houses'})

Next, load in the GeoJSON for the zipcodes of Georgia.

In [None]:
# Set the filepath and load in a shapefile
# Shape file found here:
# https://maps.princeton.edu/catalog/harvard-tg00gazcta
ga_zipcodes = "data/geojson/harvard-tg00gazcta-geojson.json"
map_ga_zipcodes = gpd.read_file(ga_zipcodes)

In [None]:
map_ga_zipcodes.crs

Then, create the choropleth map in **plotly express**.

In [None]:
# plotly and geopandas necessary for producing these choropleths
# For plotly express to print maps, jsons must be used
# More here: https://plotly.com/python/mapbox-county-choropleth/
# https://plotly.github.io/plotly.py-docs/generated/plotly.express.choropleth_mapbox.html
# https://stackoverflow.com/questions/67362742/geojson-issues-with-plotly-choropleth-mapbox
# https://community.plotly.com/t/choroplethmapbox-does-not-show/41229/6
ga_zipcodes_fig = px.choropleth_mapbox(ga_zipcode_count, geojson=map_ga_zipcodes, locations='zipcode', color='num_houses',
                                       color_continuous_scale="Viridis", #range_color=(0, 12), 
                                       mapbox_style="carto-positron", zoom=5.5, 
                                       # Geographic center of Georgia:
                                       # https://georgiahistory.com/ghmi_marker_updated/geographic-center-of-georgia/
                                       center = {"lat": 32.6461, "lon": -83.4317},
                                       opacity=0.5, labels={'num_houses':'Number of Houses'},
                                      featureidkey='properties.ZCTA')

Finally, plot the map.

In [None]:
iplot(ga_zipcodes_fig)

# If iplot doesn't show a figure, uncomment and run the code below
# plot(ga_counties_fig)

Again, the same problem persists, either due to possibly inconsistencies in the GeoJSON itself or no houses sold in particular zipcodes themselves.

<a href=#toc>Back to the top</a>