# Mapping
This notebook will demonstrate mapping US Census data in Python.<br>

We will show the following:

- Merge DataFrame with GeoDataFrame
- Static map
- Static map w/ basemap
- Choropleth map
- Choropleth map with Graduated Symbology
- Interactive map
- Interactive map with toggle between two different maps
- Web map deployment

In [1]:
# import necessary pacakges
import pandas as pd
import numpy as np
import geopandas as gpd
import contextily as ctx
import folium as f
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

In [7]:
# load data
df_acs_2020=pd.read_pickle("./data/df_acs_2020_cleaned.pkl")
gdf=gpd.read_file("./data/shp/Modified Zip Code Tabulation Areas (MODZCTA)_20240418/geo_export_bdb2fc16-3964-47c7-a04d-4d106b707aaf.shp")
# format cols
gdf.columns=[col.lower() for col in gdf.columns]

## Shapefile joining
Let's write a function that will:
1) Merge df with shapefile - get Census data in same dataframe as census data.
2) Convert to GeoDataFrame - convert to mappable dataframe
3) Transform crs to 2263 - best coordinate reference system for NYC

In [4]:
def merge_and_transform(df,gdf,left_on='zip',right_on='modzcta',crs=2263):
    """
    Takes in df (DataFrame) and gdf (GeoDataFrame), joins them and transforms crs (coordinate reference system)

    Parameters:
    - df (DataFrame): DataFrame that will be right in the join
    - gdf (GeoDataFrame): GeoDataFrame that will be left in the join
    - left_on (str,Optional): The column the DataFrame will use to join, defaults to zip
    - right_on (str,Optional): The column the GeoDataFrame will use to join, defaults to modzcta
    - crs (int, Optional): The coordinate reference system to transform, defaults to 2263

    Returns:
    Merged GeoDataFrame
    """
    # merge df with gdf
    df_shp=df.merge(gdf,left_on=left_on,right_on=right_on)
    # transform to gdf
    gdf_merged=gpd.GeoDataFrame(df_shp)
    # transform crs
    gdf_merged=gdf_merged.to_crs(2263)

    return gdf_merged

Apply function to create mappable GeoDataFrame.

In [10]:
gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   modzcta   178 non-null    object  
 1   label     177 non-null    object  
 2   zcta      178 non-null    object  
 3   pop_est   178 non-null    float64 
 4   geometry  178 non-null    geometry
dtypes: float64(1), geometry(1), object(3)
memory usage: 7.1+ KB


In [11]:
df_acs_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 15 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   zcta                         214 non-null    object 
 1   population                   214 non-null    float64
 2   median_age                   214 non-null    float64
 3   median_household_income      214 non-null    float64
 4   poverty_level                214 non-null    float64
 5   total_households             214 non-null    float64
 6   total_households_no_vehicle  214 non-null    float64
 7   pop_25_older                 214 non-null    float64
 8   pop_25_older_hs_grad         214 non-null    float64
 9   pop_25_older_associates      214 non-null    float64
 10  pop_25_older_bachelors       214 non-null    float64
 11  pop_25_older_graduate        214 non-null    float64
 12  perc_poverty_level           185 non-null    float64
 13  perc_hh_w_vehicle   