# Investigating SNAP usage levels across states
**Authored by Stephanie Chang**

To start off our project, we wanted examine SNAP usage in each state.

## Snap Usage Data
I got my data on the Income and Benefits of SNAP receivers per state from the US Census Bureau COVID-19 Site at https://covid19.census.gov/datasets/56f051341b4a40aa9aa5e2c33f85547a_2/data?geometry=109.827%2C-16.868%2C-109.196%2C72.108. 

### Import Data
I will import the libraries that I need and the geojson data file.

In [4]:
import geopandas as gpd
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd

In [6]:
snap = gpd.read_file('dataexploration/Income_and_Benefits_-_States.geojson')

### Cleaning up data
Because I don't need all the data and I want to rename my columns, I take the steps below.

In [7]:
list(snap)

['OBJECTID',
 'GEO_ID',
 'GEO_NAME',
 'FIPS_CODE',
 'B11001_001E',
 'B11001_001M',
 'DP03_0052E',
 'DP03_0052M',
 'DP03_0053E',
 'DP03_0053M',
 'DP03_0054E',
 'DP03_0054M',
 'DP03_0055E',
 'DP03_0055M',
 'DP03_0056E',
 'DP03_0056M',
 'DP03_0057E',
 'DP03_0057M',
 'DP03_0058E',
 'DP03_0058M',
 'DP03_0070E',
 'DP03_0070M',
 'DP03_0072E',
 'DP03_0072M',
 'DP03_0074E',
 'DP03_0074M',
 'HOUSELT75KP_CALC',
 'INCLT75E_CALC',
 'INCLT75M_CALC',
 'DP03_0074PE',
 'DP03_0074PM',
 'geometry']

Then, I choose the columns I want to keep.

In [9]:
columns_to_keep = ['GEO_ID','GEO_NAME','B11001_001E','DP03_0074E','DP03_0074PE','geometry']

In [10]:
snap = snap[columns_to_keep]

I check to see if I have all the columns I wanted to keep.

In [11]:
snap.head()

Unnamed: 0,GEO_ID,GEO_NAME,B11001_001E,DP03_0074E,DP03_0074PE,geometry
0,0400000US01,Alabama,1860269,269603,14.5,"MULTIPOLYGON (((-88.08682 30.25987, -88.07486 ..."
1,0400000US02,Alaska,253462,26868,10.6,"MULTIPOLYGON (((-179.09763 51.22613, -179.1268..."
2,0400000US04,Arizona,2524300,298375,11.8,"POLYGON ((-109.04518 36.99898, -109.04518 36.9..."
3,0400000US05,Arkansas,1152175,146798,12.7,"POLYGON ((-89.73310 36.00061, -89.73305 36.000..."
4,0400000US06,California,12965435,1184714,9.1,"MULTIPOLYGON (((-118.56454 33.01864, -118.5591..."


I list all my columns again in order to rename my columns.

In [12]:
list(snap)

['GEO_ID', 'GEO_NAME', 'B11001_001E', 'DP03_0074E', 'DP03_0074PE', 'geometry']

In [13]:
snap.columns = ['GeoID',
 'Name',
 'Total Households',
 'Total Households with SNAP',
 'Percentage of Households with SNAP',
 'geometry']

I noticed that this dataset included Puerto Rico, which is not considered a state. I found that in other datasets, it wasn't part of the data, so I wanted to remove it from this dataset.

I first show all of my rows to figure out which row Puerto Rico is.

In [14]:
pd.set_option("max_rows", None)
snap

Unnamed: 0,GeoID,Name,Total Households,Total Households with SNAP,Percentage of Households with SNAP,geometry
0,0400000US01,Alabama,1860269,269603,14.5,"MULTIPOLYGON (((-88.08682 30.25987, -88.07486 ..."
1,0400000US02,Alaska,253462,26868,10.6,"MULTIPOLYGON (((-179.09763 51.22613, -179.1268..."
2,0400000US04,Arizona,2524300,298375,11.8,"POLYGON ((-109.04518 36.99898, -109.04518 36.9..."
3,0400000US05,Arkansas,1152175,146798,12.7,"POLYGON ((-89.73310 36.00061, -89.73305 36.000..."
4,0400000US06,California,12965435,1184714,9.1,"MULTIPOLYGON (((-118.56454 33.01864, -118.5591..."
5,0400000US08,Colorado,2113387,168243,8.0,"POLYGON ((-104.05326 41.00141, -104.05154 41.0..."
6,0400000US09,Connecticut,1367374,167022,12.2,"MULTIPOLYGON (((-73.38574 41.05926, -73.42217 ..."
7,0400000US10,Delaware,357765,41634,11.6,"MULTIPOLYGON (((-75.54261 39.49658, -75.54269 ..."
8,0400000US11,District of Columbia,281322,39043,13.9,"POLYGON ((-77.03901 38.79165, -77.03899 38.792..."
9,0400000US12,Florida,7621760,1080766,14.2,"MULTIPOLYGON (((-82.10512 24.59115, -82.10215 ..."


Then, I drop 51, which is Puerto Rico.

In [15]:
snap = snap.drop([51])

I check again to see if Puerto Rico has been dropped.

In [16]:
pd.set_option("max_rows", None)
snap

Unnamed: 0,GeoID,Name,Total Households,Total Households with SNAP,Percentage of Households with SNAP,geometry
0,0400000US01,Alabama,1860269,269603,14.5,"MULTIPOLYGON (((-88.08682 30.25987, -88.07486 ..."
1,0400000US02,Alaska,253462,26868,10.6,"MULTIPOLYGON (((-179.09763 51.22613, -179.1268..."
2,0400000US04,Arizona,2524300,298375,11.8,"POLYGON ((-109.04518 36.99898, -109.04518 36.9..."
3,0400000US05,Arkansas,1152175,146798,12.7,"POLYGON ((-89.73310 36.00061, -89.73305 36.000..."
4,0400000US06,California,12965435,1184714,9.1,"MULTIPOLYGON (((-118.56454 33.01864, -118.5591..."
5,0400000US08,Colorado,2113387,168243,8.0,"POLYGON ((-104.05326 41.00141, -104.05154 41.0..."
6,0400000US09,Connecticut,1367374,167022,12.2,"MULTIPOLYGON (((-73.38574 41.05926, -73.42217 ..."
7,0400000US10,Delaware,357765,41634,11.6,"MULTIPOLYGON (((-75.54261 39.49658, -75.54269 ..."
8,0400000US11,District of Columbia,281322,39043,13.9,"POLYGON ((-77.03901 38.79165, -77.03899 38.792..."
9,0400000US12,Florida,7621760,1080766,14.2,"MULTIPOLYGON (((-82.10512 24.59115, -82.10215 ..."


### Sorting
To find out the states with the highest and lowest percentages of households with SNAP, I will sort my data.

In [17]:
snap_sorted = snap.sort_values(by='Percentage of Households with SNAP',ascending = False)

In [19]:
snap_sorted[['GeoID','Name','Percentage of Households with SNAP']].head(5)

Unnamed: 0,GeoID,Name,Percentage of Households with SNAP
31,0400000US35,New Mexico,16.9
37,0400000US41,Oregon,16.8
48,0400000US54,West Virginia,16.6
24,0400000US28,Mississippi,16.5
39,0400000US44,Rhode Island,16.0


The top 5 states with the highest percentages of households with SNAP are: New Mexico, Oregon, West Virginia, Mississippi, and Rhode Island.

In [20]:
snap_sorted_opp = snap.sort_values(by='Percentage of Households with SNAP',ascending = True)

In [21]:
snap_sorted_opp[['GeoID','Name','Percentage of Households with SNAP']].head(5)

Unnamed: 0,GeoID,Name,Percentage of Households with SNAP
50,0400000US56,Wyoming,5.7
34,0400000US38,North Dakota,6.9
44,0400000US49,Utah,7.2
29,0400000US33,New Hampshire,7.2
5,0400000US08,Colorado,8.0


The bottom 5 states with the lowest percentages of households with SNAP are: Wyoming, North Dakota, Utah, New Hampshire, and Colorado.

## Bar Chart
Using the sorted data, I can create a bar chart that plots everything in descending order. This clearly showcases  states that have higher usage levels of SNAP to states that have lower usage levels of SNAP.

In [31]:
px.bar(snap_sorted,
       x='Name',
       y='Percentage of Households with SNAP',
       title='SNAP Usage Levels Per State')

## Map Visualization
I can now make a interactive map showing the different levels of SNAP usage of each state.

Because my dataset did not have a column for state codes, I had to make it so that the full state name translated to the state code. 

In [23]:
Name = {
    'District of Columbia' : 'dc','Mississippi': 'MS', 'Oklahoma': 'OK', 
    'Delaware': 'DE', 'Minnesota': 'MN', 'Illinois': 'IL', 'Arkansas': 'AR', 
    'New Mexico': 'NM', 'Indiana': 'IN', 'Maryland': 'MD', 'Louisiana': 'LA', 
    'Idaho': 'ID', 'Wyoming': 'WY', 'Tennessee': 'TN', 'Arizona': 'AZ', 
    'Iowa': 'IA', 'Michigan': 'MI', 'Kansas': 'KS', 'Utah': 'UT', 
    'Virginia': 'VA', 'Oregon': 'OR', 'Connecticut': 'CT', 'Montana': 'MT', 
    'California': 'CA', 'Massachusetts': 'MA', 'West Virginia': 'WV', 
    'South Carolina': 'SC', 'New Hampshire': 'NH', 'Wisconsin': 'WI',
    'Vermont': 'VT', 'Georgia': 'GA', 'North Dakota': 'ND', 
    'Pennsylvania': 'PA', 'Florida': 'FL', 'Alaska': 'AK', 'Kentucky': 'KY', 
    'Hawaii': 'HI', 'Nebraska': 'NE', 'Missouri': 'MO', 'Ohio': 'OH', 
    'Alabama': 'AL', 'Rhode Island': 'RI', 'South Dakota': 'SD', 
    'Colorado': 'CO', 'New Jersey': 'NJ', 'Washington': 'WA', 
    'North Carolina': 'NC', 'New York': 'NY', 'Texas': 'TX', 
    'Nevada': 'NV', 'Maine': 'ME'}

snap['code'] = snap['Name'].apply(lambda x : Name[x])

In [25]:
import plotly.graph_objects as go
fig = go.Figure(data=go.Choropleth(
    locations=snap['code'], # Spatial coordinates
    z = snap['Percentage of Households with SNAP'].astype(float), # Data to be color-coded
    locationmode = 'USA-states', # set of locations match entries in `locations`
    colorscale = 'Greens',
    colorbar_title = "Percentage of Households with SNAP",
))

fig.update_layout(
    title_text = 'SNAP Usage Levels per State',
    geo_scope='usa', # limite map scope to USA
)

fig.show()

This map visualization indicates the states that have higher percentage of households with SNAP in darker green and those with lower percentages of households with SNAP in lighter green.