https://eshanley.github.io/

https://github.com/eshanley

# Final Tutorial: Impact of Food Access on Overall Community Health

## Analysis by Erin Shanley 

### Data Source 

USDA Food Access Research Atlas: 
https://www.ers.usda.gov/data-products/food-access-research-atlas/download-the-data/

University of Wisconsin Population Health Institute: 2010 County Health Rankings National Data
https://www.countyhealthrankings.org/sites/default/files/2010%20County%20Health%20Rankings%20National%20Data_v2.xls


### Project Explanation and Plan

Since I am a graduate student, to fulfill the extra requirement I will be working alone on this project. The data sets I plan on working with come from the US Department of Agriculture and The University of Wisconsin Population Health Institute. The particular sets of data I will look at focuses on food access across the US and contains information on different counties in the US and their ability to access supermarkets, other healthy and affordable food sources, overall mental health, physical health, and obsesity rates. 

While there are many ways to define which areas are considered "food deserts" and many ways to measure food store access for individuals and for neighborhoods, I have chosen a few initial parameters to begin analyzing. I will focus on distance between housing units and grocery stores, frequency of grocery stores, frequency of fast food restaurants, average income rates by county, access to transportation, and obesity rates by county. The data provides a spatial overview of food access indicators for low income vs other census tracts using measures of 1-mile, 10-mile and 20-mile demarcations to the nearest supermarket, and vehicle availability for all census tracts.

### Questions to Address 

1. Does the distance between housing units and supermarkets and the frequency of supermarkets in a community impact general health, mental and physical?
2. Do average income rates of a particular county impact overall health of the communities members?
3. Does access to transportation have an impact on a community's general health, mental and physical?
4. Do higher frequencies of fast food restaurants contribute to higher obesity rates in communities?
5. Is there higher or lower rates of obesity in areas with less access to healthy affordable food options? 




### Reading in Data
#### USDA Food Access Research Atlas Data
Note that this data refers to the 2010 census. 

In [86]:
import numpy as np
import pandas as pd

df_food = pd.read_csv("USDA_data.csv")
df_food.head()

Unnamed: 0,CensusTract,State,County,Urban,POP2010,OHU2010,GroupQuartersFlag,NUMGQTRS,PCTGQTRS,LILATracts_1And10,...,TractSeniors,TractWhite,TractBlack,TractAsian,TractNHOPI,TractAIAN,TractOMultir,TractHispanic,TractHUNV,TractSNAP
0,1001020100,Alabama,Autauga,1,1912,693,0,0,0.0,0,...,221,1622,217,14,0,14,45,44,26,112
1,1001020200,Alabama,Autauga,1,2170,743,0,181,0.08341,0,...,214,888,1217,5,0,5,55,75,87,202
2,1001020300,Alabama,Autauga,1,3373,1256,0,0,0.0,0,...,439,2576,647,17,5,11,117,87,108,120
3,1001020400,Alabama,Autauga,1,4386,1722,0,0,0.0,0,...,904,4086,193,18,4,11,74,85,19,82
4,1001020500,Alabama,Autauga,1,10766,4082,0,181,0.016812,0,...,1126,8666,1437,296,9,48,310,355,198,488


In [87]:
# Tidy data by creating new dataframe containing only the relevant parameters
df_food = df_food[["State","County","POP2010",
      "OHU2010",
      "PovertyRate",
      "MedianFamilyIncome", 
      "CensusTract",
      "lapop1","lapop10","lapop20","TractHUNV", "LATracts1","LATracts10","LATracts20"]]
df_food.rename(columns={"POP2010": "Population", "OHU2010": "TotalHouseUnits","TractHUNV":"HouseUnitsNoVehicle"}, inplace=True)
display(df_food.head())
df_food.dtypes

#Change data types to proper format
df_food.astype({'LATracts1': 'object','LATracts10': 'object','LATracts20': 'object'}).dtypes

Unnamed: 0,State,County,Population,TotalHouseUnits,PovertyRate,MedianFamilyIncome,CensusTract,lapop1,lapop10,lapop20,HouseUnitsNoVehicle,LATracts1,LATracts10,LATracts20
0,Alabama,Autauga,1912,693,10.0,74750,1001020100,1357.48094,0.0,0.0,26,1,0,0
1,Alabama,Autauga,2170,743,18.2,51875,1001020200,483.429683,0.0,0.0,87,0,0,0
2,Alabama,Autauga,3373,1256,19.1,52905,1001020300,1417.874893,0.0,0.0,108,1,0,0
3,Alabama,Autauga,4386,1722,3.3,68079,1001020400,1363.466885,0.0,0.0,19,1,0,0
4,Alabama,Autauga,10766,4082,8.5,77819,1001020500,2643.095161,0.0,0.0,198,1,0,0


State                   object
County                  object
Population               int64
TotalHouseUnits          int64
PovertyRate            float64
MedianFamilyIncome       int64
CensusTract              int64
lapop1                 float64
lapop10                float64
lapop20                float64
HouseUnitsNoVehicle      int64
LATracts1               object
LATracts10              object
LATracts20              object
dtype: object

I decided to only use a fraction of the parameters available in this dataset to start and make the data more manageable. I find these parameters including, census tract number, state, county, population count, count of housing units, poverty rate, median family income, low access population at 1 mile, 10 miles, and 20 miles, and vehicle access, to be the most useful at this point. I formatted the dataframe to be grouped by state and county.

#### County Health Data

In [88]:
df_health = pd.read_csv("County_Health.csv", header=[0,1])
df_health.rename(columns={"Unnamed: 2_level_0":"County"}, inplace=True)

df_health.head()

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,County,Premature death (Years of Potential Life Lost),Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,...,Unnamed: 123_level_0,Access to healthy foods,Unnamed: 125_level_0,Unnamed: 126_level_0,Unnamed: 127_level_0,Unnamed: 128_level_0,Liquor store density,Unnamed: 130_level_0,Unnamed: 131_level_0,Unnamed: 132_level_0
Unnamed: 0_level_1,FIPS,State,County,Unreliable,Deaths,Aggregate Population,YPLL Rate,95% CI - Low,95% CI - High,Quartile,...,Quartile,Unreliable,Zip Codes with Healthy Food,# Zip Codes,% Healthy Food,Quartile,Population,Liquor Stores,Liquor Store Rate,Quartile
0,1001.0,Alabama,Autauga,,670.0,137881.0,9778.0,8786.0,10770.0,2,...,1,,2.0,8.0,25.0,4,49109.0,2.0,0.4,2
1,1003.0,Alabama,Baldwin,,2148.0,449589.0,8222.0,7716.0,8727.0,1,...,4,,11.0,26.0,42.0,2,168233.0,13.0,0.8,3
2,1005.0,Alabama,Barbour,,424.0,79450.0,10686.0,9263.0,12109.0,2,...,1,,2.0,5.0,40.0,2,28125.0,3.0,1.1,4
3,1007.0,Alabama,Bibb,,373.0,60430.0,13070.0,11271.0,14868.0,4,...,1,,4.0,8.0,50.0,1,21341.0,1.0,0.5,2
4,1009.0,Alabama,Blount,,787.0,155580.0,8930.0,8040.0,9819.0,1,...,4,,4.0,7.0,57.0,1,55811.0,,0.0,1


I am having trouble formatting the health data above because of the multi-index header. I was trying to merge the food access data frame and the health dataframe into one but the multi-level header in the health data was causing an error. My next step is to work towards organizing this into one cohesive dataframe. 

Attempted: 
df_food.merge(df_health, how="inner", on="County)