# Final Tutorial: Impact of Food Access on Overall Community Health

## Analysis by Erin Shanley 

### Data Source 

USDA Food Access Research Atlas: 
https://www.ers.usda.gov/data-products/food-access-research-atlas/download-the-data/

### Project Explanation and Plan

Since I am a graduate student, to fulfill the extra requirement I will be working alone on this project. The data set I plan on working with comes from the US Department of Agriculture. The particular set of data I will look at focuses on food access across the US and contains information on different counties in the US and their ability to access supermarkets, and other healthy and affordable food sources. While there are many ways to define which areas are considered "food deserts" and many ways to measure food store access for individuals and for neighborhoods, I have chosen a few initial parameters to begin analyzing. I will focus on distance between housing units and grocery stores, frequency of grocery stores, frequency of fast food restaurants, average income rates by county, access to transportation, and obesity rates by county. The data provides a spatial overview of food access indicators for low income vs other census tracts using measures of 1-mile, 10-mile and 20-mile demarcations to the nearest supermarket, and vehicle availability for all census tracts.

### Questions to Address 

1. Does the distance between housing units and supermarkets and the frequency of supermarkets in a community impact general health, mental and physical?
2. Do average income rates of a particular county impact overall health of the communities members?
3. Does access to transportation have an impact on a community's general health, mental and physical?
4. Do higher frequencies of fast food restaurants contribute to higher obesity rates in communities?
5. Is there higher or lower rates of obesity in areas with less access to healthy affordable food options? 




### Reading in Data
#### USDA Food Access Research Atlas Data

In [18]:
import numpy as np
import pandas as pd

df = pd.read_csv("USDA_data.csv")
df.head()

Unnamed: 0,CensusTract,State,County,Urban,POP2010,OHU2010,GroupQuartersFlag,NUMGQTRS,PCTGQTRS,LILATracts_1And10,...,TractSeniors,TractWhite,TractBlack,TractAsian,TractNHOPI,TractAIAN,TractOMultir,TractHispanic,TractHUNV,TractSNAP
0,1001020100,Alabama,Autauga,1,1912,693,0,0,0.0,0,...,221,1622,217,14,0,14,45,44,26,112
1,1001020200,Alabama,Autauga,1,2170,743,0,181,0.08341,0,...,214,888,1217,5,0,5,55,75,87,202
2,1001020300,Alabama,Autauga,1,3373,1256,0,0,0.0,0,...,439,2576,647,17,5,11,117,87,108,120
3,1001020400,Alabama,Autauga,1,4386,1722,0,0,0.0,0,...,904,4086,193,18,4,11,74,85,19,82
4,1001020500,Alabama,Autauga,1,10766,4082,0,181,0.016812,0,...,1126,8666,1437,296,9,48,310,355,198,488


In [23]:
# Tidy data by creating new dataframe containing only the relevant parameters
df=df[["State","County","POP2010",
      "OHU2010",
      "PovertyRate",
      "MedianFamilyIncome", 
      "CensusTract",
      "lapop1","lapop10","lapop20","TractHUNV", "LATracts1","LATracts10","LATracts20"]]

df.head(20)


Unnamed: 0,State,County,POP2010,OHU2010,PovertyRate,MedianFamilyIncome,CensusTract,lapop1,lapop10,lapop20,TractHUNV,LATracts1,LATracts10,LATracts20
0,Alabama,Autauga,1912,693,10.0,74750,1001020100,1357.48094,0.0,0.0,26,1,0,0
1,Alabama,Autauga,2170,743,18.2,51875,1001020200,483.429683,0.0,0.0,87,0,0,0
2,Alabama,Autauga,3373,1256,19.1,52905,1001020300,1417.874893,0.0,0.0,108,1,0,0
3,Alabama,Autauga,4386,1722,3.3,68079,1001020400,1363.466885,0.0,0.0,19,1,0,0
4,Alabama,Autauga,10766,4082,8.5,77819,1001020500,2643.095161,0.0,0.0,198,1,0,0
5,Alabama,Autauga,3668,1311,14.1,67218,1001020600,2568.5455,0.0,0.0,49,1,0,0
6,Alabama,Autauga,2891,1188,26.4,43646,1001020700,1230.979862,0.0,0.0,134,1,0,0
7,Alabama,Autauga,3081,1074,13.6,74284,1001020801,3080.999986,0.0,0.0,126,0,0,0
8,Alabama,Autauga,10435,3694,13.8,68713,1001020802,10434.99998,251.886914,0.0,82,0,0,0
9,Alabama,Autauga,5675,2067,12.8,52994,1001020900,5674.999998,2047.118504,0.0,32,0,1,0


I decided to only use a fraction of the parameters available in this dataset to start and make the data more manageable. I find these parameters including, census tract number, state, county, population count, count of housing units, poverty rate, median family income, low access population at 1 mile, 10 miles, and 20 miles, and vehicle access, to be the most useful at this point. I formatted the dataframe to be grouped by state and county.