# INFO 2950 - Phase II

## Research Question
Is there a relationship between food availability, both grocery stores and fast food, and county demographics across the United States?

Is there a relationship between grocery stores and fast food restaurants? 

## Data Cleaning

We are looking at three different data sets from the USDA's Food Environment Atlas from the USDA. This data set contains data about grocery stores, demographics, taxes, health indicators, and restraurants by county for both 2011 and 2016. We are choosing to specifically look at the relationships between grocery stores, fast food restaurants, and demographics.  

Link to data set: https://www.ers.usda.gov/data-products/food-environment-atlas/data-access-and-documentation-downloads/#Current%20Version

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

## Data Description

The original data 

In [3]:
stores = pd.read_csv('data/Store_Access.csv')
stores.head()

Unnamed: 0,FIPS,State,County,GROC11,GROC16,PCH_GROC_11_16,GROCPTH11,GROCPTH16,PCH_GROCPTH_11_16,SUPERC11,...,PCH_SNAPS_12_17,SNAPSPTH12,SNAPSPTH17,PCH_SNAPSPTH_12_17,WICS11,WICS16,PCH_WICS_11_16,WICSPTH11,WICSPTH16,PCH_WICSPTH_11_16
0,1001,AL,Autauga,5,3,-40.0,0.090581,0.054271,-40.085748,1,...,19.376392,0.674004,0.804747,19.3979,5.0,5.0,0.0,0.090567,0.090511,-0.061543
1,1003,AL,Baldwin,27,29,7.407407,0.144746,0.139753,-3.449328,6,...,36.927711,0.725055,0.890836,22.864524,26.0,28.0,7.692307,0.13938,0.134802,-3.284727
2,1005,AL,Barbour,6,4,-33.333333,0.21937,0.155195,-29.254287,0,...,3.349282,1.28059,1.424614,11.246689,7.0,6.0,-14.285714,0.255942,0.232387,-9.203081
3,1007,AL,Bibb,6,5,-16.666667,0.263794,0.220916,-16.254289,1,...,11.794872,0.719122,0.801423,11.444711,6.0,5.0,-16.666666,0.263771,0.221474,-16.035471
4,1009,AL,Blount,7,5,-28.571429,0.121608,0.086863,-28.571429,1,...,5.701754,0.657144,0.692374,5.361034,8.0,8.0,0.0,0.139,0.139089,0.064332


In [73]:
stores = stores.drop(['PCH_GROCPTH_11_16','PCH_SUPERCPTH_11_16', 'PCH_CONVSPTH_11_16', \
                      'PCH_SPECSPTH_11_16','SNAPS12', 'SNAPS17', 'PCH_SNAPS_12_17', 'SNAPSPTH12', \
                      'SNAPSPTH17', 'PCH_SNAPSPTH_12_17', 'WICS11', 'WICS16', 'PCH_WICS_11_16', 'WICSPTH11', \
                      'WICSPTH16', 'PCH_WICSPTH_11_16','FIPS', 'PCH_SUPERC_11_16', 'PCH_GROC_11_16',\
                      'PCH_CONVS_11_16', 'PCH_SPECS_11_16'] ,axis=1)
stores.head()

Unnamed: 0,State,County,GROC11,GROC16,GROCPTH11,GROCPTH16,SUPERC11,SUPERC16,SUPERCPTH11,SUPERCPTH16,CONVS11,CONVS16,CONVSPTH11,CONVSPTH16,SPECS11,SPECS16,SPECSPTH11,SPECSPTH16
0,AL,Autauga,5,3,0.090581,0.054271,1,1,0.018116,0.01809,31,31,0.561604,0.560802,1,1,0.018116,0.01809
1,AL,Baldwin,27,29,0.144746,0.139753,6,7,0.032166,0.033733,107,118,0.573622,0.56865,20,27,0.107219,0.130115
2,AL,Barbour,6,4,0.21937,0.155195,0,1,0.0,0.038799,22,19,0.804358,0.737177,3,2,0.109685,0.077598
3,AL,Bibb,6,5,0.263794,0.220916,1,1,0.043966,0.044183,19,15,0.835348,0.662749,0,0,0.0,0.0
4,AL,Blount,7,5,0.121608,0.086863,1,1,0.017373,0.017373,30,27,0.521177,0.469059,1,0,0.017373,0.0


In [7]:
stores = stores.rename(columns={'GROC11':'Grocery 2011', 'GROC16': 'Grocery 2016', \
                                'GROCPTH11':'Grocery per 1000 2011', \
                                'GROCPTH16':'Grocery per 1000 2016', 'SUPERC11': 'Supercenter 2011', \
                                'SUPERC16': 'Supercenter 2016', \
                                'SUPERCPTH11':'Supercenter per 1000 2011','SUPERCPTH16':'Supercenter per \
                                1000 2016', \
                                'CONVS11':'Convenience 2011', 'CONVS16':'Convenience 2016', \
                                'CONVSPTH11':'Convenience per 1000 2011','CONVSPTH16':'Convenience per 1000 2016', \
                                'SPECS11':'Specialized 2011', 'SPECS16':'Specialized 2016', \
                                'SPECSPTH11':'Specialized per 1000 2011', 'SPECSPTH16':'Specialized per 1000 2016'})
stores.head()                                
                                

Unnamed: 0,FIPS,State,County,Grocery 2011,Grocery 2016,PCH_GROC_11_16,Grocery per 1000 2011,Grocery per 1000 2016,PCH_GROCPTH_11_16,Supercenter 2011,...,PCH_SNAPS_12_17,SNAPSPTH12,SNAPSPTH17,PCH_SNAPSPTH_12_17,WICS11,WICS16,PCH_WICS_11_16,WICSPTH11,WICSPTH16,PCH_WICSPTH_11_16
0,1001,AL,Autauga,5,3,-40.0,0.090581,0.054271,-40.085748,1,...,19.376392,0.674004,0.804747,19.3979,5.0,5.0,0.0,0.090567,0.090511,-0.061543
1,1003,AL,Baldwin,27,29,7.407407,0.144746,0.139753,-3.449328,6,...,36.927711,0.725055,0.890836,22.864524,26.0,28.0,7.692307,0.13938,0.134802,-3.284727
2,1005,AL,Barbour,6,4,-33.333333,0.21937,0.155195,-29.254287,0,...,3.349282,1.28059,1.424614,11.246689,7.0,6.0,-14.285714,0.255942,0.232387,-9.203081
3,1007,AL,Bibb,6,5,-16.666667,0.263794,0.220916,-16.254289,1,...,11.794872,0.719122,0.801423,11.444711,6.0,5.0,-16.666666,0.263771,0.221474,-16.035471
4,1009,AL,Blount,7,5,-28.571429,0.121608,0.086863,-28.571429,1,...,5.701754,0.657144,0.692374,5.361034,8.0,8.0,0.0,0.139,0.139089,0.064332


In [10]:
demographics = pd.read_csv('data/county_demographics.csv')
demographics.head()

Unnamed: 0,FIPS,State,County,PCT_NHWHITE10,PCT_NHBLACK10,PCT_HISP10,PCT_NHASIAN10,PCT_NHNA10,PCT_NHPI10,PCT_65OLDER10,PCT_18YOUNGER10,MEDHHINC15,POVRATE15,PERPOV10,CHILDPOVRATE15,PERCHLDPOV10,METRO13,POPLOSS10
0,1001,AL,Autauga,77.246156,17.582599,2.400542,0.855766,0.397647,0.040314,11.995382,26.777959,56580.0,12.7,0,18.8,0,1,0.0
1,1003,AL,Baldwin,83.504787,9.308425,4.384824,0.735193,0.628755,0.043343,16.771185,22.987408,52387.0,12.9,0,19.6,0,1,0.0
2,1005,AL,Barbour,46.753105,46.69119,5.051535,0.3897,0.218524,0.087409,14.236807,21.906982,31433.0,32.0,1,45.2,1,0,0.0
3,1007,AL,Bibb,75.020729,21.924504,1.771765,0.096007,0.279293,0.030548,12.68165,22.696923,40767.0,22.2,0,29.3,1,1,0.0
4,1009,AL,Blount,88.887338,1.26304,8.0702,0.200621,0.497191,0.031402,14.722096,24.608353,50487.0,14.7,0,22.2,0,1,0.0


In [11]:
demographics = demographics.drop(['PCT_65OLDER10', 'PCT_18YOUNGER10', 'PERPOV10', 'PERCHLDPOV10', \
                                 'POPLOSS10', 'FIPS'], axis=1)
demographics.head()

Unnamed: 0,State,County,PCT_NHWHITE10,PCT_NHBLACK10,PCT_HISP10,PCT_NHASIAN10,PCT_NHNA10,PCT_NHPI10,MEDHHINC15,POVRATE15,CHILDPOVRATE15,METRO13
0,AL,Autauga,77.246156,17.582599,2.400542,0.855766,0.397647,0.040314,56580.0,12.7,18.8,1
1,AL,Baldwin,83.504787,9.308425,4.384824,0.735193,0.628755,0.043343,52387.0,12.9,19.6,1
2,AL,Barbour,46.753105,46.69119,5.051535,0.3897,0.218524,0.087409,31433.0,32.0,45.2,0
3,AL,Bibb,75.020729,21.924504,1.771765,0.096007,0.279293,0.030548,40767.0,22.2,29.3,1
4,AL,Blount,88.887338,1.26304,8.0702,0.200621,0.497191,0.031402,50487.0,14.7,22.2,1


In [12]:
demographics = demographics.rename(columns={'PCT_NHWHITE10':'% White 2010', 'PCT_NHBLACK10':'% Black 2010',\
                                           'PCT_HISP10':'% Hispanic 2010', 'PCT_NHASIAN10':'% Asian 2010',\
                                           'PCT_NHNA10':'% Native American 2010', 'PCT_NHPI10':'% Hawaiian 2010',\
                                           'MEDHHINC15':'Median household income 2015', 'POVRATE15':'Poverty Rate \
                                           2015',\
                                           'CHILDPOVRATE15':'Child poverty rate 2015', 'METRO13':'Metro'})
demographics.head()

Unnamed: 0,State,County,% White 2010,% Black 2010,% Hispanic 2010,% Asian 2010,% Native American 2010,% Hawaiian 2010,Median household income 2015,Poverty Rate 2015,Child poverty rate 2015,Metro
0,AL,Autauga,77.246156,17.582599,2.400542,0.855766,0.397647,0.040314,56580.0,12.7,18.8,1
1,AL,Baldwin,83.504787,9.308425,4.384824,0.735193,0.628755,0.043343,52387.0,12.9,19.6,1
2,AL,Barbour,46.753105,46.69119,5.051535,0.3897,0.218524,0.087409,31433.0,32.0,45.2,0
3,AL,Bibb,75.020729,21.924504,1.771765,0.096007,0.279293,0.030548,40767.0,22.2,29.3,1
4,AL,Blount,88.887338,1.26304,8.0702,0.200621,0.497191,0.031402,50487.0,14.7,22.2,1


In [75]:
restaurants = pd.read_csv('data/Restaurants.csv')
restaurants.head()

Unnamed: 0,FIPS,State,County,FFR11,FFR16,PCH_FFR_11_16,FFRPTH11,FFRPTH16,PCH_FFRPTH_11_16,FSR11,FSR16,PCH_FSR_11_16,FSRPTH11,FSRPTH16,PCH_FSRPTH_11_16,PC_FFRSALES07,PC_FFRSALES12,PC_FSRSALES07,PC_FSRSALES12
0,1001,AL,Autauga,34,44,29.411765,0.615953,0.795977,29.226817,32,31,-3.125,0.579721,0.560802,-3.263448,649.511367,674.80272,484.381507,512.280987
1,1003,AL,Baldwin,121,156,28.92562,0.648675,0.751775,15.893824,216,236,9.259259,1.157966,1.1373,-1.784662,649.511367,674.80272,484.381507,512.280987
2,1005,AL,Barbour,19,23,21.052632,0.694673,0.892372,28.45932,17,14,-17.647059,0.621549,0.543183,-12.608237,649.511367,674.80272,484.381507,512.280987
3,1007,AL,Bibb,6,7,16.666667,0.263794,0.309283,17.243995,5,7,40.0,0.219829,0.309283,40.692794,649.511367,674.80272,484.381507,512.280987
4,1009,AL,Blount,20,23,15.0,0.347451,0.399569,15.0,15,12,-20.0,0.260589,0.208471,-20.0,649.511367,674.80272,484.381507,512.280987


In [76]:
restaurants = restaurants.drop(['PCH_FFRPTH_11_16', 'PCH_FSRPTH_11_16', 'PC_FFRSALES07', 'PC_FSRSALES07',\
                               'FIPS', 'PCH_FFR_11_16', 'PCH_FSR_11_16'], axis=1)
restaurants.head()

Unnamed: 0,State,County,FFR11,FFR16,FFRPTH11,FFRPTH16,FSR11,FSR16,FSRPTH11,FSRPTH16,PC_FFRSALES12,PC_FSRSALES12
0,AL,Autauga,34,44,0.615953,0.795977,32,31,0.579721,0.560802,674.80272,512.280987
1,AL,Baldwin,121,156,0.648675,0.751775,216,236,1.157966,1.1373,674.80272,512.280987
2,AL,Barbour,19,23,0.694673,0.892372,17,14,0.621549,0.543183,674.80272,512.280987
3,AL,Bibb,6,7,0.263794,0.309283,5,7,0.219829,0.309283,674.80272,512.280987
4,AL,Blount,20,23,0.347451,0.399569,15,12,0.260589,0.208471,674.80272,512.280987


In [77]:
restaurants = restaurants.rename(columns={'FFR11':'Fast food 2011', 'FFR16':'Fast food 2016', \
                                          'FFRPTH11':'Fast food per 1000 2011',\
                                          'FFRPTH16':'Fast food per 1000 2016', 'FSR11':'Full service 2011',\
                                          'FSR16':'Full service 2016',\
                                          'FSRPTH11':'Full service per 1000 2011', \
                                          'FSRPTH16':'Full service per 1000 2016',\
                                          'PC_FFRSALES12':'Fast food expenditures per capita 2012', \
                                          'PC_FSRSALES12':'Full service expenditures per capita 2012'})
restaurants.head()

Unnamed: 0,State,County,Fast food 2011,Fast food 2016,Fast food per 1000 2011,Fast food per 1000 2016,Full service 2011,Full service 2016,Full service per 1000 2011,Full service per 1000 2016,Fast food expenditures per capita 2012,Full service expenditures per capita 2012
0,AL,Autauga,34,44,0.615953,0.795977,32,31,0.579721,0.560802,674.80272,512.280987
1,AL,Baldwin,121,156,0.648675,0.751775,216,236,1.157966,1.1373,674.80272,512.280987
2,AL,Barbour,19,23,0.694673,0.892372,17,14,0.621549,0.543183,674.80272,512.280987
3,AL,Bibb,6,7,0.263794,0.309283,5,7,0.219829,0.309283,674.80272,512.280987
4,AL,Blount,20,23,0.347451,0.399569,15,12,0.260589,0.208471,674.80272,512.280987


Next, check for missing values

In [78]:
print(stores.columns[stores.isnull().any()])
print(demographics.columns[demographics.isnull().any()])
print(restaurants.columns[restaurants.isnull().any()])

Index([], dtype='object')
Index(['Median household income 2015', 'Poverty Rate 2015',
       'Child poverty rate 2015'],
      dtype='object')
Index([], dtype='object')


We need to input NaN for missing data

## Exploration