# Capstone Project - Food Access in Los Angeles
### Riley Franks, May 2019
### [Applied Data Science Capstone Course by IBM/Coursera](https://www.coursera.org/learn/applied-data-science-capstone/home/info)
-----------------------------------------

## Table of Contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)
* [References](#references)



## Introduction: Business Problem <a name="introduction"></a>

In the last decade, nutrition in the United States has come into focus as a national crisis.  Millions of Americans lack access to an adequate supply of fresh, healthy food.  And, according to the CDC, there appears to be a link between access to affordable nutritious foods and the incorporation of healthy foods into the populations’ diets. There is growing interest in the idea of "food desert" communities.  There's no standard definition, but generally these are neighborhoods that lack grocery stores and farmers' markets. The U.S. Department of Agriculture summarizes the problem well: "Some people and places, especially those with low income, may face greater barriers in accessing healthy and affordable food retailers, which may negatively affect diet and food security." 


This project will focus on Los Angeles County in California. We will analyze geographic variation in food access and population demographics, and overlay that onto a map of grocery stores, convenience stores, and restaurants in the area. Community planners and local governments may find this more useful than data reports and tables in getting an intuitive understanding of the food landscape in their cities.

References:
- "Officials seek to attract grocery stores to ‘food deserts’" (https://www.apnews.com/a2b0356365db4e5dbdb9c4a077be315e, Caitlin Morris, April 2019)
- https://www.cdc.gov/healthcommunication/toolstemplates/entertainmented/tips/FoodDesert.html
- http://americannutritionassociation.org/newsletter/usda-defines-food-deserts
- https://www.ers.usda.gov/topics/food-choices-health/food-access/

## Data <a name="data"></a>

The U.S. Department of Agriculture's Economic Research Service publishes a Food Access Research Atlas [1] with census-tract level data on food access in the United States. There are tons of indicators, but we will be exploring a few specific variables. We get counts of the total population in each census tract, the population living more than a 1/2 mile from a supermarket, and the low-income population living more than a 1/2 mile from a supermarket. 

The University of Southern California provides a crosswalk [2] to get latitude and longitude coordinates for a census tract. We will use this to connect the food access data to geospatial coordinates. 

Finally, we will access the Foursquare API [3] to look up the venues in each neighborhood. We will request information for each census tract using their latitude/longitude coordinates. The Foursquare data gives us the name, location, and category for each venue.  We'll use the category variable to subset our results to food-related venues.


1. https://www.ers.usda.gov/data-products/food-access-research-atlas/
2. https://usc.data.socrata.com/widgets/atat-mmad
3. https://developer.foursquare.com/




## References <a name="references"></a>

- Course example of outstanding submission: https://cocl.us/coursera_capstone_notebook
- https://www.governing.com/gov-data/census/2018-county-migration-census-data.html
- https://www.census.gov/data/tables/2016/demo/geographic-mobility/county-to-county-migration-2012-2016.html
- https://www.ers.usda.gov/topics/food-choices-health/food-access/
- https://www.ers.usda.gov/data-products/food-access-research-atlas/
- Infographic: https://socialwork.tulane.edu/blog/food-deserts-in-america

----------------------------------------------------------------------------

# Work in Progress...

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

import requests # library to fetch URLs

from bs4 import BeautifulSoup # library to scrape data from a webpage

print('Libraries imported.')

Libraries imported.


In [3]:
# @hidden_cell
CLIENT_ID = 'CN5NRP3FQ3PUY403DEAMPRRA25IZ5SF5YGWGIBK3FJUB5FLG' # your Foursquare ID
CLIENT_SECRET = 'AQ3N5JOFGUCUEY1RXYUSIM2ZFBURQRNOSTH43RDPQPE03GYE' # your Foursquare Secret
VERSION = '20180604' # Foursquare API version
LIMIT = 100

Use function set up in Week 3 assignment (based on the getNearbyVenues function from the Exploring NY lab)

In [97]:
# json downloaded from https://usc.data.socrata.com/resource/atat-mmad.json
# has latitude and longitude for all census tracts in Los Angeles


Data downloaded from: https://usc.data.socrata.com/widgets/atat-mmad


In [4]:
lat_lng_coords = pd.read_json('atat-mmad.json')
lat_lng_coords = lat_lng_coords[['geoid','latitude','longitude','neighborhood','tract_number']]
lat_lng_coords.head()

Unnamed: 0,geoid,latitude,longitude,neighborhood,tract_number
0,1400000US06037101110,34.259555,-118.293602,Tujunga,101110
1,1400000US06037101122,34.267357,-118.29024,Tujunga,101122
2,1400000US06037101210,34.251998,-118.292687,Tujunga,101210
3,1400000US06037101220,34.25119,-118.281014,Tujunga,101220
4,1400000US06037101300,34.245595,-118.271731,Tujunga,101300


In [5]:
food_atlas = pd.read_excel(
    'Food_Access_Research_Atlas_2015_LOS_ANGELES.xlsx',
    sheet_name='Food Access Research Atlas')

food_atlas = food_atlas[['CensusTract','State','County','Urban','POP2010',
                         'lapophalf','lapophalfshare','lalowihalf','lalowihalfshare']]
food_atlas.head()

Unnamed: 0,CensusTract,State,County,Urban,POP2010,lapophalf,lapophalfshare,lalowihalf,lalowihalfshare
0,6037101110,California,Los Angeles,1,4731,3461.636143,0.731692,1443.3378,0.305081
1,6037101122,California,Los Angeles,1,3664,3660.694039,0.999098,317.312226,0.086603
2,6037101210,California,Los Angeles,1,5990,5329.50475,0.889734,3035.663558,0.506789
3,6037101220,California,Los Angeles,1,3363,1874.230204,0.557309,681.286018,0.202583
4,6037101300,California,Los Angeles,1,4199,739.348889,0.176077,76.682345,0.018262


Combine food_atlas with lat_lng_coords

In [6]:
# Get last 6 digits of CensusTract to match tract_number in lat-long data
food_atlas['atlas_tract']=food_atlas['CensusTract'].astype(str).str[4:11]
# Put tract number in lat_lng_coords in string format
lat_lng_coords['tract_number']=lat_lng_coords['tract_number'].astype(str)

# Merge on tract number
food_and_coords = pd.merge(left=food_atlas,right=lat_lng_coords,how='inner',
              left_on='atlas_tract',right_on='tract_number',validate='1:1')
food_and_coords.drop(['atlas_tract','State','County','CensusTract','Urban','geoid'],axis=1,inplace=True)

food_and_coords.head()

Unnamed: 0,POP2010,lapophalf,lapophalfshare,lalowihalf,lalowihalfshare,latitude,longitude,neighborhood,tract_number
0,4731,3461.636143,0.731692,1443.3378,0.305081,34.259555,-118.293602,Tujunga,101110
1,3664,3660.694039,0.999098,317.312226,0.086603,34.267357,-118.29024,Tujunga,101122
2,5990,5329.50475,0.889734,3035.663558,0.506789,34.251998,-118.292687,Tujunga,101210
3,3363,1874.230204,0.557309,681.286018,0.202583,34.25119,-118.281014,Tujunga,101220
4,4199,739.348889,0.176077,76.682345,0.018262,34.245595,-118.271731,Tujunga,101300


## Connect to FourSquare Data

Define a function to get information for all of the food-related venues in a census tract. (Heavily based on the Week 2 Lab Notebook)

In [7]:
def getNearbyFoodVenues(df, radius=500):
    """Loop through the input locations and request venue information from FourSquare.
    Output a single dataframe with all of the results."""
    names = df['neighborhood']+' - Census Tract ' + df['tract_number']
    latitudes = df['latitude']
    longitudes = df['longitude']
               
    venues_dict={}
    idx = 0
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(f'Getting venues near {name}...')
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        for venue in results:            
            venues_dict.update({idx :
                {'Neighborhood':name, 
                 'Neighborhood Latitude':lat, 
                 'Neighborhood Longitude':lng, 
                 'Venue':venue['venue']['name'], 
                 'Venue ID':venue['venue']['id'],
                 'Distance from Neighborhood':venue['venue']['location']['distance'],
                 'Venue Latitude':venue['venue']['location']['lat'], 
                 'Venue Longitude':venue['venue']['location']['lng'],  
                 'Venue Category':venue['venue']['categories'][0]['name'],
                 'Venue Type':venue['venue']['categories'][0]['icon']['prefix'].split('/')[-2] # get broader category bin from icon prefix
                }})
            idx = idx + 1

        print(f'    found {len(results):,} results')
    # Use helper method for creating a dataframe by rows
    nearby_venues = pd.DataFrame.from_dict(venues_dict, orient='index')
    
    print('---------------------------------------------------------')
    #De-duplicate, make sure each venue is only assigned to the closest neighborhood
    print(f'{nearby_venues.shape[0]:,} total venues collected.')
    nearby_venues = nearby_venues.loc[nearby_venues.groupby('Venue ID')['Distance from Neighborhood'].idxmin()]
    print(f'   ... {nearby_venues.shape[0]:,} remaining after de-duplication.')
    #Subset to food venues
    nearby_food_venues = nearby_venues.loc[nearby_venues['Venue Type']=='food',
                        ['Neighborhood','Venue','Venue Latitude','Venue Longitude','Venue Category']]
    print(f'   ... {nearby_venues.shape[0]:,} after subset to food venues.')
    
    # Add numeric code for venue category
    category_labels = nearby_food_venues['Venue Category'].value_counts().reset_index()
    category_labels.set_index('index',inplace=True)
    category_labels['Category Number']= range(1, 1 + category_labels.shape[0])
    category_labels = category_labels.to_dict()['Category Number']
    nearby_food_venues['Category Number']=nearby_food_venues['Venue Category'].map(category_labels)
    print(category_labels)
    
    
    return(nearby_food_venues)


Try it out...

In [8]:
venues = getNearbyFoodVenues(food_and_coords)

Getting venues near Tujunga - Census Tract 101110...
    found 3 results
Getting venues near Tujunga - Census Tract 101122...
    found 0 results
Getting venues near Tujunga - Census Tract 101210...
    found 11 results
Getting venues near Tujunga - Census Tract 101220...
    found 4 results
Getting venues near Tujunga - Census Tract 101300...
    found 18 results
Getting venues near Tujunga - Census Tract 101400...
    found 0 results
Getting venues near Shadow Hills - Census Tract 102103...
    found 2 results
Getting venues near Shadow Hills - Census Tract 102104...
    found 2 results
Getting venues near Sun Valley - Census Tract 102105...
    found 3 results
Getting venues near Shadow Hills - Census Tract 102107...
    found 0 results
Getting venues near Sunland - Census Tract 103101...
    found 2 results
Getting venues near Sunland - Census Tract 103102...
    found 8 results
Getting venues near Lake View Terrace - Census Tract 103200...
    found 3 results
Getting venues near S

    found 2 results
Getting venues near Northridge - Census Tract 115302...
    found 8 results
Getting venues near Northridge - Census Tract 115401...
    found 4 results
Getting venues near Northridge - Census Tract 115403...
    found 14 results
Getting venues near Northridge - Census Tract 115404...
    found 2 results
Getting venues near North Hills - Census Tract 117101...
    found 6 results
Getting venues near North Hills - Census Tract 117102...
    found 0 results
Getting venues near North Hills - Census Tract 117201...
    found 8 results
Getting venues near North Hills - Census Tract 117202...
    found 2 results
Getting venues near North Hills - Census Tract 117301...
    found 6 results
Getting venues near Northridge - Census Tract 117302...
    found 13 results
Getting venues near North Hills - Census Tract 117303...
    found 1 results
Getting venues near North Hills - Census Tract 117404...
    found 4 results
Getting venues near North Hills - Census Tract 117405...
  

    found 4 results
Getting venues near Van Nuys - Census Tract 127103...
    found 23 results
Getting venues near Van Nuys - Census Tract 127104...
    found 4 results
Getting venues near Van Nuys - Census Tract 127210...
    found 2 results
Getting venues near Van Nuys - Census Tract 127220...
    found 13 results
Getting venues near Van Nuys - Census Tract 127300...
    found 9 results
Getting venues near Van Nuys - Census Tract 127400...
    found 7 results
Getting venues near Van Nuys - Census Tract 127520...
    found 19 results
Getting venues near Van Nuys - Census Tract 127603...
    found 3 results
Getting venues near Van Nuys - Census Tract 127604...
    found 4 results
Getting venues near Van Nuys - Census Tract 127605...
    found 8 results
Getting venues near Van Nuys - Census Tract 127606...
    found 5 results
Getting venues near Van Nuys - Census Tract 127711...
    found 17 results
Getting venues near Van Nuys - Census Tract 127712...
    found 9 results
Getting venues

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [10]:
# Save results to .csv so we don't have to keep hitting Foursquare
food_and_coords.shape
#venues.to_csv('los_angeles_venues.csv',sep='|')

(999, 9)

## Create a map visualization

Define a function to plot markers for each venue

In [249]:
def mapVenues(df,map):
    # set color scheme for the clusters
    x = np.arange(df['Venue Category'].nunique())
    ys = [i + x + (i*x)**2 for i in range(df['Venue Category'].nunique())]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]

    # add markers to the map
    markers_colors = []
    for lat, lon, poi, cluster, cluster_desc in zip(
        df['Venue Latitude'], 
        df['Venue Longitude'], 
        df['Neighborhood'], 
        df['Category Number'], 
        df['Venue Category']):
    
        label = folium.Popup(str(poi) + '\n' + str(cluster_desc), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(map)
    map

In [267]:
# create map
map_la = folium.Map(location=[34.24,-118.28], zoom_start=12)
mapVenues(venues,map_la)
map_la


In [217]:
df.sort_values('lapophalfshare',ascending=0,inplace=True)

In [268]:
westwood_venues

Unnamed: 0,Neighborhood,Venue,Venue Latitude,Venue Longitude,Venue Category,Category Number
229,Westwood - Census Tract 265520,Sunnin Lebanese Cafe,34.050521,-118.43735,Middle Eastern Restaurant,14
172,Westwood - Census Tract 265510,Starbucks,34.059425,-118.444725,Coffee Shop,1
118,Westwood - Census Tract 265305,Diddy Riese,34.062739,-118.447033,Ice Cream Shop,5
126,Westwood - Census Tract 265305,Lamonica's New York Pizza,34.060884,-118.446794,Pizza Place,2
290,Westwood - Census Tract 265700,Clementine,34.060115,-118.420531,Café,18
234,Westwood - Census Tract 265520,Fresh Corn Grill,34.053832,-118.440452,American Restaurant,7
117,Westwood - Census Tract 265305,In-N-Out Burger,34.06318,-118.448187,Burger Joint,24
139,Westwood - Census Tract 265305,Native Foods,34.059988,-118.445983,Vegetarian / Vegan Restaurant,20
253,Westwood - Census Tract 265520,Ramayani,34.050484,-118.437727,Indonesian Restaurant,35
39,Westwood - Census Tract 265202,Jersey Mike's Subs,34.061698,-118.444084,Sandwich Place,4
