# Data Science Capstone

## Problem: 
The experience of people in Chicago, Illinois in the United States varies wildly from neighborhood to neighborhood. I'd like to explore the relationship between the most common venues in a neighborhood and its key economic and sociological indicators. 

Specifically

1. What venues are most common in the areas with the lowest per-capita income 
2. how many grocery stores are there in the highest and lowest income areas per-capita

## Data:
1. Foursquare API: foursquare.com

2. Chicago Community Boundaries: https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6

This data is a JSON file that will set the boundaries that are fundamental to this analysis

3. Census Data - Selected socioeconomic indicators in Chicago, 2008 – 2012: https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2

This data contains multiple measures of socioeconic health, including education, housing and income. I will be leveraging the income data primarily.
4. Spreadsheet: 2010 Census Data Summarized :https://datahub.cmap.illinois.gov/dataset/2010-census-data-summarized-to-chicago-community-areas/resource/b30b47bf-bb0d-46b6-853b-47270fb7f626?inner_span=True

this contains the total poulation of each of the community areas related to my analysis

This data will allows us to separate out locations and see which venues are most prevalent. I look forward to evaluating the data and gaining deeper understanidng of a premier global city. The venue data will come from the Foursquare API. The income information will be from the socioeconomic indicators and the population totals will come from the summarized census data.

In [439]:
import requests
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import pandas as pd
import numpy as np
import json 
import folium
from pandas.io.json import json_normalize

In [440]:
#read in data files
censusData = pd.read_excel('2010_chicago_census.xlsx',header = 1)
socioData = pd.read_csv('Census_Data_-_Selected_socioeconomic_indicators_in_Chicago__2008___2012.csv')
Commareas = pd.read_csv('Commareas.csv')


In [441]:
df1 = censusData.copy()
a=df1['Geog'].copy()
b=socioData['COMMUNITY AREA NAME'].copy()
a.sort_values()
b.sort_values()
a['Bee'] = b
len(a)

78

In [442]:

CLIENT_ID = 'VYCOUITIGXOONWV3KJ34HKMDN2IAXOLBLRKF0CRIJSXXR1SJ' #  Foursquare ID
CLIENT_SECRET = 'H0ATIVHEOUNIBDCHAVN1YOT2K5GLUT4QT1HNO3VLPU00EIKB' #  Foursquare Secret
VERSION = '20180605' # Foursquare API version
search_query = 'Grocery'
radius = 3218
LIMIT = 50
address = 'Lincoln Square ,Chicago IL'
Category= '4bf58dd8d48988d118951735'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}&CategoryID={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT,Category)


42.2669968 -71.7984322


In [443]:
results = requests.get(url).json()
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)


In [444]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,No Name Grocery,Convenience Store,,US,,United States,,1443,"[Massachusetts, United States]","[{'label': 'display', 'lat': 42.25611821887266...",42.256118,-71.788906,,Massachusetts,4e550b1efa761d3de843d8e0
1,canal grocery & deli,Grocery Store,155 Millbury St,US,Worcester,United States,,1739,"[155 Millbury St, Worcester, MA 01610, United ...","[{'label': 'display', 'lat': 42.251411, 'lng':...",42.251411,-71.79988,1610.0,MA,4e42ebb8814dfebd20e283a4
2,4 Seasons Grocery & Deli,Deli / Bodega,442 Park Ave,US,Worcester,United States,,2539,"[442 Park Ave, Worcester, MA 01610, United Sta...","[{'label': 'display', 'lat': 42.25547, 'lng': ...",42.25547,-71.825035,1610.0,MA,5678a7f1498e820bd558a71e
3,San Miguel Grocery,Grocery Store,Canterbury St,US,Worcester,United States,Grand St,2708,"[Canterbury St (Grand St), Worcester, MA 01603...","[{'label': 'display', 'lat': 42.24647080025062...",42.246471,-71.816085,1603.0,MA,51129fa8e4b0874ae8d1b794
4,G&M Grocery,Smoke Shop,504 Southbridge Street,US,Worcester,United States,,2800,"[504 Southbridge Street, Worcester, MA, United...","[{'label': 'display', 'lat': 42.243419, 'lng':...",42.243419,-71.81028,,MA,4d7d5c6f86cfa1436977dca0
5,"Bahnan's International Marketplace, Bakery And...",Mediterranean Restaurant,344 Pleasant St,US,Worcester,United States,,1319,"[344 Pleasant St, Worcester, MA 01609, United ...","[{'label': 'display', 'lat': 42.263784750315, ...",42.263785,-71.813855,1609.0,MA,4c375e751e06d13a68cc753e
6,Price Rite,Supermarket,542 Southbridge St,US,Worcester,United States,,2388,"[542 Southbridge St, Worcester, MA 01610, Unit...","[{'label': 'display', 'lat': 42.24721035559493...",42.24721,-71.809641,1610.0,MA,4b5b51f7f964a52059f428e3


In [445]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred on Chicago


In [446]:
# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

In [450]:
col1=[]#name of neighborhood
col2=[]#number of grocery stores
col3=[]#loc
CLIENT_ID = 'VYCOUITIGXOONWV3KJ34HKMDN2IAXOLBLRKF0CRIJSXXR1SJ' #  Foursquare ID
CLIENT_SECRET = 'H0ATIVHEOUNIBDCHAVN1YOT2K5GLUT4QT1HNO3VLPU00EIKB' #  Foursquare Secret
VERSION = '20180605' # Foursquare API version
search_query = 'Grocery'
radius = 3218
LIMIT = 50


for i in a:
    address = str(i)+' ,Chicago IL'
    Category= '4bf58dd8d48988d118951735'

    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    #print(latitude, longitude)
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}&CategoryID={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT,Category)
    results = requests.get(url).json()
    # assign relevant part of JSON to venues
    venues = results['response']['venues']
    # tranform venues into a dataframe
    df_data = json_normalize(venues)
    nresult= len(df_data)
    col1.append(i)
    col2.append(nresult)
    col3.append(location)
    print(i,'  ',nresult)
print(col1,col2)
    
    

Rogers Park    23
West Ridge    24


GeocoderTimedOut: Service timed out

In [None]:
dl = {'Neighborhood': col1, '# Grocers': col2,'Location':col3}
DF = pd.DataFrame(data=dl)