<a href="https://colab.research.google.com/github/jimmy-io/Coursera_Capstone/blob/master/LA_food.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Science Capstone Project

## Jimmy J.

### Introduction
Frank Lloyd Wright said it best - “Tip the world over on its side and everything loose will land in Los Angeles." For decades people from all over the world have come to this city in search of home, family, love, fame, and fortune. Los Angeles welcomes this multicultural migration with a suburban sprawl that encompases over 88 cities, and even more unincorporated neihborhoods. Each of these neighborhoods are characterized by geographical, economic, and cultural features that make them uniquely poised to cater to different demographics.
Neighborhoods like Downtown Los Angeles, Culver City, Long Beach, Century City, and West Hollywood provide a mixture of urban style living and accessibility to grocery stores, malls, means of public transportation and entertainment venues within walking distance. On the otherhand suburbs like Baldwin Hills, Crenshaw, Echo Park and Boyle Heights are quiter neighborhoods with single family dwellings that are not easily accessible via public modes of transportation.
Given the vast spectrum of possibilites of neighborhoods to choose from in LA, someone looking to move here might be overwhelmed. In this project I've attempted to characterize neighborhoods in LA by the nature of venue that are in their immediate vicinities using a clustering algorithm. Results from this analysis show that neighborhoods in LA fall under a few groups, defined by the nature of the venues closest to them. This is of most interest to rental unit searching apps like Westiside Rentals or Rentpad, to real estate agents, and generally, to people looking to move to LA. The results of this project can help them find neighborhoods that are most aligned with what they are looking for in a place to live and overall, provide a more satisfactory experience than chosing a neighborhood at random.

### Data
The list of neighborhoods in Los Angeles was web scraped from Wikipedia using BeautifulSoup. The names of these neigborhoods were then fed into Nominatim to obtain their geographical coordinates.
The venues in the vicinity of these neighborhoods will be retreived using the FourSquare API search engine. Venues of five different categories were chosen: Travel & Transport, Arts & Entertainment, Outdoors & Recreation, Nightlife Spot, and Food. The number of venues of each category for each neighborhood was counted and then normalized to give five parameters with which to cluster the neighborhoods by.

In [0]:
### Importing libraries 

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import re
import json # library to handle JSON files
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
import bs4 as bs
import urllib.request
import requests
import collections
import csv

In [3]:
#@hidden_cell

CLIENT_ID = 'JS0P2BHNS4GICN4OT1LRM03JV0OLTO4QWS0I5AEITRLVI3QU' # your Foursquare ID
CLIENT_SECRET = 'KGFP21SEFHLUXM2EPAI4HDLQOAI21MC1CY24RJ4AII4UX2Q3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails: hidden')

Your credentails: hidden


In [2]:
# Retreiving the coordinates for Los Angeles 

address = 'Los Angeles, CA'

geolocator = Nominatim(user_agent="LA_explorer")
location = geolocator.geocode(address)
LAlatitude = location.latitude
LAlongitude = location.longitude
print('The geograpical coordinate of Los Angeles are {}, {}.'.format(LAlatitude, LAlongitude))

The geograpical coordinate of Los Angeles are 34.0536909, -118.2427666.


In [4]:
## Categories and IDs from FourSquare API

catIds={}
catIds={
        'Food':'4d4b7105d754a06374d81259',
 
       }
catIds

{'Food': '4d4b7105d754a06374d81259'}

In [5]:
for key in catIds:
    print(key)
    print(catIds[key])

Food
4d4b7105d754a06374d81259


In [0]:
### Foursqaure API
venues_list=[]
LIMIT = 10000 # limit of number of venues returned by Foursquare API
radius = 10000 # define radius
categoryId='4d4b7105d754a06374d81259'
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}'.format(CLIENT_ID, 
                                                                                                                                              CLIENT_SECRET, 
                                                                                                                                              VERSION, 
                                                                                                                                              LAlatitude,
                                                                                                                                              LAlongitude, 
                                                                                                                                              radius,
                                                                                                                                              categoryId,
                                                                                                                                              LIMIT)
            
    # make the GET request
            
results = requests.get(url).json()["response"]['groups'][0]['items']
        
        
    # return only relevant information for each nearby venue
venues_list.append([(v['venue']['name'], 
                      v['venue']['location']['lat'], 
                      v['venue']['location']['lng'],  
                      v['venue']['categories'][0]['name']) for v in results])
            
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['Venue', 
                          'Venue Latitude', 
                          'Venue Longitude', 
                          'Venue Category']
        



In [7]:
nearby_venues

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Redbird,34.050666,-118.244068,American Restaurant
1,Marugame Monzo,34.049807,-118.240202,Udon Restaurant
2,KazuNori: The Original Hand Roll Bar,34.047716,-118.247452,Sushi Restaurant
3,Café Dulcé,34.048869,-118.240508,Bakery
4,Sari Sari Store LA,34.051065,-118.249390,Filipino Restaurant
...,...,...,...,...
95,Mh Zh,34.089486,-118.276920,Restaurant
96,Taqueria El Atacor #1,34.088716,-118.214553,Mexican Restaurant
97,Kang Ho Dong Baek Jeong,34.063828,-118.297364,Korean Restaurant
98,Ham Ji Park,34.063633,-118.295838,Korean Restaurant


In [0]:
## Saving the nearby venues data as a csv to avoid repeated API calls 
nearby_venues.to_csv('/content/drive/My Drive/IBM Capstone/LA_food.csv')

In [0]:
## Reading the nearby venues data as a csv
la_food  = pd.read_csv('/content/drive/My Drive/IBM Capstone/LA_food.csv', index_col=[0])

In [15]:
la_food.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Redbird,34.050666,-118.244068,American Restaurant
1,Marugame Monzo,34.049807,-118.240202,Udon Restaurant
2,KazuNori: The Original Hand Roll Bar,34.047716,-118.247452,Sushi Restaurant
3,Café Dulcé,34.048869,-118.240508,Bakery
4,Sari Sari Store LA,34.051065,-118.24939,Filipino Restaurant
