<h1> Capstone Project Week 1

<h2> Review Criteria

<h3>Part 1: A description of the problem and a discussion of the background. (15 marks)

Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.

This submission will eventually become your Introduction/Business Problem section in your final report. So I recommend that you push the report (having your Introduction/Business Problem section only for now) to your Github repository and submit a link to it.

<h3> Part 2: A description of the data and how it will be used to solve the problem. (15 marks)

Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

This submission will eventually become your Data section in your final report. So I recommend that you push the report (having your Data section) to your Github repository and submit a link to it.

<h2> Part 1 (Problem)

The State Capitol of California is located in Sacramento. Sacramento is a large county with population of 508,519 in 2018 according to the US Census Bureau. The Capitol is located in downtown Sacramento which has a daytime population of more than 100,000 people. Downtown Sacramento has around 150 restraunts. <cityofsacramento.org> 

Our client wants to open a new restraunt near the State Capitol of California.  
They want to know "Where in Downtown Sacramento would it be best to open a restaurant?"
Our client will need to decide whether they want to compete with a lot of restraunts with a high market demand, or fewer restaurants with a lower market demand. --- Our client decides to go with the higher market demand and therefore would like to see what areas would be good to add a restraunt to. 
Our client also decides they want to add this popular restraunt type to an area with more restraunts nearby. 

Questions to answer:
What are the most common types of restaurants?\
What areas have more restraunts?

<h2> Part 2 (Data)

The data will be gathered from Foursquare API and zipcode data for Sacramento from the US government. It will focus on venues and their type. The explore function of Foursquare will list all the more popular food places around each zipcode and from there I can clean, organize and provide visualization of the data. 

In [None]:
#importing libraries 
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import requests, json

# Matplotlib and associated plotting modules
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

#import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

import requests

print('Libraries imported.')

Cleaning Zip code data to get the top 5 populated zip codes in Sacramento

In [2]:
df = pd.read_csv ("us-zip-code-latitude-and-longitude.csv",sep=";",dtype={'Zip':'str'})
df1 = df.drop(['Timezone', 'Daylight savings time flag', 'geopoint', 'State'], axis=1).drop_duplicates(subset='Latitude',keep='first',inplace=False)
df2 = df1[df1['City'] == 'Sacramento'].rename(columns={"Latitude":"lat","Longitude":"lng"}).reset_index(drop=True)
print(df2.dtypes)
#filter to more populated areas
df3 = df2[(df2.lat<38.594205)&(df2.lat>38.535795)&(df2.lng>-121.504660)&(df2.lng<-121.378090)].reset_index()
df3

Zip      object
City     object
lat     float64
lng     float64
dtype: object


Unnamed: 0,index,Zip,City,lat,lng
0,5,95818,Sacramento,38.556576,-121.49285
1,7,95819,Sacramento,38.568855,-121.44099
2,20,95816,Sacramento,38.571661,-121.46827
3,24,95814,Sacramento,38.580255,-121.49125
4,30,95817,Sacramento,38.551106,-121.45996


Using geolocator data to get the coordinates of Sacramento

In [3]:
# Get longitude and latitude for Sacramento
address = 'Capitol Park Sacramento, California'

geolocator = Nominatim(user_agent="usa_explorer")
location = geolocator.geocode(address)
saclatitude = location.latitude
saclongitude = location.longitude
print('The geograpical coordinate of Sacramento is {}, {}.'.format(saclatitude, saclongitude))

The geograpical coordinate of Sacramento is 38.5760675, -121.49144704301602.


In [4]:
CLIENT_ID = 'Client_id' # your Foursquare ID
CLIENT_SECRET = 'Client_secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1900# define radius
#categoryId = '4bf58dd8d48988d148941735'
search_query = 'restaurant'


# create URL with search query
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, saclatitude, saclongitude, VERSION, search_query, radius, LIMIT)

In [5]:
def getNearbyVenues(names, latitudes, longitudes, radius=1200):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            lat,
            lng,
            VERSION, 
            search_query,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Zip_codes', 
                  'Zip_Latitude', 
                  'Zip_Longitude', 
                  'Venue', 
                  'Venue_Lat', 
                  'Venue_Long', 
                  'Venue_Category']
    
    return(nearby_venues)

In [6]:
Sac_Venues = getNearbyVenues(names=df3['Zip'],
                                   latitudes=df3['lat'],
                                   longitudes=df3['lng'])

95818
95819
95816
95814
95817


In [7]:
# Create a Data-Frame out of it to Concentrate Only on Restaurants 

Sac_Venues_only_restaurant = Sac_Venues[Sac_Venues['Venue_Category']\
                                                          .str.contains('Restaurant')].reset_index(drop=True)
Sac_Venues_only_restaurant.index = np.arange(1, len(Sac_Venues_only_restaurant)+1)
print ("Shape of the Data-Frame with Venue Category only Restaurant: ", Sac_Venues_only_restaurant.shape)
#Sac_Venues_only_restaurant.head(3)

Shape of the Data-Frame with Venue Category only Restaurant:  (174, 7)


In [8]:
## Show in Map the Top Rated Restaruants in the Top 5 Wealthiest Zipcodes

map_restaurants = folium.Map(location=[saclatitude, saclongitude], zoom_start=13)

# set color scheme for the Venues based on the Major Zip_codes
Zip_codes = ['95818', '95819', '95816', '95814', '95817']

x = np.arange(len(Zip_codes))

rainbow = ['#00ff00', '#ff00ff','#0000ff','#ffa500' ,'#ff0000']
# add the exploratory search results as blue circle markers w/ labels
for lat, lng, label in zip(df3.lat, df3.lng, df3.Zip):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='green',
        popup=label,
        fill=True,
        fill_color='green',
        fill_opacity=0.6
    ).add_to(map_restaurants)
    

# add markers to the map
# markers_colors = []
for lat, lon, poi, distr in zip(Sac_Venues_only_restaurant['Venue_Lat'], 
                                 Sac_Venues_only_restaurant['Venue_Long'], 
                                  Sac_Venues_only_restaurant['Venue_Category'], 
                                  Sac_Venues_only_restaurant['Zip_codes']):
    label = folium.Popup(str(poi) + ' ' + str(distr), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=7,
        popup=label,
        color=rainbow[Zip_codes.index(distr)-1],
        fill=True,
        fill_color=rainbow[Zip_codes.index(distr)-1],
        fill_opacity=0.3).add_to(map_restaurants)
       
map_restaurants

In [9]:
### Number of Unique Categories in the Dataframe 
print('There are {} unique categories.'.format(len(Sac_Venues['Venue_Category'].unique())))
print (Sac_Venues['Venue_Category'].value_counts())
count = Sac_Venues['Venue_Category'].value_counts()

There are 55 unique categories.
American Restaurant                29
Mexican Restaurant                 25
Chinese Restaurant                 18
Café                               16
Pizza Place                        15
Restaurant                         14
Thai Restaurant                    13
BBQ Joint                           9
Breakfast Spot                      9
Japanese Restaurant                 8
Italian Restaurant                  8
Bakery                              8
Deli / Bodega                       8
Sandwich Place                      8
Sushi Restaurant                    8
Burger Joint                        7
New American Restaurant             6
Vietnamese Restaurant               6
French Restaurant                   5
Steakhouse                          5
Seafood Restaurant                  4
Food Truck                          4
Mediterranean Restaurant            4
Vegetarian / Vegan Restaurant       3
German Restaurant                   3
Fast Food Restaura