<h1 align='center'>Report - Capstone Project - The Battle of Neighborhoods (Week 4)</h1>

<h1 align='center'>(Data)</h1>

<b>We would be using two sets of data to solve our problem

1. World Cities Dataset

This dataset helps us locating the cities with their latitudes and longitudes. We use this to standerize our city names and focus on only ~26000 major cities in the world.We do this to ensure that our analysis does not corrupt due two cities having similar names.

2. Foursquare API Query

We use the explore query to get unique eatries nearby the city center within a 5 km radius. We then cluster them into groups and present them to the user wholly as a group to choose from.

A brief overview and examples of data are given below.
</b>

<b>1. Importing Libraries

In [7]:
import numpy as np
import pandas as pd

from bs4 import BeautifulSoup
import requests

from geopy.geocoders import Nominatim
import folium

from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

<b>1. Importing World cities dataset aquired from https://simplemaps.com/data/world-cities

In [8]:
cities=pd.read_csv(r'D:\Datasets\worldcities.csv')
cities.shape

(26569, 11)

In [20]:
cities.head()

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id,new
0,Tokyo,Tokyo,35.6897,139.6922,Japan,JP,JPN,Tōkyō,primary,37977000.0,1392685764,TokyoTōkyō
1,Jakarta,Jakarta,-6.2146,106.8451,Indonesia,ID,IDN,Jakarta,primary,34540000.0,1360771077,JakartaJakarta
2,Delhi,Delhi,28.66,77.23,India,IN,IND,Delhi,admin,29617000.0,1356872604,DelhiDelhi
3,Mumbai,Mumbai,18.9667,72.8333,India,IN,IND,Mahārāshtra,admin,23355000.0,1356226629,MumbaiMahārāshtra
4,Manila,Manila,14.5958,120.9772,Philippines,PH,PHL,Manila,primary,23088000.0,1608618140,ManilaManila


<b>2. Taking the City Name as Input. Lets take Mumbai for example

In [5]:
city=input('Input ascii compatible city name with first letter in uppercase: ')
print('Searching for city {} '.format(city))

while (city not in cities['city_ascii'].to_list()):
    print(city)
    print('Name not found in database')
    print('Enter 1 to try again')
    print('Enter 0 to exit')
    aux=int(input())
    if aux==1:
        city=input('Input ascii compatible city name with first letter in uppercase: ')
        print('Searching for city {} '.format(city))
    if aux==0:
        raise KeyboardInterrupt

print('City found')

Input ascii compatible city name with first letter in uppercase: Mumbai
Searching for city Mumbai 
City found


In [6]:
idx=cities[cities['city_ascii']==city].index.values

city_latitude = cities.loc[idx[0], 'lat'] # city latitude value
city_longitude = cities.loc[idx[0], 'lng'] # city longitude value

city_name = cities.loc[idx[0], 'city_ascii'] # city name

print('Latitude and longitude values of {} are {}, {}.'.format(city_name, 
                                                               city_latitude, 
                                                               city_longitude))

Latitude and longitude values of Mumbai are 18.9667, 72.8333.


<b>2. Initializing Foursquare Attributes

In [24]:
CLIENT_ID = '...' # your Foursquare ID
CLIENT_SECRET = '...' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 50 # A default Foursquare API limit value

<b>3. For Example lets take the city of Mumbai

In [26]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

<b>4. Getting eatries categorized as food

In [27]:
radius = 5000
section='food'

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&section={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    city_latitude, 
    city_longitude, 
    radius, 
    LIMIT,
    section)

In [28]:
res = requests.get(url).json()

In [31]:
#res

In [32]:
venues = res['response']['groups'][0]['items']
    
nearby_venues_food = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues_food =nearby_venues_food.loc[:, filtered_columns]

# filter the category for each row
nearby_venues_food['venue.categories'] = nearby_venues_food.apply(get_category_type, axis=1)

# clean columns
nearby_venues_food.columns = [col.split(".")[-1] for col in nearby_venues_food.columns]

nearby_venues_food.head()

Unnamed: 0,name,categories,lat,lng
0,Sarvi,Middle Eastern Restaurant,18.966853,72.829221
1,Jaffer Bhai's Delhi Darbar,Indian Restaurant,18.961417,72.823379
2,Al Rehmani,Indian Restaurant,18.961843,72.831818
3,Shalimar Restaurant,Indian Restaurant,18.95818,72.832367
4,Shree Thaker Bhojnalay,Indian Restaurant,18.951217,72.828326


In [33]:
print('{} venues were returned by Foursquare.'.format(nearby_venues_food.shape[0]))

50 venues were returned by Foursquare.


<b>5. Getting eatries categorized as cofee

In [34]:
radius = 5000
section='cofee'

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&section={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    city_latitude, 
    city_longitude, 
    radius, 
    LIMIT,
    section)

In [35]:
res = requests.get(url).json()

In [36]:
venues = res['response']['groups'][0]['items']
    
nearby_venues_cofee = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues_cofee =nearby_venues_cofee.loc[:, filtered_columns]

# filter the category for each row
nearby_venues_cofee['venue.categories'] = nearby_venues_cofee.apply(get_category_type, axis=1)

# clean columns
nearby_venues_cofee.columns = [col.split(".")[-1] for col in nearby_venues_cofee.columns]

nearby_venues_cofee.head()

Unnamed: 0,name,categories,lat,lng
0,Richardson and Cruddas,Music Venue,18.966491,72.832793
1,Sarvi,Middle Eastern Restaurant,18.966853,72.829221
2,Jaffer Bhai's Delhi Darbar,Indian Restaurant,18.961417,72.823379
3,Al Rehmani,Indian Restaurant,18.961843,72.831818
4,Shalimar Restaurant,Indian Restaurant,18.95818,72.832367


In [37]:
print('{} venues were returned by Foursquare.'.format(nearby_venues_cofee.shape[0]))

50 venues were returned by Foursquare.


<b>6. Getting eatries categorized as drinks

In [38]:
radius = 5000
section='drinks'

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&section={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    city_latitude, 
    city_longitude, 
    radius, 
    LIMIT,
    section)

In [39]:
venues = res['response']['groups'][0]['items']
    
nearby_venues_drinks = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues_drinks =nearby_venues_drinks.loc[:, filtered_columns]

# filter the category for each row
nearby_venues_drinks['venue.categories'] = nearby_venues_drinks.apply(get_category_type, axis=1)

# clean columns
nearby_venues_drinks.columns = [col.split(".")[-1] for col in nearby_venues_drinks.columns]

nearby_venues_drinks.head()

Unnamed: 0,name,categories,lat,lng
0,Richardson and Cruddas,Music Venue,18.966491,72.832793
1,Sarvi,Middle Eastern Restaurant,18.966853,72.829221
2,Jaffer Bhai's Delhi Darbar,Indian Restaurant,18.961417,72.823379
3,Al Rehmani,Indian Restaurant,18.961843,72.831818
4,Shalimar Restaurant,Indian Restaurant,18.95818,72.832367


In [40]:
res = requests.get(url).json()

In [41]:
print('{} venues were returned by Foursquare.'.format(nearby_venues_drinks.shape[0]))

50 venues were returned by Foursquare.


<b>7. Merging All three Datasets and removing duplicates

In [42]:
nearby_venues=nearby_venues_food.append(nearby_venues_cofee)
nearby_venues=nearby_venues.append(nearby_venues_drinks)

In [43]:
nearby_venues=nearby_venues.drop_duplicates()

In [44]:
nearby_venues[:]

Unnamed: 0,name,categories,lat,lng
0,Sarvi,Middle Eastern Restaurant,18.966853,72.829221
1,Jaffer Bhai's Delhi Darbar,Indian Restaurant,18.961417,72.823379
2,Al Rehmani,Indian Restaurant,18.961843,72.831818
3,Shalimar Restaurant,Indian Restaurant,18.958180,72.832367
4,Shree Thaker Bhojnalay,Indian Restaurant,18.951217,72.828326
...,...,...,...,...
42,Haji Ali Juice Centre,Juice Bar,18.978362,72.811248
43,Four Seasons,Hotel,18.994356,72.820319
46,Li Bai - St. Regis,Hotel Bar,18.994186,72.823795
47,Parsi Dairy Farm,Cheese Shop,18.946756,72.831183


We will use this data to cluster the shops and provide the categories of eatries to the customers