# IBM Applied Data Science Capstone Course by Coursera
## Week 5 Final Report
### Opening a Pizza Shop  in Chennai, India

- Build a dataframe of neighborhoods in chennai, India by web scraping the data from Wikipedia page 
- Get the geographical coordinates of the neighborhoods
- Obtain the venue data for the neighborhoods from Foursquare API
- Explore and cluster the neighborhoods
- Select the best cluster to open a Pizza shop in Chennai

In [1]:
!pip install geocoder



In [2]:
!pip install folium



## 1. Import libraries

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


## 2. Scrap data from Wikipedia page into a DataFrame

In [4]:
# send the GET request
response = requests.get("https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Chennai").text

In [46]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(response, 'html.parser')

In [6]:
names = soup.find_all('td',class_ ='navbox-list navbox-odd hlist')[0].find_all('a')
names

[<a href="/wiki/Adyar,_Chennai" title="Adyar, Chennai">Adyar</a>,
 <a href="/wiki/Adambakkam" title="Adambakkam">Adambakkam</a>,
 <a href="/wiki/Alapakkam" title="Alapakkam">Alapakkam</a>,
 <a href="/wiki/Alandur" title="Alandur">Alandur</a>,
 <a href="/wiki/Alwarpet" title="Alwarpet">Alwarpet</a>,
 <a href="/wiki/Alwarthirunagar" title="Alwarthirunagar">Alwarthirunagar</a>,
 <a href="/wiki/Ambattur" title="Ambattur">Ambattur</a>,
 <a href="/wiki/Aminjikarai" title="Aminjikarai">Aminjikarai</a>,
 <a href="/wiki/Anna_Nagar" title="Anna Nagar">Anna Nagar</a>,
 <a href="/wiki/Anna_Nagar_West" title="Anna Nagar West">Anna Nagar West</a>,
 <a href="/wiki/Annanur" title="Annanur">Annanur</a>,
 <a class="mw-redirect" href="/wiki/Andarkuppam,_Chennai" title="Andarkuppam, Chennai">Andarkuppam</a>,
 <a href="/wiki/Arumbakkam" title="Arumbakkam">Arumbakkam</a>,
 <a href="/wiki/Ashok_Nagar,_Chennai" title="Ashok Nagar, Chennai">Ashok Nagar</a>,
 <a href="/wiki/Athipattu" title="Athipattu">Athipatt

In [7]:
# create a list to store neighborhood data
neighborhoodList = []
# append the data into the list
for title in names:
    neighborhoodList.append(title.text)

In [8]:

# create a new DataFrame from the list
chennai_df = pd.DataFrame({"Neighborhood": neighborhoodList})
chennai_df.head()

Unnamed: 0,Neighborhood
0,Adyar
1,Adambakkam
2,Alapakkam
3,Alandur
4,Alwarpet


In [9]:
# print the number of rows of the dataframe
chennai_df.shape

(172, 1)

## 3. Get the geographical coordinates

In [10]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Chennai, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [11]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in chennai_df["Neighborhood"].tolist() ]

In [48]:
coords

[[13.00305000000003, 80.25193000000007],
 [12.99192000000005, 80.20603000000006],
 [13.046100000000024, 80.16499000000005],
 [13.00013000000007, 80.20060000000007],
 [13.034710000000075, 80.25416000000007],
 [13.050550000000044, 80.18397000000004],
 [13.118820000000028, 80.15442000000007],
 [13.071390000000065, 80.22256000000004],
 [13.083590000000072, 80.21020000000004],
 [13.092720000000043, 80.20222000000007],
 [13.112120000000061, 80.12895000000003],
 [13.191590000000076, 80.27328000000006],
 [13.073080000000061, 80.20952000000005],
 [13.035390000000064, 80.21220000000005],
 [13.256700000000023, 80.29109000000005],
 [13.129090000000076, 80.10361000000006],
 [13.09883000000002, 80.23238000000003],
 [12.996850000000052, 80.26691000000005],
 [13.064400000000035, 80.28065000000004],
 [13.040270000000021, 80.06437000000005],
 [12.682240000000036, 79.98008000000004],
 [13.072880000000055, 80.24567000000008],
 [12.932770000000062, 80.14387000000005],
 [13.090240000000051, 80.2655700000000

In [13]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [14]:
# merge the coordinates into the original dataframe
chennai_df['Latitude'] = df_coords['Latitude']
chennai_df['Longitude'] = df_coords['Longitude']

In [49]:
# check the neighborhoods and the coordinates
print(chennai_df.shape)
chennai_df.head()

(172, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Adyar,13.00305,80.25193
1,Adambakkam,12.99192,80.20603
2,Alapakkam,13.0461,80.16499
3,Alandur,13.00013,80.2006
4,Alwarpet,13.03471,80.25416


In [16]:

# save the DataFrame as CSV file
chennai_df.to_csv("kl_df.csv", index=False)

## 4.Create a map of Chennai with neighborhoods superimposed on top

In [17]:
# get the coordinates of chennai
address = 'Chennai, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Chennai, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Chennai, India 13.0801721, 80.2838331.


In [18]:
# create map of chennai using latitude and longitude values
map_chennai = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(chennai_df['Latitude'], chennai_df['Longitude'], chennai_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_chennai)  
    
map_chennai

## 5. Use the Foursquare API to explore the neighborhoods

In [24]:

# define Foursquare Credentials and Version
CLIENT_ID = '3KCFLKV10BGADE35JYXJEACPO1EOBGFP1DMWFVYLCJJ4M24J' # your Foursquare ID
CLIENT_SECRET = 'D203QG2GQMQ0UWVTDANABZO12S5U4GFP4NOP3QTTJPKWVJWU' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version


### Now, let's get the top 100 venues that are within a radius of 2000 meters.

In [25]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(chennai_df['Latitude'], chennai_df['Longitude'], chennai_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [50]:

# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(5249, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Adyar,13.00305,80.25193,That Madras Place,13.005848,80.250726,Café
1,Adyar,13.00305,80.25193,Anjappar,13.006757,80.250713,Indian Restaurant
2,Adyar,13.00305,80.25193,Bombay Brassiere,13.006961,80.256419,North Indian Restaurant
3,Adyar,13.00305,80.25193,ibaco,13.005864,80.251764,Ice Cream Shop
4,Adyar,13.00305,80.25193,Prems Graama Bhojanam,13.006345,80.253995,Vegetarian / Vegan Restaurant


###  Let's check how many venues were returned for each neighorhood

In [29]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adambakkam,72,72,72,72,72,72
Adyar,85,85,85,85,85,85
Alandur,49,49,49,49,49,49
Alapakkam,24,24,24,24,24,24
Alwarpet,100,100,100,100,100,100
Alwarthirunagar,31,31,31,31,31,31
Ambattur,12,12,12,12,12,12
Aminjikarai,90,90,90,90,90,90
Andarkuppam,1,1,1,1,1,1
Anna Nagar,100,100,100,100,100,100


###  Let's find out how many unique categories can be curated from all the returned venues

In [30]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 193 uniques categories.


In [31]:

# print out the list of categories
venues_df['VenueCategory'].unique()[:]

array(['Café', 'Indian Restaurant', 'North Indian Restaurant',
       'Ice Cream Shop', 'Vegetarian / Vegan Restaurant', "Women's Store",
       'Bookstore', 'Middle Eastern Restaurant', 'Pizza Place',
       'Chinese Restaurant', 'Fast Food Restaurant', 'Juice Bar',
       'Clothing Store', 'Asian Restaurant', 'Dessert Shop', 'Rock Club',
       'Italian Restaurant', 'Bistro', 'Lounge',
       'Mediterranean Restaurant', 'Movie Theater', 'Department Store',
       'Coffee Shop', 'Grocery Store', 'Snack Place',
       'Fruit & Vegetable Store', 'College Cafeteria',
       'Performing Arts Venue', 'Sandwich Place', 'Gym',
       'Comfort Food Restaurant', 'Hotel', 'Office', 'Diner',
       'Breakfast Spot', 'Multiplex', 'Hotel Bar', 'Motorcycle Shop',
       'Light Rail Station', 'Market', 'Shopping Mall',
       'Rajasthani Restaurant', 'Donut Shop', "Men's Store", 'Bar',
       'Train Station', 'Restaurant', 'Punjabi Restaurant',
       'Frozen Yogurt Shop', 'BBQ Joint', 'Mexican Rest

In [32]:

# check if the results contain "pizza shops"
"Pizza Place" in venues_df['VenueCategory'].unique()

True

## 6. Analyze Each Neighborhood

In [33]:

# one hot encoding
chennai_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
chennai_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [chennai_onehot.columns[-1]] + list(chennai_onehot.columns[:-1])
chennai_onehot = chennai_onehot[fixed_columns]

print(chennai_onehot.shape)
chennai_onehot.head()

(5249, 194)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Afghan Restaurant,African Restaurant,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Badminton Court,Bakery,Bank,Bar,Beach,Bengali Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Buffet,Burger Joint,Burmese Restaurant,Bus Line,Bus Station,Cafeteria,Café,Campground,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Library,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Cricket Ground,Daycare,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Historic Site,Hotel,Hotel Bar,Hyderabadi Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Kerala Restaurant,Korean Restaurant,Lake,Light Rail Station,Lighthouse,Lounge,Malay Restaurant,Market,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,Music Store,Music Venue,National Park,New American Restaurant,Nightclub,North Indian Restaurant,Office,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Pool,Pool Hall,Pub,Punjabi Restaurant,Racetrack,Rajasthani Restaurant,Ramen Restaurant,Recreation Center,Resort,Rest Area,Restaurant,River,Road,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Supermarket,Surf Spot,Tea Room,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Toy / Game Store,Trail,Train,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Whisky Bar,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Adyar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Adyar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Adyar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Adyar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Adyar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0


### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [51]:

chennai_grouped = chennai_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(chennai_grouped.shape)
chennai_grouped.head()

(170, 194)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Afghan Restaurant,African Restaurant,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Badminton Court,Bakery,Bank,Bar,Beach,Bengali Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Buffet,Burger Joint,Burmese Restaurant,Bus Line,Bus Station,Cafeteria,Café,Campground,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Library,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Cricket Ground,Daycare,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Historic Site,Hotel,Hotel Bar,Hyderabadi Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Kerala Restaurant,Korean Restaurant,Lake,Light Rail Station,Lighthouse,Lounge,Malay Restaurant,Market,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,Music Store,Music Venue,National Park,New American Restaurant,Nightclub,North Indian Restaurant,Office,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Pool,Pool Hall,Pub,Punjabi Restaurant,Racetrack,Rajasthani Restaurant,Ramen Restaurant,Recreation Center,Resort,Rest Area,Restaurant,River,Road,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Supermarket,Surf Spot,Tea Room,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Toy / Game Store,Trail,Train,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Whisky Bar,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Adambakkam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.027778,0.013889,0.013889,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.013889,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.013889,0.055556,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.013889,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.013889,0.013889,0.125,0.0,0.0,0.0,0.013889,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.013889,0.0,0.013889,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.013889,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.013889,0.0,0.013889,0.0,0.0,0.013889,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.055556,0.0,0.0,0.0,0.013889,0.0,0.0,0.0
1,Adyar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023529,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023529,0.0,0.011765,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.011765,0.0,0.047059,0.011765,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.023529,0.011765,0.0,0.0,0.0,0.0,0.023529,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.011765,0.0,0.0,0.0,0.0,0.0,0.023529,0.011765,0.0,0.0,0.047059,0.188235,0.0,0.0,0.0,0.023529,0.0,0.0,0.0,0.023529,0.0,0.0,0.0,0.0,0.011765,0.0,0.011765,0.0,0.011765,0.011765,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.011765,0.011765,0.0,0.0,0.023529,0.0,0.0,0.0,0.0,0.0,0.0,0.023529,0.023529,0.0,0.0,0.011765,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023529,0.0,0.023529,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.011765,0.0,0.0,0.0
2,Alandur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.020408,0.0,0.020408,0.040816,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.122449,0.0,0.0,0.0,0.020408,0.204082,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.061224,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.061224,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Alapakkam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Alwarpet,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.07,0.0,0.03,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.11,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [35]:
len(chennai_grouped[chennai_grouped["Pizza Place"] > 0])

92

### Create a new DataFrame for Pizza shop data only

In [36]:

chennai_pizzas_shop = chennai_grouped[["Neighborhoods","Pizza Place"]]
chennai_pizzas_shop.head()

Unnamed: 0,Neighborhoods,Pizza Place
0,Adambakkam,0.083333
1,Adyar,0.058824
2,Alandur,0.040816
3,Alapakkam,0.083333
4,Alwarpet,0.0


## 7. Cluster Neighborhoods
Run k-means to cluster the neighborhoods in chennai into 3 clusters

In [37]:
# set number of clusters
kclusters = 3

chennai_clustering = chennai_pizzas_shop.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(chennai_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 1, 0, 0, 1, 1, 0], dtype=int32)

In [52]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
chennai_merged = chennai_pizzas_shop.copy()

# add clustering labels
chennai_merged["Cluster Labels"] = kmeans.labels_
chennai_merged.head()

Unnamed: 0,Neighborhoods,Pizza Place,Cluster Labels
0,Adambakkam,0.083333,0
1,Adyar,0.058824,0
2,Alandur,0.040816,0
3,Alapakkam,0.083333,0
4,Alwarpet,0.0,1


In [39]:
chennai_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
chennai_merged.head()

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels
0,Adambakkam,0.083333,0
1,Adyar,0.058824,0
2,Alandur,0.040816,0
3,Alapakkam,0.083333,0
4,Alwarpet,0.0,1


In [40]:
# merge chennai_grouped with chennai_data to add latitude/longitude for each neighborhood
chennai_merged = chennai_merged.join(chennai_df.set_index("Neighborhood"), on="Neighborhood")

print(chennai_merged.shape)
chennai_merged.head() # check the last columns!

(170, 5)


Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Latitude,Longitude
0,Adambakkam,0.083333,0,12.99192,80.20603
1,Adyar,0.058824,0,13.00305,80.25193
2,Alandur,0.040816,0,13.00013,80.2006
3,Alapakkam,0.083333,0,13.0461,80.16499
4,Alwarpet,0.0,1,13.03471,80.25416


In [53]:

# sort the results by Cluster Labels
print(chennai_merged.shape)
chennai_merged.sort_values(["Cluster Labels"], inplace=True)
chennai_merged.head()

(170, 3)


Unnamed: 0,Neighborhoods,Pizza Place,Cluster Labels
0,Adambakkam,0.083333,0
104,Perambur,0.083333,0
103,Peerkankaranai,0.111111,0
96,Pammal,0.086957,0
94,Pallavaram,0.111111,0


### Finally, let's visualize the resulting clusters

In [42]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chennai_merged['Latitude'], chennai_merged['Longitude'], chennai_merged['Neighborhood'], chennai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 8. Examine Clusters

In [43]:
#Cluster 0
chennai_merged.loc[chennai_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Latitude,Longitude
0,Adambakkam,0.083333,0,12.99192,80.20603
104,Perambur,0.083333,0,13.12247,80.23569
103,Peerkankaranai,0.111111,0,12.91224,80.09895
96,Pammal,0.086957,0,12.96814,80.13359
94,Pallavaram,0.111111,0,12.97444,80.14852
93,Palavanthangal,0.095238,0,12.98681,80.18673
91,Padi,0.045455,0,13.09756,80.18679
89,Nolambur,0.0625,0,13.07626,80.17169
88,Nesapakkam,0.090909,0,13.03521,80.19177
75,Meenambakkam,0.060606,0,12.98646,80.176


In [44]:
#Cluster 1
chennai_merged.loc[chennai_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Latitude,Longitude
159,Vallalar Nagar,0.0,1,13.11731,80.05923
15,Avadi,0.0,1,13.12909,80.10361
126,Sholavaram,0.0,1,13.23577,80.16369
125,Shenoy Nagar,0.03,1,13.07732,80.22498
16,Ayanavaram,0.035714,1,13.09883,80.23238
123,Santhome,0.032787,1,13.02954,80.27762
122,Saidapet,0.032787,1,13.02026,80.22131
121,Sadayankuppam,0.0,1,13.19319,80.29102
120,Royapuram,0.0,1,13.11394,80.2942
119,Royapettah,0.02,1,13.0535,80.26826


In [45]:
#Cluster 2
chennai_merged.loc[chennai_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Latitude,Longitude
167,Vyasarpadi,0.166667,2,13.11778,80.25168
106,Perumbakkam,0.25,2,12.90563,80.20907
57,Kovilambakkam,0.2,2,12.94047,80.18712
149,Tirusulam,0.153846,2,12.96906,80.17712
60,Madambakkam,0.2,2,12.90529,80.15352
22,Chitlapakkam,0.2,2,12.93277,80.14387
77,Mogappair,0.25,2,13.08053,80.16116
78,Moolakadai,0.142857,2,13.12838,80.24086
47,Kelambakkam,0.2,2,12.79341,80.2201
137,Tambaram,0.133333,2,12.92489,80.12818


# Observation

Most of Pizza shops are in Cluster 2 and very low count (close to zero) in Cluster 1 . Also, there are good opportunities to open near Park Town (City central place), Panagal park( Top shopping place in chennai) and Siruseri( IT Zone of chennai) in Cluster 1 . Looking at nearby venues, it seems Cluster 1 might be a good location as there are very less Pizza shops in these areas. Therefore, this project recommends the entrepreneur to open a Pizza shop in these locations with little to no competition. 