# IBM Applied Data Science Capstone Course by Coursera
### Week 5 Final Report

### AUTHOR: SRIRAM
**_Opening a New Shopping Mall in Chennai, Tamil Nadu, India_**
- Build a dataframe of neighborhoods in Chennai, Tamil Nadu, India by web scraping the data from Wikipedia page
- Get the geographical coordinates of the neighborhoods
- Obtain the venue data for the neighborhoods from Foursquare API
- Explore and cluster the neighborhoods
- Select the best cluster to open a new shopping mall
***
### 1. Import libraries

In [1]:
!conda install -c conda-forge beautifulsoup4 --yes

!conda install -c conda-forge geopy --yes

!conda install -c conda-forge geocoder --yes

!conda install -c conda-forge folium=0.5.0 --yes

print('Libraries installed!')

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.2
  latest version: 4.8.3

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /srv/conda/envs/notebook

  added / updated specs:
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    beautifulsoup4-4.9.0       |   py36h9f0ad1d_0         160 KB  conda-forge
    soupsieve-1.9.4            |   py36h9f0ad1d_1          58 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         218 KB

The following NEW packages will be INSTALLED:

  beautifulsoup4     conda-forge/linux-64::beautifulsoup4-4.9.0-py36h9f0ad1d_0
  soupsieve          conda-forge/linux-64::soupsieve-1.9.4-py36h9f0ad1d_1



Downloading and Extracting Packages
soupsiev

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


### 2. Scrap data from Wikipedia page into a DataFrame

In [3]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Suburbs_of_Chennai").text

In [4]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [5]:
# create a list to store neighborhood data
neighborhoodList = []

In [6]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [7]:
# create a new DataFrame from the list
chennai_df = pd.DataFrame({"Neighborhood": neighborhoodList})

chennai_df.head()

Unnamed: 0,Neighborhood
0,Alandur
1,Anna Nagar
2,"Ashok Nagar, Chennai"
3,Assisi Nagar
4,Ayanavaram


In [8]:
# print the number of rows of the dataframe
chennai_df.shape

(61, 1)

### 3. Get the geographical coordinates

In [9]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Chennai, Tamil Nadu, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [10]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in chennai_df["Neighborhood"].tolist() ]

In [11]:
coords

[[13.00013000000007, 80.20060000000007],
 [13.083590000000072, 80.21020000000004],
 [13.035390000000064, 80.21220000000005],
 [13.164570000000026, 80.23274000000004],
 [13.09883000000002, 80.23238000000003],
 [13.074621704974275, 80.24277657646144],
 [12.932770000000062, 80.14387000000005],
 [12.95234000000005, 80.14411000000007],
 [12.988610000000051, 80.15100000000007],
 [12.82725000000005, 80.22866000000005],
 [13.040920000000028, 80.13649000000004],
 [13.11035000000004, 80.21301000000005],
 [13.129720000000077, 80.18300000000005],
 [13.120580000000075, 80.06047000000007],
 [12.956150000000036, 80.17885000000007],
 [12.793410000000051, 80.22010000000006],
 [13.081980000000044, 80.24448000000007],
 [13.051520000000039, 80.22421000000008],
 [13.136630000000025, 80.24479000000008],
 [13.131830000000036, 80.19928000000004],
 [13.096050000000048, 80.05292000000009],
 [13.116800000000069, 80.27726000000007],
 [13.183260000000075, 80.24059000000005],
 [13.157520000000034, 80.24283000000008

In [12]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [13]:
# merge the coordinates into the original dataframe
chennai_df['Latitude'] = df_coords['Latitude']
chennai_df['Longitude'] = df_coords['Longitude']

In [14]:
# check the neighborhoods and the coordinates
print(chennai_df.shape)
chennai_df

(61, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Alandur,13.00013,80.2006
1,Anna Nagar,13.08359,80.2102
2,"Ashok Nagar, Chennai",13.03539,80.2122
3,Assisi Nagar,13.16457,80.23274
4,Ayanavaram,13.09883,80.23238
5,Chennai city,13.074622,80.242777
6,Chitlapakkam,12.93277,80.14387
7,Chromepet,12.95234,80.14411
8,Cowl Bazaar,12.98861,80.151
9,Egattur (Kanchipuram District),12.82725,80.22866


In [15]:
# save the DataFrame as CSV file
chennai_df.to_csv("chennai_df.csv", index=False)

### 4. Create a map of Chennai with neighborhoods superimposed on top

In [16]:
# get the coordinates of Chennai, Tamil Nadu, India
address = 'Chennai, Tamil Nadu, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Chennai, Tamil Nadu, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Chennai, Tamil Nadu, India 13.0801721, 80.2838331.


In [17]:
# create map of Chennai using latitude and longitude values
map_chennai = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(chennai_df['Latitude'], chennai_df['Longitude'], chennai_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_chennai)  
    
map_chennai

In [18]:
# save the map as HTML file
map_chennai.save('map_chennai.html')

### 5. Use the Foursquare API to explore the neighborhoods

In [43]:
# HIDDEN - (ENTER YOUR API CREDENTIALS) -SRIRAM


**Now, let's get the top 100 venues that are within a radius of 2000 meters.**

In [20]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(chennai_df['Latitude'], chennai_df['Longitude'], chennai_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [21]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1216, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Alandur,13.00013,80.2006,Sukkkubai Beef Biryani Shop,12.998769,80.201381,Indian Restaurant
1,Alandur,13.00013,80.2006,Pizza Republic,12.990987,80.198613,Pizza Place
2,Alandur,13.00013,80.2006,Moon & Six Pence - The Irish Bar,13.007848,80.208152,Bar
3,Alandur,13.00013,80.2006,Q Bar,13.016606,80.204853,Restaurant
4,Alandur,13.00013,80.2006,Hilton,13.016621,80.204787,Hotel


**Let's check how many venues were returned for each neighorhood**

In [22]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alandur,42,42,42,42,42,42
Anna Nagar,100,100,100,100,100,100
"Ashok Nagar, Chennai",74,74,74,74,74,74
Assisi Nagar,3,3,3,3,3,3
Ayanavaram,27,27,27,27,27,27
Chennai city,100,100,100,100,100,100
Chitlapakkam,14,14,14,14,14,14
Chromepet,21,21,21,21,21,21
Cowl Bazaar,19,19,19,19,19,19
Egattur (Kanchipuram District),17,17,17,17,17,17


**Let's find out how many unique categories can be curated from all the returned venues**

In [23]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 154 uniques categories.


In [24]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Indian Restaurant', 'Pizza Place', 'Bar', 'Restaurant', 'Hotel',
       'Church', 'Train Station', 'Asian Restaurant', 'Ice Cream Shop',
       'Donut Shop', 'South Indian Restaurant', "Men's Store",
       'Breakfast Spot', 'Bakery', 'Italian Restaurant',
       'Multicuisine Indian Restaurant', 'Pool Hall',
       'Japanese Restaurant', 'Café', 'Juice Bar', 'Fast Food Restaurant',
       'Department Store', 'Metro Station', 'Cafeteria', 'Gym Pool',
       'Sandwich Place', 'Gym', 'Park', 'Chinese Restaurant',
       'Snack Place', 'Coffee Shop', 'Vegetarian / Vegan Restaurant',
       'Shoe Store', 'Indian Sweet Shop', 'American Restaurant',
       'Burger Joint', 'Middle Eastern Restaurant', 'Shopping Mall',
       'Market', 'BBQ Joint', 'New American Restaurant', 'Clothing Store',
       'Paper / Office Supplies Store', 'Jewelry Store', 'Multiplex',
       'Furniture / Home Store', 'Farmers Market', 'Bookstore', 'Bistro',
       'Electronics Store'], dtype=object)

In [25]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

False

### 6. Analyze Each Neighborhood

In [26]:
# one hot encoding
chennai_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
chennai_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [chennai_onehot.columns[-1]] + list(chennai_onehot.columns[:-1])
chennai_onehot = chennai_onehot[fixed_columns]

print(chennai_onehot.shape)
chennai_onehot.head()

(1216, 155)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Afghan Restaurant,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Andhra Restaurant,Antique Shop,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,Badminton Court,Bakery,Bar,Beach,Bed & Breakfast,Big Box Store,Bistro,Bookstore,Boutique,Breakfast Spot,Buffet,Burger Joint,Bus Line,Bus Station,Cafeteria,Café,Campground,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Coffee Shop,College Cafeteria,Concert Hall,Convenience Store,Coworking Space,Daycare,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden Center,Gas Station,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym Pool,Historic Site,Hookah Bar,Hospital,Hotel,Hyderabadi Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Kerala Restaurant,Korean Restaurant,Lake,Light Rail Station,Lounge,Malay Restaurant,Market,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Motel,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,Music Store,Music Venue,National Park,New American Restaurant,Nightclub,Paper / Office Supplies Store,Park,Pharmacy,Pizza Place,Platform,Playground,Pool,Pool Hall,Pub,Racetrack,Recreation Center,Resort,Rest Area,Restaurant,River,Road,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Smoke Shop,Snack Place,Soccer Stadium,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Club,Supermarket,Taxi Stand,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Train,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Video Store,Warehouse Store,Whisky Bar,Women's Store,Zoo,Zoo Exhibit
0,Alandur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Alandur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Alandur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Alandur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Alandur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


**Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [27]:
chennai_grouped = chennai_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(chennai_grouped.shape)
chennai_grouped

(61, 155)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Afghan Restaurant,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Andhra Restaurant,Antique Shop,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,Badminton Court,Bakery,Bar,Beach,Bed & Breakfast,Big Box Store,Bistro,Bookstore,Boutique,Breakfast Spot,Buffet,Burger Joint,Bus Line,Bus Station,Cafeteria,Café,Campground,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Coffee Shop,College Cafeteria,Concert Hall,Convenience Store,Coworking Space,Daycare,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden Center,Gas Station,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym Pool,Historic Site,Hookah Bar,Hospital,Hotel,Hyderabadi Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Kerala Restaurant,Korean Restaurant,Lake,Light Rail Station,Lounge,Malay Restaurant,Market,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Motel,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,Music Store,Music Venue,National Park,New American Restaurant,Nightclub,Paper / Office Supplies Store,Park,Pharmacy,Pizza Place,Platform,Playground,Pool,Pool Hall,Pub,Racetrack,Recreation Center,Resort,Rest Area,Restaurant,River,Road,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Smoke Shop,Snack Place,Soccer Stadium,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Club,Supermarket,Taxi Stand,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Train,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Video Store,Warehouse Store,Whisky Bar,Women's Store,Zoo,Zoo Exhibit
0,Alandur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.02381,0.047619,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.142857,0.0,0.0,0.02381,0.166667,0.0,0.0,0.0,0.02381,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Anna Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.04,0.0,0.03,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.01,0.12,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.13,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0
2,"Ashok Nagar, Chennai",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.013514,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.013514,0.0,0.0,0.040541,0.013514,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.013514,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.040541,0.175676,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.027027,0.0,0.013514,0.040541,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.067568,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.013514,0.0,0.0,0.027027,0.0,0.040541,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.013514,0.0,0.040541,0.0,0.0,0.0,0.0,0.0,0.0
3,Assisi Nagar,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Ayanavaram,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.148148,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0
5,Chennai city,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.08,0.0,0.04,0.01,0.0,0.01,0.03,0.0,0.01,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.01,0.01,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.05,0.19,0.0,0.0,0.0,0.05,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
6,Chitlapakkam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.214286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.214286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Chromepet,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.238095,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Cowl Bazaar,0.0,0.0,0.0,0.052632,0.105263,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Egattur (Kanchipuram District),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.176471,0.0,0.0,0.0,0.352941,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [30]:
len(chennai_grouped[chennai_grouped["Shopping Mall"] > 0])

6

**Create a new DataFrame for Shopping Mall data only**

In [31]:
chennai_mall = chennai_grouped[["Neighborhoods","Shopping Mall"]]

In [32]:
chennai_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Alandur,0.0
1,Anna Nagar,0.02
2,"Ashok Nagar, Chennai",0.013514
3,Assisi Nagar,0.0
4,Ayanavaram,0.0


### 7. Cluster Neighborhoods
Run k-means to cluster the neighborhoods in Chennai into 3 clusters.

In [33]:
# set number of clusters
kclusters = 3

chennai_clustering = chennai_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(chennai_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 2, 0, 0, 2, 0, 0, 0, 0], dtype=int32)

In [34]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
chennai_merged = chennai_mall.copy()

# add clustering labels
chennai_merged["Cluster Labels"] = kmeans.labels_

In [35]:
chennai_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
chennai_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Alandur,0.0,0
1,Anna Nagar,0.02,1
2,"Ashok Nagar, Chennai",0.013514,2
3,Assisi Nagar,0.0,0
4,Ayanavaram,0.0,0


In [36]:
# merge chennai_grouped with chennai_data to add latitude/longitude for each neighborhood
chennai_merged = chennai_merged.join(chennai_df.set_index("Neighborhood"), on="Neighborhood")

print(chennai_merged.shape)
chennai_merged.head() # check the last columns!

(61, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Alandur,0.0,0,13.00013,80.2006
1,Anna Nagar,0.02,1,13.08359,80.2102
2,"Ashok Nagar, Chennai",0.013514,2,13.03539,80.2122
3,Assisi Nagar,0.0,0,13.16457,80.23274
4,Ayanavaram,0.0,0,13.09883,80.23238


In [37]:
# sort the results by Cluster Labels
print(chennai_merged.shape)
chennai_merged.sort_values(["Cluster Labels"], inplace=True)
chennai_merged

(61, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Alandur,0.0,0,13.00013,80.2006
31,Navalur,0.0,0,12.84584,80.22648
32,Nazarethpettai,0.0,0,13.0371,80.05755
33,Oragadam,0.0,0,13.13744,80.15383
34,Padappai,0.0,0,12.876997,80.048508
35,Pallavaram,0.0,0,12.97444,80.14852
36,Pallikaranai,0.0,0,12.95567,80.2208
37,Pammal,0.0,0,12.96814,80.13359
38,Panambakkam,0.0,0,13.07761,80.15583
39,Pattabiram,0.0,0,13.12333,80.05944


**Finally, let's visualize the resulting clusters**

In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chennai_merged['Latitude'], chennai_merged['Longitude'], chennai_merged['Neighborhood'], chennai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [39]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

### 8. Examine Clusters

#### Cluster 0

In [40]:
chennai_merged.loc[chennai_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Alandur,0.0,0,13.00013,80.2006
31,Navalur,0.0,0,12.84584,80.22648
32,Nazarethpettai,0.0,0,13.0371,80.05755
33,Oragadam,0.0,0,13.13744,80.15383
34,Padappai,0.0,0,12.876997,80.048508
35,Pallavaram,0.0,0,12.97444,80.14852
36,Pallikaranai,0.0,0,12.95567,80.2208
37,Pammal,0.0,0,12.96814,80.13359
38,Panambakkam,0.0,0,13.07761,80.15583
39,Pattabiram,0.0,0,13.12333,80.05944


#### Cluster 1

In [41]:
chennai_merged.loc[chennai_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
17,Kodambakkam,0.02,1,13.05152,80.22421
1,Anna Nagar,0.02,1,13.08359,80.2102
57,Vadapalani,0.015873,1,13.05226,80.2112
58,Virugambakkam,0.018868,1,13.0559,80.19349


#### Cluster 2

In [42]:
chennai_merged.loc[chennai_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
5,Chennai city,0.01,2,13.074622,80.242777
2,"Ashok Nagar, Chennai",0.013514,2,13.03539,80.2122


### Final Observations:

Most of the shopping malls are concentrated in the central area of Chennai city, with the highest number in cluster 1 and moderate number in cluster 2. On the other hand, cluster 0 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. 
Meanwhile, shopping malls in cluster 1 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. 
Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 0 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 2 with moderate competition. 
Lastly, property developers are advised to avoid neighborhoods in cluster 1 which already have high concentration of shopping malls and suffering from intense competition.

P.S. The data about the count of the shopping malls is quite old. Hence, certain recently constructed malls may not be counted. But, the algorithm is working perfectly fine!

Thank You!!