# IBM Applied Data Science Capstone Course by Coursera
### Week 5 Final Report
**_Opening a New Shopping Mall in visakhapatnam, india_**
- Build a dataframe of neighborhoods in visakhapatnam, india by web scraping the data from Wikipedia page
- Get the geographical coordinates of the neighborhoods
- Obtain the venue data for the neighborhoods from Foursquare API
- Explore and cluster the neighborhoods
- Select the best cluster to open a new shopping mall
***
### 1. Import libraries

In [55]:
pip install geocoder

Note: you may need to restart the kernel to use updated packages.


In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


### 2. Scrap data from Wikipedia page into a DataFrame

In [2]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Visakhapatnam").text

In [3]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [4]:
# create a list to store neighborhood data
neighborhoodList = []

In [5]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [6]:
# create a new DataFrame from the list
vizag= pd.DataFrame({"Neighborhood": neighborhoodList})

vizag.head()

Unnamed: 0,Neighborhood
0,Abidnagar
1,Adarsh Nagar
2,Adavivaram
3,Aganampudi
4,Akkayyapalem


In [7]:
# print the number of rows of the dataframe
vizag.shape

(124, 1)

### 3. Get the geographical coordinates

In [8]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Visakha Patnam, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [10]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in vizag["Neighborhood"].tolist() ]

In [11]:
coords

[[17.73786000000007, 83.29888000000005],
 [17.763910000000067, 83.33169000000004],
 [17.785830000000033, 83.25242000000009],
 [17.689040000000034, 83.13988000000006],
 [17.734210000000076, 83.29713000000004],
 [17.708720000000028, 83.20904000000007],
 [17.720270000000028, 83.29758000000004],
 [17.68984000000006, 83.00175000000007],
 [17.877720000000068, 83.30459000000008],
 [17.596290000000067, 83.20243000000005],
 [17.768430000000023, 83.31107000000003],
 [17.72276000000005, 83.31078000000008],
 [17.565500000000043, 82.98174000000006],
 [17.743340000000046, 83.31052000000005],
 [17.812513400301874, 83.40788937588022],
 [17.889350000000036, 83.45037000000008],
 [17.70595000000003, 83.19796000000008],
 [17.72477000845818, 83.30994999190324],
 [17.693350000000066, 83.29211000000004],
 [17.681190000000072, 83.19786000000005],
 [17.719840000000033, 83.26278000000008],
 [17.726720000000057, 83.33061000000004],
 [17.80147000000005, 83.22367000000008],
 [17.873140000000035, 82.18573000000004]

In [12]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [20]:
# merge the coordinates into the original dataframe
vizag['Latitude'] = df_coords['Latitude']
vizag['Longitude'] = df_coords['Longitude']

In [21]:
# check the neighborhoods and the coordinates
print(vizag.shape)
vizag

(124, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Abidnagar,17.73786,83.29888
1,Adarsh Nagar,17.76391,83.33169
2,Adavivaram,17.78583,83.25242
3,Aganampudi,17.68904,83.13988
4,Akkayyapalem,17.73421,83.29713
5,Akkireddypalem,17.70872,83.20904
6,Allipuram,17.72027,83.29758
7,Anakapalle,17.68984,83.00175
8,Anandapuram,17.87772,83.30459
9,Appikonda,17.59629,83.20243


In [22]:
# save the DataFrame as CSV file
vizag.to_csv("vizag.csv", index=False)

### 4. Create a map of Visakhapatnam with neighborhoods superimposed on top

In [56]:
# get the coordinates of viskahapatnam
address = 'VisakhaPatnam, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Visakha Patnam, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Visakha Patnam, India 17.7231276, 83.3012842.


In [57]:
# create map of viskahapatnam using latitude and longitude values
map_vizag = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(vizag['Latitude'], vizag['Longitude'], vizag['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_kl)  
    
map_kl

In [58]:
# save the map as HTML file
map_vizag.save('map_vizag.html')

### 5. Use the Foursquare API to explore the neighborhoods

In [59]:
# define Foursquare Credentials and Version
CLIENT_ID = '***********************' # your Foursquare ID
CLIENT_SECRET = '************************' # your Foursquare Secret, i starred my credential details cause i dnt want publish
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ***********************
CLIENT_SECRET:************************


**Now, let's get the top 100 venues that are within a radius of 2000 meters.**

In [29]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(vizag['Latitude'], vizag['Longitude'], vizag['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [30]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2515, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abidnagar,17.73786,83.29888,Pizza Hut,17.72665,83.305531,Pizza Place
1,Abidnagar,17.73786,83.29888,Sai Ram Parlour,17.726339,83.303465,Indian Restaurant
2,Abidnagar,17.73786,83.29888,Sangam Sarat Theatre,17.725508,83.302463,Indie Movie Theater
3,Abidnagar,17.73786,83.29888,Shoppers Stop,17.729061,83.314433,Fabric Shop
4,Abidnagar,17.73786,83.29888,Deepak Punjabi Dhaba,17.723782,83.309922,Indian Restaurant


**Let's check how many venues were returned for each neighorhood**

In [31]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abidnagar,19,19,19,19,19,19
Adarsh Nagar,7,7,7,7,7,7
Adavivaram,1,1,1,1,1,1
Aganampudi,7,7,7,7,7,7
Akkayyapalem,19,19,19,19,19,19
Akkireddypalem,2,2,2,2,2,2
Allipuram,40,40,40,40,40,40
Anakapalle,5,5,5,5,5,5
Anandapuram,1,1,1,1,1,1
Arilova,3,3,3,3,3,3


**Let's find out how many unique categories can be curated from all the returned venues**

In [32]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 107 uniques categories.


In [33]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Pizza Place', 'Indian Restaurant', 'Indie Movie Theater',
       'Fabric Shop', 'Café', 'Shopping Mall',
       'Vegetarian / Vegan Restaurant', 'Multiplex', 'Park', 'Hotel',
       'Cricket Ground', 'Platform', 'Stadium', 'Volleyball Court',
       'Bakery', 'History Museum', 'Moving Target', 'Bus Station',
       'Mountain', 'Historic Site', 'Beach', 'ATM', 'Train Station',
       'Bookstore', 'Dessert Shop', 'Drive-in Theater', 'Ice Cream Shop',
       'Italian Restaurant', 'Clothing Store', 'Fast Food Restaurant',
       'Restaurant', 'Mobile Phone Shop', 'Pet Store', 'Movie Theater',
       'Lake', 'Pharmacy', 'Golf Course', 'Snack Place',
       'Multicuisine Indian Restaurant', 'Breakfast Spot', 'Food Court',
       'Steakhouse', 'Coffee Shop', 'Juice Bar', 'Department Store',
       'Sandwich Place', 'Paper / Office Supplies Store', 'Spa',
       'Andhra Restaurant', 'Dhaba'], dtype=object)

In [60]:
# check if the results contain "Shopping Mall"
"Shopping Mall" in venues_df['VenueCategory'].unique()

True

### 6. Analyze Each Neighborhood

In [38]:
# one hot encoding
vizag_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vizag_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [vizag_onehot.columns[-1]] + list(kl_onehot.columns[:-1])
kl_onehot = vizag_onehot[fixed_columns]

print(vizag_onehot.shape)
vizag_onehot.head()

(2515, 108)


Unnamed: 0,ATM,Airport,American Restaurant,Andhra Restaurant,Antique Shop,Asian Restaurant,Bakery,Bar,Beach,Boat or Ferry,Bookstore,Breakfast Spot,Bus Station,Business Service,Cafeteria,Café,Campground,Candy Store,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cricket Ground,Department Store,Dessert Shop,Dhaba,Diner,Drive-in Theater,Electronics Store,Fabric Shop,Farmers Market,Fast Food Restaurant,Fish Market,Food,Food Court,Food Stand,Food Truck,Garden Center,Gastropub,Gift Shop,Go Kart Track,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Health Food Store,Historic Site,History Museum,Hockey Arena,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Inn,Italian Restaurant,Juice Bar,Lake,Light Rail Station,Lounge,Market,Mattress Store,Mobile Phone Shop,Motel,Mountain,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Service,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Port,Pub,Racetrack,Resort,Rest Area,Restaurant,River,Salad Place,Sandwich Place,Scenic Lookout,Science Museum,Shopping Mall,Smoke Shop,Snack Place,Spa,Stadium,Steakhouse,Supermarket,Tea Room,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Volleyball Court,Women's Store,Neighborhoods
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Abidnagar
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Abidnagar
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Abidnagar
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Abidnagar
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Abidnagar


**Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [39]:
vizag_grouped = vizag_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(vizag_grouped.shape)
vizag_grouped

(111, 108)


Unnamed: 0,Neighborhoods,ATM,Airport,American Restaurant,Andhra Restaurant,Antique Shop,Asian Restaurant,Bakery,Bar,Beach,Boat or Ferry,Bookstore,Breakfast Spot,Bus Station,Business Service,Cafeteria,Café,Campground,Candy Store,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cricket Ground,Department Store,Dessert Shop,Dhaba,Diner,Drive-in Theater,Electronics Store,Fabric Shop,Farmers Market,Fast Food Restaurant,Fish Market,Food,Food Court,Food Stand,Food Truck,Garden Center,Gastropub,Gift Shop,Go Kart Track,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Health Food Store,Historic Site,History Museum,Hockey Arena,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Inn,Italian Restaurant,Juice Bar,Lake,Light Rail Station,Lounge,Market,Mattress Store,Mobile Phone Shop,Motel,Mountain,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Service,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Port,Pub,Racetrack,Resort,Rest Area,Restaurant,River,Salad Place,Sandwich Place,Scenic Lookout,Science Museum,Shopping Mall,Smoke Shop,Snack Place,Spa,Stadium,Steakhouse,Supermarket,Tea Room,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Volleyball Court,Women's Store
0,Abidnagar,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.157895,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0
1,Adarsh Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Adavivaram,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aganampudi,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.428571,0.0,0.0,0.0,0.0
4,Akkayyapalem,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.157895,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0
5,Akkireddypalem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Allipuram,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.175,0.0,0.025,0.125,0.1,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.075,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.05,0.075,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.025,0.0,0.0,0.0
7,Anakapalle,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0
8,Anandapuram,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Arilova,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [40]:
len(vizag_grouped[vizag_grouped["Shopping Mall"] > 0])

41

**Create a new DataFrame for Shopping Mall data only**

In [41]:
vizag_mall = vizag_grouped[["Neighborhoods","Shopping Mall"]]

In [42]:
vizag_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Abidnagar,0.052632
1,Adarsh Nagar,0.0
2,Adavivaram,0.0
3,Aganampudi,0.0
4,Akkayyapalem,0.052632


### 7. Cluster Neighborhoods
Run k-means to cluster the neighborhoods in Kuala Lumpur into 3 clusters.

In [45]:
# set number of clusters
kclusters = 3

vizag_clustering = vizag_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(vizag_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 1, 1, 0, 1, 0, 1, 1, 1])

In [46]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
vizag_merged = vizag_mall.copy()

# add clustering labels
vizag_merged["Cluster Labels"] = kmeans.labels_

In [47]:
vizag_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
vizag_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Abidnagar,0.052632,0
1,Adarsh Nagar,0.0,1
2,Adavivaram,0.0,1
3,Aganampudi,0.0,1
4,Akkayyapalem,0.052632,0


In [48]:
# merge vizag_grouped with vizag to add latitude/longitude for each neighborhood
vizag_merged = vizag_merged.join(vizag.set_index("Neighborhood"), on="Neighborhood")

print(vizag_merged.shape)
vizag_merged.head() # check the last columns!

(111, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Abidnagar,0.052632,0,17.73786,83.29888
1,Adarsh Nagar,0.0,1,17.76391,83.33169
2,Adavivaram,0.0,1,17.78583,83.25242
3,Aganampudi,0.0,1,17.68904,83.13988
4,Akkayyapalem,0.052632,0,17.73421,83.29713


In [49]:
# sort the results by Cluster Labels
print(vizag_merged.shape)
vizag_merged.sort_values(["Cluster Labels"], inplace=True)
vizag_merged

(111, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Abidnagar,0.052632,0,17.73786,83.29888
23,Daspalla Hills,0.035714,0,17.718191,83.317069
73,Poorna Market,0.029412,0,17.70683,83.29814
25,Dondaparthy,0.051282,0,17.72661,83.29744
92,"Siripuram, Visakhapatnam",0.035294,0,17.72121,83.31686
27,Dwaraka Nagar,0.073171,0,17.73579,83.30378
91,Shivaji Palem,0.035714,0,17.73761,83.32577
89,Seethammapeta,0.056604,0,17.73429,83.31058
88,Seethammadhara,0.058824,0,17.74067,83.31072
79,"Ramnagar, Visakhapatnam",0.05,0,17.72119,83.30907


**Finally, let's visualize the resulting clusters**

In [50]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(vizag_merged['Latitude'], vizag_merged['Longitude'], vizag_merged['Neighborhood'], vizag_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [51]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

### 8. Examine Clusters

#### Cluster 0

In [52]:
vizag_merged.loc[vizag_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Abidnagar,0.052632,0,17.73786,83.29888
23,Daspalla Hills,0.035714,0,17.718191,83.317069
73,Poorna Market,0.029412,0,17.70683,83.29814
25,Dondaparthy,0.051282,0,17.72661,83.29744
92,"Siripuram, Visakhapatnam",0.035294,0,17.72121,83.31686
27,Dwaraka Nagar,0.073171,0,17.73579,83.30378
91,Shivaji Palem,0.035714,0,17.73761,83.32577
89,Seethammapeta,0.056604,0,17.73429,83.31058
88,Seethammadhara,0.058824,0,17.74067,83.31072
79,"Ramnagar, Visakhapatnam",0.05,0,17.72119,83.30907


#### Cluster 1

In [53]:
vizag_merged.loc[vizag_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
69,Peda Waltair,0.018519,1,17.731494,83.334313
67,One Town (Visakhapatnam),0.0,1,17.71984,83.26278
74,Pothinamallayya Palem,0.0,1,17.80187,83.35312
70,Pedagantyada,0.0,1,17.6668,83.21039
66,Nidigattu,0.0,1,17.87796,83.37371
75,Prahaladapuram,0.0,1,17.76064,83.22227
65,Nathayyapalem,0.0,1,17.71099,83.20239
71,Pendurthi,0.0,1,17.8199,83.20574
87,"Scindia, Visakhapatnam",0.0,1,17.69106,83.26918
80,Ravindra Nagar,0.0,1,17.7656,83.32455


#### Cluster 2

In [54]:
vizag_merged.loc[vizag_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
90,Sheela Nagar,0.166667,2,17.71927,83.19642
58,Mulagada,0.2,2,17.69859,83.22464


#### Observations:
VisakhaPatnam is fast developing  and growing city, Most of the shopping malls are concentrated in the central area of VisakhaPatnam city, with the highest number in cluster 2 and moderate number in cluster 0. On the other hand, cluster 1 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 1 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 0 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of shopping malls and suffering from intense competition.