# The Battle of the Neighborhoods: Visakhapatnam City, INDIA
## Capstone Project (Week-2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Import Libraries](#import_libs)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

Whenever people move to any other place, they explore the place and try to fetch as much information as possible about it. It can be the neighborhood, locality, market, price of the place and many more factors including neighborhood analysis. This can be termed as a request for a search algorithm which usually returns the requested features such as population rate, median house price, school ratings, crime rates, weather conditions, recreational facilities etc. It would be beneficial and nice to have an application which could be made easy by considering a comparative analysis between the neighborhoods with provided factors.

This project helps the end user or the stakeholder to achieve the results which will not only be recommended but also saves a lot of time in manual search. This will indeed save the time and money of the user.

This project can be used by the user at the time of rental apartment or buy house in a locality based on the distribution of various facilities available around the neighborhood. As an example, this project would compare 2 randomly picked neighborhoods and analyse the top 10 most common venues in each of those two neighborhoods based on the number of visits by people in each of those places. Also, this project uses K-mean clustering unsupervised machine learning algorithm to cluster the venues based on the place category such as restaurants, park, coffee shop, gym, clubs etc. This would give a better understanding of the similarities and dissimilarities between the two chosen  neighborhoods to retrieve more insights and to conclude with ease which neighborhood
wins over others.

## Data <a name="data"></a>

Foursquare API:
This API has a database of more than 105 million places. This project would use Four-square API as its prime data gathering source. Many organizations are using to geo-tag their photos with detailed info about a destination, while also serving up contextually relevant locations for those who are searching for a place to eat, drink or explore. This API provides the ability to perform location search, location sharing and details about a business. Foursquare users can also use photos, tips and reviews in many productive ways to add value to the results.

Visakhapatnam City Data :
To get data about Boroughs, PinCode, and Neighbourhood we will scrape this data from a webpage ( https://www.indiatvnews.com/pincode/andhra-pradesh/visakhapatnam ) to create our own dataset.

## Methodology <a name="methodology"></a>

HTTP requests would be made to this Foursquare API server using pin codes of the Visakhapatnam city neighborhoods to pull the location information (Latitude and Longitude).
Foursquare API search feature would be enabled to collect the nearby places of the neighborhoods.
Due to http request limitations the number of places per neighborhood parameter would reasonably be set to 100 and the radius parameter would be set to 700.
Folium- Python visualization library would be used to visualize the neighborhoods cluster distribution of Visakhapatnam city over an interactive leaflet map.
Extensive comparative analysis of two randomly picked neighborhoods world be carried out to derive the desirable insights from the outcomes using python’s scientific libraries Pandas, NumPy and Scikit-learn.
Unsupervised machine learning algorithm K-mean clustering would be applied to form the clusters of different categories of places residing in and around the neighborhoods. These clusters from each of those two chosen neighborhoods would be analyzed individually collectively and comparatively to derive the conclusions.

## Importing Libraries <a name="import_libs"></a>

In [1]:
# installing dependencies
!pip install beautifulsoup4
!pip install geopy
!pip install folium

# importing dependencies
import requests
from bs4 import BeautifulSoup
import pandas as pd
import folium
import numpy as np

# import k-means from clustering stage
from sklearn.cluster import KMeans

import matplotlib.cm as cm
import matplotlib.colors as colors



## Analysis <a name="analysis"></a>

### Scraping Data from the Web Page


In [2]:
#scraping data required from webpage
page = requests.get("https://www.indiatvnews.com/pincode/andhra-pradesh/visakhapatnam")
soup = BeautifulSoup(page.content, 'html.parser')
table = soup.find('table', class_='alt')
table_rows = table.find_all('tr')


In [3]:
#Converting scraped data into dataframe
# this array will hold the table data
temp = []

# adding invidual subarrays for each table array
for tr in table_rows:
    td = tr.find_all('td')
    row = [d.text.strip() for d in td]
    
    if row and row[1] != "NA":
        temp.append(row)



__Converting array with stored data into dataframe__

In [4]:
# creating dataframe out of mentioned array
df = pd.DataFrame(data=temp, columns=['Neighbourhood', 'Borough', 'District', 'State', 'Pincode'])
df = df.drop(['District', 'State'], axis=1)
df = df.iloc[1:]
print(df.shape)
print(df)

(695, 3)
                   Neighbourhood                Borough Pincode
1                   A Kothapalle             Chodavaram  531022
2               A Veeranarayanam             Chodavaram  531027
3                        Adakula            Narsipatnam  531115
4                      Addumanda             Chodavaram  531077
5                         Adduru             Anakapalle  531035
6                   Aduguluputtu             Chodavaram  531040
7                       Alamanda             Chodavaram  531030
8                   Amal College             Anakapalle  531002
9                     Amalapuram            Narsipatnam  531117
10              Ameena Sahebpeta             Anakapalle  531055
11                   Ammulapalem             Anakapalle  531002
12                  Amruthpuram.             Anakapalle  531035
13                    Anakapalle             Anakapalle  531001
14                   Anandapuram             Chodavaram  531022
15                     Annavara

### Data Pre-Processing

In [5]:
listof_Borough = ['Visakhapatnam Rural','Visakhapatnam.','Anandapuram','Gajuwaka','Viskhapatnam','Pendurthi','Bheemunipatnam','Visakhapatnam (Rural)','Visakhapatnam','Visakhapatnam (Urban)','Pedagantyada']
list_vsp= ['Visakhapatnam.','Viskhapatnam','Visakhapatnam','Visakhapatnam (Urban)']
df = df[df['Borough'].isin(['Visakhapatnam Rural','Visakhapatnam.','Anandapuram','Gajuwaka','Viskhapatnam','Pendurthi','Bheemunipatnam','Visakhapatnam (Rural)','Visakhapatnam','Visakhapatnam (Urban)','Pedagantyada'])]

In [6]:
df['Borough'] = df['Borough'].replace(['Visakhapatnam', 'Visakhapatnam.', 'Viskhapatnam', 'Visakhapatnam (Urban)'], 'Visakhapatnam Urban')

In [7]:
df['Borough'] = df['Borough'].replace(['Visakhapatnam Rural ', 'Visakhapatnam (Rural)'], 'Visakhapatnam Rural')

In [8]:
df['Borough'].value_counts()

Visakhapatnam Urban    114
Pedagantyada            16
Visakhapatnam Rural     11
Bheemunipatnam           7
Pendurthi                5
Gajuwaka                 1
Anandapuram              1
Name: Borough, dtype: int64

In [9]:
print("Shape of dataframe before removing duplicates")
print(df.shape)

print("Shape of dataframe before removing duplicates")
df = df.drop_duplicates(subset="Pincode")
print(df.shape)


Shape of dataframe before removing duplicates
(155, 3)
Shape of dataframe before removing duplicates
(42, 3)


In [10]:
# setting pincode as index
df = df.set_index('Pincode')

print("Below show dataframe will be used for further research")
print(df.head())

# adding column for latitude and longitude
df["Latitude"] = "null"
df["Longitude"] = "null"

df_location = pd.DataFrame()

Below show dataframe will be used for further research
            Neighbourhood              Borough
Pincode                                       
530003   A U Engg College  Visakhapatnam Urban
530053         Aganampudi             Gajuwaka
530016       Akkayyapalem  Visakhapatnam Urban
531162             Amanam  Visakhapatnam Urban
531219       Ananthavaram  Visakhapatnam Urban


### Getting Geo-Coordinates for Rapidapi.com 

__Latitude and Longitude values for all the Pincodes and Neighbourhoods__

In [11]:
def fetchLatLng(postal_code, Neighbourhood):
    #init variable to none
    lat_lng = None
    url = "https://trueway-geocoding.p.rapidapi.com/Geocode"
    address = '{}, {}'.format(Neighbourhood, postal_code)
    
    querystring = {"language":"en","country":"IN","address":address }

    headers = {
        'x-rapidapi-host': "trueway-geocoding.p.rapidapi.com",
        'x-rapidapi-key': "0ab78acfb8mshd9509620b75bcd0p16edeejsn5075f9748ecc"
        }
    while(lat_lng is None):
        response = requests.request("GET", url, headers=headers, params=querystring)
        result = response.json()
        ans = result['results']
        ans1 = ans[0]
        lat_lng = ans1['location']
        lat = lat_lng['lat']
        lon = lat_lng['lng']
        
    df.loc[postal_code, 'Latitude'] = lat
    df.loc[postal_code, 'Longitude'] = lon
    print('Latitude: {} & longitude: {}'.format(lat, lon))
    return

In [48]:
for index, row in df.iterrows():
     fetchLatLng(index, row['Neighbourhood'])

Latitude: 17.728796 & longitude: 83.32006
Latitude: 17.686458 & longitude: 83.137761
Latitude: 17.735264 & longitude: 83.299945
Latitude: 17.95882 & longitude: 83.461684
Latitude: 17.967477 & longitude: 83.293723
Latitude: 17.766662 & longitude: 83.313473
Latitude: 17.699139 & longitude: 83.19169
Latitude: 17.656005 & longitude: 83.197277
Latitude: 17.901088 & longitude: 83.410967
Latitude: 17.802383 & longitude: 83.204811
Latitude: 17.784847 & longitude: 83.199743
Latitude: 17.709179 & longitude: 83.308139
Latitude: 17.716129 & longitude: 83.300084
Latitude: 17.753749 & longitude: 83.346186
Latitude: 17.567015 & longitude: 83.116667
Latitude: 17.764484 & longitude: 83.218805
Latitude: 17.697024 & longitude: 83.146614
Latitude: 17.699766 & longitude: 83.296101
Latitude: 17.685077 & longitude: 83.210427
Latitude: 17.681529 & longitude: 83.261769
Latitude: 17.820534 & longitude: 83.342368
Latitude: 17.781622 & longitude: 83.377482
Latitude: 17.720747 & longitude: 83.284695
Latitude: 17.7

In [49]:
df.sort_index()

Unnamed: 0_level_0,Neighbourhood,Borough,Latitude,Longitude
Pincode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
530001,Fortward,Visakhapatnam Urban,17.6998,83.2961
530002,D.C. Buildings,Visakhapatnam Urban,17.7092,83.3081
530003,A U Engg College,Visakhapatnam Urban,17.7288,83.3201
530004,Gnanapuram,Visakhapatnam Urban,17.7207,83.2847
530005,Gandhigram Visakhapatnam,Pedagantyada,17.6815,83.2618
530007,Industrial Estate Visakhapatnam,Visakhapatnam Urban,17.7447,83.2686
530008,IRSD Area,Visakhapatnam Urban,17.7253,83.2637
530009,Marripalem Vuda Colony,Visakhapatnam Urban,17.7459,83.2446
530011,Malkapuram,Visakhapatnam Urban,17.6855,83.2433
530012,Autonagar,Visakhapatnam Urban,17.6991,83.1917


### Create a map of Visakhapatnam with neighborhoods superimposed on top

In [50]:
vsp_latitude = "17.6868"
vsp_longitude = "83.2185"

map_vsp= folium. Map(location=[vsp_latitude, vsp_longitude], zoom_start=12)

for lat, lng, borough, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}'.format(neighbourhood)
    label= folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=True
    ).add_to(map_vsp)

map_vsp

### Use the Foursquare API to explore the neighborhoods

In [51]:
# define Foursquare Credentials and Version
CLIENT_ID = 'LRZSXJXQROUZCBGT2GQN3DSOOCOIQOVRLB0HUUYJTOCA1CQ3' # your Foursquare ID
CLIENT_SECRET = '32KAGLUFOGWUSHLUCFQTM4F2IAXXTX12OBSG4JSRKZIAADN0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LRZSXJXQROUZCBGT2GQN3DSOOCOIQOVRLB0HUUYJTOCA1CQ3
CLIENT_SECRET:32KAGLUFOGWUSHLUCFQTM4F2IAXXTX12OBSG4JSRKZIAADN0


__Now, let's get the top 100 venues that are within a radius of 500 meters.__

In [52]:
radius = 500
LIMIT = 100

venues = []

for lat, long, post, neighborhood, borough in zip(df['Latitude'], df['Longitude'], df.index,df['Neighbourhood'], df['Borough']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()
    item = results["response"]['groups'][0]['items']
   
    
    for venue in item:
        venues.append((
            post, 
            borough,
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [53]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Pincode', 'Borough', 'Neighborhood', 'BoroughLatitude', 'BoroughLongitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(77, 9)


Unnamed: 0,Pincode,Borough,Neighborhood,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,530003,Visakhapatnam Urban,A U Engg College,17.728796,83.32006,Venkatadri Vantillu,17.725012,83.320543,Andhra Restaurant
1,530003,Visakhapatnam Urban,A U Engg College,17.728796,83.32006,Fresh Choice,17.725168,83.317429,Convenience Store
2,530053,Gajuwaka,Aganampudi,17.686458,83.137761,Domino's Pizza,17.686542,83.139156,Pizza Place
3,530053,Gajuwaka,Aganampudi,17.686458,83.137761,aganampudi panchayat office,17.689123,83.135063,Warehouse Store
4,530016,Visakhapatnam Urban,Akkayyapalem,17.735264,83.299945,Manikanda Bakery,17.734526,83.301597,Bakery


__Let's check how many venues were returned for each Pincode__

In [54]:
venues_df.groupby(["Pincode", "Borough", "Neighborhood"]).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Pincode,Borough,Neighborhood,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
530001,Visakhapatnam Urban,Fortward,1,1,1,1,1,1
530002,Visakhapatnam Urban,D.C. Buildings,5,5,5,5,5,5
530003,Visakhapatnam Urban,A U Engg College,2,2,2,2,2,2
530004,Visakhapatnam Urban,Gnanapuram,1,1,1,1,1,1
530007,Visakhapatnam Urban,Industrial Estate Visakhapatnam,2,2,2,2,2,2
530008,Visakhapatnam Urban,IRSD Area,1,1,1,1,1,1
530009,Visakhapatnam Urban,Marripalem Vuda Colony,2,2,2,2,2,2
530011,Visakhapatnam Urban,Malkapuram,2,2,2,2,2,2
530012,Visakhapatnam Urban,Autonagar,1,1,1,1,1,1
530013,Visakhapatnam Urban,P & T Colony VM,2,2,2,2,2,2


__Let's find out how many unique categories can be curated from all the returned venues__

In [55]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 47 uniques categories.


In [56]:
venues_df['VenueCategory'].unique()[:50]

array(['Andhra Restaurant', 'Convenience Store', 'Pizza Place',
       'Warehouse Store', 'Bakery', 'Indian Restaurant',
       'African Restaurant', 'Electronics Store', 'Pet Service',
       'Pharmacy', 'Business Service', 'Spa', 'Hotel',
       'Arts & Crafts Store', 'Seafood Restaurant', 'Indie Movie Theater',
       'Multiplex', 'Fast Food Restaurant', 'Clothing Store',
       'Coffee Shop', 'Movie Theater', 'Restaurant', 'Tea Room',
       'Dessert Shop', 'ATM', 'Harbor / Marina', 'Shopping Mall',
       'Sandwich Place', 'Scenic Lookout', 'Hot Dog Joint',
       'Ice Cream Shop', 'Beach', 'Tennis Court', 'Train Station', 'Park',
       'Café', 'Motel', 'Smoke Shop', 'Gift Shop', 'Cafeteria', 'Bar',
       'Outdoor Supply Store', 'Lounge', 'Department Store',
       'Sculpture Garden', 'Historic Site', 'Mobile Phone Shop'],
      dtype=object)

### Analyze Each Area

In [57]:
# one hot encoding
vsp_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add postal, borough and neighborhood column back to dataframe
vsp_onehot['Pincode'] = venues_df['Pincode'] 
vsp_onehot['Borough'] = venues_df['Borough'] 
vsp_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(vsp_onehot.columns[-3:]) + list(vsp_onehot.columns[:-3])
vsp_onehot = vsp_onehot[fixed_columns]

print(vsp_onehot.shape)
vsp_onehot.head()

(77, 50)


Unnamed: 0,Pincode,Borough,Neighborhoods,ATM,African Restaurant,Andhra Restaurant,Arts & Crafts Store,Bakery,Bar,Beach,...,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shopping Mall,Smoke Shop,Spa,Tea Room,Tennis Court,Train Station,Warehouse Store
0,530003,Visakhapatnam Urban,A U Engg College,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,530003,Visakhapatnam Urban,A U Engg College,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,530053,Gajuwaka,Aganampudi,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,530053,Gajuwaka,Aganampudi,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,530016,Visakhapatnam Urban,Akkayyapalem,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


__Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category__

In [58]:
vsp_grouped = vsp_onehot.groupby(["Pincode", "Borough", "Neighborhoods"]).mean().reset_index()

print(vsp_grouped.shape)
vsp_grouped



(27, 50)


Unnamed: 0,Pincode,Borough,Neighborhoods,ATM,African Restaurant,Andhra Restaurant,Arts & Crafts Store,Bakery,Bar,Beach,...,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shopping Mall,Smoke Shop,Spa,Tea Room,Tennis Court,Train Station,Warehouse Store
0,530001,Visakhapatnam Urban,Fortward,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,530002,Visakhapatnam Urban,D.C. Buildings,0.0,0.0,0.0,0.2,0.0,0.0,0.0,...,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0
2,530003,Visakhapatnam Urban,A U Engg College,0.0,0.0,0.5,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,530004,Visakhapatnam Urban,Gnanapuram,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,530007,Visakhapatnam Urban,Industrial Estate Visakhapatnam,0.0,0.0,0.0,0.0,0.5,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0
5,530008,Visakhapatnam Urban,IRSD Area,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,530009,Visakhapatnam Urban,Marripalem Vuda Colony,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,530011,Visakhapatnam Urban,Malkapuram,0.0,0.0,0.0,0.0,0.0,0.5,0.0,...,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0
8,530012,Visakhapatnam Urban,Autonagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,530013,Visakhapatnam Urban,P & T Colony VM,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


__Now let's create the new dataframe and display the top 10 venues for each Pincode.__

In [59]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ['Pincode', 'Borough', 'Neighborhoods']
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Pincode'] = vsp_grouped['Pincode']
neighborhoods_venues_sorted['Borough'] = vsp_grouped['Borough']
neighborhoods_venues_sorted['Neighborhoods'] = vsp_grouped['Neighborhoods']

for ind in np.arange(vsp_grouped.shape[0]):
    row_categories = vsp_grouped.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighborhoods_venues_sorted.iloc[ind, 3:] = row_categories_sorted.index.values[0:num_top_venues]

# neighborhoods_venues_sorted.sort_values(freqColumns, inplace=True)
print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted

(27, 13)


Unnamed: 0,Pincode,Borough,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,530001,Visakhapatnam Urban,Fortward,Harbor / Marina,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop,Department Store
1,530002,Visakhapatnam Urban,D.C. Buildings,Indian Restaurant,Spa,Arts & Crafts Store,Seafood Restaurant,Hotel,Coffee Shop,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant
2,530003,Visakhapatnam Urban,A U Engg College,Andhra Restaurant,Convenience Store,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store
3,530004,Visakhapatnam Urban,Gnanapuram,Bakery,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop
4,530007,Visakhapatnam Urban,Industrial Estate Visakhapatnam,Tennis Court,Bakery,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store
5,530008,Visakhapatnam Urban,IRSD Area,Park,Warehouse Store,Ice Cream Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop
6,530009,Visakhapatnam Urban,Marripalem Vuda Colony,Hotel,Outdoor Supply Store,Ice Cream Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop
7,530011,Visakhapatnam Urban,Malkapuram,Smoke Shop,Bar,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store
8,530012,Visakhapatnam Urban,Autonagar,Business Service,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop
9,530013,Visakhapatnam Urban,P & T Colony VM,Department Store,Restaurant,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store


### Cluster Areas

Run k-means to cluster the Toronto areas into 5 clusters.

In [62]:
# set number of clusters
kclusters = 5

vsp_grouped_clustering = vsp_grouped.drop(["Pincode", "Borough", "Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(vsp_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([3, 0, 0, 0, 0, 0, 0, 0, 2, 0], dtype=int32)

In [64]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
vsp_merged = vsp_grouped.copy()
vsp_merged = vsp_merged.loc[:,["Pincode", "Borough", "Neighborhoods"]]
pncodes = vsp_grouped['Pincode'].values

lati = [] 
long = [] 

for i in pncodes:
    for index, row in df.iterrows():
        if i == index:
            lati.append(row['Latitude'])
            long.append(row['Longitude'])

# add clustering labels
vsp_merged["Latitude"] = lati
vsp_merged["Longitude"] = long
vsp_merged["Cluster Labels"] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
vsp_merged = vsp_merged.join(neighborhoods_venues_sorted.drop(["Borough", "Neighborhoods"], 1).set_index("Pincode"), on="Pincode")

print(vsp_merged.shape)
vsp_merged.head() # check the last columns!



(27, 16)


Unnamed: 0,Pincode,Borough,Neighborhoods,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,530001,Visakhapatnam Urban,Fortward,17.699766,83.296101,3,Harbor / Marina,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop,Department Store
1,530002,Visakhapatnam Urban,D.C. Buildings,17.709179,83.308139,0,Indian Restaurant,Spa,Arts & Crafts Store,Seafood Restaurant,Hotel,Coffee Shop,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant
2,530003,Visakhapatnam Urban,A U Engg College,17.728796,83.32006,0,Andhra Restaurant,Convenience Store,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store
3,530004,Visakhapatnam Urban,Gnanapuram,17.720747,83.284695,0,Bakery,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop
4,530007,Visakhapatnam Urban,Industrial Estate Visakhapatnam,17.744701,83.268643,0,Tennis Court,Bakery,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store


In [65]:
print(vsp_merged.shape)
vsp_merged.sort_values(["Cluster Labels"], inplace=True)

(27, 16)


In [66]:
vsp_merged

Unnamed: 0,Pincode,Borough,Neighborhoods,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,530017,Visakhapatnam Urban,L B Colony,17.732859,83.336835,0,Café,Gift Shop,Smoke Shop,Bakery,Motel,Cafeteria,Hotel,Dessert Shop,Warehouse Store,Department Store
24,530045,Visakhapatnam Rural,Gitam Engg. College,17.781622,83.377482,0,Ice Cream Shop,Tennis Court,Fast Food Restaurant,Beach,Scenic Lookout,Sandwich Place,Hot Dog Joint,Convenience Store,Coffee Shop,Historic Site
23,530043,Visakhapatnam Urban,Dayalnagar,17.753749,83.346186,0,Tea Room,Dessert Shop,Restaurant,Warehouse Store,Clothing Store,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store
20,530032,Pedagantyada,Ukkunagaram,17.650206,83.138085,0,Mobile Phone Shop,Warehouse Store,Ice Cream Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop
18,530028,Visakhapatnam Rural,Simhachalam Hills,17.769236,83.247113,0,Historic Site,Warehouse Store,Coffee Shop,Hot Dog Joint,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop,Department Store
17,530027,Visakhapatnam Urban,Gopalapatnam,17.747975,83.217924,0,Indian Restaurant,Train Station,African Restaurant,Andhra Restaurant,Hotel,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant
16,530026,Visakhapatnam Urban,Gajuwaka,17.685077,83.210427,0,Shopping Mall,Movie Theater,Hotel,Pharmacy,Warehouse Store,Coffee Shop,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant
15,530020,Visakhapatnam Urban,Dabagardens,17.716129,83.300084,0,Indie Movie Theater,Indian Restaurant,Clothing Store,Fast Food Restaurant,Movie Theater,Multiplex,Hotel,Coffee Shop,Business Service,Bakery
12,530016,Visakhapatnam Urban,Akkayyapalem,17.735264,83.299945,0,Indian Restaurant,African Restaurant,Bakery,Electronics Store,Convenience Store,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant
11,530015,Visakhapatnam Urban,Zinc Smelter P,17.689752,83.219599,0,Ice Cream Shop,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop


### Create a map of Visakhapatnam with all Clusters superimposed on top

In [67]:
vsp_latitude = "17.6868"
vsp_longitude = "83.2185"

# create map
map_clusters = folium.Map(location=[vsp_latitude, vsp_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, post, bor, poi, cluster in zip(vsp_merged['Latitude'], vsp_merged['Longitude'], vsp_merged['Pincode'], vsp_merged['Borough'], vsp_merged['Neighborhoods'], vsp_merged['Cluster Labels']):
    label = folium.Popup('{} ({}): {} - Cluster {}'.format(bor, post, poi, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results and Discussion <a name="results"></a>

### Examine the Clusters

#### Cluster 1

In [68]:
vsp_merged.loc[vsp_merged['Cluster Labels'] == 0, vsp_merged.columns[[2] + list(range(5, vsp_merged.shape[1]))]]

Unnamed: 0,Neighborhoods,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,L B Colony,0,Café,Gift Shop,Smoke Shop,Bakery,Motel,Cafeteria,Hotel,Dessert Shop,Warehouse Store,Department Store
24,Gitam Engg. College,0,Ice Cream Shop,Tennis Court,Fast Food Restaurant,Beach,Scenic Lookout,Sandwich Place,Hot Dog Joint,Convenience Store,Coffee Shop,Historic Site
23,Dayalnagar,0,Tea Room,Dessert Shop,Restaurant,Warehouse Store,Clothing Store,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store
20,Ukkunagaram,0,Mobile Phone Shop,Warehouse Store,Ice Cream Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop
18,Simhachalam Hills,0,Historic Site,Warehouse Store,Coffee Shop,Hot Dog Joint,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop,Department Store
17,Gopalapatnam,0,Indian Restaurant,Train Station,African Restaurant,Andhra Restaurant,Hotel,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant
16,Gajuwaka,0,Shopping Mall,Movie Theater,Hotel,Pharmacy,Warehouse Store,Coffee Shop,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant
15,Dabagardens,0,Indie Movie Theater,Indian Restaurant,Clothing Store,Fast Food Restaurant,Movie Theater,Multiplex,Hotel,Coffee Shop,Business Service,Bakery
12,Akkayyapalem,0,Indian Restaurant,African Restaurant,Bakery,Electronics Store,Convenience Store,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant
11,Zinc Smelter P,0,Ice Cream Shop,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop


#### Cluster 2

In [69]:
vsp_merged.loc[vsp_merged['Cluster Labels'] == 1, vsp_merged.columns[[2] + list(range(5, vsp_merged.shape[1]))]]

Unnamed: 0,Neighborhoods,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Pothinamallayapalem,1,Sculpture Garden,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop


#### Cluster 3

In [70]:
vsp_merged.loc[vsp_merged['Cluster Labels'] == 2, vsp_merged.columns[[2] + list(range(5, vsp_merged.shape[1]))]]

Unnamed: 0,Neighborhoods,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Autonagar,2,Business Service,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop


#### Cluster 4

In [71]:
vsp_merged.loc[vsp_merged['Cluster Labels'] == 3, vsp_merged.columns[[2] + list(range(5, vsp_merged.shape[1]))]]

Unnamed: 0,Neighborhoods,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Fortward,3,Harbor / Marina,Warehouse Store,Coffee Shop,Hot Dog Joint,Historic Site,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop,Department Store


#### Cluster 5

In [72]:
vsp_merged.loc[vsp_merged['Cluster Labels'] == 4, vsp_merged.columns[[2] + list(range(5, vsp_merged.shape[1]))]]

Unnamed: 0,Neighborhoods,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,Duvvada,4,ATM,Coffee Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop,Department Store
14,Marripalem,4,ATM,Pharmacy,Ice Cream Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop
19,Durganagar,4,ATM,Pharmacy,Ice Cream Shop,Hot Dog Joint,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop
21,Arilova,4,Pet Service,Pharmacy,Warehouse Store,Hotel,Historic Site,Harbor / Marina,Gift Shop,Fast Food Restaurant,Electronics Store,Dessert Shop


## Conclusion <a name="conclusion"></a> 

+ Most of the neighborhoods fall into Cluster 1 which are mostly business areas with cafe, restaurants, supermarkets etc.
+ Mostly business areas in Cluster 5 are ATMs, Pet services and Pharmacies.
+ Mostly business areas in Cluster 4 is just Harbor / Marina.
+ Mostly business areas in Cluster 3 are Business Service and Warehouse Store.
+ Mostly business areas in Cluster 2 is just Sculpture Gardens.

__With this we can conclude that the people who are looking for new appartment or home for reat or perchase can select the Neighborhoods that belong to Cluster 1 as there are many facilites and business areas.__

