# Capstone Project - The Battle of Neighborhoods (Week 1-2)

## Introduction: Business Problem

Mumbai and Delhi are two metro cities of India. These two cities are crowded and always jam packed with people. These cities are one of the best tourist attractions. So, we will explore these two cities and try to find out which city has more venues to visit.  

Nobody can remember or know all venues in Mumbai and Delhi area and so cannot promote all venues and categories which can found through Foursquare API. We would like to provide information near these neighborhoods to tell the tourists and people who are unaware. We would also cluster the similar venues and categorize them to quickly find out which category of venues and areas are unique and have good parks and cafeterias. This could make the difference for tourists who are unaware of the places and it will provide information that can be even crucial when families deciding where they are going to move or buy new home.

## Data Section

Based on definition of our problem, factors that will help:
* All venues of neighborhood
* Top venue categories in neighborhood
* Overall style for example cafes and parks

The following data sources will be needed to generate the required information:
* Data found from the [data.gov.in](https://data.gov.in/) about the post offices in India
* I extracted the data about Mumbai and Delhi
* And using this data I changed office names as neighborhoods
* All venues or neighborhood area through Foursquare API
* After cleaning the data, I used geopy library
* Geopy library to get coordinates of neighborhoods

We will use the explore function to get the most common venue categories in each neighborhood of Mumbai and Delhi. We will also cluster neighborhoods to give similarity information to end customer. And in the end, which city is the best to live in according to the end customers.

### Methodology Section

The Methodology section will describe the main components of our analysis and predication system. The Methodology section comprises four stages:

1. Collect Inspection Data
2. Explore and understand Data
3. Data prepartion and preprocessing
4. Modeling

#### 1. Collecting Inspection Data

#### Data is collected from the [data.gov.in ](https://data.gov.in/)

After collecting it by using IBM's insert to code feature data is imported.

In [1]:
# The code was removed by Watson Studio for sharing.

#### 2. Explore and Understand Data

We read the dataset that we collected from the [data.gov.in](https://data.gov.in/) website into a pandas' data frame and display the first five rows of it as follows:

In [2]:
df_data.head()

Unnamed: 0,officename,pincode,officeType,Deliverystatus,divisionname,regionname,circlename,Taluk,Districtname,statename,Telephone,Related Suboffice,Related Headoffice,longitude,latitude
0,Achalapur,504273,,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Asifabad,Adilabad,TELANGANA,,Rechini,Mancherial,,
1,Ada,504293,,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Asifabad,Adilabad,TELANGANA,,Asifabad,Mancherial,,
2,Adegaon,504307,,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Boath,Adilabad,TELANGANA,,Echoda,Adilabad,,
3,Adilabad Collectorate,504001,,Non-Delivery,Adilabad,Hyderabad,Andhra Pradesh,Adilabad,Adilabad,TELANGANA,08732-226703,,Adilabad,,
4,Adilabad,504001,,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Adilabad,Adilabad,TELANGANA,08732-226738,,,,


#### 3. Data Preparation and Preprocessing

At this stage, we prepare our dataset for the modeling process, opting for the most suitable machine learning algorithm for our scope. 

#### Dropping the unnecessary columns

* Dropped all the columns including Latitude and Longitude columns since values were missing.
* Changed 'officename' column to neighborhood and 'regionname' to region

In [3]:
df_data = df_data.drop(columns = [ 'officeType', 'Deliverystatus', 'divisionname',
                        'circlename', 'Taluk', 'Districtname', 'statename',
                        'Telephone', 'Related Suboffice', 'Related Headoffice', 'longitude', 'latitude'])
df_data.columns = ['neighborhood', 'pincode','region']

df_data.head()

Unnamed: 0,neighborhood,pincode,region
0,Achalapur,504273,Hyderabad
1,Ada,504293,Hyderabad
2,Adegaon,504307,Hyderabad
3,Adilabad Collectorate,504001,Hyderabad
4,Adilabad,504001,Hyderabad


Finding the unique values of region column

In [4]:
df_data['region'].unique().tolist()

['Hyderabad',
 'Hyderabad City',
 'Kurnool',
 'Vijayawada',
 'Visakhapatnam',
 'Dibrugarh',
 'Guwahati HQ',
 'Muzaffarpur',
 'Patna HQ',
 'Raipur',
 'Delhi',
 'Ahmedabad HQ',
 'Rajkot',
 'Vadodara',
 'Ambala  HQ',
 'Shimla HQ',
 'Srinagar HQ',
 'Ranchi',
 'Bangalore HQ',
 'North Karnataka',
 'South Karnataka',
 'Calicut',
 'Kochi',
 'Trivandrum  HQ',
 'Bhopal HQ',
 'Gwalior',
 'Indore',
 'Aurangabad',
 'Goa-Panaji',
 'Mumbai',
 'Nagpur',
 'Pune',
 'North Eastern',
 'Shillong HQ',
 'Berhampur',
 'Bhubaneswar HQ',
 'Sambalpur',
 'Chandigarh HQ',
 'Chandigarh Region',
 'Ajmer',
 'Jaipur HQ',
 'Jodhpur',
 'Chennai Region',
 'Coimbatore',
 'Madurai',
 'Tiruchy',
 'Agra',
 'Allahabad',
 'Bareilly',
 'Gorakhpur',
 'Kanpur',
 'Lucknow  HQ',
 'Dehradun',
 'Calcutta',
 'Calcutta HQ',
 'North Bengal And Sikkim',
 'South Bengal']

### Data Clean up

#### Extracting Required data from the whole data

Creating a new dataframe which has only two region Mumbai and Delhi

In [5]:
df = df_data[df_data['region'].isin(['Mumbai','Delhi'])]

Group the table by pincode/region, neighbourhood belonging to same region will be combined in 'neighbourhood' column as separated with 'comma'.

In [6]:
df = df.groupby(['pincode', 'region'])['neighborhood'].apply(', '.join).reset_index()
print(df.shape)
df.tail()

(335, 3)


Unnamed: 0,pincode,region,neighborhood
330,421506,Mumbai,Additional Ambernath
331,421601,Mumbai,"Aghai , Alyani , Ambarje , Andad , Asangaon , ..."
332,421602,Mumbai,"Kasara (Thane), Mokhawane , Shirol , Vashala ..."
333,421603,Mumbai,"Bhatsanagar , Birwadi"
334,421605,Mumbai,"Khadavali , Manda , Phalegaon , Titwala"


#### Collecting the coordinates (Latitude and Longitude)

By using the geopy library I've collected some of the coordinates and due to service timed out I used googlemaps api key to collect coordinates. And I've imported the coordinates file.

In [7]:

body = client_426455649982490b8aabfc274ebc0143.get_object(Bucket='battleofneighborhoodscapstonefina-donotdelete-pr-8fvbqslwhxgphw',Key='coordinates.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

# If you are reading an Excel file into a pandas DataFrame, replace `read_csv` by `read_excel` in the next statement.
coordinates = pd.read_csv(body)
coordinates.head()


Unnamed: 0,Latitude,Longitude
0,28.630485,77.215051
1,28.64625,77.265751
2,28.596234,77.223611
3,28.614348,77.19943
4,28.657456,77.191789


#### Final Cleaned up and preprocessed data

Using concat function of pandas created the final dataset

In [8]:
df_final = pd.concat([df, coordinates], axis = 1)
df_final.to_csv('df_final.csv', index = False)
df_final.tail()

Unnamed: 0,pincode,region,neighborhood,Latitude,Longitude
330,421506,Mumbai,Additional Ambernath,18.543596,73.134556
331,421601,Mumbai,"Aghai , Alyani , Ambarje , Andad , Asangaon , ...",18.717285,73.296711
332,421602,Mumbai,"Kasara (Thane), Mokhawane , Shirol , Vashala ...",18.788433,73.289531
333,421603,Mumbai,"Bhatsanagar , Birwadi",19.262058,73.236008
334,421605,Mumbai,"Khadavali , Manda , Phalegaon , Titwala",19.211114,73.195034


### Importing the necessary Libraries

In [11]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.0 MB

The following NEW packages will be 

#### Use geopy library to get the latitude and longitude values of Mumbai, India.

In [12]:
address = 'Mumbai, India'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Mumbai city are {}, {}.'.format(latitude, longitude))

  app.launch_new_instance()


The geograpical coordinate of Mumbai city are 18.9387711, 72.8353355.


#### Create a map of Mumbai with neighborhoods superimposed on top.

In [13]:
# create map of New York using latitude and longitude values
map_India = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, region, neighborhood in zip(df_final['Latitude'], df_final['Longitude'], df_final['region'], df_final['neighborhood']):
    label = '{}, {}'.format(neighborhood, region)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_India)  
    
map_India

For this task, I will just reduce the our target analysis to Neighbohoods in Mumbai and Delhi only. Let's just take portion of dataframe where regions contain word Mumbai.

## Part 01 - Exploring and Clustering Mumbai Neighborhoods

In [14]:
df_mumbai = df_final[df_final['region'].str.contains("Mumbai")].reset_index(drop=True)
print(df_mumbai.shape)
df_mumbai.head()

(240, 5)


Unnamed: 0,pincode,region,neighborhood,Latitude,Longitude
0,400001,Mumbai,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",19.093878,72.862683
1,400002,Mumbai,"Kalbadevi , Ramwadi , S. C. Court , Thakurdwar",18.955026,72.833389
2,400003,Mumbai,"B.P.Lane , Mandvi (Mumbai), Masjid , Null Bazar",18.954353,72.818101
3,400004,Mumbai,"Ambewadi (Mumbai), Charni Road , Chaupati , G...",18.907265,72.807068
4,400005,Mumbai,"Asvini , Colaba Bazar , Colaba , Holiday Camp ...",18.951834,72.801253


#### Re-create the map with new markers for Mumbai Neighborhoods.

In [15]:
map_mumbai = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_mumbai['Latitude'], df_mumbai['Longitude'], df_mumbai['neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mumbai)  
    
map_mumbai

#### Utilizing the Foursquare API to explore and segment neighborhoods

In [16]:
CLIENT_ID = 'NUTE4SSXCN4MMYEDX2XSN2PFXJMWCV4PAZSSYHYGUURYNHPL' # your Foursquare ID
CLIENT_SECRET = 'TMVFNU45RMVOURKBQ5DYQOZRAQIMPM3SQNQDLD0HDH10VZYA' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100
print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: NUTE4SSXCN4MMYEDX2XSN2PFXJMWCV4PAZSSYHYGUURYNHPL
CLIENT_SECRET:TMVFNU45RMVOURKBQ5DYQOZRAQIMPM3SQNQDLD0HDH10VZYA


We can now proceed to the Modeling phase. We will analyze neighborhoods to recommend venues and places that must be visited. We will then recommend profitable venues according to amenities and essential facilities surrounding such venues i.e. cafes, Restaurants, Temples and other places to visit. 

#### 4. Modeling

After exploring the dataset and gaining insights into it, we are ready to use the clustering methodology. We will use the k-means clustering technique as it is fast and efficient in terms of computational cost, is highly flexible.

### 1.1 Exploring Neighbourhood in Mumbai

#### Exploring the first venue of the dataset 'Bazargate , Elephanta Caves Po , M.P.T. , Stock Exchange , Tajmahal , Town Hall  (Mumbai), Mumbai G.P.O.'

In [17]:
df_mumbai.loc[0, 'neighborhood']

'Bazargate , Elephanta Caves Po , M.P.T. , Stock Exchange , Tajmahal , Town Hall  (Mumbai), Mumbai G.P.O. '

In [18]:
neighborhood_latitude = df_mumbai.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_mumbai.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_mumbai.loc[0, 'neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are \n{}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Bazargate , Elephanta Caves Po , M.P.T. , Stock Exchange , Tajmahal , Town Hall  (Mumbai), Mumbai G.P.O.  are 
19.0938785, 72.8626828.


In [19]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=NUTE4SSXCN4MMYEDX2XSN2PFXJMWCV4PAZSSYHYGUURYNHPL&client_secret=TMVFNU45RMVOURKBQ5DYQOZRAQIMPM3SQNQDLD0HDH10VZYA&v=20180605&ll=19.0938785,72.8626828&radius=500&limit=100'

In [20]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e268ebb0f59680022e37f78'},
 'response': {'headerLocation': 'Airport',
  'headerFullLocation': 'Airport, Mumbai',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 5,
  'suggestedBounds': {'ne': {'lat': 19.098378504500005,
    'lng': 72.8674358996394},
   'sw': {'lat': 19.089378495499993, 'lng': 72.8579297003606}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '510944c6e4b04a3ebed4c7c0',
       'name': 'Tarmac',
       'location': {'address': 'Mumbai',
        'lat': 19.09203891664548,
        'lng': 72.85904524344362,
        'labeledLatLngs': [{'label': 'display',
          'lat': 19.09203891664548,
          'lng': 72.85904524344362}],
        'distance': 434,
        'cc': 'IN',
        'city': 'Mumbai',
        's

In [21]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [22]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Tarmac,Airport,19.092039,72.859045
1,Chhatrapati Shivaji International Airport,Airport,19.090509,72.865148
2,"The Lounge, Dosmestic Terminal 1B",Café,19.093097,72.859074
3,Aviserv Arrival Lounge,Airport Lounge,19.092619,72.86632
4,International Cargo Complex,Airport,19.096785,72.865374


In [23]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

5 venues were returned by Foursquare.


In [24]:
# The following function retrieves the venues given the names and coordinates and stores it into dataframe.

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Retrieve all venues given the Addresses

In [25]:
mumbai_neighborhoods = df_mumbai
mumbai_venues = getNearbyVenues(names=mumbai_neighborhoods['neighborhood'],
                                   latitudes=mumbai_neighborhoods['Latitude'],
                                   longitudes=mumbai_neighborhoods['Longitude']
                                  )

Bazargate , Elephanta Caves Po , M.P.T. , Stock Exchange , Tajmahal , Town Hall  (Mumbai), Mumbai G.P.O. 
Kalbadevi , Ramwadi , S. C. Court , Thakurdwar 
B.P.Lane , Mandvi  (Mumbai), Masjid , Null Bazar 
Ambewadi  (Mumbai), Charni Road , Chaupati , Girgaon , Madhavbaug , Opera House 
Asvini , Colaba Bazar , Colaba , Holiday Camp , V.W.T.C. 
Malabar Hill 
Bharat Nagar  (Mumbai), Grant Road , N.S.Patkar Marg , S V Marg , Tardeo 
Falkland Road , J.J.Hospital , Kamathipura , M A Marg , Mumbai Central 
Chinchbunder , Noor Baug , Princess Dock 
Dockyard Road , Mazgaon Dock , Mazgaon Road , Mazgaon , V K Bhavan 
Agripada , BPC  Jacob Circle , Chinchpokli , Haines Road , Jacob Circle 
BEST STaff Quarters , Chamarbaug , Haffkin Institute , Lal Baug , Parel Naka , Parel Rly Work Shop , Parel 
Delisle Road 
Dadar Colony , Dadar , Naigaon  (Mumbai)
Sewri 
Kapad Bazar , Mahim Bazar , Mahim East , Mahim , Mori Road 
Dharavi Road , Dharavi 
Worli Naka , Worli 
Matunga 
Central Building , Churchgate ,

#### Check size of resulting dataframe

In [26]:
print(mumbai_venues.shape)
mumbai_venues.head()

(1396, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",19.093878,72.862683,Tarmac,19.092039,72.859045,Airport
1,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",19.093878,72.862683,Chhatrapati Shivaji International Airport,19.090509,72.865148,Airport
2,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",19.093878,72.862683,"The Lounge, Dosmestic Terminal 1B",19.093097,72.859074,Café
3,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",19.093878,72.862683,Aviserv Arrival Lounge,19.092619,72.86632,Airport Lounge
4,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",19.093878,72.862683,International Cargo Complex,19.096785,72.865374,Airport


#### Count of venues were returned for each region

In [27]:
mumbai_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"A I Staff Colony , Santacruz P&t Colony",12,12,12,12,12,12
"Aareymilk Colony , Nagari Niwara , S R P F Camp",7,7,7,7,7,7
"Adai , Awre , Chawk , Chirner , Dighode , Gavan , Jasai , Jci Kamothe , Jui , Kelevane , Khanda Colony , Koproli , Kundevahal , Lodhivali , Morbe , Nere , Nhava , Panvel City , Panvel , Pargaon , Somatane , Ulwa , Vahal , Vaje , Vasheni , Wavandhal , Wavarle",4,4,4,4,4,4
Additional Ambernath,1,1,1,1,1,1
"Agarwadi , Dahisar tymanor , Datiware , Dhekale , Edwan , Navghar , Pargaon , Tandulwadi , Tembhikhodave , Umbarpada , Usarani , Virathan Budruk",2,2,2,2,2,2
"Agashi , Chikhal Dongre , Kophrad , Vatar",7,7,7,7,7,7
"Agrav , Bapale , Chaul , Chinchoti , Malyan , Ramraj , Sudkoli , Usar",12,12,12,12,12,12
"Agripada , BPC Jacob Circle , Chinchpokli , Haines Road , Jacob Circle",14,14,14,14,14,14
"Airport (Mumbai), International Airport , Sahar P & T Colony , Sahargaon",12,12,12,12,12,12
"Ajivali , Barapada , Chikhale , Ongc Complex,panvel , Palaspe , Poyanje , Sai , Shirdhon",1,1,1,1,1,1


#### How many unique categories can be curated from all the returned venues

In [28]:
print('There are {} uniques categories.'.format(len(mumbai_venues['Venue Category'].unique())))

There are 182 uniques categories.


###  1.2 Analyze each region Neighborhood

In [29]:
# one hot encoding
mumbai_onehot = pd.get_dummies(mumbai_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mumbai_onehot['Neighborhood'] = mumbai_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [mumbai_onehot.columns[-1]] + list(mumbai_onehot.columns[:-1])
mumbai_onehot = mumbai_onehot[fixed_columns]

mumbai_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport,Airport Lounge,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bakery,Bank,Bar,Basketball Court,Beach,Bed & Breakfast,Beer Garden,Bengali Restaurant,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Bus Station,Cafeteria,Café,Cheese Shop,Chinese Restaurant,City,Clothing Store,Cocktail Bar,Coffee Shop,College Auditorium,College Cafeteria,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Creperie,Cricket Ground,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Fish Market,Fishing Spot,Flea Market,Flower Shop,Food,Food Court,Food Service,Food Truck,Forest,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Government Building,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hockey Arena,Hot Dog Joint,Hotel,Hotel Pool,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Lake,Light Rail Station,Liquor Store,Lounge,Maharashtrian Restaurant,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Music Venue,Nightclub,Office,Opera House,Optical Shop,Other Great Outdoors,Park,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Racetrack,Recreation Center,Resort,Rest Area,Restaurant,Road,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soup Place,South Indian Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Club,Stadium,Steakhouse,Supermarket,Tea Room,Tennis Court,Thai Restaurant,Theater,Toll Plaza,Tourist Information Center,Track,Trail,Train Station,Tree,Tunnel,Udupi Restaurant,Vegetarian / Vegan Restaurant,Wine Bar,Women's Store,Yoga Studio,Zoo
0,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### And let's find out the dataframe size

In [30]:
mumbai_onehot.shape

(1396, 183)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [31]:
mumbai_grouped = mumbai_onehot.groupby('Neighborhood').mean().reset_index()
mumbai_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport,Airport Lounge,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bakery,Bank,Bar,Basketball Court,Beach,Bed & Breakfast,Beer Garden,Bengali Restaurant,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Bus Station,Cafeteria,Café,Cheese Shop,Chinese Restaurant,City,Clothing Store,Cocktail Bar,Coffee Shop,College Auditorium,College Cafeteria,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Creperie,Cricket Ground,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Fish Market,Fishing Spot,Flea Market,Flower Shop,Food,Food Court,Food Service,Food Truck,Forest,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Government Building,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hockey Arena,Hot Dog Joint,Hotel,Hotel Pool,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Lake,Light Rail Station,Liquor Store,Lounge,Maharashtrian Restaurant,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Music Venue,Nightclub,Office,Opera House,Optical Shop,Other Great Outdoors,Park,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Racetrack,Recreation Center,Resort,Rest Area,Restaurant,Road,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soup Place,South Indian Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Club,Stadium,Steakhouse,Supermarket,Tea Room,Tennis Court,Thai Restaurant,Theater,Toll Plaza,Tourist Information Center,Track,Trail,Train Station,Tree,Tunnel,Udupi Restaurant,Vegetarian / Vegan Restaurant,Wine Bar,Women's Store,Yoga Studio,Zoo
0,"A I Staff Colony , Santacruz P&t Colony",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Aareymilk Colony , Nagari Niwara , S R P F Camp",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Adai , Awre , Chawk , Chirner , Dighode , Gava...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Additional Ambernath,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Agarwadi , Dahisar tymanor , Datiware , Dhekal...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Agashi , Chikhal Dongre , Kophrad , Vatar",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Agrav , Bapale , Chaul , Chinchoti , Malyan , ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0
7,"Agripada , BPC Jacob Circle , Chinchpokli , H...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0
8,"Airport (Mumbai), International Airport , Sah...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.083333,0.0,0.083333,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Ajivali , Barapada , Chikhale , Ongc Complex,p...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [32]:
mumbai_grouped.shape

(151, 183)

#### Let's put that into a pandas dataframe

In [33]:
num_top_venues = 5

for hood in mumbai_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = mumbai_grouped[mumbai_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----A I Staff Colony , Santacruz P&t Colony ----
             venue  freq
0           Lounge  0.17
1   Sandwich Place  0.08
2      Flea Market  0.08
3  Thai Restaurant  0.08
4        Nightclub  0.08


----Aareymilk Colony , Nagari Niwara , S R P F Camp ----
                  venue  freq
0           Bus Station  0.29
1        Scenic Lookout  0.14
2     Indian Restaurant  0.14
3                  Park  0.14
4  Gym / Fitness Center  0.14


----Adai , Awre , Chawk , Chirner , Dighode , Gavan , Jasai , Jci Kamothe , Jui , Kelevane , Khanda Colony , Koproli , Kundevahal , Lodhivali , Morbe , Nere , Nhava , Panvel City , Panvel , Pargaon , Somatane , Ulwa , Vahal , Vaje , Vasheni , Wavandhal , Wavarle ----
                             venue  freq
0                       Toll Plaza  0.50
1                       Playground  0.25
2                 Basketball Court  0.25
3                Mobile Phone Shop  0.00
4  Molecular Gastronomy Restaurant  0.00


----Additional Ambernath ----
              

#### First, let's write a function to sort the venues in descending order.

In [34]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [35]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = mumbai_grouped['Neighborhood']

for ind in np.arange(mumbai_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mumbai_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()
neighborhoods_venues_sorted.shape

(151, 11)

### 1.3 Clustering Neighborhoods

#### Run k-means to cluster the neighborhood into 5 clusters.

In [36]:
# set number of clusters
kclusters = 5

mumbai_grouped_clustering = mumbai_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mumbai_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 3, 0, 0, 0, 0, 0, 0], dtype=int32)

In [37]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

mumbai_merged = mumbai_neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
mumbai_merged = mumbai_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='neighborhood')
mumbai_merged.head() # check the last columns!

Unnamed: 0,pincode,region,neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,400001,Mumbai,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",19.093878,72.862683,0.0,Airport,Airport Lounge,Café,Zoo,Falafel Restaurant,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market
1,400002,Mumbai,"Kalbadevi , Ramwadi , S. C. Court , Thakurdwar",18.955026,72.833389,0.0,Indian Restaurant,Smoke Shop,Asian Restaurant,Juice Bar,Chinese Restaurant,Café,Food,BBQ Joint,Indian Sweet Shop,Market
2,400003,Mumbai,"B.P.Lane , Mandvi (Mumbai), Masjid , Null Bazar",18.954353,72.818101,0.0,Indian Restaurant,Ice Cream Shop,Juice Bar,Fast Food Restaurant,Pizza Place,Harbor / Marina,Italian Restaurant,Train Station,Breakfast Spot,Sandwich Place
3,400004,Mumbai,"Ambewadi (Mumbai), Charni Road , Chaupati , G...",18.907265,72.807068,0.0,Cricket Ground,Ice Cream Shop,Gym,Beach,Garden,Falafel Restaurant,Flower Shop,Flea Market,Fishing Spot,Fish Market
4,400005,Mumbai,"Asvini , Colaba Bazar , Colaba , Holiday Camp ...",18.951834,72.801253,0.0,Gym,Restaurant,Hotel,Vegetarian / Vegan Restaurant,Convenience Store,Food Truck,Dessert Shop,Electronics Store,Fishing Spot,Fish Market


#### Finding out the unique values in cluster labels 

Since, there are some NaN values for some addresses, it shows there are no venues near 500m radius. And, they do no belong to any of the clusters.

In [39]:
mumbai_merged['Cluster Labels'].unique().tolist()

[0.0, 4.0, nan, 1.0, 2.0, 3.0]

#### Dropping the rows that has NaN values in there Cluster Labels

In [40]:
mumbai_merged = mumbai_merged.dropna()
mumbai_merged = mumbai_merged.reset_index()
mumbai_merged

Unnamed: 0,index,pincode,region,neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,400001,Mumbai,"Bazargate , Elephanta Caves Po , M.P.T. , Stoc...",19.093878,72.862683,0.0,Airport,Airport Lounge,Café,Zoo,Falafel Restaurant,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market
1,1,400002,Mumbai,"Kalbadevi , Ramwadi , S. C. Court , Thakurdwar",18.955026,72.833389,0.0,Indian Restaurant,Smoke Shop,Asian Restaurant,Juice Bar,Chinese Restaurant,Café,Food,BBQ Joint,Indian Sweet Shop,Market
2,2,400003,Mumbai,"B.P.Lane , Mandvi (Mumbai), Masjid , Null Bazar",18.954353,72.818101,0.0,Indian Restaurant,Ice Cream Shop,Juice Bar,Fast Food Restaurant,Pizza Place,Harbor / Marina,Italian Restaurant,Train Station,Breakfast Spot,Sandwich Place
3,3,400004,Mumbai,"Ambewadi (Mumbai), Charni Road , Chaupati , G...",18.907265,72.807068,0.0,Cricket Ground,Ice Cream Shop,Gym,Beach,Garden,Falafel Restaurant,Flower Shop,Flea Market,Fishing Spot,Fish Market
4,4,400005,Mumbai,"Asvini , Colaba Bazar , Colaba , Holiday Camp ...",18.951834,72.801253,0.0,Gym,Restaurant,Hotel,Vegetarian / Vegan Restaurant,Convenience Store,Food Truck,Dessert Shop,Electronics Store,Fishing Spot,Fish Market
5,5,400006,Mumbai,Malabar Hill,18.964178,72.817092,0.0,Indian Restaurant,Electronics Store,Coffee Shop,Bowling Alley,Food Truck,Lounge,Fast Food Restaurant,Farmers Market,Chinese Restaurant,Bakery
6,6,400007,Mumbai,"Bharat Nagar (Mumbai), Grant Road , N.S.Patka...",18.962274,72.822632,4.0,Indian Restaurant,Ice Cream Shop,Nightclub,Zoo,Event Space,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field
7,7,400008,Mumbai,"Falkland Road , J.J.Hospital , Kamathipura , M...",18.955919,72.835719,0.0,Indian Restaurant,Smoke Shop,Flea Market,Café,Chinese Restaurant,Rest Area,Juice Bar,Indian Sweet Shop,Convenience Store,Grocery Store
8,8,400009,Mumbai,"Chinchbunder , Noor Baug , Princess Dock",18.967006,72.847926,0.0,Convenience Store,Indian Restaurant,Government Building,Bakery,Event Space,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field
9,9,400010,Mumbai,"Dockyard Road , Mazgaon Dock , Mazgaon Road , ...",18.975124,72.82439,0.0,Platform,Coffee Shop,Bakery,Athletics & Sports,Department Store,Dessert Shop,Flower Shop,Flea Market,Fishing Spot,Fish Market


#### Finally, let's visualize the clusters

In [41]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mumbai_merged['Latitude'], mumbai_merged['Longitude'], mumbai_merged['neighborhood'], mumbai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Cluster 1

In [42]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 0, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,pincode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,400001,72.862683,0.0,Airport,Airport Lounge,Café,Zoo,Falafel Restaurant,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market
1,400002,72.833389,0.0,Indian Restaurant,Smoke Shop,Asian Restaurant,Juice Bar,Chinese Restaurant,Café,Food,BBQ Joint,Indian Sweet Shop,Market
2,400003,72.818101,0.0,Indian Restaurant,Ice Cream Shop,Juice Bar,Fast Food Restaurant,Pizza Place,Harbor / Marina,Italian Restaurant,Train Station,Breakfast Spot,Sandwich Place
3,400004,72.807068,0.0,Cricket Ground,Ice Cream Shop,Gym,Beach,Garden,Falafel Restaurant,Flower Shop,Flea Market,Fishing Spot,Fish Market
4,400005,72.801253,0.0,Gym,Restaurant,Hotel,Vegetarian / Vegan Restaurant,Convenience Store,Food Truck,Dessert Shop,Electronics Store,Fishing Spot,Fish Market
5,400006,72.817092,0.0,Indian Restaurant,Electronics Store,Coffee Shop,Bowling Alley,Food Truck,Lounge,Fast Food Restaurant,Farmers Market,Chinese Restaurant,Bakery
7,400008,72.835719,0.0,Indian Restaurant,Smoke Shop,Flea Market,Café,Chinese Restaurant,Rest Area,Juice Bar,Indian Sweet Shop,Convenience Store,Grocery Store
8,400009,72.847926,0.0,Convenience Store,Indian Restaurant,Government Building,Bakery,Event Space,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field
9,400010,72.82439,0.0,Platform,Coffee Shop,Bakery,Athletics & Sports,Department Store,Dessert Shop,Flower Shop,Flea Market,Fishing Spot,Fish Market
10,400011,72.840281,0.0,Indian Restaurant,Gym / Fitness Center,Coffee Shop,Chinese Restaurant,Restaurant,Roof Deck,Cafeteria,Lounge,Stadium,Bar


#### Cluster 2

In [43]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 1, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,pincode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,400081,72.904066,1.0,Café,Zoo,Falafel Restaurant,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field,Fast Food Restaurant
112,401302,72.904066,1.0,Café,Zoo,Falafel Restaurant,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field,Fast Food Restaurant


#### Cluster 3

In [44]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 2, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,pincode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
100,400710,72.79301,2.0,Fishing Spot,Zoo,Food Service,Food,Flower Shop,Flea Market,Fish Market,Field,Fast Food Restaurant,Farmers Market


#### Cluster 4

In [45]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 3, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,pincode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
120,402105,73.134556,3.0,Movie Theater,Zoo,Food Court,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field,Fast Food Restaurant,Farmers Market
147,421505,73.134556,3.0,Movie Theater,Zoo,Food Court,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field,Fast Food Restaurant,Farmers Market
148,421506,73.134556,3.0,Movie Theater,Zoo,Food Court,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field,Fast Food Restaurant,Farmers Market


#### Cluster 5

In [46]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 4, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,pincode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,400007,72.822632,4.0,Indian Restaurant,Ice Cream Shop,Nightclub,Zoo,Event Space,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field
12,400013,72.845617,4.0,Indian Restaurant,Hotel,Optical Shop,Zoo,Event Space,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field
43,400058,72.882471,4.0,Indian Restaurant,Hotel,Food,Zoo,Event Space,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field
50,400066,72.852705,4.0,Indian Restaurant,Sandwich Place,Bakery,Zoo,Falafel Restaurant,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field
51,400067,72.882454,4.0,Indian Restaurant,Flea Market,Dessert Shop,Middle Eastern Restaurant,Mughlai Restaurant,Zoo,Falafel Restaurant,Flower Shop,Fishing Spot,Fish Market
58,400075,72.938062,4.0,Indian Restaurant,Ice Cream Shop,Restaurant,Zoo,Event Space,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field
71,400089,72.928753,4.0,Playground,Indian Restaurant,Electronics Store,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field,Fast Food Restaurant,Farmers Market
84,400104,72.952161,4.0,Indian Restaurant,Zoo,Event Space,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field,Fast Food Restaurant
92,400612,72.935873,4.0,Indian Restaurant,Zoo,Event Space,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market,Field,Fast Food Restaurant
93,400614,73.006064,4.0,Indian Restaurant,Hotel,Farmers Market,Market,Middle Eastern Restaurant,Zoo,Food,Flower Shop,Flea Market,Fishing Spot


In the next part we will explore the Delhi neighborhoods and we will cluster the neighborhoods.