# IBM Applied D.S Capstone Project (Week 5)

Content in Notebook

1. Import libraries
2. Scrap data from webpage into a DataFrame
   A. Data Preprocessing
   B. Output as Prediction file ( .csv)
3. Define Foursquare Credentials and Version
4. Top 225 venues that are within a radius of 600 meters for each post office
   A. Data Preprocessing
   B. Output as Prediction file ( .csv)
5. Analyze Each Postal Office For Venue Category
6. List and display the top 5 existing facilities for each Pin Code
7. Exploratory Visualization 1
8. Feature Engineering for Business Problem
    A. Simplification
    B. Feature Selection
    C. Handling Categorical Data (One Hot Encoding)
9. Potential area for the development of different infrastructure
10. Best place to stay with vital infrastructure facilities nearby
11. Clustering And Exploratory Visualization 2
12. Examine Clusters
13. Observations:
14. Acknowledgments

# 1. Import libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd
import folium


In [2]:
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import requests


In [3]:
import lxml.html as lh

In [4]:
from sklearn.cluster import KMeans
print("Libraries imported.")

Libraries imported.


# 2. Checking & Reading the 'mumbailatlong' file directly to avoid data scraping.

In [5]:
clean_df = pd.read_csv('https://raw.githubusercontent.com/kushagravarshney/Coursera_Capstone/main/Week%205/mumbailatlong.csv',index_col='Unnamed: 0')
clean_df.head()

Unnamed: 0,City,Post Office,Pin Code,Latitude,Longitude
0,Mumbai,August Kranti Marg,400036,18.963549,72.809989
1,Mumbai,Aarey Milk Colony,400065,19.156129,72.870722
2,Mumbai,Andheri (East),400069,19.115883,72.854202
3,Mumbai,Andheri (West),400058,19.117249,72.833968
4,Mumbai,Antop Hill,400037,19.020761,72.865256


# 3. Define Foursquare Credentials and Version

In [7]:
# define Foursquare Credentials and Version
CLIENT_ID = '2AIIEPNRD1MSA1U4LQTR2HFRBY4EWKIIXF0IDM5NFSDCQ22P' # your Foursquare ID
CLIENT_SECRET = 'ZMNIRSPT2VJHW4VFHFBMTLND5DMUB3PMGUIWQZEXSH4MCRXS' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
clean_df_new = clean_df.copy()

Your credentails:
CLIENT_ID: 2AIIEPNRD1MSA1U4LQTR2HFRBY4EWKIIXF0IDM5NFSDCQ22P
CLIENT_SECRET:ZMNIRSPT2VJHW4VFHFBMTLND5DMUB3PMGUIWQZEXSH4MCRXS


# 4. Checking & Reading the 'mumbaiexplore' file directly to avoid api calling.

In [8]:
venues_df = pd.read_csv('https://raw.githubusercontent.com/kushagravarshney/Coursera_Capstone/main/Week%205/mumbaiexplore.csv',index_col='Unnamed: 0')
venues_df.head()

Unnamed: 0,Post Office,Pin Code,Latitude,Longitude,City,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,August Kranti Marg,400036,18.963549,72.809989,Mumbai,Doolally Taproom,18.963809,72.807695,Brewery
1,August Kranti Marg,400036,18.963549,72.809989,Mumbai,symphony,18.963347,72.810251,Restaurant
2,August Kranti Marg,400036,18.963549,72.809989,Mumbai,Crossword,18.963474,72.807773,Bookstore
3,August Kranti Marg,400036,18.963549,72.809989,Mumbai,Swati Snacks,18.966442,72.813531,Indian Restaurant
4,August Kranti Marg,400036,18.963549,72.809989,Mumbai,Gustoso,18.964198,72.807726,Pizza Place


In [11]:
venues_df.groupby(['Post Office', 'Pin Code', 'City']).count().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Post Office,Pin Code,City,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Aarey Milk Colony,400065,Mumbai,2,2,2,2,2,2
Agashi,401301,Thane,3,3,3,3,3,3
Airoli Mode,400708,Navi Mumbai,3,3,3,3,3,3
Andheri (East),400069,Mumbai,14,14,14,14,14,14
Andheri (West),400058,Mumbai,22,22,22,22,22,22


In [10]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 191 uniques categories.


In [12]:
venues_df['VenueCategory'].unique()[:20]

array(['Brewery', 'Restaurant', 'Bookstore', 'Indian Restaurant',
       'Pizza Place', 'Dessert Shop', 'History Museum',
       'Salon / Barbershop', 'Hotel', 'Bakery', 'Café', 'Coffee Shop',
       'Bar', 'Concert Hall', 'Italian Restaurant', 'Park', 'Lounge',
       'Deli / Bodega', 'Sandwich Place', 'Clothing Store'], dtype=object)

# 5. Analyze Each Postal Office For Venue Category

In [13]:
# one hot encoding
mumbai_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add postal, borough and neighborhood column back to dataframe
mumbai_onehot['Post Office'] = venues_df['Post Office'] 
mumbai_onehot['Pin Code'] = venues_df['Pin Code'] 
mumbai_onehot['City'] = venues_df['City'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(mumbai_onehot.columns[-3:]) + list(mumbai_onehot.columns[:-3])
mumbai_onehot = mumbai_onehot[fixed_columns]

print(mumbai_onehot.shape)
mumbai_onehot.head()

(2340, 194)


Unnamed: 0,Post Office,Pin Code,City,Accessories Store,Airport Terminal,American Restaurant,Amphitheater,Aquarium,Arcade,Art Gallery,...,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Store,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,August Kranti Marg,400036,Mumbai,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,August Kranti Marg,400036,Mumbai,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,August Kranti Marg,400036,Mumbai,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,August Kranti Marg,400036,Mumbai,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,August Kranti Marg,400036,Mumbai,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [14]:
mumbai_grouped = mumbai_onehot.groupby(["Post Office", "Pin Code", "City"]).mean().reset_index()

print(mumbai_grouped.shape)
mumbai_grouped.head()

(132, 194)


Unnamed: 0,Post Office,Pin Code,City,Accessories Store,Airport Terminal,American Restaurant,Amphitheater,Aquarium,Arcade,Art Gallery,...,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Store,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Aarey Milk Colony,400065,Mumbai,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agashi,401301,Thane,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Airoli Mode,400708,Navi Mumbai,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Andheri (East),400069,Mumbai,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0
4,Andheri (West),400058,Mumbai,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0


# 6. List and display the top 5 existing facilities for each Pin Code

In [16]:
num_top_venues = 5
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ["Post Office", "Pin Code", "City"]
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Post Office'] = mumbai_grouped['Post Office']
neighborhoods_venues_sorted['Pin Code'] = mumbai_grouped['Pin Code']
neighborhoods_venues_sorted['City'] = mumbai_grouped['City']

for ind in np.arange(mumbai_grouped.shape[0]):
    row_categories = mumbai_grouped.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighborhoods_venues_sorted.iloc[ind, 3:] = row_categories_sorted.index.values[0:num_top_venues]

# neighborhoods_venues_sorted.sort_values(freqColumns, inplace=True)
print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

(132, 8)


Unnamed: 0,Post Office,Pin Code,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Aarey Milk Colony,400065,Mumbai,Fast Food Restaurant,Lake,Accessories Store,Miscellaneous Shop,Monument / Landmark
1,Agashi,401301,Thane,Cheese Shop,Restaurant,Scenic Lookout,Accessories Store,New American Restaurant
2,Airoli Mode,400708,Navi Mumbai,Cocktail Bar,Restaurant,Garden,Accessories Store,New American Restaurant
3,Andheri (East),400069,Mumbai,Hotel,Chinese Restaurant,Indian Restaurant,Food Truck,Camera Store
4,Andheri (West),400058,Mumbai,Indian Restaurant,Pub,Gym / Fitness Center,Café,Coffee Shop


In [17]:
# Set manually to get proper fit in the map
address = 'Mumbai'
latitude = 19.0760
longitude = 72.8777
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of Mumbai are 19.076, 72.8777.


In [18]:
mumbai_merged = clean_df.copy()
mumbai_merged = mumbai_merged.join(neighborhoods_venues_sorted[["Pin Code", "1st Most Common Venue"]].set_index("Pin Code"), on="Pin Code")
print(mumbai_merged.shape)
mumbai_merged.head()

(136, 6)


Unnamed: 0,City,Post Office,Pin Code,Latitude,Longitude,1st Most Common Venue
0,Mumbai,August Kranti Marg,400036,18.963549,72.809989,Pizza Place
1,Mumbai,Aarey Milk Colony,400065,19.156129,72.870722,Fast Food Restaurant
2,Mumbai,Andheri (East),400069,19.115883,72.854202,Hotel
3,Mumbai,Andheri (West),400058,19.117249,72.833968,Indian Restaurant
4,Mumbai,Antop Hill,400037,19.020761,72.865256,Fast Food Restaurant


# 7. Exploratory Visualization

In [19]:
import folium
my_map = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label1,common in zip(mumbai_merged['Latitude'], mumbai_merged['Longitude'], mumbai_merged['Post Office'],mumbai_merged['1st Most Common Venue'] ):
    labelnew =  'Post office : {} , Top Existing Infrastructure  : {}'.format(label1,common)
    label = folium.Popup( labelnew, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(my_map)  
my_map

# 8. Feature Engineering for Business Problem

In [20]:
venues_df['VenueCategory'].unique()

array(['Brewery', 'Restaurant', 'Bookstore', 'Indian Restaurant',
       'Pizza Place', 'Dessert Shop', 'History Museum',
       'Salon / Barbershop', 'Hotel', 'Bakery', 'Café', 'Coffee Shop',
       'Bar', 'Concert Hall', 'Italian Restaurant', 'Park', 'Lounge',
       'Deli / Bodega', 'Sandwich Place', 'Clothing Store',
       'Chinese Restaurant', 'Farmers Market', 'Diner', 'Food Court',
       'Cosmetics Shop', 'Food Truck', 'Fast Food Restaurant', 'Lake',
       'Shopping Mall', 'Camera Store', 'Smoke Shop', 'Bus Station',
       'Vegetarian / Vegan Restaurant', 'Pub', 'Gym / Fitness Center',
       'Bagel Shop', 'Nightclub', 'Snack Place', 'Ice Cream Shop',
       'Pharmacy', 'Grocery Store', 'Trail', 'Multiplex',
       'Parsi Restaurant', 'Irani Cafe', 'Plaza', 'Seafood Restaurant',
       'Hostel', 'Train Station', 'Outdoors & Recreation', 'Sports Club',
       'General Entertainment', 'Gym', 'College Auditorium',
       'French Restaurant', 'Gourmet Shop', 'Salad Place',
     

In [21]:
# Quality Infrastructure 
search_query= ['Restaurant', 'Hotel', 'Farmers Market', 'Shopping Mall', 'Gym / Fitness Center', 'Pharmacy',
                         'Electronics Store', 'Indie Movie Theater', 'Light Rail Station','Metro Station', 'Train','Train Station', 'Garden',
                          'Theater','ATM', 'Office', 'Bus Station', 'Bank', 'Market' , 'Business Service', 'Monument / Landmark' ,
                          'Resort', 'Hospital', 'Police Station', 'School', 'College', 'Café' , 'Park', 'Playground',
                'Convention Center', 'College Auditorium', 'Government Building', 'Airport Terminal',
                         ]
print(len(search_query))

33


In [22]:
quality_dataframe = []
quality_dataframe= venues_df.loc[venues_df['VenueCategory'].isin(search_query)]
quality_dataframe.shape

(473, 9)

In [23]:
quality_dataframe

Unnamed: 0,Post Office,Pin Code,Latitude,Longitude,City,VenueName,VenueLatitude,VenueLongitude,VenueCategory
1,August Kranti Marg,400036,18.963549,72.809989,Mumbai,symphony,18.963347,72.810251,Restaurant
8,August Kranti Marg,400036,18.963549,72.809989,Mumbai,Krishna Palace Residency Hotel,18.962266,72.813960,Hotel
11,August Kranti Marg,400036,18.963549,72.809989,Mumbai,Moshe's,18.963438,72.807810,Café
16,August Kranti Marg,400036,18.963549,72.809989,Mumbai,August Kranti Maidan,18.963433,72.810083,Park
18,August Kranti Marg,400036,18.963549,72.809989,Mumbai,di bella,18.965556,72.807004,Café
...,...,...,...,...,...,...,...,...,...
2327,B A R C,400085,19.016700,72.850000,Mumbai,Ramee Guestline Hotel,19.017085,72.844703,Hotel
2330,B A R C,400085,19.016700,72.850000,Mumbai,wadala bus depot,19.014879,72.852001,Bus Station
2332,B A R C,400085,19.016700,72.850000,Mumbai,Agyari Gardens,19.018896,72.852526,Park
2334,Talasari,401606,19.916700,73.233300,Thane,Jawhar Bus Station,19.919062,73.230400,Bus Station


In [24]:
# one hot encoding
qualitymumbai_onehot = pd.get_dummies(quality_dataframe[['VenueCategory']], prefix="", prefix_sep="")
# add postal, borough and neighborhood column back to dataframe
qualitymumbai_onehot['Post Office'] = quality_dataframe['Post Office'] 
qualitymumbai_onehot['Pin Code'] = quality_dataframe['Pin Code'] 
qualitymumbai_onehot['City'] = quality_dataframe['City'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(qualitymumbai_onehot.columns[-3:]) + list(qualitymumbai_onehot.columns[:-3])
qualitymumbai_onehot = qualitymumbai_onehot[fixed_columns]

print(qualitymumbai_onehot.shape)
qualitymumbai_onehot.head()
print(qualitymumbai_onehot.columns.values)

(473, 27)
['Post Office' 'Pin Code' 'City' 'Airport Terminal' 'Bank' 'Bus Station'
 'Business Service' 'Café' 'College Auditorium' 'Electronics Store'
 'Farmers Market' 'Garden' 'Government Building' 'Gym / Fitness Center'
 'Hotel' 'Indie Movie Theater' 'Light Rail Station' 'Market'
 'Monument / Landmark' 'Park' 'Pharmacy' 'Playground' 'Resort'
 'Restaurant' 'Shopping Mall' 'Theater' 'Train Station']


In [28]:
# qualitymumbai_grouped = qualitymumbai_onehot.groupby(["Post Office", "Pin Code", "City"]).sum().reset_index()
print(qualitymumbai_grouped.shape)
qualitymumbai_grouped.head()

(111, 28)


Unnamed: 0,Post Office,Pin Code,City,Airport Terminal,Bank,Bus Station,Business Service,Café,College Auditorium,Electronics Store,...,Monument / Landmark,Park,Pharmacy,Playground,Resort,Restaurant,Shopping Mall,Theater,Train Station,Total infrastructure
0,Agashi,401301,Thane,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,1
1,Airoli Mode,400708,Navi Mumbai,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,2
2,Andheri (East),400069,Mumbai,0,0,1,0,0,0,0,...,0,0,0,0,0,0,1,0,0,4
3,Andheri (West),400058,Mumbai,0,0,0,0,2,0,0,...,0,0,1,0,0,1,0,0,0,6
4,Anu Shakti Nagar,400094,Mumbai,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,1


In [26]:
qualitymumbai_grouped['Total infrastructure'] =  qualitymumbai_grouped[qualitymumbai_grouped.drop(['Post Office','Pin Code','City'], axis=1).columns.values].sum(axis=1)

In [27]:
qualitymumbai_grouped.shape

(111, 28)

# What are the best locations in Mumbai as per infrastructure?

In [29]:
qualitymumbai_grouped[qualitymumbai_grouped['Total infrastructure'] == qualitymumbai_grouped['Total infrastructure'].max()].transpose()

Unnamed: 0,9
Post Office,Bandra (West)
Pin Code,400050
City,Mumbai
Airport Terminal,0
Bank,0
Bus Station,0
Business Service,0
Café,10
College Auditorium,1
Electronics Store,1


# Which all areas lack the infrastructure facilities?

In [30]:
badquality = qualitymumbai_grouped[qualitymumbai_grouped['Total infrastructure'] == qualitymumbai_grouped['Total infrastructure'].min()]
badquality

Unnamed: 0,Post Office,Pin Code,City,Airport Terminal,Bank,Bus Station,Business Service,Café,College Auditorium,Electronics Store,...,Monument / Landmark,Park,Pharmacy,Playground,Resort,Restaurant,Shopping Mall,Theater,Train Station,Total infrastructure
0,Agashi,401301,Thane,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,1
4,Anu Shakti Nagar,400094,Mumbai,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,1
12,Bassien,401201,Thane,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,1
16,Bhandup (East),400042,Mumbai,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,1
19,Bhayander (East),401105,Thane,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,1
20,Boisar,401501,Thane,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,1
35,Ghansoli,400701,Navi Mumbai,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,1
43,Jacob Circle,400011,Mumbai,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,1
44,Jakegram,400606,Thane,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,1
45,Jawhar,401603,Thane,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1


# 9. Potential area for the development of infrastructure of different kinds


1. Write your choice of infrastructur for which postal area has highest potential

In [31]:
yourchoiceinfra = 'Restaurant' # Select your choice of infrastructue from VenueCategory
badqualitychoice = qualitymumbai_grouped[qualitymumbai_grouped[yourchoiceinfra] == qualitymumbai_grouped[yourchoiceinfra].min()]
badqualitychoice['Post Office']

2                  Andheri (East)
4                Anu Shakti Nagar
6                         B A R C
7                  Ballard Estate
11                    Barve Nagar
                  ...            
102                         Vashi
103    Veer Jijamata Bhosle Udyan
106                      Vikhroli
109                        Wadala
110                         Worli
Name: Post Office, Length: 65, dtype: object

2. Write your choice of area for which one infrastructure has highest potential

In [32]:
yourchoicearea = 'Mantralaya'   # Change with the name of postal area where you want to see potential
infraqualitychoice = qualitymumbai_grouped[qualitymumbai_grouped['Post Office'] == yourchoicearea].transpose()
infraqualitychoice = infraqualitychoice.reset_index()

In [33]:
print("These are infrastructures with highest potential in" , yourchoicearea, "area : " )
for i in range(len(infraqualitychoice)) : 
    if (infraqualitychoice.iloc[i, 1] == 0):
        print(infraqualitychoice.iloc[i, 0])

These are infrastructures with highest potential in Mantralaya area : 
Airport Terminal
Bank
Bus Station
Business Service
College Auditorium
Farmers Market
Garden
Government Building
Indie Movie Theater
Light Rail Station
Market
Monument / Landmark
Park
Pharmacy
Playground
Resort
Train Station


Recheck your choice infrastructure at your choice postal office

In [34]:
from pandas.io.json import json_normalize
search_query = 'School'
LIMIT = 5
radius = 500
latitude = 19.016700   # Latitude of your choice postal office from 
longitude = 72.850000  # Longitude of your choice postal office from 
VERSION = 20180604
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6034169c6db72d086aa8434a'},
 'response': {'venues': [{'id': '4f426fd2e4b0085fef97baaa',
    'name': 'Auxilium Convent School',
    'location': {'address': 'Wadala (W)',
     'crossStreet': 'Off Katrak Rd',
     'lat': 19.01620168738538,
     'lng': 72.85447591009165,
     'labeledLatLngs': [{'label': 'display',
       'lat': 19.01620168738538,
       'lng': 72.85447591009165}],
     'distance': 474,
     'postalCode': '400031',
     'cc': 'IN',
     'city': 'Mumbai',
     'state': 'Mahārāshtra',
     'country': 'India',
     'formattedAddress': ['Wadala (W) (Off Katrak Rd)',
      'Mumbai 400031',
      'Mahārāshtra',
      'India']},
    'categories': [{'id': '4bf58dd8d48988d1ab941735',
      'name': 'Student Center',
      'pluralName': 'Student Centers',
      'shortName': 'Student Center',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/education/studentcenter_',
       'suffix': '.png'},
      'primary': True}],
    'referralId

In [35]:
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
dataframe = json_normalize(venues)



In [36]:
# keep only columns that include venue name, and anything that is associated with location
clean_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')]+ ['id']
clean_dataframe = dataframe.loc[:,clean_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
clean_dataframe['categories'] = clean_dataframe.apply(get_category_type, axis=1)

# clean column names by keeping only last term
clean_dataframe.columns = [column.split('.')[-1] for column in clean_dataframe.columns]

clean_dataframe.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,id
0,Auxilium Convent School,Student Center,Wadala (W),Off Katrak Rd,19.016202,72.854476,"[{'label': 'display', 'lat': 19.01620168738538...",474,400031.0,IN,Mumbai,Mahārāshtra,India,"[Wadala (W) (Off Katrak Rd), Mumbai 400031, Ma...",4f426fd2e4b0085fef97baaa
1,King George School,High School,Hindu Colony,,19.021661,72.848383,"[{'label': 'display', 'lat': 19.02166087460775...",577,,IN,Mumbai,Mahārāshtra,India,"[Hindu Colony, Mumbai, Mahārāshtra, India]",4dbb90906e810768bf48c704
2,School,Law School,,,19.013098,72.849128,"[{'label': 'display', 'lat': 19.01309812095294...",411,,IN,Mumbai,Mahārāshtra,India,"[Mumbai, Mahārāshtra, India]",4f19aa3be4b0a9e6db8d7446
3,Good Luck Motor Training School,Automotive Shop,,,19.016811,72.84686,"[{'label': 'display', 'lat': 19.01681065035709...",330,,IN,,,India,[India],51bfeeee498e9cf446697ed5
4,Dadar Parsee Youths Assembly High School,College Academic Building,"Dadar Parsi Colony, Dadar East",,19.017843,72.852982,"[{'label': 'display', 'lat': 19.01784324645996...",338,400014.0,IN,Mumbai,Mahārāshtra,India,"[Dadar Parsi Colony, Dadar East, Mumbai 400014...",52f4ef6511d2b5a364b6975a


# 10. Best place to stay within a city for vital infrastructure facilities

In [38]:
quality_infra_mumbai2 = pd.read_csv('https://raw.githubusercontent.com/kushagravarshney/Coursera_Capstone/main/Week%205/essentialinfra.csv',index_col='Unnamed: 0')
quality_infra_mumbai2.head()

Unnamed: 0,Post Office,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,August Kranti Marg,18.963549,72.809989,Cumballa Hill Heart hospital,18.963778,72.809091,Emergency Room
1,Andheri (East),19.115883,72.854202,criticare hospital,19.118263,72.850639,Hospital
2,Andheri (West),19.117249,72.833968,Sujay Hospital,19.115915,72.83427,Hospital
3,Antop Hill,19.020761,72.865256,sai hospital,19.023059,72.862824,Hospital
4,Ballard Estate,18.936651,72.839132,Seaman Hospital,18.936524,72.83938,Medical Center


OR FIND ALL THE SEARCH QUERY IN ALL POSTAL OFFICE

In [40]:
search_query2= ['Hospital','Food', 'Hotel', 'Shopping Mall', 'Pharmacy', 
                         'Metro Station', 'Train Station', 'ATM', 'Office', 'Bus Station', 'Bank', 'Market' ,
                          'Police Station', 'School', 'College & University', 'Park'
 ]
categoryId = ['4bf58dd8d48988d104941735','4d4b7105d754a06374d81259', '4bf58dd8d48988d1fa931735', '4bf58dd8d48988d1fd941735', '4bf58dd8d48988d10f951735', 
             '4bf58dd8d48988d1fd931735', '4bf58dd8d48988d129951735', '52f2ab2ebcbc57f1066b8b56', '4bf58dd8d48988d124941735','4bf58dd8d48988d1fe931735',
             '4bf58dd8d48988d10a951735', '50be8ee891d4fa8dcc7199a7','4bf58dd8d48988d12e941735', '4bf58dd8d48988d13b941735', '4d4b7105d754a06372d81259',
             '4bf58dd8d48988d163941735']

In [41]:
from pandas.io.json import json_normalize
radius = 500
VERSION = 20180604
# Quality Infrastructure 
search_query2 = 'Park'
categoryId = '4bf58dd8d48988d163941735'
LIMIT = 1

In [42]:
def getNearbyVenues(names, lat1, long1, radius):
    venues_list=[]
    for name, lat, lng in zip(names, lat1, long1):

        # create the API request URL
        url1 = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}&locale={}&categoryId={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, VERSION, search_query2, radius, LIMIT,  'en', categoryId)
        # make the GET request
        results = requests.get(url1).json()["response"]["venues"]
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng'],
            v['categories'][0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])

    return(nearby_venues)

In [43]:
names=clean_df['Post Office']
latitudes=clean_df['Latitude']
longitudes=clean_df['Longitude']
all_venues = getNearbyVenues(names,latitudes, longitudes, radius )

In [44]:
# define the column names
all_venues.columns = ['Post Office','Latitude', 'Longitude','VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(all_venues.shape)
all_venues.head()

(23, 7)


Unnamed: 0,Post Office,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Bandra (West),19.058336,72.830267,D'Monte Park Recreation Centre,19.05963,72.825995,Event Space
1,Bangur Nagar,19.168814,72.833678,Jogger's park,19.164701,72.835153,Park
2,Colaba,18.915091,72.825969,Harish Mahindra Children's Park,18.914867,72.823688,Park
3,Cumballa Hill,18.969307,72.806538,"Amarsons Park, Breach Candy",18.972317,72.806327,Park
4,Ghatkopar (West),19.089719,72.904597,"Joggers park,L B S marg. Ghatkopar west",19.087582,72.901977,Park


In [45]:
quality_infra_mumbai = all_venues.copy()

In [46]:
quality_infra_mumbai2 = quality_infra_mumbai2.append(quality_infra_mumbai, ignore_index = True)
quality_infra_mumbai2.shape

(1042, 7)

In [47]:
quality_infra_mumbai2.tail(30)

Unnamed: 0,Post Office,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
1012,Ghansoli,19.119331,72.99951,CIDCO Park,19.121827,73.00301,Park
1013,Turbhe,19.076165,73.017662,Cidco Nature Park,19.07681,73.012673,Park
1014,Bhayandar,19.197152,72.811366,Kalpana Park,19.196491,72.814888,Park
1015,Boisar,19.209144,72.859183,Joggers Park (Thakur Zagadu Singh Charitable t...,19.210427,72.863646,Park
1016,Mira,19.282057,72.874144,sai baba nagar park,19.28389,72.870278,Park
1017,Thane (East),19.030826,73.019854,mother teresa park nerul east,19.028223,73.021927,Park
1018,Vesava (Versova),19.133736,72.814877,Park,19.137666,72.815483,Park
1019,Bandra (West),19.058336,72.830267,D'Monte Park Recreation Centre,19.05963,72.825995,Event Space
1020,Bangur Nagar,19.168814,72.833678,Jogger's park,19.164701,72.835153,Park
1021,Colaba,18.915091,72.825969,Harish Mahindra Children's Park,18.914867,72.823688,Park


In [48]:
quality_infra_mumbai2.to_csv('https://raw.githubusercontent.com/kushagravarshney/Coursera_Capstone/main/Week%205/essentialinfra.csv')

# CLUSTERING THE IMPORTED DATASET

In [49]:
quality_infra_mumbai2['VenueCategory'].unique()

array(['Emergency Room', 'Hospital', 'Medical Center', 'Eye Doctor',
       'Veterinarian', "Doctor's Office", 'Bus Line', 'Sandwich Place',
       'Fast Food Restaurant', 'Fried Chicken Joint', 'Bagel Shop',
       'Snack Place', 'Seafood Restaurant', 'Indian Restaurant',
       'Food Court', 'Chinese Restaurant', 'Food Truck', 'Cafeteria',
       'Pizza Place', 'Café', 'Middle Eastern Restaurant',
       'Vegetarian / Vegan Restaurant', 'Asian Restaurant', 'Restaurant',
       'Deli / Bodega', 'Burger Joint', 'Breakfast Spot',
       'Comfort Food Restaurant', 'Dessert Shop', 'Hotel',
       'Bed & Breakfast', 'Motel', 'Boarding House', 'Resort', 'Hostel',
       'Shopping Mall', 'Building', 'Pharmacy', 'Light Rail Station',
       'Metro Station', 'Train Station', 'Platform', 'Train', 'ATM',
       'Bank', 'Campaign Office', 'Office', 'Coworking Space',
       'Tech Startup', 'Conference Room', 'Bus Station', 'Market',
       'Flea Market', 'Police Station', 'High School',
       'E

In [50]:
# one hot encoding
quality_mumbai_onehot = pd.get_dummies(quality_infra_mumbai2[['VenueCategory']], prefix="", prefix_sep="")

# add postal, borough and neighborhood column back to dataframe
quality_mumbai_onehot['Post Office'] = quality_infra_mumbai2['Post Office'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(quality_mumbai_onehot.columns[-1:]) + list(quality_mumbai_onehot.columns[:-1])
quality_mumbai_onehot = quality_mumbai_onehot[fixed_columns]

print(quality_mumbai_onehot.shape)
quality_mumbai_onehot.head()

(1042, 81)


Unnamed: 0,Post Office,ATM,Adult Education Center,Asian Restaurant,Bagel Shop,Bank,Bed & Breakfast,Boarding House,Breakfast Spot,Building,...,School,Seafood Restaurant,Shopping Mall,Snack Place,Student Center,Tech Startup,Train,Train Station,Vegetarian / Vegan Restaurant,Veterinarian
0,August Kranti Marg,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Andheri (East),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Andheri (West),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Antop Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ballard Estate,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [51]:

qualitymumbai_grouped = quality_mumbai_onehot.groupby(["Post Office"]).sum().reset_index()

print(qualitymumbai_grouped.shape)
qualitymumbai_grouped.head()

(122, 81)


Unnamed: 0,Post Office,ATM,Adult Education Center,Asian Restaurant,Bagel Shop,Bank,Bed & Breakfast,Boarding House,Breakfast Spot,Building,...,School,Seafood Restaurant,Shopping Mall,Snack Place,Student Center,Tech Startup,Train,Train Station,Vegetarian / Vegan Restaurant,Veterinarian
0,Aarey Milk Colony,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Airoli Mode,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Andheri (East),1,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Andheri (West),1,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Antop Hill,1,0,0,1,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [52]:
qualitymumbai_grouped['Total infrastructure'] =  qualitymumbai_grouped[qualitymumbai_grouped.drop(['Post Office'], axis=1).columns.values].sum(axis=1)

In [53]:
qualitymumbai_groupedmax = qualitymumbai_grouped[qualitymumbai_grouped['Total infrastructure'] == qualitymumbai_grouped['Total infrastructure'].max()]
print("Best place to stay within a city for vital infrastructure facilities :")
qualitymumbai_groupedmax[['Post Office', 'Total infrastructure']]
print(qualitymumbai_groupedmax.shape)

Best place to stay within a city for vital infrastructure facilities :
(1, 82)


In [54]:
mumbai_merged2 = qualitymumbai_grouped.copy()
mumbai_merged2 = mumbai_merged2.join(clean_df[["Pin Code",'Latitude', 'Longitude', "Post Office" ]].set_index("Post Office"), on="Post Office")

In [55]:
fixed_columns = list(mumbai_merged2.columns[-3:]) + list(mumbai_merged2.columns[:-3])
mumbai_merged2 = mumbai_merged2[fixed_columns]

print(mumbai_merged2.shape)
mumbai_merged2.head()

(122, 85)


Unnamed: 0,Pin Code,Latitude,Longitude,Post Office,ATM,Adult Education Center,Asian Restaurant,Bagel Shop,Bank,Bed & Breakfast,...,Seafood Restaurant,Shopping Mall,Snack Place,Student Center,Tech Startup,Train,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Total infrastructure
0,400065,19.156129,72.870722,Aarey Milk Colony,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,400708,19.172979,73.003532,Airoli Mode,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
2,400069,19.115883,72.854202,Andheri (East),1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,8
3,400058,19.117249,72.833968,Andheri (West),1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,9
4,400037,19.020761,72.865256,Antop Hill,1,0,0,1,1,0,...,0,0,0,0,0,0,0,0,0,6


In [56]:
# set number of clusters
kclusters = 3

mumbai_2_grouped_clustering = mumbai_merged2[["Total infrastructure"]]

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mumbai_2_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 2, 2, 1, 1, 1, 2, 2, 1], dtype=int32)

In [57]:
mumbai_mergedfinal = mumbai_merged2.copy()
# add clustering labels
mumbai_mergedfinal["Cluster Labels"] = kmeans.labels_
print(mumbai_mergedfinal.shape)
mumbai_mergedfinal.head() # check the last columns!

(122, 86)


Unnamed: 0,Pin Code,Latitude,Longitude,Post Office,ATM,Adult Education Center,Asian Restaurant,Bagel Shop,Bank,Bed & Breakfast,...,Shopping Mall,Snack Place,Student Center,Tech Startup,Train,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Total infrastructure,Cluster Labels
0,400065,19.156129,72.870722,Aarey Milk Colony,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,1
1,400708,19.172979,73.003532,Airoli Mode,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,2,1
2,400069,19.115883,72.854202,Andheri (East),1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,8,2
3,400058,19.117249,72.833968,Andheri (West),1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,9,2
4,400037,19.020761,72.865256,Antop Hill,1,0,0,1,1,0,...,0,0,0,0,0,0,0,0,6,1


# 11. Exploratory Visualization 2

In [58]:
# Set manually to get proper fit in the map
address = 'Mumbai'
latitude = 19.0760
longitude = 72.8777
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of Mumbai are 19.076, 72.8777.


In [59]:
map_clusters  = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
rainbow = [    'red',    'blue',    'orange',    'darkgreen',    'darkblue',    'black']
# add markers to map
markers_colors = []
for lat, lng, label1,common, cluster in zip(mumbai_mergedfinal['Latitude'], mumbai_mergedfinal['Longitude'], mumbai_mergedfinal['Post Office'],mumbai_mergedfinal['Total infrastructure'] , mumbai_mergedfinal['Cluster Labels']):
    labelnew =  'Post office : {} , Total infrastructure : {}'.format(label1,common)
    label = folium.Popup( labelnew, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusters)
map_clusters

# 12. Examine Clusters

Cluster 0

In [60]:
mumbai_mergedfinal.loc[mumbai_mergedfinal['Cluster Labels'] == 0]

Unnamed: 0,Pin Code,Latitude,Longitude,Post Office,ATM,Adult Education Center,Asian Restaurant,Bagel Shop,Bank,Bed & Breakfast,...,Shopping Mall,Snack Place,Student Center,Tech Startup,Train,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Total infrastructure,Cluster Labels
12,400050,19.058336,72.830267,Bandra (West),1,0,0,0,1,0,...,1,0,0,0,0,0,0,0,12,0
18,400028,18.938771,72.835335,Bhavani Shankar Road,1,0,0,0,1,1,...,0,1,0,0,0,1,0,0,13,0
24,400093,19.115287,72.861808,Chakala MIDC,1,1,0,0,1,0,...,1,0,0,0,0,0,0,1,11,0
25,400071,19.061213,72.897591,Chembur,1,0,0,0,1,0,...,0,1,0,0,0,1,0,0,11,0
28,400039,18.938771,72.835335,Council Hall,1,0,0,0,1,1,...,0,1,0,0,0,1,0,0,13,0
29,400026,18.969307,72.806538,Cumballa Hill,1,0,0,0,1,0,...,1,1,0,0,0,0,0,0,14,0
30,400014,19.019282,72.842876,Dadar,1,0,0,0,1,0,...,1,0,0,0,0,1,0,0,13,0
34,400074,18.938771,72.835335,F C I Mumbai,1,0,0,0,1,1,...,0,1,0,0,0,1,0,0,13,0
35,401206,18.938771,72.835335,Ganeshpuri,1,0,0,0,1,1,...,0,1,0,0,0,1,0,0,13,0
37,400086,19.089719,72.904597,Ghatkopar (West),1,0,0,0,1,0,...,0,1,0,1,0,0,0,0,13,0


Cluster 1

In [61]:
mumbai_mergedfinal.loc[mumbai_mergedfinal['Cluster Labels'] == 1]

Unnamed: 0,Pin Code,Latitude,Longitude,Post Office,ATM,Adult Education Center,Asian Restaurant,Bagel Shop,Bank,Bed & Breakfast,...,Shopping Mall,Snack Place,Student Center,Tech Startup,Train,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Total infrastructure,Cluster Labels
0,400065,19.156129,72.870722,Aarey Milk Colony,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,1
1,400708,19.172979,73.003532,Airoli Mode,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,2,1
4,400037,19.020761,72.865256,Antop Hill,1,0,0,1,1,0,...,0,0,0,0,0,0,0,0,6,1
5,400094,19.037528,72.928146,Anu Shakti Nagar,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,1
6,401302,19.202322,73.002537,Arnala,1,0,0,0,1,0,...,0,0,0,0,1,0,0,0,4,1
9,400608,19.202322,73.002537,Balcum,1,0,0,0,1,0,...,0,0,0,0,1,0,0,0,4,1
14,400084,19.095283,72.900178,Barve Nagar,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,3,1
17,400042,19.148557,72.947066,Bhandup (East),1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,4,1
19,401101,19.197152,72.811366,Bhayandar,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,3,1
36,400701,19.119331,72.99951,Ghansoli,1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,6,1



Cluster 2

In [62]:
mumbai_mergedfinal.loc[mumbai_mergedfinal['Cluster Labels'] == 2]

Unnamed: 0,Pin Code,Latitude,Longitude,Post Office,ATM,Adult Education Center,Asian Restaurant,Bagel Shop,Bank,Bed & Breakfast,...,Shopping Mall,Snack Place,Student Center,Tech Startup,Train,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Total infrastructure,Cluster Labels
2,400069,19.115883,72.854202,Andheri (East),1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,8,2
3,400058,19.117249,72.833968,Andheri (West),1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,9,2
7,400036,18.963549,72.809989,August Kranti Marg,1,0,0,0,1,0,...,0,0,0,0,0,1,0,0,10,2
8,400085,19.0167,72.85,B A R C,1,0,0,0,1,0,...,0,1,1,0,0,0,0,0,10,2
10,400038,18.936651,72.839133,Ballard Estate,0,0,0,0,1,1,...,0,1,0,0,0,0,0,0,9,2
11,400051,19.061657,72.849811,Bandra (East),1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,10,2
13,400090,19.168814,72.833678,Bangur Nagar,1,0,0,0,1,0,...,0,0,0,1,0,0,0,0,8,2
15,400611,19.018987,73.039095,Belapur,0,0,0,0,2,0,...,0,0,0,0,0,1,0,0,9,2
16,400078,19.143868,72.938433,Bhandup,1,0,0,0,1,0,...,1,0,0,0,0,1,0,0,10,2
20,401501,19.209144,72.859183,Boisar,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,7,2


# 13. Observations:


Most of the infrastructures are concentrated in the Southern areas of Mumbai city, with the highest number in cluster 0 and moderate number in cluster 2. On the other hand, cluster 1 has a very low number of infrastructures in the neighborhoods. This represents a great opportunity and high potential areas to open new infrastructures as it is very little to no competition from existing varied infrastructures. Meanwhile, one can specifically check the infrastructure of choice against the postal office choice area.

A person who is planning to build infrastructure with unique selling propositions and lives prosperously to stand out from the competition can also open new infrastructures in neighborhoods in cluster 2 with moderate competition and supporting adequate no. of infrastructures. Lastly, people with planning to settle in the city are advised to start in cluster 0 which already has a high concentration of infrastructures.

# 14. Acknowledgement

Conclusion:
In this project, I have gone through the process of identifying the business problems, specifying the data required, extracting and preparing the data, visualizing the results, performing machine learning by clustering the data into 3 clusters based on their frequency similarities, tackling and reaching to a definitive solution to business problems (mentioned in results). Lastly, the project is providing recommendations to the relevant stakeholders i.e. business developers regarding the best locations to open a new infrastructure. The project also provides visitors and immigrants to the city regarding postal office areas for growth and living prosperously.