## IBM Applied Data Science Capstone Project

### Content in Notebook
1.Import libraries  
2.Import Data from csv  
3.Define Foursquare Credentials and Version  
4.Venues in the area  
5.Analyze each area for venue category  
6.Display top 5 existing facilities for each area  
7.Exploratory Visualization  
8.Feature Engineering for Business Problem  
9.Potential area for development of different infrastructures  
10.Best place to stay within a city for vital infrastructure facilities  
11.Exploratory Visualization 2  
12.Examine Clusters  
13.Observations  
14.Acknowledgments

### 1. Import Libraries

In [50]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd
import folium
import requests
import lxml.html as lh
import json
from sklearn.cluster import KMeans
print("Libraries imported.")

Libraries imported.


### 2.Import Data from CSV

In [2]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,City,Area,Postal Code,Latitude,Longitude
0,Singapore,West Coast,120381,1.318343,103.767821
1,Singapore,Jurong,608526,1.330026,103.742675
2,Singapore,Holland - Bukit Timah,279621,1.311813,103.788279
3,Singapore,Choa Chu Kang,680309,1.38504,103.7279
4,Singapore,Tanjong Pagar,168730,1.28613,103.809826


### 3: Foursquare Credentials and Version

In [49]:
CLIENT_ID = 'XXX' 
CLIENT_SECRET = 'XXX'
VERSION = '20180605' # Foursquare API version
import requests
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
clean_df_new = clean_df.copy()


Your credentails:
CLIENT_ID: XXX
CLIENT_SECRET:XXX


### 4: Venues in the area

In [4]:
radius = 600
LIMIT = 225
venues = []

for lat, long, pc, area, city in zip(clean_df_new['Latitude'], clean_df_new['Longitude'], clean_df_new['Postal Code'], clean_df_new['Area'], clean_df_new['City']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,        CLIENT_SECRET,        VERSION,        lat,        long,        radius,         LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            area,            pc,            lat,             long,          city,
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))
        venues_df = pd.DataFrame(venues)
        venues_df.head()

In [5]:
venues_df.columns = ['Area', 'Postal Code', 'Latitude', 'Longitude', 'City', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(487, 9)


Unnamed: 0,Area,Postal Code,Latitude,Longitude,City,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,West Coast,120381,1.318343,103.767821,Singapore,308 海鲜煮炒,1.321086,103.766526,Chinese Restaurant
1,West Coast,120381,1.318343,103.767821,Singapore,The Daily Scoop,1.323567,103.767714,Ice Cream Shop
2,West Coast,120381,1.318343,103.767821,Singapore,Fredo’s,1.322443,103.770365,Bakery
3,West Coast,120381,1.318343,103.767821,Singapore,Buttercake N Cream,1.321759,103.7698,Dessert Shop
4,West Coast,120381,1.318343,103.767821,Singapore,Classic Cakes,1.323566,103.767761,Bakery


In [6]:
venues_df.groupby(['Area', 'Postal Code', 'City']).count().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Area,Postal Code,City,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Aljunied,538808,Singapore,23,23,23,23,23,23
Ang Mo Kio,560723,Singapore,22,22,22,22,22,22
Bishan - Toa Payoh,311125,Singapore,4,4,4,4,4,4
Choa Chu Kang,680309,Singapore,12,12,12,12,12,12
East Coast,460124,Singapore,8,8,8,8,8,8


In [7]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 121 uniques categories.


In [8]:
venues_df['VenueCategory'].unique()[:20]

array(['Chinese Restaurant', 'Ice Cream Shop', 'Bakery', 'Dessert Shop',
       'Video Game Store', 'Fried Chicken Joint', 'Food Court',
       'Pet Store', 'Indian Restaurant', 'Seafood Restaurant', 'Park',
       'Trail', 'Soup Place', 'Coffee Shop', 'Café', 'Dim Sum Restaurant',
       'Asian Restaurant', 'Sandwich Place', 'Supermarket',
       'Electronics Store'], dtype=object)

### 5.Analyze each area for venue category

In [9]:
sg_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

sg_onehot['Area'] = venues_df['Area'] 
sg_onehot['Postal Code'] = venues_df['Postal Code'] 
sg_onehot['City'] = venues_df['City'] 

fixed_columns = list(sg_onehot.columns[-3:]) + list(sg_onehot.columns[:-3])
sg_onehot = sg_onehot[fixed_columns]

print(sg_onehot.shape)
sg_onehot.head()

(487, 124)


Unnamed: 0,Area,Postal Code,City,American Restaurant,Arcade,Asian Restaurant,BBQ Joint,Baby Store,Bagel Shop,Bakery,...,Sushi Restaurant,Tanning Salon,Thai Restaurant,Theater,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Yoga Studio,Zhejiang Restaurant
0,West Coast,120381,Singapore,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,West Coast,120381,Singapore,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,West Coast,120381,Singapore,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3,West Coast,120381,Singapore,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,West Coast,120381,Singapore,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0


In [10]:
sg_grouped = sg_onehot.groupby(["Area", "Postal Code", "City"]).mean().reset_index()

print(sg_grouped.shape)
sg_grouped.head()

(16, 124)


Unnamed: 0,Area,Postal Code,City,American Restaurant,Arcade,Asian Restaurant,BBQ Joint,Baby Store,Bagel Shop,Bakery,...,Sushi Restaurant,Tanning Salon,Thai Restaurant,Theater,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Yoga Studio,Zhejiang Restaurant
0,Aljunied,538808,Singapore,0.0,0.0,0.130435,0.0,0.0,0.0,0.043478,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Ang Mo Kio,560723,Singapore,0.0,0.0,0.136364,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455
2,Bishan - Toa Payoh,311125,Singapore,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
3,Choa Chu Kang,680309,Singapore,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,East Coast,460124,Singapore,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 6.Display top 5 existing facilities for each area

In [11]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ["Area", "Postal Code", "City"]
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Area'] = sg_grouped['Area']
neighborhoods_venues_sorted['Postal Code'] = sg_grouped['Postal Code']
neighborhoods_venues_sorted['City'] = sg_grouped['City']

for ind in np.arange(sg_grouped.shape[0]):
    row_categories = sg_grouped.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighborhoods_venues_sorted.iloc[ind, 3:] = row_categories_sorted.index.values[0:num_top_venues]

print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted

(16, 8)


Unnamed: 0,Area,Postal Code,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Aljunied,538808,Singapore,Coffee Shop,Food Court,Supermarket,Asian Restaurant,Pet Store
1,Ang Mo Kio,560723,Singapore,Asian Restaurant,Café,Zhejiang Restaurant,Chinese Restaurant,Bus Stop
2,Bishan - Toa Payoh,311125,Singapore,Theater,Gym,Movie Theater,Bus Station,Zhejiang Restaurant
3,Choa Chu Kang,680309,Singapore,Pet Store,Flower Shop,Garden Center,Lake,Farm
4,East Coast,460124,Singapore,Indian Restaurant,Food Court,Coffee Shop,Bus Stop,Supermarket
5,Holland - Bukit Timah,279621,Singapore,Japanese Restaurant,Food Court,Chinese Restaurant,Shopping Mall,American Restaurant
6,Jalan Besar,328127,Singapore,Café,Chinese Restaurant,Hotel,Coffee Shop,Italian Restaurant
7,Jurong,608526,Singapore,Japanese Restaurant,Café,Chinese Restaurant,Coffee Shop,Shopping Mall
8,Marine Parade,408600,Singapore,Food Court,Noodle House,Farmers Market,Chinese Restaurant,Dessert Shop
9,Marsiling - Yew Tee,689286,Singapore,Farm,Farmers Market,Zhejiang Restaurant,Hot Dog Joint,Dumpling Restaurant


In [12]:
address = 'Singapore'
latitude = 1.3521
longitude = 103.8198
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of Singapore are 1.3521, 103.8198.


In [13]:
sg_merged = clean_df.copy()
sg_merged = sg_merged.join(neighborhoods_venues_sorted[["Postal Code", "1st Most Common Venue"]].set_index("Postal Code"), on="Postal Code")
print(sg_merged.shape)
sg_merged

(16, 6)


Unnamed: 0,City,Area,Postal Code,Latitude,Longitude,1st Most Common Venue
0,Singapore,West Coast,120381,1.318343,103.767821,Coffee Shop
1,Singapore,Jurong,608526,1.330026,103.742675,Japanese Restaurant
2,Singapore,Holland - Bukit Timah,279621,1.311813,103.788279,Japanese Restaurant
3,Singapore,Choa Chu Kang,680309,1.38504,103.7279,Pet Store
4,Singapore,Tanjong Pagar,168730,1.28613,103.809826,Chinese Restaurant
5,Singapore,Marine Parade,408600,1.319291,103.877267,Food Court
6,Singapore,Jalan Besar,328127,1.317285,103.841774,Café
7,Singapore,Bishan - Toa Payoh,311125,1.337527,103.828221,Theater
8,Singapore,Tampines,528523,1.35296,103.938151,Coffee Shop
9,Singapore,Pasir Ris - Punggol,545025,1.392501,103.876812,Coffee Shop


### 7.Exploratory Visualization

In [14]:
my_map = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label1,common in zip(sg_merged['Latitude'], sg_merged['Longitude'], sg_merged['Area'],sg_merged['1st Most Common Venue'] ):
    labelnew =  'Area : {} , Top Existing Infrastructure  : {}'.format(label1,common)
    label = folium.Popup( labelnew, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(my_map)  
my_map

### Image link if map not displayed

https://github.com/jjtan444/Coursera_Capstone/blob/master/week5/results/Results1.JPG

### 8.Feature Engineering for Business Problem

In [15]:
venues_df['VenueCategory'].unique()

array(['Chinese Restaurant', 'Ice Cream Shop', 'Bakery', 'Dessert Shop',
       'Video Game Store', 'Fried Chicken Joint', 'Food Court',
       'Pet Store', 'Indian Restaurant', 'Seafood Restaurant', 'Park',
       'Trail', 'Soup Place', 'Coffee Shop', 'Café', 'Dim Sum Restaurant',
       'Asian Restaurant', 'Sandwich Place', 'Supermarket',
       'Electronics Store', 'Bookstore', 'Sushi Restaurant',
       'Fast Food Restaurant', 'Canal', 'Japanese Restaurant',
       'Steakhouse', 'American Restaurant', 'Pie Shop', 'Grocery Store',
       'Shopping Mall', 'Hookah Bar', 'Snack Place', 'Jewelry Store',
       'Food', 'Clothing Store', 'Furniture / Home Store', 'Skating Rink',
       'Movie Theater', 'Multiplex', 'Vegetarian / Vegan Restaurant',
       'Bubble Tea Shop', 'Italian Restaurant', 'Frozen Yogurt Shop',
       'Korean Restaurant', 'Burger Joint', 'German Restaurant',
       'Gym / Fitness Center', 'Dumpling Restaurant', 'Cafeteria',
       'Ramen Restaurant', 'Karaoke Bar', '

In [16]:
search_query= ['Restaurant', 'Hotel', 'Supermarket', 'Farmers Market', 'Shopping Mall', 'Gym', 'Gym / Fitness Center', 'Pharmacy',
                         'Electronics Store', 'Movie Theater', 'Light Rail Station','Metro Station', 'Train','Train Station', 'Garden',
                          'Theater','ATM', 'Office', 'Bus Station', 'Bank', 'Market' , 'Business Service', 'Monument / Landmark' ,
                          'Resort', 'Hospital', 'Police Station', 'School', 'College', 'Café' , 'Park', 'Playground',
                'Convention Center', 'College Auditorium', 'Government Building', 'Airport Terminal',
                         ]
print(len(search_query))


35


In [17]:
quality_dataframe = []
quality_dataframe= venues_df.loc[venues_df['VenueCategory'].isin(search_query)]
quality_dataframe

Unnamed: 0,Area,Postal Code,Latitude,Longitude,City,VenueName,VenueLatitude,VenueLongitude,VenueCategory
11,West Coast,120381,1.318343,103.767821,Singapore,Firefly Park @ Clementi,1.320385,103.764844,Park
17,West Coast,120381,1.318343,103.767821,Singapore,Summer Hill,1.321941,103.770151,Café
24,West Coast,120381,1.318343,103.767821,Singapore,Sunset Railway Cafe,1.323714,103.767606,Café
27,West Coast,120381,1.318343,103.767821,Singapore,Sheng Siong Supermarket,1.315091,103.771130,Supermarket
28,West Coast,120381,1.318343,103.767821,Singapore,Challenger,1.315205,103.764809,Electronics Store
...,...,...,...,...,...,...,...,...,...
467,Ang Mo Kio,560723,1.372402,103.829752,Singapore,Thus Coffee,1.372647,103.829545,Café
468,Ang Mo Kio,560723,1.372402,103.829752,Singapore,Lower Peirce Reservoir Park,1.370299,103.826565,Park
469,Ang Mo Kio,560723,1.372402,103.829752,Singapore,Yam's Kitchen,1.371326,103.828778,Restaurant
483,Ang Mo Kio,560723,1.372402,103.829752,Singapore,JCU Cafeteria,1.375415,103.829084,Café


In [18]:
qualitysg_onehot = pd.get_dummies(quality_dataframe[['VenueCategory']], prefix="", prefix_sep="")
qualitysg_onehot['Area'] = quality_dataframe['Area'] 
qualitysg_onehot['Postal Code'] = quality_dataframe['Postal Code'] 
qualitysg_onehot['City'] = quality_dataframe['City'] 

fixed_columns = list(qualitysg_onehot.columns[-3:]) + list(qualitysg_onehot.columns[:-3])
qualitysg_onehot = qualitysg_onehot[fixed_columns]

print(qualitysg_onehot.shape)
qualitysg_onehot.head()
print(qualitysg_onehot.columns.values)

(95, 20)
['Area' 'Postal Code' 'City' 'Bus Station' 'Café' 'Electronics Store'
 'Farmers Market' 'Garden' 'Gym' 'Gym / Fitness Center' 'Hotel'
 'Light Rail Station' 'Movie Theater' 'Park' 'Pharmacy' 'Playground'
 'Restaurant' 'Shopping Mall' 'Supermarket' 'Theater']


In [19]:
qualitysg_grouped = qualitysg_onehot.groupby(["Area", "Postal Code", "City"]).sum().reset_index()
print(qualitysg_grouped.shape)
qualitysg_grouped

(16, 20)


Unnamed: 0,Area,Postal Code,City,Bus Station,Café,Electronics Store,Farmers Market,Garden,Gym,Gym / Fitness Center,Hotel,Light Rail Station,Movie Theater,Park,Pharmacy,Playground,Restaurant,Shopping Mall,Supermarket,Theater
0,Aljunied,538808,Singapore,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0
1,Ang Mo Kio,560723,Singapore,1,2,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0
2,Bishan - Toa Payoh,311125,Singapore,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1
3,Choa Chu Kang,680309,Singapore,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0
4,East Coast,460124,Singapore,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
5,Holland - Bukit Timah,279621,Singapore,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0
6,Jalan Besar,328127,Singapore,1,5,0,0,0,1,0,3,0,0,0,1,0,2,0,2,0
7,Jurong,608526,Singapore,1,5,0,0,0,0,1,0,0,1,0,0,0,0,3,1,0
8,Marine Parade,408600,Singapore,0,1,0,2,0,0,0,0,0,0,0,0,1,0,0,0,0
9,Marsiling - Yew Tee,689286,Singapore,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0


In [20]:
qualitysg_grouped['Total infrastructure'] =  qualitysg_grouped[qualitysg_grouped.drop(['Area','Postal Code','City'], axis=1).columns.values].sum(axis=1)


In [21]:
qualitysg_grouped.shape

(16, 21)

### Best location in Singapore as per infrastructure

In [22]:
qualitysg_grouped[qualitysg_grouped['Total infrastructure'] == qualitysg_grouped['Total infrastructure'].max()].transpose()

Unnamed: 0,6
Area,Jalan Besar
Postal Code,328127
City,Singapore
Bus Station,1
Café,5
Electronics Store,0
Farmers Market,0
Garden,0
Gym,1
Gym / Fitness Center,0


### Areas that lack infrastructure facilities

In [23]:
lowquality = qualitysg_grouped[qualitysg_grouped['Total infrastructure'] == qualitysg_grouped['Total infrastructure'].min()]
lowquality


Unnamed: 0,Area,Postal Code,City,Bus Station,Café,Electronics Store,Farmers Market,Garden,Gym,Gym / Fitness Center,...,Light Rail Station,Movie Theater,Park,Pharmacy,Playground,Restaurant,Shopping Mall,Supermarket,Theater,Total infrastructure
9,Marsiling - Yew Tee,689286,Singapore,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,1


### 9.Potential area for development of different infrastructures

#### Identify which area has highest potential of success for your choice of infrastructure

In [24]:
yourchoiceinfra = 'Restaurant' # Select your choice of infrastructue from VenueCategory
lowqualitychoice = qualitysg_grouped[qualitysg_grouped[yourchoiceinfra] == qualitysg_grouped[yourchoiceinfra].min()]
lowqualitychoice['Area']

0                  Aljunied
2        Bishan - Toa Payoh
3             Choa Chu Kang
4                East Coast
5     Holland - Bukit Timah
7                    Jurong
8             Marine Parade
9       Marsiling - Yew Tee
10                 Nee Soon
11      Pasir Ris - Punggol
12                Sembawang
13                 Tampines
14            Tanjong Pagar
15               West Coast
Name: Area, dtype: object

#### Identify which infrastructure has highest potential of success for your choice of area

In [25]:
yourchoicearea = 'Jalan Besar'   # Change for your choice of area
infraqualitychoice = qualitysg_grouped[qualitysg_grouped['Area'] == yourchoicearea].transpose()
infraqualitychoice = infraqualitychoice.reset_index()
infraqualitychoice

Unnamed: 0,index,6
0,Area,Jalan Besar
1,Postal Code,328127
2,City,Singapore
3,Bus Station,1
4,Café,5
5,Electronics Store,0
6,Farmers Market,0
7,Garden,0
8,Gym,1
9,Gym / Fitness Center,0


In [26]:
print("These are infrastructures with highest potential in" , yourchoicearea, "area : " )
Xx=0
for i in range(len(infraqualitychoice)) : 
    if (infraqualitychoice.iloc[i, 1] == 0):
        print(infraqualitychoice.iloc[i, 0])
        Xx += 1
if Xx == 0:
    for i in range(len(infraqualitychoice)) : 
        if (infraqualitychoice.iloc[i, 1] == 1):
            print(infraqualitychoice.iloc[i, 0])

These are infrastructures with highest potential in Jalan Besar area : 
Electronics Store
Farmers Market
Garden
Gym / Fitness Center
Light Rail Station
Movie Theater
Park
Playground
Shopping Mall
Theater


### 10. Best place to stay within a city for vital infrastructure facilities

In [27]:
search_query2= ['Hospital','Food', 'Hotel', 'Shopping Mall', 'Pharmacy', 
                         'Metro Station', 'Train Station', 'ATM', 'Office', 'Bus Station', 'Bank', 'Market' ,
                          'Police Station', 'School', 'College & University', 'Park'
 ]
categoryId = ['4bf58dd8d48988d104941735','4d4b7105d754a06374d81259', '4bf58dd8d48988d1fa931735', '4bf58dd8d48988d1fd941735', '4bf58dd8d48988d10f951735', 
             '4bf58dd8d48988d1fd931735', '4bf58dd8d48988d129951735', '52f2ab2ebcbc57f1066b8b56', '4bf58dd8d48988d124941735','4bf58dd8d48988d1fe931735',
             '4bf58dd8d48988d10a951735', '50be8ee891d4fa8dcc7199a7','4bf58dd8d48988d12e941735', '4bf58dd8d48988d13b941735', '4d4b7105d754a06372d81259',
             '4bf58dd8d48988d163941735']

In [28]:
from pandas.io.json import json_normalize
radius = 500
VERSION = 20180604
LIMIT = 1

In [29]:
def getNearbyVenues(names, lat1, long1, radius):
    venues_list=[]
    for name, lat, lng in zip(names, lat1, long1):
        for query,cat_Id in zip(search_query2,categoryId):

            # create the API request URL
            url1 = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}&locale={}&categoryId={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, VERSION, query, radius, LIMIT,  'en', cat_Id)
            # make the GET request
            results = requests.get(url1).json()["response"]["venues"]
            # return only relevant information for each nearby venue          
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['name'], 
                v['location']['lat'], 
                v['location']['lng'],
                v['categories'][0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])

    return(nearby_venues)


In [30]:
names=clean_df['Area']
latitudes=clean_df['Latitude']
longitudes=clean_df['Longitude']
all_venues = getNearbyVenues(names,latitudes, longitudes, radius )


In [31]:
all_venues.columns = ['Area','Latitude', 'Longitude','VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(all_venues.shape)
all_venues

(135, 7)


Unnamed: 0,Area,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,West Coast,1.318343,103.767821,National University Hospital Clinic G,1.317007,103.763242,Hospital
1,West Coast,1.318343,103.767821,Food Loft,1.320721,103.767061,Food Court
2,West Coast,1.318343,103.767821,The Clementi Mall,1.315036,103.764909,Shopping Mall
3,West Coast,1.318343,103.767821,Guardian Pharmacy,1.315212,103.764809,Pharmacy
4,West Coast,1.318343,103.767821,Clementi MRT Station (EW23),1.315095,103.764890,Metro Station
...,...,...,...,...,...,...,...
130,Ang Mo Kio,1.372402,103.829752,Sembawang Hills Food Centre,1.372387,103.829038,Food Court
131,Ang Mo Kio,1.372402,103.829752,Guardian Pharmacy,1.374321,103.827836,Pharmacy
132,Ang Mo Kio,1.372402,103.829752,Bus Stop 56039 (Aft Yio Chu Kang Rd),1.376789,103.828470,Bus Station
133,Ang Mo Kio,1.372402,103.829752,CHIJ St Nicholas Girls' School,1.374114,103.834337,High School


In [32]:
quality_infra_sg = all_venues.copy()

In [33]:
quality_infra_sg2 = quality_infra_sg.copy()

In [34]:
quality_infra_sg2.tail(30)


Unnamed: 0,Area,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
105,Aljunied,1.374143,103.87433,Nearest Mall To My Home,1.376143,103.879525,Shopping Mall
106,Aljunied,1.374143,103.87433,Guardian @ Serangoon North,1.369116,103.871546,Pharmacy
107,Aljunied,1.374143,103.87433,50 First Centre Serangoon North Ave 4 Manageme...,1.375561,103.875811,Tech Startup
108,Aljunied,1.374143,103.87433,Bus Stop 66531 (Opp Blk 531),1.374888,103.874684,Bus Line
109,Aljunied,1.374143,103.87433,Serangoon North Market,1.369742,103.873075,Market
110,Aljunied,1.374143,103.87433,Serangoon North Neighbourhood Police Post,1.369888,103.871366,Police Station
111,Aljunied,1.374143,103.87433,Rosyth School,1.373663,103.873368,School
112,Aljunied,1.374143,103.87433,Alex's College Library,1.372215,103.870516,College Library
113,Aljunied,1.374143,103.87433,Park,1.369922,103.872776,Park
114,Sembawang,1.439028,103.785135,"Khoo Teck Puat Hospital, Tower C, Level 3, Cli...",1.436016,103.786929,Hospital


In [35]:
quality_infra_sg2['VenueCategory'].unique()

array(['Hospital', 'Food Court', 'Shopping Mall', 'Pharmacy',
       'Metro Station', 'Bank', 'Office', 'Bus Line', 'Police Station',
       'School', 'College Lab', 'Trail', 'Hotel', 'Conference Room',
       'Bus Station', 'Music School', 'General College & University',
       'Park', 'Train Station', 'ATM', 'Halal Restaurant', 'High School',
       'Community College', 'Train', 'Private School',
       'College Administrative Building', 'Tech Startup',
       'Fish & Chips Shop', 'Market', 'College Academic Building',
       'Coworking Space', 'College Classroom', 'College Library',
       'Elementary School'], dtype=object)

In [36]:
quality_sg_onehot = pd.get_dummies(quality_infra_sg2[['VenueCategory']], prefix="", prefix_sep="")

quality_sg_onehot['Area'] = quality_infra_sg2['Area'] 

fixed_columns = list(quality_sg_onehot.columns[-1:]) + list(quality_sg_onehot.columns[:-1])
quality_sg_onehot = quality_sg_onehot[fixed_columns]

print(quality_sg_onehot.shape)
quality_sg_onehot.head()

(135, 35)


Unnamed: 0,Area,ATM,Bank,Bus Line,Bus Station,College Academic Building,College Administrative Building,College Classroom,College Lab,College Library,...,Park,Pharmacy,Police Station,Private School,School,Shopping Mall,Tech Startup,Trail,Train,Train Station
0,West Coast,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,West Coast,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,West Coast,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
3,West Coast,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
4,West Coast,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [37]:
qualitysg_grouped = quality_sg_onehot.groupby(["Area"]).sum().reset_index()

print(qualitysg_grouped.shape)
qualitysg_grouped.head()

(15, 35)


Unnamed: 0,Area,ATM,Bank,Bus Line,Bus Station,College Academic Building,College Administrative Building,College Classroom,College Lab,College Library,...,Park,Pharmacy,Police Station,Private School,School,Shopping Mall,Tech Startup,Trail,Train,Train Station
0,Aljunied,0,0,1,0,0,0,0,0,1,...,1,1,1,0,1,1,1,0,0,0
1,Ang Mo Kio,0,0,0,1,0,0,0,0,0,...,1,1,0,0,0,0,0,0,0,0
2,Bishan - Toa Payoh,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
3,Choa Chu Kang,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,East Coast,0,0,1,0,0,0,1,0,0,...,1,0,0,0,0,0,0,0,0,0


In [38]:
qualitysg_grouped['Total infrastructure'] =  qualitysg_grouped[qualitysg_grouped.drop(['Area'], axis=1).columns.values].sum(axis=1)
qualitysg_grouped

Unnamed: 0,Area,ATM,Bank,Bus Line,Bus Station,College Academic Building,College Administrative Building,College Classroom,College Lab,College Library,...,Pharmacy,Police Station,Private School,School,Shopping Mall,Tech Startup,Trail,Train,Train Station,Total infrastructure
0,Aljunied,0,0,1,0,0,0,0,0,1,...,1,1,0,1,1,1,0,0,0,12
1,Ang Mo Kio,0,0,0,1,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,5
2,Bishan - Toa Payoh,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,2
3,Choa Chu Kang,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,East Coast,0,0,1,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,8
5,Holland - Bukit Timah,1,0,0,1,0,0,0,0,0,...,0,1,0,1,1,0,0,0,2,11
6,Jalan Besar,1,1,0,1,0,1,0,0,0,...,1,0,0,1,1,0,0,0,0,14
7,Jurong,0,1,0,1,0,0,0,0,0,...,1,0,0,0,1,0,0,0,0,12
8,Marine Parade,0,0,1,0,0,0,0,0,0,...,0,0,1,0,0,0,1,1,0,8
9,Nee Soon,0,0,0,1,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,4


In [39]:
qualitysg_groupedmax = qualitysg_grouped[qualitysg_grouped['Total infrastructure'] == qualitysg_grouped['Total infrastructure'].max()]
print("Best place to stay within a city for vital infrastructure facilities :")
qualitysg_groupedmax[['Area', 'Total infrastructure']]

Best place to stay within a city for vital infrastructure facilities :


Unnamed: 0,Area,Total infrastructure
6,Jalan Besar,14


In [40]:
sg_merged2 = qualitysg_grouped.copy()
sg_merged2 = sg_merged2.join(clean_df[["Postal Code",'Latitude', 'Longitude', "Area" ]].set_index("Area"), on="Area")

In [41]:
fixed_columns = list(sg_merged2.columns[-3:]) + list(sg_merged2.columns[:-3])
sg_merged2 = sg_merged2[fixed_columns]

print(sg_merged2.shape)
sg_merged2

(15, 39)


Unnamed: 0,Postal Code,Latitude,Longitude,Area,ATM,Bank,Bus Line,Bus Station,College Academic Building,College Administrative Building,...,Pharmacy,Police Station,Private School,School,Shopping Mall,Tech Startup,Trail,Train,Train Station,Total infrastructure
0,538808,1.374143,103.87433,Aljunied,0,0,1,0,0,0,...,1,1,0,1,1,1,0,0,0,12
1,560723,1.372402,103.829752,Ang Mo Kio,0,0,0,1,0,0,...,1,0,0,0,0,0,0,0,0,5
2,311125,1.337527,103.828221,Bishan - Toa Payoh,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,0,2
3,680309,1.38504,103.7279,Choa Chu Kang,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,460124,1.329297,103.921288,East Coast,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,8
5,279621,1.311813,103.788279,Holland - Bukit Timah,1,0,0,1,0,0,...,0,1,0,1,1,0,0,0,2,11
6,328127,1.317285,103.841774,Jalan Besar,1,1,0,1,0,1,...,1,0,0,1,1,0,0,0,0,14
7,608526,1.330026,103.742675,Jurong,0,1,0,1,0,0,...,1,0,0,0,1,0,0,0,0,12
8,408600,1.319291,103.877267,Marine Parade,0,0,1,0,0,0,...,0,0,1,0,0,0,1,1,0,8
9,760234,1.434407,103.820064,Nee Soon,0,0,0,1,0,0,...,0,0,0,1,0,0,0,0,0,4


### Clustering the Dataset

In [42]:
kclusters = 3

sg_2_grouped_clustering = sg_merged2[["Total infrastructure"]]

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sg_2_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 1, 1, 2, 0, 0, 0, 2, 1], dtype=int32)

In [43]:
sg_mergedfinal = sg_merged2.copy()
# add clustering labels
sg_mergedfinal["Cluster Labels"] = kmeans.labels_
print(sg_mergedfinal.shape)
sg_mergedfinal

(15, 40)


Unnamed: 0,Postal Code,Latitude,Longitude,Area,ATM,Bank,Bus Line,Bus Station,College Academic Building,College Administrative Building,...,Police Station,Private School,School,Shopping Mall,Tech Startup,Trail,Train,Train Station,Total infrastructure,Cluster Labels
0,538808,1.374143,103.87433,Aljunied,0,0,1,0,0,0,...,1,0,1,1,1,0,0,0,12,0
1,560723,1.372402,103.829752,Ang Mo Kio,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,5,1
2,311125,1.337527,103.828221,Bishan - Toa Payoh,0,0,1,0,0,0,...,0,0,0,0,1,0,0,0,2,1
3,680309,1.38504,103.7279,Choa Chu Kang,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,1
4,460124,1.329297,103.921288,East Coast,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,8,2
5,279621,1.311813,103.788279,Holland - Bukit Timah,1,0,0,1,0,0,...,1,0,1,1,0,0,0,2,11,0
6,328127,1.317285,103.841774,Jalan Besar,1,1,0,1,0,1,...,0,0,1,1,0,0,0,0,14,0
7,608526,1.330026,103.742675,Jurong,0,1,0,1,0,0,...,0,0,0,1,0,0,0,0,12,0
8,408600,1.319291,103.877267,Marine Parade,0,0,1,0,0,0,...,0,1,0,0,0,1,1,0,8,2
9,760234,1.434407,103.820064,Nee Soon,0,0,0,1,0,0,...,0,0,1,0,0,0,0,0,4,1


### 11. Exploratory Visualization 2

In [44]:
address = 'Singapore'
latitude = 1.3521
longitude = 103.8198
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of Singapore are 1.3521, 103.8198.


In [45]:
map_clusters  = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
rainbow = [    'red',    'blue',    'orange',    'darkgreen',    'darkblue',    'black']
# add markers to map
markers_colors = []
for lat, lng, label1,common, cluster in zip(sg_mergedfinal['Latitude'], sg_mergedfinal['Longitude'], sg_mergedfinal['Area'],sg_mergedfinal['Total infrastructure'] , sg_mergedfinal['Cluster Labels']):
    labelnew =  'Area : {} , Total infrastructure : {}'.format(label1,common)
    label = folium.Popup( labelnew, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusters)
map_clusters

### Image link if map not displayed

https://github.com/jjtan444/Coursera_Capstone/blob/master/week5/results/Results6.JPG

### 12.Examine Clusters

In [46]:
sg_mergedfinal.loc[sg_mergedfinal['Cluster Labels'] == 0]

Unnamed: 0,Postal Code,Latitude,Longitude,Area,ATM,Bank,Bus Line,Bus Station,College Academic Building,College Administrative Building,...,Police Station,Private School,School,Shopping Mall,Tech Startup,Trail,Train,Train Station,Total infrastructure,Cluster Labels
0,538808,1.374143,103.87433,Aljunied,0,0,1,0,0,0,...,1,0,1,1,1,0,0,0,12,0
5,279621,1.311813,103.788279,Holland - Bukit Timah,1,0,0,1,0,0,...,1,0,1,1,0,0,0,2,11,0
6,328127,1.317285,103.841774,Jalan Besar,1,1,0,1,0,1,...,0,0,1,1,0,0,0,0,14,0
7,608526,1.330026,103.742675,Jurong,0,1,0,1,0,0,...,0,0,0,1,0,0,0,0,12,0
11,738991,1.439028,103.785135,Sembawang,1,1,0,1,0,0,...,0,0,0,1,0,0,1,0,12,0
12,528523,1.35296,103.938151,Tampines,1,1,0,1,1,0,...,0,0,0,1,0,0,0,0,12,0
13,168730,1.28613,103.809826,Tanjong Pagar,0,1,0,1,0,0,...,0,0,0,1,0,0,0,0,12,0
14,120381,1.318343,103.767821,West Coast,0,2,1,0,0,0,...,1,0,1,1,0,1,0,0,13,0


In [47]:
sg_mergedfinal.loc[sg_mergedfinal['Cluster Labels'] == 1]

Unnamed: 0,Postal Code,Latitude,Longitude,Area,ATM,Bank,Bus Line,Bus Station,College Academic Building,College Administrative Building,...,Police Station,Private School,School,Shopping Mall,Tech Startup,Trail,Train,Train Station,Total infrastructure,Cluster Labels
1,560723,1.372402,103.829752,Ang Mo Kio,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,5,1
2,311125,1.337527,103.828221,Bishan - Toa Payoh,0,0,1,0,0,0,...,0,0,0,0,1,0,0,0,2,1
3,680309,1.38504,103.7279,Choa Chu Kang,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,1
9,760234,1.434407,103.820064,Nee Soon,0,0,0,1,0,0,...,0,0,1,0,0,0,0,0,4,1


In [48]:
sg_mergedfinal.loc[sg_mergedfinal['Cluster Labels'] == 2]

Unnamed: 0,Postal Code,Latitude,Longitude,Area,ATM,Bank,Bus Line,Bus Station,College Academic Building,College Administrative Building,...,Police Station,Private School,School,Shopping Mall,Tech Startup,Trail,Train,Train Station,Total infrastructure,Cluster Labels
4,460124,1.329297,103.921288,East Coast,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,8,2
8,408600,1.319291,103.877267,Marine Parade,0,0,1,0,0,0,...,0,1,0,0,0,1,1,0,8,2
10,545025,1.392501,103.876812,Pasir Ris - Punggol,0,0,0,1,0,0,...,0,0,0,1,0,0,0,1,9,2


## 13. Observations:

Most of the infrastructures are concentrated in the Southern areas of Singapore, with the highest number in cluster 0 and moderate number in cluster 2. On the other hand, cluster 1 has a very low number of infrastructures in the neighborhoods. This represents a great opportunity to build new infrastructures as it has very little to no competition from existing infrastructures.

A person who is planning to build infrastructure with unique selling propositions and lives prosperously to stand out from the competition can also open new infrastructures in neighborhoods in cluster 2 with moderate competition and supporting adequate number of infrastructures. Lastly, people  planning to settle in the city are advised to start in cluster 0 which already has a high concentration of infrastructures.

## 14. Acknowledgements

### Conclusion:

In this project, I have gone through the process of identifying the business problems, specifying the data required, extracting and preparing the data, visualizing the results and performing machine learning by clustering the data into 3 clusters based on their frequency similarities. The project also provides recommendations to the relevant stakeholders i.e. business developers regarding the best locations to build new infrastructure. The project also inform visitors and immigrants of the best areas to stay in.

Data obtained from [data.gov.sg](https://data.gov.sg/dataset/sgo-satellite-offices)
