Before we get the data and start exploring it, let's download all the dependencies that we will need.


In [35]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
#from pandas.json_normalize import json_normalize


# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 1. Download and Explore Dataset

Jakarta has a total of 6 Kabupaten/Kota (<a href="https://en.wikipedia.org/wiki/Regency_(Indonesia)">Regency</a>) and 44 Kecamatan (<a href="https://en.wikipedia.org/wiki/Districts_of_Indonesia">District</a>). In order to segment the Kabupaten and explore them, we will essentially need a dataset that contains the 5 Kabupaten and Kecamatan that exist in each Kabupaten as well as the the latitude and logitude coordinates of each Kabupaten. 

<b>Tips</b>: We got the data from <a href="https://www.jakarta.bps.go.id">BPS</a> and save it into file in server to avoid frequent access to Goverment website :)

In [154]:
data = pd.read_csv("data-jumlah-kecamatan-2019.csv") 
column_names = ['Kabupaten','Kecamatan','Luas']
data.columns = column_names
data.head()

Unnamed: 0,Kabupaten,Kecamatan,Luas
0,JAKARTA BARAT,CENGKARENG,26.55
1,JAKARTA BARAT,GROGOL PETAMBURAN,9.99
2,JAKARTA BARAT,KALI DERES,30.23
3,JAKARTA BARAT,KEBON JERUK,17.63
4,JAKARTA BARAT,KEMBANGAN,24.17


Thanks to geolocator Nominatim we will get the equivalent of address

In [149]:
geolocator = Nominatim(user_agent="ny_explorer")
coordinate = []

for index, row in data.iterrows():
    address = row['Kecamatan'] #+ ", " + row['Kabupaten']
    location = geolocator.geocode(address)
    if location is not None: 
        coordinate.append([location.latitude, location.longitude])
    else:
        coordinate.append([null, null])
   

df_geo = pd.DataFrame(coordinate, columns=['Latitude', 'Longitude'])
neighborhoods=pd.merge(data, df_geo, left_index=True, right_index=True)

<b>Tips</b>: Geolocator tends to consume time, hence we store the data to avoid retrieving the data from internet for further references

In [150]:
neighborhoods.to_csv(r'..\Coursera_Capstone\neighborhoods.csv')

In [155]:
neighborhoods = pd.read_csv("neighborhoods.csv") 
neighborhoods = neighborhoods.drop('Unnamed: 0', 1)

In [156]:
neighborhoods.shape

(44, 5)

Quickly examine the resulting dataframe and we will have columns as below:


In [157]:
neighborhoods.head()

Unnamed: 0,Kabupaten,Kecamatan,Luas,Latitude,Longitude
0,JAKARTA BARAT,CENGKARENG,26.55,-6.149093,106.734781
1,JAKARTA BARAT,GROGOL PETAMBURAN,9.99,-6.164188,106.788317
2,JAKARTA BARAT,KALI DERES,30.23,-6.1343,106.7058
3,JAKARTA BARAT,KEBON JERUK,17.63,-6.192572,106.769725
4,JAKARTA BARAT,KEMBANGAN,24.17,-6.193,106.7426


In [158]:
print('As mentioned in the introduction The dataframe has {} Kabupaten and {} Kecamatan.'.format(
        len(neighborhoods['Kabupaten'].unique()),
        neighborhoods.shape[0]
    )
)

As mentioned in the introduction The dataframe has 6 Kabupaten and 44 Kecamatan.


#### Use geopy library to get the latitude and longitude values of Jakarta.


In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.


In [44]:
address = 'Jakarta'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Jakarta are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Jakarta are -6.1753942, 106.827183.


#### Create a map of Jakarta with neighborhoods superimposed on top.


In [45]:
# create map of New York using latitude and longitude values
map_Jakarta = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Kabupaten'], neighborhoods['Kecamatan']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Jakarta)  
    
map_Jakarta

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of Kecamatan and respective Kabupaten


Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.


#### Define Foursquare Credentials and Version


In [46]:
CLIENT_ID = 'WAFDIUY5OL25UM44YNMRQWZNT5OQ2CVL0RJD4QE4IHFJYS4R' # your Foursquare ID
CLIENT_SECRET = 'X044CFGQSJCIGKLPD2XNZREII0FISRGRZ0CIQQKDRSLQ0BSS' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#CLIENT_ID = 'your-client-ID' # your Foursquare ID
#CLIENT_SECRET = 'your-client-secret' # your Foursquare Secret
#VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WAFDIUY5OL25UM44YNMRQWZNT5OQ2CVL0RJD4QE4IHFJYS4R
CLIENT_SECRET:X044CFGQSJCIGKLPD2XNZREII0FISRGRZ0CIQQKDRSLQ0BSS


#### Let's explore the first Kecamatan in our dataframe.


Let's try and get the Kecamatan's name.


In [166]:
jakarta_data=neighborhoods.reset_index(drop=True)

In [167]:
print('Get latitude and longitude values of Kecamatan {}.'.format(jakarta_data.loc[0, 'Kecamatan']))

Get latitude and longitude values of Kecamatan CENGKARENG.


In [168]:
neighborhood_latitude = jakarta_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = jakarta_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = jakarta_data.loc[0, 'Kecamatan'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of CENGKARENG are -6.1490933, 106.73478100000001.


#### Now, let's get the top 100 venues that are within a radius of 500 meters.


First, let's create the GET request URL. Name your URL **url**.


In [169]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API


radius = 500 # define radius


 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL


'https://api.foursquare.com/v2/venues/explore?&client_id=WAFDIUY5OL25UM44YNMRQWZNT5OQ2CVL0RJD4QE4IHFJYS4R&client_secret=X044CFGQSJCIGKLPD2XNZREII0FISRGRZ0CIQQKDRSLQ0BSS&v=20180605&ll=-6.1490933,106.73478100000001&radius=500&limit=100'

Send the GET request and examine the resutls


In [170]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6027558741cc7e0fe9587457'},
 'response': {'headerLocation': 'Cengkareng',
  'headerFullLocation': 'Cengkareng, Jakarta',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 6,
  'suggestedBounds': {'ne': {'lat': -6.1445932954999956,
    'lng': 106.73929859432982},
   'sw': {'lat': -6.153593304500004, 'lng': 106.7302634056702}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '50fb9af9e4b0c2329c07e0ca',
       'name': 'XXI Puri Indah Mall',
       'location': {'lat': -6.151018337252608,
        'lng': 106.73398277361841,
        'labeledLatLngs': [{'label': 'display',
          'lat': -6.151018337252608,
          'lng': 106.73398277361841}],
        'distance': 231,
        'cc': 'ID',
        'country': 'Indonesia',
     

From the Foursquare lab in the previous module, we know that all the information is in the _items_ key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.


In [171]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a _pandas_ dataframe.


In [172]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,XXI Puri Indah Mall,Movie Theater,-6.151018,106.733983
1,Studio 29,Music Venue,-6.146643,106.732274
2,Laut Dadap,Harbor / Marina,-6.148722,106.73294
3,Family Mart City Park Apartement,Night Market,-6.146726,106.735641
4,Mie Baso,Restaurant,-6.149799,106.730642


And how many venues were returned by Foursquare?


In [173]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

6 venues were returned by Foursquare.


<a id='item2'></a>


## 2. Explore Neighborhoods in Jakarta


#### Let's create a function to repeat the same process to all the neighborhoods in Jakarta


In [174]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Kecamatan', 
                  'Kecamatan Latitude', 
                  'Kecamatan Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [175]:
jakarta_venues = getNearbyVenues(names=jakarta_data['Kecamatan'],
                                   latitudes=jakarta_data['Latitude'],
                                   longitudes=jakarta_data['Longitude']
                                  )

CENGKARENG
GROGOL PETAMBURAN
KALI DERES
KEBON JERUK
KEMBANGAN
PALMERAH
TAMAN SARI
TAMBORA
CEMPAKA PUTIH
GAMBIR
JOHAR BARU
KEMAYORAN
MENTENG
SAWAH BESAR
SENEN
TANAH ABANG
CILANDAK
JAGAKARSA
KEBAYORAN BARU
KEBAYORAN LAMA
MAMPANG PRAPATAN
PANCORAN
PASAR MINGGU
PESANGGRAHAN
SETIA BUDI
TEBET
CAKUNG
CIPAYUNG
CIRACAS
DUREN SAWIT
JATINEGARA
KRAMAT JATI
MAKASAR
MATRAMAN
PASAR REBO
PULO GADUNG
CILINCING
KELAPA GADING
KOJA
PADEMANGAN
PENJARINGAN
TANJUNG PRIOK
KEPULAUAN SERIBU SELATAN
KEPULAUAN SERIBU UTARA


#### Let's check the size of the resulting dataframe


In [176]:
print(jakarta_venues.shape)
jakarta_venues.head()

(719, 7)


Unnamed: 0,Kecamatan,Kecamatan Latitude,Kecamatan Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,CENGKARENG,-6.149093,106.734781,XXI Puri Indah Mall,-6.151018,106.733983,Movie Theater
1,CENGKARENG,-6.149093,106.734781,Studio 29,-6.146643,106.732274,Music Venue
2,CENGKARENG,-6.149093,106.734781,Laut Dadap,-6.148722,106.73294,Harbor / Marina
3,CENGKARENG,-6.149093,106.734781,Family Mart City Park Apartement,-6.146726,106.735641,Night Market
4,CENGKARENG,-6.149093,106.734781,Mie Baso,-6.149799,106.730642,Restaurant


<b>Tips</b>: To avoid frequent access to Foursquare API since we only use Developer Free License, then we store the data into file locally

In [178]:
jakarta_venues.to_csv(r'..\Coursera_Capstone\jakarta_venues.csv')

In [179]:
#to import from file
jakarta_venues = pd.read_csv("jakarta_venues.csv") 
jakarta_venues = jakarta_venues.drop('Unnamed: 0', 1)

Let's check how many venues were returned for each Kecamatan


In [181]:
jakarta_venues.groupby('Kecamatan').count()

Unnamed: 0_level_0,Kecamatan Latitude,Kecamatan Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Kecamatan,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CAKUNG,3,3,3,3,3,3
CEMPAKA PUTIH,7,7,7,7,7,7
CENGKARENG,6,6,6,6,6,6
CILANDAK,21,21,21,21,21,21
CILINCING,2,2,2,2,2,2
CIPAYUNG,2,2,2,2,2,2
CIRACAS,2,2,2,2,2,2
DUREN SAWIT,5,5,5,5,5,5
GAMBIR,15,15,15,15,15,15
GROGOL PETAMBURAN,35,35,35,35,35,35


#### Let's find out how many unique categories can be curated from all the returned venues


In [182]:
print('There are {} uniques categories.'.format(len(jakarta_venues['Venue Category'].unique())))

There are 156 uniques categories.


<a id='item3'></a>


## 3. Analyze Each Kecamatan


In [183]:
# one hot encoding
jakarta_onehot = pd.get_dummies(jakarta_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
jakarta_onehot['Kecamatan'] = jakarta_venues['Kecamatan'] 

# move neighborhood column to the first column
fixed_columns = [jakarta_onehot.columns[-1]] + list(jakarta_onehot.columns[:-1])
jakarta_onehot = jakarta_onehot[fixed_columns]

jakarta_onehot.head()

Unnamed: 0,Kecamatan,Accessories Store,Acehnese Restaurant,Airport Lounge,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Automotive Shop,BBQ Joint,Bakery,Balinese Restaurant,Bar,Basketball Court,Basketball Stadium,Bed & Breakfast,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Cafeteria,Café,Camera Store,Chinese Restaurant,Clothing Store,Coffee Shop,College Academic Building,College Cafeteria,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run,Donut Shop,Dumpling Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Gas Station,Gastropub,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,High School,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Housing Development,Ice Cream Shop,Indonesian Meatball Place,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Javanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Lounge,Manadonese Restaurant,Massage Studio,Medical Center,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Multiplex,Music School,Music Store,Music Venue,Neighborhood,Night Market,Nightclub,Noodle House,Office,Optical Shop,Padangnese Restaurant,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Ramen Restaurant,Resort,Restaurant,Salon / Barbershop,Sandwich Place,Satay Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shoe Store,Shop & Service,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Soup Place,Spa,Sports Bar,Stadium,Steakhouse,Street Food Gathering,Student Center,Sundanese Restaurant,Supermarket,Sushi Restaurant,Tailor Shop,Thai Restaurant,Theme Park,Toy / Game Store,Train Station,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,CENGKARENG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,CENGKARENG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,CENGKARENG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,CENGKARENG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,CENGKARENG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.


In [184]:
jakarta_onehot.shape

(719, 157)

#### Next, let's group rows by Kecamatan and by taking the mean of the frequency of occurrence of each category


In [185]:
jakarta_grouped = jakarta_onehot.groupby('Kecamatan').mean().reset_index()
jakarta_grouped

Unnamed: 0,Kecamatan,Accessories Store,Acehnese Restaurant,Airport Lounge,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Automotive Shop,BBQ Joint,Bakery,Balinese Restaurant,Bar,Basketball Court,Basketball Stadium,Bed & Breakfast,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Cafeteria,Café,Camera Store,Chinese Restaurant,Clothing Store,Coffee Shop,College Academic Building,College Cafeteria,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run,Donut Shop,Dumpling Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Gas Station,Gastropub,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,High School,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Housing Development,Ice Cream Shop,Indonesian Meatball Place,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Javanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Lounge,Manadonese Restaurant,Massage Studio,Medical Center,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Multiplex,Music School,Music Store,Music Venue,Neighborhood,Night Market,Nightclub,Noodle House,Office,Optical Shop,Padangnese Restaurant,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Ramen Restaurant,Resort,Restaurant,Salon / Barbershop,Sandwich Place,Satay Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shoe Store,Shop & Service,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Soup Place,Spa,Sports Bar,Stadium,Steakhouse,Street Food Gathering,Student Center,Sundanese Restaurant,Supermarket,Sushi Restaurant,Tailor Shop,Thai Restaurant,Theme Park,Toy / Game Store,Train Station,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,CAKUNG,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,CEMPAKA PUTIH,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,CENGKARENG,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,CILANDAK,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,CILINCING,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,CIPAYUNG,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,CIRACAS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,DUREN SAWIT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,GAMBIR,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.266667,0.0,0.066667,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,GROGOL PETAMBURAN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.114286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.085714,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.057143,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.085714,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.114286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size


In [186]:
jakarta_grouped.shape

(44, 157)

#### Let's print each neighborhood along with the top 3 most common venues


We use only 3 to limit the research for this to, however it can be extended :)

In [188]:
num_top_venues = 3

for hood in jakarta_grouped['Kecamatan']:
    print("----"+hood+"----")
    temp = jakarta_grouped[jakarta_grouped['Kecamatan'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----CAKUNG----
          venue  freq
0   Gas Station  0.33
1  Neighborhood  0.33
2        Lounge  0.33


----CEMPAKA PUTIH----
                  venue  freq
0           Pizza Place  0.29
1             BBQ Joint  0.14
2  Fast Food Restaurant  0.14


----CENGKARENG----
          venue  freq
0   Music Venue  0.17
1         Diner  0.17
2  Night Market  0.17


----CILANDAK----
                   venue  freq
0  Indonesian Restaurant   0.1
1                    Gym   0.1
2      Convenience Store   0.1


----CILINCING----
               venue  freq
0               Park   0.5
1      Shopping Mall   0.5
2  Accessories Store   0.0


----CIPAYUNG----
               venue  freq
0     Shop & Service   0.5
1         Restaurant   0.5
2  Accessories Store   0.0


----CIRACAS----
              venue  freq
0        Playground   0.5
1  Department Store   0.5
2            Office   0.0


----DUREN SAWIT----
                       venue  freq
0  Indonesian Meatball Place   0.4
1          Convenience Store   0

#### Let's put that into a _pandas_ dataframe


First, let's write a function to sort the venues in descending order.


In [189]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.


In [190]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Kecamatan']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Kecamatan'] = jakarta_grouped['Kecamatan']

for ind in np.arange(jakarta_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(jakarta_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Kecamatan,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,CAKUNG,Lounge,Gas Station,Neighborhood
1,CEMPAKA PUTIH,Pizza Place,BBQ Joint,Acehnese Restaurant
2,CENGKARENG,Restaurant,Night Market,Movie Theater
3,CILANDAK,Convenience Store,Gym,Indonesian Restaurant
4,CILINCING,Park,Shopping Mall,Wine Bar


<a id='item4'></a>
