# Capstone Project. Tunisia: Tourist agencies and Monuments

## i. Introduction

As part of my Data Science Capstone Project in these sometimes dark and uncertain times I have decided to consider a case of exploring Tunisia and help them improve their customer experience by the means of the following:


Who might be interested by this project? 

- Venues (mainly hostels in our case) owners
- Marketing agencies
- City tourism department
- Customers 

Let's also not forget that this project is a pilot and in case of our idea viability it might be scaled up to any city or venue type.

## ii. Data Usage

The data we will be using in the Project are:

1. Tunisia government open dataset to get neighbourhoods and their locations

2. Foursquare open API for fetching the exact location and addresses of the venues

https://ru.foursquare.com/developers/login?continue=%2Fdevelopers%2Fapps

3. Institut National du Patrimoine to export monuments location.

https://www.inp2020.tn/

4. Additional data from open sources for movies list extending, for example:

https://fr.wikipedia.org/wiki/Liste_des_monuments_class%C3%A9s_du_gouvernorat_de_Tunis   

## iii. Methodology

The structure of our work will be as follows:

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 2>

1. Locating main neighbourhood clusters in order to find out what are the most visited places in the area to help us find out what people from here do like

2. Exploring the neighbourhoods in Tunisia

3. Analyzing each neihbourhood that we have found

4. Cluster the neighbourhoods with attempt to identify the patterns

5. Creating a map of the above-mentioned clusters   
</font>
</div>

In this project, we will use the Foursquare API to explore neighborhoods in Tunisia. We will use the explore function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. We will use the k-means clustering algorithm to complete this task. Also, we will use the Folium library to visualize the neighborhoods in Tunisia and their emerging clusters.

## iv. Results

- We have fetched the Open Data dataset for tunis 
- We have created superimposed map of Tunisia with neighborhoods marked on it
- We have used Foursquare API to categorize the venue for each neighborhood on the basis of 100 venues within the radius of 500 meters
- We used one-hot encoding to explore the categories of the venue by calculating the mean of the frequency of occurrence of each category
- We have also calcuated the frequency for each neighborshood's venue category
- We have obtained 5 clusters for our neighborhoods and top 10 venues using k-means 
- We have examined each cluster 
- We have used monuments dataframe to create a map and joined the map layer to our existing Tunisia clusters map

## v. Observations and Recommendations

Whether you are deciding to open a tourist agency our analysis helps us to know <i> what </i> cuisine will be more popular <i> where. </i>
We can see that many of the clusters have monuments as their 1st most popular venue category, except for the 4th cluster. We have limited our monuments dataframe to 27 but we can clearly see that a lot of monuments very closely to our cluster points.

## vi. Conclusion

This project demonstrates the capabilities of combining any dataframe with geographical data using Python. We have used folium to build our maps, and Foursquare API enabled using venues data for our analysis. As the data might not be always precise, I was considering this project as an opportunity to enhance my skills and apply them directly via this practical task. When extending new skills further (which I hope I will be able to do) I will continue to create notebooks using more advanced techniques and statistical methods. 

# Work Zone 

# Part I. Paris Neighborhoods and venues

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

usage: conda-script.py [-h] [-V] command ...
conda-script.py: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab
usage: conda-script.py [-h] [-V] command ...
conda-script.py: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


Libraries imported.


### 1. Import Data

Tunis has 24 Neighbourhoods - called City council provides us with the necessary data and location.

Let's load the data.

In [2]:
tunis = pd.read_csv('tn.csv')

In [3]:
tunis.head()

Unnamed: 0,city,lat,lng,country,iso2,admin_name,capital,population,population_proper
0,Tunis,36.8008,10.18,Tunisia,TN,Tunis,primary,1200000.0,1200000.0
1,Sfax,34.75,10.72,Tunisia,TN,Sfax,admin,453050.0,277278.0
2,Sousse,35.83,10.625,Tunisia,TN,Sousse,admin,327004.0,164123.0
3,Gabès,33.9004,10.1,Tunisia,TN,Gabès,admin,219517.0,110075.0
4,Kairouan,35.6804,10.1,Tunisia,TN,Kairouan,admin,144522.0,119794.0


In [4]:
tunis.shape

(24, 9)

In [5]:
address = 'Tunis, Tunisia'

geolocator = Nominatim(user_agent="Tunis_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Tunis are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Tunis are 33.8439408, 9.400138.


Let's now create a map with our neihborhoods. 

In [6]:
# create map of Paris using latitude and longitude values
map_tunisia = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(tunis.lat, tunis.lng, tunis.city):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(map_tunisia)
    
map_tunisia

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

In [7]:
CLIENT_ID = 'S5SAICXPSOGKAUNY0GT4AKAZIQCLZ5ODL412BN1DPR5O02J4' # your Foursquare ID
CLIENT_SECRET = 'V4X0LDB5DTQAKKBCB0JY2A1YQYYDT202JQSZJCZMSOAO42AQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: S5SAICXPSOGKAUNY0GT4AKAZIQCLZ5ODL412BN1DPR5O02J4
CLIENT_SECRET:V4X0LDB5DTQAKKBCB0JY2A1YQYYDT202JQSZJCZMSOAO42AQ


### 2. Explore neighborhoods

Here, we are creating function to explore our neighborhoods.

In [8]:
LIMIT = 100
radius = 500

def getNearbyVenues(L_AROFF, LATITUDE, LONGITUDE, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(L_AROFF, LATITUDE, LONGITUDE):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
Tunis_venues = getNearbyVenues(L_AROFF=tunis['city'],
                                   LATITUDE=tunis['lat'],
                                   LONGITUDE=tunis['lng']
                                  )

Tunis
Sfax
Sousse
Gabès
Kairouan
Bizerte
Gafsa
Nabeul
Ariana
Kasserine
Monastir
Tataouine
Medenine
Béja
Jendouba
El Kef
Mahdia
Sidi Bouzid
Tozeur
Siliana
Kebili
Zaghouan
Ben Arous
Manouba


In [10]:
print(Tunis_venues.shape)
Tunis_venues.head()

(147, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Tunis,36.8008,10.18,Théâtre Municipal de Tunis (المسرح البلدي بتونس),36.7992,10.180893,Theater
1,Tunis,36.8008,10.18,Al Kitab (الكتاب),36.799967,10.182274,Bookstore
2,Tunis,36.8008,10.18,Bar Garibaldi,36.802067,10.18159,Bar
3,Tunis,36.8008,10.18,Resto Café De Paris (Ex ARTE),36.799781,10.18163,French Restaurant
4,Tunis,36.8008,10.18,4ème Art,36.802041,10.181105,Theater


Let's check how many venues were returned for each neighborhood

In [11]:
Tunis_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ariana,8,8,8,8,8,8
Ben Arous,7,7,7,7,7,7
Bizerte,5,5,5,5,5,5
Béja,1,1,1,1,1,1
El Kef,6,6,6,6,6,6
Gafsa,1,1,1,1,1,1
Kairouan,4,4,4,4,4,4
Kebili,1,1,1,1,1,1
Manouba,7,7,7,7,7,7
Nabeul,7,7,7,7,7,7


Now let's check how many different venue categories do we have.

In [12]:
print('There are {} uniques categories.'.format(len(Tunis_venues['Venue Category'].unique())))

There are 62 uniques categories.


### 3. Analyzing each neigborhood

Now we are analyzing each neighborhood using one-hot encoding.

In [13]:
# one hot encoding
Tunis_onehot = pd.get_dummies(Tunis_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Tunis_onehot['Neighborhood'] = Tunis_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Tunis_onehot.columns[-1]] + list(Tunis_onehot.columns[:-1])
Tunis_onehot = Tunis_onehot[fixed_columns]

Tunis_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Bakery,Bar,Beach,Big Box Store,Bookstore,Boutique,Burger Joint,Cafeteria,Café,Clothing Store,Coffee Shop,Cosmetics Shop,Cultural Center,Department Store,Dessert Shop,Diner,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food Truck,French Restaurant,Furniture / Home Store,Gastropub,Grocery Store,Gym / Fitness Center,Historic Site,History Museum,Hotel,Hotel Bar,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Mac & Cheese Joint,Market,Mediterranean Restaurant,Memorial Site,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Music Venue,Pastry Shop,Pizza Place,Plaza,Pub,Restaurant,Road,Seafood Restaurant,Shoe Store,Shopping Mall,Snack Place,Soccer Stadium,Steakhouse,Supermarket,Theater,Vietnamese Restaurant,Wings Joint,Women's Store
0,Tunis,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
1,Tunis,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Tunis,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Tunis,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Tunis,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0


In [14]:
Tunis_onehot.shape

(147, 63)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [15]:
Tunis_grouped = Tunis_onehot.groupby('Neighborhood').mean().reset_index()
Tunis_grouped

Unnamed: 0,Neighborhood,African Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Bakery,Bar,Beach,Big Box Store,Bookstore,Boutique,Burger Joint,Cafeteria,Café,Clothing Store,Coffee Shop,Cosmetics Shop,Cultural Center,Department Store,Dessert Shop,Diner,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food Truck,French Restaurant,Furniture / Home Store,Gastropub,Grocery Store,Gym / Fitness Center,Historic Site,History Museum,Hotel,Hotel Bar,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Mac & Cheese Joint,Market,Mediterranean Restaurant,Memorial Site,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Music Venue,Pastry Shop,Pizza Place,Plaza,Pub,Restaurant,Road,Seafood Restaurant,Shoe Store,Shopping Mall,Snack Place,Soccer Stadium,Steakhouse,Supermarket,Theater,Vietnamese Restaurant,Wings Joint,Women's Store
0,Ariana,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0
1,Ben Arous,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.428571,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bizerte,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Béja,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,El Kef,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Gafsa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Kairouan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Kebili,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
8,Manouba,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.571429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Nabeul,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0


In [16]:
# confirming new size

Tunis_grouped.shape

(14, 63)

Let's find out each neighborhood with its top 5 venues

In [17]:
num_top_venues = 5

for hood in Tunis_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Tunis_grouped[Tunis_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Ariana----
                venue  freq
0                Café  0.50
1         Coffee Shop  0.25
2          Restaurant  0.12
3         Supermarket  0.12
4  African Restaurant  0.00


----Ben Arous----
                  venue  freq
0                  Café  0.43
1  Gym / Fitness Center  0.14
2           Coffee Shop  0.14
3         Big Box Store  0.14
4     Mobile Phone Shop  0.14


----Bizerte----
               venue  freq
0              Diner   0.2
1      Grocery Store   0.2
2              Beach   0.2
3  Fish & Chips Shop   0.2
4        Flower Shop   0.2


----Béja----
                venue  freq
0      Soccer Stadium   1.0
1  African Restaurant   0.0
2         Pizza Place   0.0
3           Hotel Bar   0.0
4      Ice Cream Shop   0.0


----El Kef----
           venue  freq
0           Café  0.33
1          Plaza  0.17
2    Pizza Place  0.17
3  Historic Site  0.17
4      Gastropub  0.17


----Gafsa----
                 venue  freq
0                 Café   1.0
1   African Restaurant   

Let's put that into dataframe

In [18]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [28]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Tunis_grouped['Neighborhood']

for ind in np.arange(Tunis_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Tunis_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(23)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ariana,Café,Coffee Shop,Supermarket,Restaurant,Electronics Store,Cultural Center,Department Store,Dessert Shop,Diner,Women's Store
1,Ben Arous,Café,Coffee Shop,Gym / Fitness Center,Mobile Phone Shop,Big Box Store,Food Truck,Flower Shop,Fish & Chips Shop,Cultural Center,Fast Food Restaurant
2,Bizerte,Grocery Store,Beach,Flower Shop,Fish & Chips Shop,Diner,Women's Store,Cosmetics Shop,Gastropub,Furniture / Home Store,French Restaurant
3,Béja,Soccer Stadium,Women's Store,Coffee Shop,Gastropub,Furniture / Home Store,French Restaurant,Food Truck,Flower Shop,Fish & Chips Shop,Fast Food Restaurant
4,El Kef,Café,Historic Site,Gastropub,Pizza Place,Plaza,Electronics Store,Department Store,Dessert Shop,Diner,Fast Food Restaurant
5,Gafsa,Café,Women's Store,Cosmetics Shop,Grocery Store,Gastropub,Furniture / Home Store,French Restaurant,Food Truck,Flower Shop,Fish & Chips Shop
6,Kairouan,Historic Site,Hotel,Ice Cream Shop,Restaurant,Cosmetics Shop,Furniture / Home Store,French Restaurant,Food Truck,Flower Shop,Fish & Chips Shop
7,Kebili,Vietnamese Restaurant,Women's Store,Coffee Shop,Gastropub,Furniture / Home Store,French Restaurant,Food Truck,Flower Shop,Fish & Chips Shop,Fast Food Restaurant
8,Manouba,Café,History Museum,Bakery,Cafeteria,Fish & Chips Shop,Dessert Shop,Diner,Electronics Store,Fast Food Restaurant,Flower Shop
9,Nabeul,Coffee Shop,Shopping Mall,Food Truck,Flower Shop,Department Store,Café,Wings Joint,Beach,Dessert Shop,Grocery Store


### 4. Clustering neigborhoods

Let's run k-means to cluster our neighborhoods. 

In [20]:
# set number of clusters
kclusters = 5

Tunis_grouped_clustering = Tunis_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Tunis_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 4, 2, 0, 0, 3, 1, 0, 4])

In [25]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
Tunis_merged = tunis

# merge paris_grouped with paris data to add latitude/longitude for each neighborhood
Tunis_merged = Tunis_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='city')
Tunis_merged.head() # check the last columns!

Unnamed: 0,city,lat,lng,country,iso2,admin_name,capital,population,population_proper,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Tunis,36.8008,10.18,Tunisia,TN,Tunis,primary,1200000.0,1200000.0,4.0,Café,Fast Food Restaurant,Coffee Shop,Hotel,Bar,Restaurant,Theater,Movie Theater,Plaza,Mediterranean Restaurant
1,Sfax,34.75,10.72,Tunisia,TN,Sfax,admin,453050.0,277278.0,4.0,Coffee Shop,Furniture / Home Store,Café,Shopping Mall,French Restaurant,Food Truck,Flower Shop,Fish & Chips Shop,Cosmetics Shop,Fast Food Restaurant
2,Sousse,35.83,10.625,Tunisia,TN,Sousse,admin,327004.0,164123.0,4.0,Diner,Coffee Shop,Café,Athletics & Sports,Snack Place,Fast Food Restaurant,Department Store,Dessert Shop,Electronics Store,Women's Store
3,Gabès,33.9004,10.1,Tunisia,TN,Gabès,admin,219517.0,110075.0,,,,,,,,,,,
4,Kairouan,35.6804,10.1,Tunisia,TN,Kairouan,admin,144522.0,119794.0,3.0,Historic Site,Hotel,Ice Cream Shop,Restaurant,Cosmetics Shop,Furniture / Home Store,French Restaurant,Food Truck,Flower Shop,Fish & Chips Shop


In [33]:
Tunis_merged.isnull().sum()
Tunis_merged.fillna(2, inplace=True)


Let's try to visualize the resulting clusters

In [34]:
Tunis_merged['Cluster Labels'] = Tunis_merged['Cluster Labels'].astype(int)

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Tunis_merged['lat'], Tunis_merged['lng'], Tunis_merged['city'], Tunis_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now let's continue with adding movie datasets

# Part 2. Monument in Tunisia

In [43]:
mh=pd.read_excel('hmm.xlsx')
mh.head()

Unnamed: 0,Name,LATITUDE,LONGITUDE
0,Dougga,36.422938,9.219348
1,Amphithéâtre El Djem,35.296673,10.706873
2,Thermes d'Antonin,36.854231,10.33496
3,Musée national du Bardo,36.809459,10.134036
4,Quartier Punique de Byrsa,36.8525,10.322778


Now we will build a map with the location of 27 Monument in Tunisia. And let's join our map with the cluster map.

In [44]:
from IPython.core.display import HTML

HTML(movies_map._repr_html_())

from folium import plugins

# instantiate a mark cluster object for the incidents in the dataframe
movies_mark = plugins.MarkerCluster().add_to(map_clusters)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(mh.LATITUDE, mh.LONGITUDE, mh.Name):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=folium.Popup(label,parse_html=True),
    ).add_to(movies_mark)

# display map
map_clusters

Here we go! We have created a map containing our clusters, as well as locations of the monuments. How can we use it? Well when opening a tourist agency, we can clearly see the places that might interest our customers. We have divided our venues to clusters which should also help us to diversify.