# Capstone Project

## Introduction

In spite of the use of social media and internet as the massive way to advertising, nowadays we keep seeing advertising in the middle of the cities and streets like posters or billboards. This means that promote a product or event through posters among the city is an old but still efficient way to publicize. So, the question that I will try to answer in this project is: where would you put an advertising in a city like Bogotá in order to impact the greatest number of people?

Even though the impact of a publicity depends on other factors like what product or event are your promoting, what are your target customers and what is the strategy used in the poster, to choose a correct place to show your advertising can be an advantage and also the definitive factor when we measure the success or fail of the publicity. It doesn’t have sense to spend many dollars for a poster if at the end it will be display in an incorrect place and nobody will see it.

Hence, we should choose areas with a good visibility where you can impact the right audience. That’s why it is always a good idea to show our advertising in public spaces, events, companies, restaurants, hotels, coffee shops, shopping centers, and trending places where there is a big affluent of people. 

Therefore, the objective in this project will be found the busiest areas in the city of Bogotá where we can put our advertising and get positive results.

## Data

The data is based in the city of Bogotá, the capital city of Colombia. The city is divided into 20 different borough and we will use foursquare API to identify the venues for each borough and neighborhood. 

So, I will get information about venues like name, ID, location, and category from Foursquare API. I also will use a table with the latitude and longitude of every location in Bogotá.

### Import the data

In [1]:
# Import packages
import numpy as np # library to handle data in a vectorized manner

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import json
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [5]:
#Intall folium
!pip install folium
import folium



In [14]:
# The code was removed by Watson Studio for sharing.

In [15]:
# Read the data of Localities of Bogotá
df_data_1 = pd.read_csv(body, sep=';')
df_data_1.head()

Unnamed: 0,LOCALIDAD,LONGITUD,LATITUD,CODIGO,gp
0,CHAPINERO,-74.0467,4.6569,2,"-74.0467,4.6569"
1,TUNJUELITO,-74.1407,4.5875,6,"-74.1407,4.5875"
2,ANTONIO NARIÑO,-74.1009,4.5486,15,"-74.1009,4.5486"
3,PUENTE ARANDA,-74.1227,4.6149,16,"-74.1227,4.6149"
4,USAQUÉN,-74.0312,4.7485,1,"-74.0312,4.7485"


### Map of Localities of Bogotá

In [6]:
LatBogota = 4.60971
LongBogota = -74.08175

In [7]:
# Map of Bogotá
map_bogota = folium.Map(location=[LatBogota,LongBogota], zoom_start=11)

for lat, lng, label in zip(df_data_1['LATITUD'], df_data_1['LONGITUD'], df_data_1['LOCALIDAD']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bogota)
    
map_bogota

### Venues of Bogotá

In [11]:
# The code was removed by Watson Studio for sharing.

In [20]:
LIMIT = 100
def getNearbyVenues(names, latitudes, longitudes, radius=3000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['LOCALIDAD', 
                  'LATITUD', 
                  'LONGITUD', 
                  'LUGAR', 
                  'LUGAR LATITUD', 
                  'LUGAR LONGITUS', 
                  'CATEGORIA']
    
    return(nearby_venues)

In [21]:
lugares_Bogota = getNearbyVenues(names=df_data_1['LOCALIDAD'],
                                 latitudes=df_data_1['LATITUD'],
                                 longitudes=df_data_1['LONGITUD'])

CHAPINERO
TUNJUELITO
ANTONIO NARIÑO
PUENTE ARANDA
USAQUÉN
BOGOTÁ
BOSA
CIUDAD BOLÍVAR
RAFAEL URIBE URIBE
KENNEDY
BARRIOS UNIDOS
ENGATIVÁ
SUMAPAZ
TEUSAQUILLO
LA CANDELARIA
SANTA FE
SUBA
FONTIBÓN
LOS MÁRTIRES
SAN CRISTOBAL
USME


In [22]:
print(lugares_Bogota.shape)
lugares_Bogota.head()

(1351, 7)


Unnamed: 0,LOCALIDAD,LATITUD,LONGITUD,LUGAR,LUGAR LATITUD,LUGAR LONGITUS,CATEGORIA
0,CHAPINERO,4.6569,-74.0467,Bandido Bistro,4.661514,-74.050307,French Restaurant
1,CHAPINERO,4.6569,-74.0467,Quebrada La Vieja,4.650833,-74.049511,Scenic Lookout
2,CHAPINERO,4.6569,-74.0467,El Caracol Azul,4.656121,-74.053203,Peruvian Restaurant
3,CHAPINERO,4.6569,-74.0467,Harry Sasson,4.659021,-74.054525,Restaurant
4,CHAPINERO,4.6569,-74.0467,Brot Bakery & Cafe,4.663257,-74.050578,Bakery


## Methodology

### Exploratory Data Analysis

Let's check how many places are there for every locality and eliminate the localities with less than 15 venues.

In [28]:
# count the number of venues per locality
lugares_Bogota.groupby('LOCALIDAD').count()

Unnamed: 0_level_0,LATITUD,LONGITUD,LUGAR,LUGAR LATITUD,LUGAR LONGITUS,CATEGORIA
LOCALIDAD,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ANTONIO NARIÑO,15,15,15,15,15,15
BARRIOS UNIDOS,100,100,100,100,100,100
BOGOTÁ,100,100,100,100,100,100
BOSA,16,16,16,16,16,16
CHAPINERO,100,100,100,100,100,100
CIUDAD BOLÍVAR,3,3,3,3,3,3
ENGATIVÁ,100,100,100,100,100,100
FONTIBÓN,78,78,78,78,78,78
KENNEDY,100,100,100,100,100,100
LA CANDELARIA,100,100,100,100,100,100


In [37]:
# Eliminate the vanues with less than 15 venues
lugares_Bogota2 = lugares_Bogota.drop(lugares_Bogota[(lugares_Bogota['LOCALIDAD']=='CIUDAD BOLÍVAR')|
                                        (lugares_Bogota['LOCALIDAD']=='SAN CRISTOBAL')|
                                        (lugares_Bogota['LOCALIDAD']=='SANTA FE')|
                                        (lugares_Bogota['LOCALIDAD']=='SUMAPAZ')|
                                        (lugares_Bogota['LOCALIDAD']=='USME')].index)
print(lugares_Bogota2.shape)
lugares_Bogota2.head()

(1329, 7)


Unnamed: 0,LOCALIDAD,LATITUD,LONGITUD,LUGAR,LUGAR LATITUD,LUGAR LONGITUS,CATEGORIA
0,CHAPINERO,4.6569,-74.0467,Bandido Bistro,4.661514,-74.050307,French Restaurant
1,CHAPINERO,4.6569,-74.0467,Quebrada La Vieja,4.650833,-74.049511,Scenic Lookout
2,CHAPINERO,4.6569,-74.0467,El Caracol Azul,4.656121,-74.053203,Peruvian Restaurant
3,CHAPINERO,4.6569,-74.0467,Harry Sasson,4.659021,-74.054525,Restaurant
4,CHAPINERO,4.6569,-74.0467,Brot Bakery & Cafe,4.663257,-74.050578,Bakery


Let's check how many places are per category.

In [47]:
# count the number of venues per category
lugares_Bogota2.groupby('CATEGORIA').count()

Unnamed: 0_level_0,LOCALIDAD,LATITUD,LONGITUD,LUGAR,LUGAR LATITUD,LUGAR LONGITUS
CATEGORIA,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Advertising Agency,1,1,1,1,1,1
Airport,3,3,3,3,3,3
Airport Lounge,5,5,5,5,5,5
Airport Service,2,2,2,2,2,2
American Restaurant,2,2,2,2,2,2
...,...,...,...,...,...,...
Water Park,1,1,1,1,1,1
Wine Bar,3,3,3,3,3,3
Wings Joint,9,9,9,9,9,9
Women's Store,1,1,1,1,1,1


## Machine Learning

### DBSCAN

In [39]:
from sklearn.cluster import DBSCAN 
from sklearn.preprocessing import StandardScaler

In [49]:
Data_set = lugares_Bogota2[['LUGAR LATITUD','LUGAR LONGITUS']]
Data_set = StandardScaler().fit_transform(Data_set)
db = DBSCAN(eps=0.15, min_samples=10).fit(Data_set)
labels = db.labels_

In [50]:
lugares_Bogota2['Labels']=labels
lugares_Bogota2.head()

Unnamed: 0,LOCALIDAD,LATITUD,LONGITUD,LUGAR,LUGAR LATITUD,LUGAR LONGITUS,CATEGORIA,Labels
0,CHAPINERO,4.6569,-74.0467,Bandido Bistro,4.661514,-74.050307,French Restaurant,0
1,CHAPINERO,4.6569,-74.0467,Quebrada La Vieja,4.650833,-74.049511,Scenic Lookout,0
2,CHAPINERO,4.6569,-74.0467,El Caracol Azul,4.656121,-74.053203,Peruvian Restaurant,0
3,CHAPINERO,4.6569,-74.0467,Harry Sasson,4.659021,-74.054525,Restaurant,0
4,CHAPINERO,4.6569,-74.0467,Brot Bakery & Cafe,4.663257,-74.050578,Bakery,0


In [51]:
lugares_Bogota2['Labels'].unique()

array([ 0, -1,  1,  2,  3,  4,  5,  6,  7, 13, 18,  8,  9, 10, 11, 12, 14,
       15, 16, 17, 19, 20, 21])

## Results

In [87]:
Final_table = lugares_Bogota2[['LOCALIDAD','LUGAR LATITUD','LUGAR LONGITUS','Labels']]

# eliminar los atípicos
Final_table = Final_table.drop(Final_table[(Final_table['Labels']==-1)].index)

# Calcular el numero de lugares
Table2 = Final_table.groupby(['LOCALIDAD','Labels']).count()

#Final table
Final_table = Final_table.groupby(['LOCALIDAD','Labels']).mean()
Final_table['Count']= Table2['LUGAR LATITUD']
Final_table['sdt Count'] = 2*(Table2['LUGAR LATITUD']+1-Table2['LUGAR LATITUD'].min())/(Table2['LUGAR LATITUD'].max()-Table2['LUGAR LATITUD'].min())
print(Final_table.head())

                       LUGAR LATITUD  LUGAR LONGITUS  Count  sdt Count
LOCALIDAD      Labels                                                 
ANTONIO NARIÑO 4            4.572171      -74.094196      4   0.080808
BARRIOS UNIDOS 0            4.660290      -74.060696     12   0.242424
               11           4.662466      -74.084493     54   1.090909
               14           4.655229      -74.104100     20   0.404040
BOGOTÁ         7            4.599708      -74.094924      1   0.020202


In [88]:
Final = Final_table.reset_index()
Final

Unnamed: 0,LOCALIDAD,Labels,LUGAR LATITUD,LUGAR LONGITUS,Count,sdt Count
0,ANTONIO NARIÑO,4,4.572171,-74.094196,4,0.080808
1,BARRIOS UNIDOS,0,4.66029,-74.060696,12,0.242424
2,BARRIOS UNIDOS,11,4.662466,-74.084493,54,1.090909
3,BARRIOS UNIDOS,14,4.655229,-74.1041,20,0.40404
4,BOGOTÁ,7,4.599708,-74.094924,1,0.020202
5,BOGOTÁ,10,4.61595,-74.085586,8,0.161616
6,BOGOTÁ,11,4.607775,-74.070968,90,1.818182
7,CHAPINERO,0,4.659989,-74.053965,100,2.020202
8,ENGATIVÁ,15,4.711159,-74.111583,27,0.545455
9,ENGATIVÁ,16,4.695167,-74.086021,32,0.646465


In [90]:
# create a map for every locality!

# create map
map_clusters = folium.Map(location=[LatBogota,LongBogota], zoom_start=11)
kclusters = 21
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, cluster, poi, cnt in zip(Final['LUGAR LATITUD'], Final['LUGAR LONGITUS'], Final['Labels'], Final['LOCALIDAD'], Final['sdt Count']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10*cnt,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Finally, we have a circle in each dense area of Bogotá in proportion with the number of venues in the zone.

In [93]:
Final.sort_values('Count',ascending=False).head()

Unnamed: 0,LOCALIDAD,Labels,LUGAR LATITUD,LUGAR LONGITUS,Count,sdt Count
7,CHAPINERO,0,4.659989,-74.053965,100,2.020202
19,LA CANDELARIA,11,4.604675,-74.070945,93,1.878788
6,BOGOTÁ,11,4.607775,-74.070968,90,1.818182
49,USAQUÉN,8,4.74389,-74.041702,82,1.656566
24,LOS MÁRTIRES,11,4.603669,-74.071974,74,1.494949


## Discussion

I can notice that our project was limit by the number of venues per call, so it will be interest to realize the same procedure dividing the localities into neighborhood and finding more venues of each one. Another aspect to improve is the change of the radius that we use to make the call in Foursquares API, another improvement can be to find new models with different radius.

For the DBSCAN method I use the parameter min_samples equal to 10. It is an opportunity to find new results if we decide to change this parameter. Finally, It would interesting if we could add more features to the clustering method because the successfull of an advertising doesn't only depends of the population density.

## Conclusion

In this project, I create a list with at most 100 venues per locality, then I cluster then using the method DBSCAN, so I cluster it based in density or looking for the areas with more public venues. I could group the venues and finally found that the better areas to advertise something using a poster are CHAPINEIRO, LA CANDELARIA, BOGOTÁ, USAQUÉN and LOS MARTIRES. There are still parameters to improve but I could find a good model.

