# Valparaiso vs. Viña del Mar
## Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods
### Instructions:

>Now that you have been equipped with the skills and the tools to use location data to explore a geographical location, over the course of two weeks, you will have the opportunity to be as creative as you want and come up with an idea to leverage the Foursquare location data to explore or compare neighborhoods or cities of your choice or to come up with a problem that you can use the Foursquare location data to solve. If you cannot think of an idea or a problem, here are some ideas to get you started:

>1. In Module 3, we explored New York City and the city of Toronto and segmented and clustered their neighborhoods. Both cities are very diverse and are the financial capitals of their respective countries. One interesting idea would be to compare the neighborhoods of the two cities and determine how similar or dissimilar they are. Is New York City more like Toronto or Paris or some other multicultural city? I will leave it to you to refine this idea.
>2. In a city of your choice, if someone is looking to open a restaurant, where would you recommend that they open it? Similarly, if a contractor is trying to start their own business, where would you recommend that they setup their office?

>These are just a couple of many ideas and problems that can be solved using location data in addition to other datasets. No matter what you decide to do, make sure to provide sufficient justification of why you think what you want to do or solve is important and why would a client or a group of people be interested in your project.

>### Review criteria

>This capstone project will be graded by your peers. This capstone project is worth 70% of your total grade. The project will be completed over the course of 2 weeks.  Week 1 submissions will be worth 30% whereas week 2 submissions will be worth 40% of your total grade.

>#### For week 1, you will required to submit the following:

>1. A description of the problem and a discussion of the background. (15 marks)
>2. A description of the data and how it will be used to solve the problem. (15 marks)

>#### For week 2, the final deliverables of the project will be:

>1. A link to your Notebook on your Github repository, showing your code. (15 marks)
>2. A full report consisting of all of the following components (15 marks):
>    - Introduction where you discuss the business problem and who would be interested in this project.
>    - Data where you describe the data that will be used to solve the problem and the source of the data.
>    - Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
>    - Results section where you discuss the results.
>    - Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
>    - Conclusion section where you conclude the report.
>3. Your choice of a presentation or blogpost. (10 marks)
___

# Valparaiso vs. Viña del Mar
## Introduction/Business Problem
### A description of the problem and a discussion of the background. (15 marks)
In my country, Chile, there are two cities very close to each other but very different too: Santiago and Valparaiso. Both are very touristic cities but there are very noticeable differences when you travel there, such as Viña del Mar being way wealthier than Valparaiso, having more commerce and big events, but Valparaiso has a lot of more historical places, trading, and old stores that refuse to die. These differences are very important to take into account when trying to open a specific business in one of those two cities, therefore it would be very valuable to have insights obtained from real data in order to give eventual business owners and enterpreneurs the opportunity to make an informed decision.

## Data
### A description of the data and how it will be used to solve the problem. (15 marks)
The data available at Foursquare gives the unique oportunity of clustering neighborhoods based on the venues that are in them, thus allowing to compare specific areas of a city and compare that with areas from another city. To gain insights about what kind of business should be made within a city (and where in the city, too), I would like to compare both cities using Foursquare's API and finally be able to point out the advantages of one city over the other when trying to open a business in certain areas. This way, I will recommend to a client to open a specific store at one of the two cities (for example) because there's less competition, the current venues aren't highly rated or any other kind of indicator that will be obtained when working with the data.

## Week 2
### Importing libraries

In [1]:
import requests
import pandas as pd
import numpy as np
import random

In [3]:
from geopy.geocoders import Nominatim

In [4]:
from IPython.display import Image
from IPython.core.display import HTML

In [40]:
#from pandas.io.json import json_normalize

In [9]:
import folium

In [52]:
from IPython.display import display

In [89]:
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

### Defining Foursquare's API credentials

In [251]:
#CLIENT_ID = 'DFCYOGX34RLW0Y3GEB2ZQX1ZNE25SMFGIJEVDHOGIGBAH5R2'
#CLIENT_SECRET = 'Y0ILH5I0J3VB5B2ENNAVL3KCZBIS0XCRCVZV5GYGQUOMM5VS'
#VERSION = '20180604'
#LIMIT = 100
#radius = 1500

### Let's define a point at the center of Valparaiso and a different point at the center of Viña del Mar
#### Valparaiso, Parque Italia's northwest's corner:

In [253]:
latitudeValpo = -33.047201
longitudeValpo = -71.614716

In [250]:
#urlValpo = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID,CLIENT_SECRET,latitudeValpo,longitudeValpo,VERSION,radius,LIMIT)
#urlValpo

In [33]:
resultsValpo = requests.get(urlValpo).json()
'There are {} venues near central Valparaiso'.format(len(resultsValpo['response']['groups'][0]['items']))

'There are 100 venues near central Valparaiso'

#### Viña del Mar, Plaza Vergara's center:

In [254]:
latitudeVina = -33.024584
longitudeVina = -71.551847

In [252]:
#urlVina = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID,CLIENT_SECRET,latitudeVina,longitudeVina,VERSION,radius,LIMIT)
#urlVina

In [35]:
resultsVina = requests.get(urlVina).json()
'There are {} venues near central Viña del Mar'.format(len(resultsValpo['response']['groups'][0]['items']))

'There are 100 venues near central Viña del Mar'

### Let's process these JSONs and convert them to dataframes:

In [36]:
itemsValpo = resultsValpo['response']['groups'][0]['items']
itemsValpo[0]

{'reasons': {'count': 0,
  'items': [{'summary': 'This spot is popular',
    'type': 'general',
    'reasonName': 'globalInteractionReason'}]},
 'venue': {'id': '570bcfb2498eb339ee9d9d95',
  'name': 'Sazón Nazca',
  'location': {'address': 'Rodríguez  473',
   'lat': -33.04674935109481,
   'lng': -71.61606561490522,
   'labeledLatLngs': [{'label': 'display',
     'lat': -33.04674935109481,
     'lng': -71.61606561490522}],
   'distance': 135,
   'cc': 'CL',
   'city': 'Valparaíso',
   'state': 'Valparaíso',
   'country': 'Chile',
   'formattedAddress': ['Rodríguez  473',
    'Valparaíso',
    'Valparaíso',
    'Chile']},
  'categories': [{'id': '4eb1bfa43b7b52c0e1adc2e8',
    'name': 'Peruvian Restaurant',
    'pluralName': 'Peruvian Restaurants',
    'shortName': 'Peruvian',
    'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/peruvian_',
     'suffix': '.png'},
    'primary': True}],
  'photos': {'count': 0, 'groups': []}},
 'referralId': 'e-0-570bcfb2498eb339ee9d9d95-

In [37]:
itemsVina = resultsVina['response']['groups'][0]['items']
itemsVina[0]

{'reasons': {'count': 0,
  'items': [{'summary': 'This spot is popular',
    'type': 'general',
    'reasonName': 'globalInteractionReason'}]},
 'venue': {'id': '506f4611e4b0184b5aa12d2d',
  'name': 'Frank Hostel',
  'location': {'address': 'Avenida Valparaiso',
   'crossStreet': 'Etchevers',
   'lat': -33.02446367351446,
   'lng': -71.55465197864842,
   'labeledLatLngs': [{'label': 'display',
     'lat': -33.02446367351446,
     'lng': -71.55465197864842}],
   'distance': 262,
   'postalCode': '2571511',
   'cc': 'CL',
   'city': 'Viña del Mar',
   'state': 'Valparaíso',
   'country': 'Chile',
   'formattedAddress': ['Avenida Valparaiso (Etchevers)',
    '2571511 Viña del Mar',
    'Valparaíso',
    'Chile']},
  'categories': [{'id': '4bf58dd8d48988d1ee931735',
    'name': 'Hostel',
    'pluralName': 'Hostels',
    'shortName': 'Hostel',
    'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/hostel_',
     'suffix': '.png'},
    'primary': True}],
  'photos': {'count': 

In [39]:
dataframeValpo = pd.json_normalize(itemsValpo)
dataframeVina = pd.json_normalize(itemsVina)

In [42]:
filteredColumnsValpo = ['venue.name','venue.categories'] + [col for col in dataframeValpo.columns if col.startswith('venue.location.')]+['venue.id']
dataframeFilteredValpo = dataframeValpo.loc[:,filteredColumnsValpo]
filteredColumnsVina = ['venue.name','venue.categories'] + [col for col in dataframeVina.columns if col.startswith('venue.location.')]+['venue.id']
dataframeFilteredVina = dataframeVina.loc[:,filteredColumnsVina]

In [44]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [45]:
dataframeFilteredValpo['venue.categories']=dataframeFilteredValpo.apply(get_category_type,axis=1)
dataframeFilteredVina['venue.categories']=dataframeFilteredVina.apply(get_category_type,axis=1)

In [47]:
dataframeFilteredValpo.columns = [col.split('.')[-1] for col in dataframeFilteredValpo.columns]
dataframeFilteredVina.columns =  [col.split('.')[-1] for col in dataframeFilteredVina.columns]

In [48]:
dataframeFilteredValpo.head(10)

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,cc,city,state,country,formattedAddress,crossStreet,postalCode,neighborhood,id
0,Sazón Nazca,Peruvian Restaurant,Rodríguez 473,-33.046749,-71.616066,"[{'label': 'display', 'lat': -33.0467493510948...",135,CL,Valparaíso,Valparaíso,Chile,"[Rodríguez 473, Valparaíso, Valparaíso, Chile]",,,,570bcfb2498eb339ee9d9d95
1,Habrakadabra Sabores,Pizza Place,Independencia 2089,-33.048186,-71.615182,"[{'label': 'display', 'lat': -33.048186, 'lng'...",117,CL,Valparaíso,Valparaíso,Chile,"[Independencia 2089 (Freire), Valparaíso, Valp...",Freire,,,54cd707f498e0e083b18ba42
2,La Riviera,Pizza Place,Pedro Montt 2405,-33.047278,-71.610947,"[{'label': 'display', 'lat': -33.0472778114022...",351,CL,Valparaíso,Valparaíso,Chile,"[Pedro Montt 2405, Valparaíso, Valparaíso, Chile]",,,,4de410f145dd180ae56d2f0a
3,Bogarín,Sandwich Place,Plaza Victoria 1670,-33.046735,-71.619637,"[{'label': 'display', 'lat': -33.0467354163366...",462,CL,Valparaíso,Valparaíso,Chile,"[Plaza Victoria 1670, Valparaíso, Valparaíso, ...",,,,4b61eaf6f964a5202a2b2ae3
4,Govindas,Vegetarian / Vegan Restaurant,General Cruz 539,-33.048297,-71.613903,"[{'label': 'display', 'lat': -33.0482972466697...",143,CL,Valparaíso,Valparaíso,Chile,"[General Cruz 539 (Parque Italia), Valparaíso,...",Parque Italia,,,549452ef498e025da41fbe78
5,Hotzenplotz,German Restaurant,Hector Calvo 331,-33.048339,-71.62251,"[{'label': 'display', 'lat': -33.048339, 'lng'...",738,CL,Valparaíso,Valparaíso,Chile,"[Hector Calvo 331, Valparaíso, Valparaíso, Chile]",,,,54136e12498e5447674366c9
6,Sazón Nazca,Peruvian Restaurant,Edwards 636,-33.047146,-71.619604,"[{'label': 'display', 'lat': -33.0471459946967...",456,CL,Valparaíso,Valparaíso,Chile,"[Edwards 636, Valparaíso, Valparaíso, Chile]",,,,557c7d97498e95a6f8f191f8
7,Café CasaPlan,Art Gallery,Brasil 1490,-33.044888,-71.62099,"[{'label': 'display', 'lat': -33.044888, 'lng'...",639,CL,Valparaíso,Valparaíso,Chile,"[Brasil 1490, Valparaíso, Valparaíso, Chile]",,,,56043f43498ec58350f23b9e
8,Museo de Historia Natural de Valparaíso,History Museum,Condell 1546,-33.046391,-71.621133,"[{'label': 'display', 'lat': -33.0463906513234...",605,CL,Valparaíso,Valparaíso,Chile,"[Condell 1546, Valparaíso, Valparaíso, Chile]",,,,4b5219d8f964a520dd6727e3
9,Arte y Salero,Tapas Restaurant,Av. Pedro Montt 2382,-33.047448,-71.611258,"[{'label': 'display', 'lat': -33.047448205669,...",323,CL,Valparaíso,Valparaíso,Chile,"[Av. Pedro Montt 2382 (2do piso), Valparaíso, ...",2do piso,,,50f416a219a9c283eb806268


In [49]:
dataframeFilteredVina.head(10)

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,id
0,Frank Hostel,Hostel,Avenida Valparaiso,Etchevers,-33.024464,-71.554652,"[{'label': 'display', 'lat': -33.0244636735144...",262,2571511.0,CL,Viña del Mar,Valparaíso,Chile,"[Avenida Valparaiso (Etchevers), 2571511 Viña ...",506f4611e4b0184b5aa12d2d
1,Purolivo,Gourmet Shop,Galería Somar,Loc. 6-9,-33.024226,-71.553209,"[{'label': 'display', 'lat': -33.0242261422981...",133,,CL,Viña del Mar,Valparaíso,Chile,"[Galería Somar (Loc. 6-9), Viña del Mar, Valpa...",51ffd3dee4b07abc4ba8469b
2,Panzoni,Italian Restaurant,Paseo Cousiño 12,e/ Viana y Av. Valparaíso,-33.025731,-71.553494,"[{'label': 'display', 'lat': -33.0257310185469...",199,,CL,Viña del Mar,Valparaíso,Chile,"[Paseo Cousiño 12 (e/ Viana y Av. Valparaíso),...",4bba278498c7ef3bd0c43202
3,Fuente de Soda Cevasco,Hot Dog Joint,Av. Valparaiso 700,,-33.024952,-71.552665,"[{'label': 'display', 'lat': -33.0249521757472...",86,,CL,Viña del Mar,Valparaíso,Chile,"[Av. Valparaiso 700, Viña del Mar, Valparaíso,...",4b5b31d5f964a5203bea28e3
4,Déjà Vu,Latin American Restaurant,"Calle Viana 144, 2do. piso",Paseo Cousiño,-33.025785,-71.553563,"[{'label': 'display', 'lat': -33.0257852977384...",208,,CL,Viña del Mar,Valparaíso,Chile,"[Calle Viana 144, 2do. piso (Paseo Cousiño), V...",4d4ae4399544a093bf0c37e7
5,Bogarín,Juice Bar,Av. Valparaíso 533,Quinta,-33.024535,-71.554497,"[{'label': 'display', 'lat': -33.0245345160844...",247,,CL,Viña del Mar,Valparaíso,Chile,"[Av. Valparaíso 533 (Quinta), Viña del Mar, Va...",4b802f0af964a520285a30e3
6,La Nonna,Fast Food Restaurant,Quinta 255,,-33.025106,-71.554167,"[{'label': 'display', 'lat': -33.0251055711397...",224,,CL,,,Chile,"[Quinta 255, Chile]",4f28110de4b055df2814aca6
7,Quinta Vergara,Forest,,,-33.02799,-71.552421,"[{'label': 'display', 'lat': -33.0279900316624...",382,,CL,,,Chile,[Chile],541c8e4f498e0fca5a734959
8,Hotel Pacifico,Hotel,2 Poniente 154,,-33.020759,-71.553613,"[{'label': 'display', 'lat': -33.0207588089243...",456,,CL,,,Chile,"[2 Poniente 154, Chile]",4e3c713a62e19d61096127cc
9,Empanadas Royal,Diner,Traslaviña 138,,-33.023503,-71.558202,"[{'label': 'display', 'lat': -33.0235028231584...",605,,CL,Viña del Mar,Valparaíso,Chile,"[Traslaviña 138, Viña del Mar, Valparaíso, Chile]",4e87e9204fc6adea8c8f1898


In [79]:
# Had to remove this character ' in order for Folium to be able to put all categories into labels. Problem was "Men's store".
dataframeFilteredValpo['categories'] = dataframeFilteredValpo['categories'].str.replace("[']","")

### Now let's create a map for Valparaiso, Viña del Mar and both combined too
#### Valparaiso:

In [81]:
venuesMapValpo = folium.Map(location=[latitudeValpo,longitudeValpo],zoom_start=15)

folium.CircleMarker(
    [latitudeValpo,longitudeValpo],
    radius=10,
    popup='Center of Valparaiso',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6).add_to(venuesMapValpo)

for lat, lng, label in zip(dataframeFilteredValpo.lat, dataframeFilteredValpo.lng, dataframeFilteredValpo.categories):
    folium.CircleMarker([lat,lng],radius=5,fill=True,popup=label,color='blue',fill_color='blue',fill_opacity=0.6).add_to(venuesMapValpo)

display(venuesMapValpo)

#### Viña del Mar:

In [82]:
venuesMapVina = folium.Map(location=[latitudeVina,longitudeVina],zoom_start=15)

folium.CircleMarker(
    [latitudeVina,longitudeVina],
    radius=10,
    popup='Center of Viña del Mar',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6).add_to(venuesMapVina)

for lat, lng, label in zip(dataframeFilteredVina.lat, dataframeFilteredVina.lng, dataframeFilteredVina.categories):
    folium.CircleMarker([lat,lng],radius=5,fill=True,popup=label,color='blue',fill_color='blue',fill_opacity=0.6).add_to(venuesMapVina)

display(venuesMapVina)

In [179]:
dataframeValpoVina = dataframeFilteredVina
dataframeValpoVina = dataframeValpoVina.append(dataframeFilteredValpo, ignore_index=True, sort=False)
dataframeValpoVina.tail(5)

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,id
195,Delicias Express,Empanada Restaurant,Urriola 358,Prat,-33.040724,-71.627812,"[{'label': 'display', 'lat': -33.0407243530363...",1418,,CL,Valparaíso,Valparaíso,Chile,"[Urriola 358 (Prat), Valparaíso, Valparaíso, C...",4c77e78d9f51a0939c82b627
196,Casa Galos Apart Hotel,Hotel,Galos 595,Almirante Montt,-33.04406,-71.630037,"[{'label': 'display', 'lat': -33.0440601947333...",1471,,CL,,Valparaíso,Chile,"[Galos 595 (Almirante Montt), Cerro Alegre, Va...",4fdf7be8e5e8242caa618b82
197,Hotel Thomas Somerscales,Hotel,San Enrique 446 Co Alegre,,-33.04334,-71.629977,"[{'label': 'display', 'lat': -33.0433395484664...",1487,,CL,Valparaíso,Valparaíso,Chile,"[San Enrique 446 Co Alegre, Valparaíso, Valpar...",4b805ef0f964a5204c6c30e3
198,La Joya Hostel,Hostel,Quillota 80,,-33.044919,-71.603845,"[{'label': 'display', 'lat': -33.0449192616162...",1045,,CL,Valparaíso,Valparaíso,Chile,"[Quillota 80, Valparaíso, Valparaíso, Chile]",56c8e1d4cd10968b51e5e52d
199,Natur-in,Juice Bar,Avenida Colón 2634,Morris,-33.050432,-71.608877,"[{'label': 'display', 'lat': -33.0504316806504...",652,,CL,Valparaíso,Valparaíso,Chile,"[Avenida Colón 2634 (Morris), Valparaíso, Valp...",4d55fea4d0a72c0ffc77227c


#### Valparaiso and Viña del Mar:

In [184]:
latitudeValpoVina = -33.037395
longitudeValpoVina = -71.584546

venuesMapValpoVina2 = folium.Map(location=[latitudeValpoVina,longitudeValpoVina],zoom_start=14)

folium.CircleMarker(
    [latitudeVina,longitudeVina],
    radius=10,
    popup='Center of Viña del Mar',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6).add_to(venuesMapValpoVina2)

folium.CircleMarker(
    [latitudeValpo,longitudeValpo],
    radius=10,
    popup='Center of Valparaiso',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6).add_to(venuesMapValpoVina2)

for lat, lng, label in zip(dataframeValpoVina.lat, dataframeValpoVina.lng, dataframeValpoVina.categories):
    folium.CircleMarker([lat,lng],radius=5,fill=True,popup=label,color='blue',fill_color='blue',fill_opacity=0.6).add_to(venuesMapValpoVina2)

    
display(venuesMapValpoVina2)

### Let's see what kind of venues each city has the most

In [135]:
df = dataframeFilteredValpo.groupby('categories').count()
df['id'].sort_values(ascending=False).head(10)

categories
Restaurant             9
Hotel                  7
Pizza Place            6
Bar                    4
Café                   4
Bakery                 3
Dessert Shop           3
Peruvian Restaurant    3
Scenic Lookout         3
Neighborhood           3
Name: id, dtype: int64

In [141]:
df = dataframeFilteredVina.groupby('categories').count()
df['id'].sort_values(ascending=False).head(10)

categories
Sushi Restaurant      7
Coffee Shop           6
Italian Restaurant    5
Bed & Breakfast       5
Hotel                 5
Ice Cream Shop        4
Burger Joint          3
Pizza Place           3
Tea Room              3
Dessert Shop          2
Name: id, dtype: int64

### Now I will do one hot encoding for each city

In [202]:
vinaValpoOneHot = pd.get_dummies(dataframeValpoVina[['categories']],prefix="",prefix_sep="")
vinaValpoOneHot['city'] = dataframeValpoVina['city']
vinaValpoFixedColumns = [vinaValpoOneHot.columns[-1]] + list(vinaValpoOneHot.columns[:-1])
vinaValpoOneHot = vinaValpoOneHot[vinaValpoFixedColumns]
vinaValpoOneHot.head()

Unnamed: 0,city,Art Gallery,Art Museum,Austrian Restaurant,Bagel Shop,Bakery,Bar,Beach,Bed & Breakfast,Beer Bar,...,Supermarket,Surf Spot,Sushi Restaurant,Tailor Shop,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Yoga Studio
0,Viña del Mar,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Viña del Mar,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Viña del Mar,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Viña del Mar,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Viña del Mar,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [203]:
vinaValpoGrouped = vinaValpoOneHot.groupby('city').mean().reset_index()
vinaValpoGrouped

Unnamed: 0,city,Art Gallery,Art Museum,Austrian Restaurant,Bagel Shop,Bakery,Bar,Beach,Bed & Breakfast,Beer Bar,...,Supermarket,Surf Spot,Sushi Restaurant,Tailor Shop,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Yoga Studio
0,Cerro Concepción,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Valparaiso,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Valparaíso,0.022727,0.011364,0.0,0.0,0.034091,0.034091,0.0,0.022727,0.0,...,0.0,0.0,0.011364,0.0,0.011364,0.0,0.011364,0.011364,0.022727,0.0
3,Viña Del Mar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Viña del Mar,0.0,0.011628,0.011628,0.011628,0.0,0.023256,0.011628,0.046512,0.011628,...,0.011628,0.0,0.081395,0.011628,0.011628,0.023256,0.0,0.0,0.0,0.011628


### We can see above that we have a problem: Cerro Concepción is within Valparaíso, Valparaíso is duplicated (í and i are separated), and Viña del Mar is duplicated due to a capital letter.

### I will fix this now:

In [192]:
#vinaValpoOneHot['city'] = vinaValpoOneHot['city'].str.replace("Viña Del Mar","Viña del Mar")
#vinaValpoOneHot['city'] = vinaValpoOneHot['city'].str.replace("Cerro Concepción","Valparaíso")
#vinaValpoOneHot['city'] = vinaValpoOneHot['city'].str.replace("Valparaiso","Valparaíso")

In [193]:
#vinaValpoGrouped = vinaValpoOneHot.groupby('city').mean().reset_index()
#vinaValpoGrouped

Unnamed: 0,city,Art Gallery,Art Museum,Austrian Restaurant,Bagel Shop,Bakery,Bar,Beach,Bed & Breakfast,Beer Bar,...,Supermarket,Surf Spot,Sushi Restaurant,Tailor Shop,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Yoga Studio
0,Valparaíso,0.022222,0.011111,0.0,0.0,0.033333,0.033333,0.0,0.022222,0.0,...,0.0,0.0,0.011111,0.0,0.011111,0.0,0.011111,0.011111,0.022222,0.0
1,Viña del Mar,0.0,0.011494,0.011494,0.011494,0.0,0.022989,0.011494,0.045977,0.011494,...,0.011494,0.0,0.08046,0.011494,0.011494,0.022989,0.0,0.0,0.0,0.011494


In [204]:
vinaValpoGrouped.shape

(5, 90)

In [205]:
num_top_venues = 10
for city in vinaValpoGrouped['city']:
    print("----"+city+"----")
    temp = vinaValpoGrouped[vinaValpoGrouped['city']== city].T.reset_index()
    temp.columns = ['Venue','Freq']
    temp = temp.iloc[1:]
    temp['Freq'] = temp['Freq'].astype(float)
    temp = temp.round({'Freq':2})
    print(temp.sort_values('Freq',ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Cerro Concepción----
                   Venue  Freq
0           Dessert Shop   1.0
1            Art Gallery   0.0
2            Pizza Place   0.0
3               Pie Shop   0.0
4    Peruvian Restaurant   0.0
5  Performing Arts Venue   0.0
6                   Park   0.0
7   Other Great Outdoors   0.0
8              Nightclub   0.0
9           Neighborhood   0.0


----Valparaiso----
                   Venue  Freq
0                   Café   1.0
1            Art Gallery   0.0
2                 Museum   0.0
3            Pizza Place   0.0
4               Pie Shop   0.0
5    Peruvian Restaurant   0.0
6  Performing Arts Venue   0.0
7                   Park   0.0
8   Other Great Outdoors   0.0
9              Nightclub   0.0


----Valparaíso----
                 Venue  Freq
0           Restaurant  0.08
1          Pizza Place  0.07
2                Hotel  0.06
3         Neighborhood  0.03
4               Bakery  0.03
5                  Bar  0.03
6  Peruvian Restaurant  0.03
7                 C

In [206]:
def return_most_common_venues(row,num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [207]:
num_top_venues = 10
indicators = ['st','nd','rd']

columns = ['city']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1,indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
vinaValpoSortedVenues = pd.DataFrame(columns=columns)
vinaValpoSortedVenues['city'] = vinaValpoGrouped['city']

for ind in np.arange(vinaValpoGrouped.shape[0]):
    vinaValpoSortedVenues.iloc[ind,1:] = return_most_common_venues(vinaValpoGrouped.iloc[ind,:],num_top_venues)
    
vinaValpoSortedVenues.head()

Unnamed: 0,city,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Cerro Concepción,Dessert Shop,Yoga Studio,Ice Cream Shop,Diner,Empanada Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Forest
1,Valparaiso,Café,Yoga Studio,Cupcake Shop,Diner,Empanada Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Forest
2,Valparaíso,Restaurant,Pizza Place,Hotel,Café,Peruvian Restaurant,Neighborhood,Bar,Bakery,Japanese Restaurant,Bed & Breakfast
3,Viña Del Mar,Burger Joint,Yoga Studio,Cupcake Shop,Diner,Empanada Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Forest
4,Viña del Mar,Sushi Restaurant,Coffee Shop,Italian Restaurant,Bed & Breakfast,Hotel,Ice Cream Shop,Pizza Place,Diner,Park,Burger Joint


### Now I will cluster venues to get a visual representation

In [208]:
kclusters = 5
vinaValpoClustering = vinaValpoGrouped.drop('city',1)
kmeans = KMeans(n_clusters=kclusters,random_state=0).fit(vinaValpoClustering)
kmeans.labels_[0:10]

array([2, 1, 4, 3, 0], dtype=int32)

In [240]:
vinaValpoSortedVenues.insert(0,'Cluster Labels',kmeans.labels_)
vinaValpoSortedVenues

Unnamed: 0,Cluster Labels,city,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2,Cerro Concepción,Dessert Shop,Yoga Studio,Ice Cream Shop,Diner,Empanada Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Forest
1,1,Valparaiso,Café,Yoga Studio,Cupcake Shop,Diner,Empanada Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Forest
2,4,Valparaíso,Restaurant,Pizza Place,Hotel,Café,Peruvian Restaurant,Neighborhood,Bar,Bakery,Japanese Restaurant,Bed & Breakfast
3,3,Viña Del Mar,Burger Joint,Yoga Studio,Cupcake Shop,Diner,Empanada Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Forest
4,0,Viña del Mar,Sushi Restaurant,Coffee Shop,Italian Restaurant,Bed & Breakfast,Hotel,Ice Cream Shop,Pizza Place,Diner,Park,Burger Joint


In [241]:
vinaValpoMerged3 = dataframeValpoVina
vinaValpoMerged3 = vinaValpoMerged3.join(vinaValpoSortedVenues.set_index('city'), on='city')
vinaValpoMerged3.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Frank Hostel,Hostel,Avenida Valparaiso,Etchevers,-33.024464,-71.554652,"[{'label': 'display', 'lat': -33.0244636735144...",262,2571511.0,CL,...,Sushi Restaurant,Coffee Shop,Italian Restaurant,Bed & Breakfast,Hotel,Ice Cream Shop,Pizza Place,Diner,Park,Burger Joint
1,Purolivo,Gourmet Shop,Galería Somar,Loc. 6-9,-33.024226,-71.553209,"[{'label': 'display', 'lat': -33.0242261422981...",133,,CL,...,Sushi Restaurant,Coffee Shop,Italian Restaurant,Bed & Breakfast,Hotel,Ice Cream Shop,Pizza Place,Diner,Park,Burger Joint
2,Panzoni,Italian Restaurant,Paseo Cousiño 12,e/ Viana y Av. Valparaíso,-33.025731,-71.553494,"[{'label': 'display', 'lat': -33.0257310185469...",199,,CL,...,Sushi Restaurant,Coffee Shop,Italian Restaurant,Bed & Breakfast,Hotel,Ice Cream Shop,Pizza Place,Diner,Park,Burger Joint
3,Fuente de Soda Cevasco,Hot Dog Joint,Av. Valparaiso 700,,-33.024952,-71.552665,"[{'label': 'display', 'lat': -33.0249521757472...",86,,CL,...,Sushi Restaurant,Coffee Shop,Italian Restaurant,Bed & Breakfast,Hotel,Ice Cream Shop,Pizza Place,Diner,Park,Burger Joint
4,Déjà Vu,Latin American Restaurant,"Calle Viana 144, 2do. piso",Paseo Cousiño,-33.025785,-71.553563,"[{'label': 'display', 'lat': -33.0257852977384...",208,,CL,...,Sushi Restaurant,Coffee Shop,Italian Restaurant,Bed & Breakfast,Hotel,Ice Cream Shop,Pizza Place,Diner,Park,Burger Joint


In [246]:
vinaValpoMerged3['Cluster Labels']

0      0.0
1      0.0
2      0.0
3      0.0
4      0.0
      ... 
195    4.0
196    NaN
197    4.0
198    4.0
199    4.0
Name: Cluster Labels, Length: 200, dtype: float64

In [248]:
map_clusters = folium.Map(location=[latitudeValpoVina,longitudeValpoVina],zoom_start=11)
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat,lon,poi,cluster in zip(vinaValpoMerged3['lat'],vinaValpoMerged3['lng'],vinaValpoMerged3['name'],vinaValpoMerged3['Cluster Labels'].dropna()):
    label = folium.Popup(str(poi)+ ' Cluster ' + str(cluster), parse_html=True)
    cluster = int(cluster)
    folium.CircleMarker(
        [lat,lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

In [249]:
map_clusters = folium.Map(location=[latitudeValpoVina,longitudeValpoVina],zoom_start=11)
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat,lon,poi,cluster in zip(vinaValpoMerged3['lat'],vinaValpoMerged3['lng'],vinaValpoMerged3['name'],vinaValpoMerged3['Cluster Labels'].dropna()):
    label = folium.Popup(str(poi)+ ' Cluster ' + str(cluster), parse_html=True)
    cluster = int(cluster)
    folium.CircleMarker(
        [lat,lon],
        radius=15,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters