## Project Neighborhood
This project involves scraping a webpage to get list of coordinates of Barcelona barios and use those coordinates to match the neighborhoods in Foursquare API. 

In [112]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
import json
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # to convert address in latitude and longitude value
# for plotting     
import matplotlib.cm as cm
import matplotlib.colors as colors
# for K-means clustering
from sklearn.cluster import KMeans
# for mapping with folium
!conda install -c conda-forge folium=0.5.0 --yes
import folium
from sklearn import preprocessing
print('Libraries imported')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.16.0                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Libraries imported


The above cell takes care of the necessary libraries that need to be imported. The next step is to download the link and store the URL into the object wikipedia_link

Now, since the necessary libraries are imported we are going to import the data of Barcelona districts and their respective coordinates. The list of postal codes and their 
respective coordinates have been obtained courtesy of the website: https://www.aggdata.com/free/spain-postal-codes in a csv format. However this data consists of all the places in deifferent states in Spain, we will only use the ones for Barcelona. We will use this data as a dataframe 
for our analysis. We will use the following line of code to import it as dataframe from the data downloaded from internet and uploaded to the watson server

In [113]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Postal Code,Place Name,State,County,City,Latitude,Longitude
0,1001,Vitoria-Gasteiz,Pais Vasco,PV,Álava,42.85,-2.6667
1,1002,Vitoria-Gasteiz,Pais Vasco,PV,Álava,42.85,-2.6667
2,1003,Vitoria-Gasteiz,Pais Vasco,PV,Álava,42.85,-2.6667
3,1004,Vitoria-Gasteiz,Pais Vasco,PV,Álava,42.85,-2.6667
4,1005,Vitoria-Gasteiz,Pais Vasco,PV,Álava,42.8435,-2.6748


Now, the dataframe is filtered for only for the City of Barcelona 

In [114]:
barna_df = df_data_1[df_data_1['City']=='Barcelona']
barna_df = barna_df.reset_index(drop=True)
barna_df.head()

Unnamed: 0,Postal Code,Place Name,State,County,City,Latitude,Longitude
0,8001,Barcelona,Cataluna,CT,Barcelona,41.3818,2.1685
1,8002,Barcelona,Cataluna,CT,Barcelona,41.3838,2.1744
2,8003,Barcelona,Cataluna,CT,Barcelona,41.3862,2.1799
3,8004,Barcelona,Cataluna,CT,Barcelona,41.3765,2.1669
4,8005,Barcelona,Cataluna,CT,Barcelona,41.3944,2.1874


Lets see how the postal codes look like in a map of Barcelona

In [115]:
address    = 'Barcelona, Spain'
geolocator = Nominatim()
location   = geolocator.geocode(address)
latitude   = location.latitude
longitude  = location.longitude

print('The coordinates of Barcelona are {}, {}.'.format(latitude, longitude))

map_barna= folium.Map(location=[latitude,longitude], zoom_start=12)

#adding markers to the Map
for lat, long, label in zip(barna_df['Latitude'], barna_df['Longitude'], barna_df['Place Name']):
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
    [lat,long],
    radius=5,
    popup=label,
    colour='blue',
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_barna)

map_barna




The coordinates of Barcelona are 41.3828939, 2.1774322.


As we can see from the map that Barcelona postal codes also include the suburbs and small towns which fall under the bigger city of Barcelona. However we are only interested in the main city of Barcelona. So will filter only those areas of the city. Further we will rename the place name ( which is same for all the 'barios' in the above dataframe ) to the actual 'barios' names matching with a different dataset

In [116]:
barna_df =barna_df.loc[barna_df['Place Name'].isin(['Barcelona'])]
barna_df =barna_df.reset_index(drop=True)

In [117]:
barna_df.head()

Unnamed: 0,Postal Code,Place Name,State,County,City,Latitude,Longitude
0,8001,Barcelona,Cataluna,CT,Barcelona,41.3818,2.1685
1,8002,Barcelona,Cataluna,CT,Barcelona,41.3838,2.1744
2,8003,Barcelona,Cataluna,CT,Barcelona,41.3862,2.1799
3,8004,Barcelona,Cataluna,CT,Barcelona,41.3765,2.1669
4,8005,Barcelona,Cataluna,CT,Barcelona,41.3944,2.1874


In [118]:
barna_df.drop(['State','County','City'],axis=1,inplace=True)

In [119]:
barna_df.shape

(45, 4)

Manually, we will insert the names of Place Names with their actual names using google map reference. ( It is done manually because there is no concise data available for all the districts poste code wise and also since the number of post codes are not many, it is feasible )

In [None]:
barna_df['Place Name'][0]='El Raval'
barna_df['Place Name'][1]='Gothic Quarter' 
barna_df['Place Name'][2]='Ciutat Vella, La Barcelonata'
barna_df['Place Name'][3]='El Poble Sec' 
barna_df['Place Name'][4]='El Poble Nou'
barna_df['Place Name'][5]='Sant Gervasi, Hospital Plato' 
barna_df['Place Name'][6]='City Center, Passeig de Gracia'
barna_df['Place Name'][7]='Diagonal' 
barna_df['Place Name'][8]='Girona, Jardins de Jaume Perich, Carrer d\'Arago'
barna_df['Place Name'][9]='placa de Tetuan, Urquinaona' 
barna_df['Place Name'][10]='Urgell,Jardin de Cesar Martinell'
barna_df['Place Name'][11]='Vila de Gracia' 
barna_df['Place Name'][12]='Fort Pienc, Sagrada Familia'
barna_df['Place Name'][13]='Sants, Taragona, Placa del Centre' 
barna_df['Place Name'][14]='Sant Antoni, Rocafort'
barna_df['Place Name'][15]='Porta,Virrei Amat, La prosperitat ' 
barna_df['Place Name'][16]='Vallvidrera'
barna_df['Place Name'][17]='Sant Marti, La Verneda, Glories, Arc de Triomf' 
barna_df['Place Name'][18]='El Besos I El Maresme, Paolo Alto Market'
barna_df['Place Name'][19]='La Verneda I La Pau, Sant Marti de Provencals' 
barna_df['Place Name'][20]='Sant Gervasi-Galvany, La Bonanova, Turo Parc'
barna_df['Place Name'][21]='Sant Gervasi-La Bonanova, Av. Tibidabo,  El Putxet' 
barna_df['Place Name'][22]='Vallcarca I Els Penitents, El parc del Turo del Putxet, El Coll'
barna_df['Place Name'][23]='Gracia, Can Baro, Alfons X' 
barna_df['Place Name'][24]='El Baix Guinardo, El Camp D\'En Grassot I Gracia  Nova'
barna_df['Place Name'][25]='El Camp de L\'Arpa del Clot' 
barna_df['Place Name'][26]='La Sagrega'
barna_df['Place Name'][27]='Les Corts, La Maternitat I Sant Ramon, Sants-Montjuic' 
barna_df['Place Name'][28]='La Nova Esquerra de L\'Exiample, Entenca'
barna_df['Place Name'][29]='Sant Adreu, Baro de Viver' 
barna_df['Place Name'][30]='Vilapicina, Can Peguera, El Turo de La Peira'
barna_df['Place Name'][31]='El Carmel, Horta, La Font D\'En Fargues' 
barna_df['Place Name'][32]='La Trinitat Nova, Trinitat Vella, Torre Baro, Vallbona'
barna_df['Place Name'][33]='Saria, Pedralbes, Ciutat Universitaria' 
barna_df['Place Name'][34]='Horta, Sant Genis Dels Agudells, La Teixonera'
barna_df['Place Name'][35]='Hospital Clinic, La Fira' 
barna_df['Place Name'][36]='Verdaguer, Passeig de Sant Joan'
barna_df['Place Name'][37]='Parc de Montjuic, La Marina de Port, La Marina del Prat Vermell' 
barna_df['Place Name'][38]='Port de Barcelona'
barna_df['Place Name'][39]='Zona Franca' 
barna_df['Place Name'][40]='El Guinardo'
barna_df['Place Name'][41]='Les Roquetes'


In [121]:
barna_df.drop(barna_df.index[42], inplace=True)


In [122]:
barna_df.drop(barna_df.index[42], inplace=True)


In [123]:
barna_df.drop(barna_df.index[42], inplace=True)


In [124]:
barna_df = barna_df.rename(columns={"Place Name": "Neighborhood"})
barna_df.head()

Unnamed: 0,Postal Code,Neighborhood,Latitude,Longitude
0,8001,El Raval,41.3818,2.1685
1,8002,Gothic Quarter,41.3838,2.1744
2,8003,"Ciutat Vella, La Barcelonata",41.3862,2.1799
3,8004,El Poble Sec,41.3765,2.1669
4,8005,El Poble Nou,41.3944,2.1874


Login to Foursquare API and get the requests

In [125]:
# The code was removed by Watson Studio for sharing.

Lets check a data from Foursquare request

In [126]:
#creating url and requesiting the API data
neighborhood_lng = barna_df.loc[0, 'Longitude']
neighborhood_lat = barna_df.loc[0, 'Latitude']
radius = 500
Query   = 'coffee'
#create request
url='https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    API_VERSION,
    neighborhood_lat,
    neighborhood_lng,
    radius,
    LIMIT,
    Query)
results = requests.get(url).json()


Now, Lets get all the venues 

In [127]:
def getNearbyvenues( names, latitudes, longitudes, radius = 500 ):
    
    venues_list=[]
    for name, lat, lng, in zip( names, latitudes, longitudes ):
        print(name)
        #create request
        url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            API_VERSION,
            lat,
            lng,
            radius,
            LIMIT)
        
        #make get request
        results=requests.get(url).json()["response"]['groups'][0]['items']
      
        #returning only the relevant information
        venues_list.append([(name,lat,lng,v['venue']['name'],v['venue']['location']['lat'],v['venue']['location']['lng'],
                             v['venue']['categories'][0]['name'], v['venue']['id'] ) for v in results])
        
        nearby_venues= pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns=['Neighbourhood',
                               'Neighbourhood latitude',
                               'Neighbourhood longitude',
                               'Venue',
                               'Venue Latitude',
                               'Venue Longitude',
                               'Venue category',
                               'Id']
    return(nearby_venues)

In [128]:
barcelona_venues=getNearbyvenues(names=barna_df['Neighborhood'], latitudes=barna_df['Latitude'],longitudes=barna_df['Longitude'])

El Raval
Gothic Quarter
Ciutat Vella, La Barcelonata
El Poble Sec
El Poble Nou
Sant Gervasi, Hospital Plato
City Center, Passeig de Gracia
Diagonal
Girona, Jardins de Jaume Perich, Carrer d'Arago
placa de Tetuan, Urquinaona
Urgell,Jardin de Cesar Martinell
Vila de Gracia
Fort Pienc, Sagrada Familia
Sants, Taragona, Placa del Centre
Sant Antoni, Rocafort
Porta,Virrei Amat, La prosperitat 
Vallvidrera
Sant Marti, La Verneda, Glories, Arc de Triomf
El Besos I El Maresme, Paolo Alto Market
La Verneda I La Pau, Sant Marti de Provencals
Sant Gervasi-Galvany, La Bonanova, Turo Parc
Sant Gervasi-La Bonanova, Av. Tibidabo,  El Putxet
Vallcarca I Els Penitents, El parc del Turo del Putxet, El Coll
Gracia, Can Baro, Alfons X
El Baix Guinardo, El Camp D'En Grassot I Gracia  Nova
El Camp de L'Arpa del Clot
La Sagrega
Les Corts, La Maternitat I Sant Ramon, Sants-Montjuic
La Nova Esquerra de L'Exiample, Entenca
Sant Adreu, Baro de Viver
Vilapicina, Can Peguera, El Turo de La Peira
El Carmel, Horta, L

In [129]:
barcelona_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood latitude,Neighbourhood longitude,Venue,Venue Latitude,Venue Longitude,Venue category,Id
0,El Raval,41.3818,2.1685,La Central del Raval,41.383586,2.168949,Bookstore,4b71cd74f964a5204b5d2de3
1,El Raval,41.3818,2.1685,Chivuo's,41.382961,2.169948,Sandwich Place,567873b9498e9f35c38e192e
2,El Raval,41.3818,2.1685,Llop,41.381849,2.169511,Restaurant,59de559766fc6558b0956589
3,El Raval,41.3818,2.1685,Caravelle,41.382337,2.168818,Restaurant,5040f5e5e4b06be18fa8af73
4,El Raval,41.3818,2.1685,A Tu Bola,41.380096,2.169054,Tapas Restaurant,52ffc95e498e5f219673b9d1


In [130]:
barcelona_venues.groupby('Neighbourhood').count().head() #counting the number of venues in each neighborhood

Unnamed: 0_level_0,Neighbourhood latitude,Neighbourhood longitude,Venue,Venue Latitude,Venue Longitude,Venue category,Id
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"City Center, Passeig de Gracia",100,100,100,100,100,100,100
"Ciutat Vella, La Barcelonata",100,100,100,100,100,100,100
Diagonal,100,100,100,100,100,100,100
"El Baix Guinardo, El Camp D'En Grassot I Gracia Nova",100,100,100,100,100,100,100
"El Besos I El Maresme, Paolo Alto Market",57,57,57,57,57,57,57


Next we get the results for the number of ratings for each venue ( which will give us the faint idea of most visited places ) and the cost ( whether expensive or cheap ) 

In [131]:
# lets get those data of expense and number of ratings
#results = []
#for idt in barcelona_venues['Id']:
#    url='https://api.foursquare.com/v2/venues/'+idt+'?&client_id={}&client_secret={}&v={}&ll={},{}'.format(
#        CLIENT_ID,
#        CLIENT_SECRET,
#        API_VERSION,
#        neighborhood_lat,
#        neighborhood_lng)
#    try:
#        results.append(requests.get(url).json()["response"]['venue']['attributes']['groups'][0]['items'][0]['priceTier'])
#    except:
#        results.append('Na')


In [132]:
#analysing neighborhoods
barcelona_onehot = pd.get_dummies(barcelona_venues[['Venue category']], prefix = '', prefix_sep='')
barcelona_onehot['Neighbourhood']=barcelona_venues['Neighbourhood']
fixed_columns = [barcelona_onehot.columns[-1]]+list(barcelona_onehot.columns[:-1])
barcelona_onehot= barcelona_onehot[fixed_columns]
barcelona_onehot.shape
#grouping by neighbourhood
barcelona_grouped=barcelona_onehot.groupby('Neighbourhood').mean().reset_index()
barcelona_grouped

Unnamed: 0,Neighbourhood,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Turkish Restaurant,Vacation Rental,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,"City Center, Passeig de Gracia",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0
1,"Ciutat Vella, La Barcelonata",0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.02,...,0.0,0.0,0.03,0.0,0.0,0.0,0.04,0.01,0.0,0.0
2,Diagonal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0
3,"El Baix Guinardo, El Camp D'En Grassot I Graci...",0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01
4,"El Besos I El Maresme, Paolo Alto Market",0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,...,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,El Camp de L'Arpa del Clot,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01
6,"El Carmel, Horta, La Font D'En Fargues",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,El Guinardo,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,...,0.017857,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0
8,El Poble Nou,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,El Poble Sec,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0


In [133]:
def return_most_common_venues(row, num_top_venues):
    row_categories=row.iloc[1:]
    row_categories_sorted=row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [134]:
#printing most common venues
num_top_venues=10
for hood in barcelona_grouped['Neighbourhood']:
    print('____'+hood+'____')
    temp=barcelona_grouped[barcelona_grouped['Neighbourhood']==hood].T.reset_index()
    temp.columns=['venue','freq']
    temp=temp.iloc[1:]
    temp['freq']=temp['freq'].astype(float)
    temp=temp.round({'freq':2})
    print(temp.sort_values('freq',ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
#sorting in descending order
num_top_venues=10
indicators=['st','nd','rd']
columns=['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append("{}{} Most Common Venue".format(ind+1,indicators[ind]))
    except:
        columns.append("{}th Most Common Venue".format(ind+1))
#new dataframe
neighbourhoods_venuessorted = pd.DataFrame(columns=columns)
neighbourhoods_venuessorted['Neighbourhood']=barcelona_grouped['Neighbourhood']
for ind in np.arange(barcelona_grouped.shape[0]):
    neighbourhoods_venuessorted.iloc[ind,1:] = return_most_common_venues(barcelona_grouped.iloc[ind, :], num_top_venues)
neighbourhoods_venuessorted

____City Center, Passeig de Gracia____
                      venue  freq
0                     Hotel  0.16
1            Clothing Store  0.06
2  Mediterranean Restaurant  0.05
3                Restaurant  0.04
4                    Hostel  0.04
5          Tapas Restaurant  0.04
6       Sporting Goods Shop  0.03
7             Women's Store  0.03
8              Cocktail Bar  0.03
9        Spanish Restaurant  0.03


____Ciutat Vella, La Barcelonata____
                venue  freq
0    Tapas Restaurant  0.11
1                 Bar  0.07
2        Cocktail Bar  0.06
3            Wine Bar  0.04
4  Italian Restaurant  0.04
5               Hotel  0.03
6                Café  0.03
7  Spanish Restaurant  0.03
8         Coffee Shop  0.03
9         Pizza Place  0.03


____Diagonal____
                      venue  freq
0                     Hotel  0.18
1                  Boutique  0.10
2  Mediterranean Restaurant  0.06
3        Spanish Restaurant  0.05
4          Tapas Restaurant  0.04
5       Japanese 

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"City Center, Passeig de Gracia",Hotel,Clothing Store,Mediterranean Restaurant,Restaurant,Hostel,Tapas Restaurant,Cocktail Bar,Café,Bookstore,Spanish Restaurant
1,"Ciutat Vella, La Barcelonata",Tapas Restaurant,Bar,Cocktail Bar,Wine Bar,Italian Restaurant,Pizza Place,Dessert Shop,Spanish Restaurant,Vegetarian / Vegan Restaurant,Mediterranean Restaurant
2,Diagonal,Hotel,Boutique,Mediterranean Restaurant,Spanish Restaurant,Japanese Restaurant,Tapas Restaurant,Hostel,Restaurant,Women's Store,Bakery
3,"El Baix Guinardo, El Camp D'En Grassot I Graci...",Italian Restaurant,Restaurant,Coffee Shop,Plaza,Burger Joint,Hotel,Deli / Bodega,Bar,Latin American Restaurant,Café
4,"El Besos I El Maresme, Paolo Alto Market",Mediterranean Restaurant,Pizza Place,Restaurant,Spanish Restaurant,Italian Restaurant,Bar,Bakery,Indian Restaurant,Food & Drink Shop,Café
5,El Camp de L'Arpa del Clot,Café,Hotel,Spanish Restaurant,Restaurant,Grocery Store,Mediterranean Restaurant,Mexican Restaurant,Tapas Restaurant,Coffee Shop,Plaza
6,"El Carmel, Horta, La Font D'En Fargues",Spanish Restaurant,Hotel,Sporting Goods Shop,Soccer Stadium,Mediterranean Restaurant,Soccer Field,Bakery,Hockey Arena,Cocktail Bar,Coffee Shop
7,El Guinardo,Bakery,Hotel,Restaurant,Italian Restaurant,Gastropub,Mediterranean Restaurant,Grocery Store,Supermarket,Spanish Restaurant,Gym
8,El Poble Nou,Spanish Restaurant,Hostel,Coffee Shop,Rock Club,Café,Hotel,Music Venue,Burger Joint,Tapas Restaurant,Bistro
9,El Poble Sec,Tapas Restaurant,Cocktail Bar,Bar,Theater,Spanish Restaurant,Mediterranean Restaurant,Café,Hotel,Restaurant,Beer Bar


In [135]:
# Kmeans clustering
Kclusters = 10
barcelona_grouped_clustering= barcelona_grouped.drop('Neighbourhood',1)
#run k means
kmeans=KMeans(n_clusters=Kclusters, random_state=0).fit(barcelona_grouped_clustering)
kmeans.labels_[0:10]


array([1, 8, 1, 7, 7, 7, 7, 7, 7, 8], dtype=int32)

In [136]:
barcelona_merged=barna_df

barcelona_merged['Cluster Labels']=kmeans.labels_
neighbourhoods_venuessorted.shape
barcelona_merged=barcelona_merged.join(neighbourhoods_venuessorted.set_index('Neighbourhood'), on='Neighborhood')
barcelona_merged

Unnamed: 0,Postal Code,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,8001,El Raval,41.3818,2.1685,1,Hotel,Tapas Restaurant,Spanish Restaurant,Bar,Cocktail Bar,Bookstore,Vegetarian / Vegan Restaurant,Coffee Shop,Italian Restaurant,Café
1,8002,Gothic Quarter,41.3838,2.1744,8,Plaza,Hotel,Tapas Restaurant,Ice Cream Shop,Spanish Restaurant,Dessert Shop,Café,Japanese Restaurant,Gift Shop,Coffee Shop
2,8003,"Ciutat Vella, La Barcelonata",41.3862,2.1799,1,Tapas Restaurant,Bar,Cocktail Bar,Wine Bar,Italian Restaurant,Pizza Place,Dessert Shop,Spanish Restaurant,Vegetarian / Vegan Restaurant,Mediterranean Restaurant
3,8004,El Poble Sec,41.3765,2.1669,7,Tapas Restaurant,Cocktail Bar,Bar,Theater,Spanish Restaurant,Mediterranean Restaurant,Café,Hotel,Restaurant,Beer Bar
4,8005,El Poble Nou,41.3944,2.1874,7,Spanish Restaurant,Hostel,Coffee Shop,Rock Club,Café,Hotel,Music Venue,Burger Joint,Tapas Restaurant,Bistro
5,8006,"Sant Gervasi, Hospital Plato",41.3975,2.1535,7,Restaurant,Furniture / Home Store,Cocktail Bar,Japanese Restaurant,Mediterranean Restaurant,Tapas Restaurant,Spanish Restaurant,Hotel,Café,Ice Cream Shop
6,8007,"City Center, Passeig de Gracia",41.39,2.1682,7,Hotel,Clothing Store,Mediterranean Restaurant,Restaurant,Hostel,Tapas Restaurant,Cocktail Bar,Café,Bookstore,Spanish Restaurant
7,8008,Diagonal,41.3936,2.1595,7,Hotel,Boutique,Mediterranean Restaurant,Spanish Restaurant,Japanese Restaurant,Tapas Restaurant,Hostel,Restaurant,Women's Store,Bakery
8,8009,"Girona, Jardins de Jaume Perich, Carrer d'Arago",41.3897,2.166,7,Hotel,Clothing Store,Bookstore,Restaurant,Mediterranean Restaurant,Tapas Restaurant,Women's Store,Seafood Restaurant,Cocktail Bar,Bakery
9,8010,"placa de Tetuan, Urquinaona",41.3902,2.1733,8,Hotel,Café,Restaurant,Hostel,Coffee Shop,Spanish Restaurant,Tapas Restaurant,Bed & Breakfast,Vegetarian / Vegan Restaurant,Supermarket


In [137]:
# visulaize clusters
map_clusters=folium.Map(location=[latitude,longitude],zoom_start=12)
x = np.arange(Kclusters)
ys=[i+x+(i*x)**2 for i in range(Kclusters)]
colors_array=cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
#add markers
markers_colors=[]
for lat, lon, poi, cluster in zip(barcelona_merged['Latitude'],barcelona_merged['Longitude'], barcelona_merged['Neighborhood'], barcelona_merged['Cluster Labels']):
    label=folium.Popup(str(poi)+' Cluster '+str(cluster),parse_html=True)
    folium.CircleMarker(
        [lat,lon],
        radiesu=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters

Now we will input the prices of rent per square meters for each district and then cluster the neighborhoods to see which neighborhoods are most similar. The data over rent prices are from the last quarter of 2017 and have been obtained from Open Data BCN  

In [138]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Dte.,Barris,1r Trimestre,2n Trimestre,3r Trimestre,4rt Trimestre,Acumulat/1r Trimestre,Acumulat/2n Trimestre,Acumulat/3r Trimestre,Acumulat/4rt Trimestre
0,BARCELONA,,13,,,,,,,
1,1,1. el Raval,135,,,,,,,
2,1,2. el Barri Gòtic,141,,,,,,,
3,1,3. la Barceloneta,195,,,,,,,
4,1,"4. Sant Pere, Santa Caterina i la Ribera",15,,,,,,,


In [139]:
df_data1 = df_data_2[df_data_2.columns[1:3]]
df_data3 = df_data1.rename(columns={'1r Trimestre': 'Price per sq. m'})
df_data3.head()

Unnamed: 0,Barris,Price per sq. m
0,,13
1,1. el Raval,135
2,2. el Barri Gòtic,141
3,3. la Barceloneta,195
4,"4. Sant Pere, Santa Caterina i la Ribera",15


In [140]:
df_data3.drop(df_data3.index[0], inplace=True)

In [141]:
df_data4 = df_data3[df_data3['Price per sq. m'].str.contains("nd") == False] # droppinf rows with NaN or Nd values 
# next step is to select only those rows which partially match the rows of the main barcelona Table
# cleaning the data
df_data4.loc[:,'Barris']= [x.split('.')[1] for x in df_data4['Barris']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item_labels[indexer[info_axis]]] = value


In [None]:
# adding the datasets manually
df5 = barcelona_merged
df5['Rent per sq m'] = 1
df5['Rent per sq m'][1]  = 19.5
df5['Rent per sq m'][10] = 13.5
df5['Rent per sq m'][13] = 14.1
df5['Rent per sq m'][11] = 13
df5['Rent per sq m'][17] = 13
df5['Rent per sq m'][0]  = 19.8
df5['Rent per sq m'][3]  = 12.7
df5['Rent per sq m'][2]  = 12.3
df5['Rent per sq m'][4]  = 9.5
df5['Rent per sq m'][5]  = 12.9
df5['Rent per sq m'][6]  = 11.2
df5['Rent per sq m'][7]  = 19.5
df5['Rent per sq m'][8]  = 13.4
df5['Rent per sq m'][9]  = 13
df5['Rent per sq m'][11] = 13.5
df5['Rent per sq m'][12] = 12.8
df5['Rent per sq m'][14] = 11.9
df5['Rent per sq m'][15] = 11.2
df5['Rent per sq m'][16] = 13.2
df5['Rent per sq m'][18] = 9.8
df5['Rent per sq m'][19] = 10
df5['Rent per sq m'][20] = 13.2
df5['Rent per sq m'][21] = 13.6
df5['Rent per sq m'][22] = 9.4
df5['Rent per sq m'][23] = 9.9
df5['Rent per sq m'][24] = 11.5
df5['Rent per sq m'][25] = 11.6
df5['Rent per sq m'][26] = 12.4
df5['Rent per sq m'][27] = 12.4
df5['Rent per sq m'][28] = 14.5
df5['Rent per sq m'][29] = 14.6
df5['Rent per sq m'][30] = 15
df5['Rent per sq m'][31] = 13.2
df5['Rent per sq m'][32] = 12.4
df5['Rent per sq m'][33] = 16
df5['Rent per sq m'][34] = 14
df5['Rent per sq m'][35] = 14.1
df5['Rent per sq m'][36] = 10.1
df5['Rent per sq m'][37] = 9.8
df5['Rent per sq m'][38] = 14.9
df5['Rent per sq m'][39] = 11.2
df5['Rent per sq m'][40] = 11.1
df5['Rent per sq m'][41] = 13.7


In [149]:
df5.head()

Unnamed: 0,Postal Code,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Rent per sq m
0,8001,El Raval,41.3818,2.1685,1,Hotel,Tapas Restaurant,Spanish Restaurant,Bar,Cocktail Bar,Bookstore,Vegetarian / Vegan Restaurant,Coffee Shop,Italian Restaurant,Café,19
1,8002,Gothic Quarter,41.3838,2.1744,8,Plaza,Hotel,Tapas Restaurant,Ice Cream Shop,Spanish Restaurant,Dessert Shop,Café,Japanese Restaurant,Gift Shop,Coffee Shop,19
2,8003,"Ciutat Vella, La Barcelonata",41.3862,2.1799,1,Tapas Restaurant,Bar,Cocktail Bar,Wine Bar,Italian Restaurant,Pizza Place,Dessert Shop,Spanish Restaurant,Vegetarian / Vegan Restaurant,Mediterranean Restaurant,12
3,8004,El Poble Sec,41.3765,2.1669,7,Tapas Restaurant,Cocktail Bar,Bar,Theater,Spanish Restaurant,Mediterranean Restaurant,Café,Hotel,Restaurant,Beer Bar,12
4,8005,El Poble Nou,41.3944,2.1874,7,Spanish Restaurant,Hostel,Coffee Shop,Rock Club,Café,Hotel,Music Venue,Burger Joint,Tapas Restaurant,Bistro,9


In [150]:
# normalizing the rent column
def standardize( df, label):
    """
    standardizes a series with name ``label'' within the pd.DataFrame
    ``df''.
    """
    df = df.copy(deep=True)
    series = df.loc[:, label]
    avg = series.mean()
    stdv = series.std()
    series_standardized = (series - avg)/ stdv
    return series_standardized
Out = standardize(df5, 'Rent per sq m')
new=[]
for i in df5['Rent per sq m']:
    new.append(Out[i])

In [151]:
# final result , Hence we can show the similarity or dissimilarity between different neighborhoods alongwith Rent prices, whcih will recommend the person where to get a house
df5['Rent per sq m'] = new

In [155]:
df5.head()

Unnamed: 0,Postal Code,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Rent per sq m
0,8001,El Raval,41.3818,2.1685,1,Hotel,Tapas Restaurant,Spanish Restaurant,Bar,Cocktail Bar,Bookstore,Vegetarian / Vegan Restaurant,Coffee Shop,Italian Restaurant,Café,-1.036621
1,8002,Gothic Quarter,41.3838,2.1744,8,Plaza,Hotel,Tapas Restaurant,Ice Cream Shop,Spanish Restaurant,Dessert Shop,Café,Japanese Restaurant,Gift Shop,Coffee Shop,-1.036621
2,8003,"Ciutat Vella, La Barcelonata",41.3862,2.1799,1,Tapas Restaurant,Bar,Cocktail Bar,Wine Bar,Italian Restaurant,Pizza Place,Dessert Shop,Spanish Restaurant,Vegetarian / Vegan Restaurant,Mediterranean Restaurant,-0.23036
3,8004,El Poble Sec,41.3765,2.1669,7,Tapas Restaurant,Cocktail Bar,Bar,Theater,Spanish Restaurant,Mediterranean Restaurant,Café,Hotel,Restaurant,Beer Bar,-0.23036
4,8005,El Poble Nou,41.3944,2.1874,7,Spanish Restaurant,Hostel,Coffee Shop,Rock Club,Café,Hotel,Music Venue,Burger Joint,Tapas Restaurant,Bistro,0.17277
