# Business Problem

The following analysis solves the problem of finding similar areas in the city of Madrid (Spain) attending to similarity in the venues contained in each area of the city. 

This analysis will be useful for a real estate business that wants to advise potential customers which are the areas in Madrid  match better the customers preferences in terms of services, businesses or any other venue that it is possible to find in each of these areas. 

An example of why this is useful for the customer is the following: the customer likes a certain area, but the prices are too expensive. So the client asks the real estate which other areas are the best alternatives (i.e. the most similar). We will use this work to find the closest alternative areas for the given model provided by the client (his/her favorite, but expensive, area to live). 

# Location Data used to perform the analysis

Our first step will be to divide the city of Madrid in areas distributed across the whole territory. For that purpose, we will use all the Postal Codes that exist in the city. 

A complete list of the postal code can be found here:
http://distritopostal.es/madrid/madrid
    
There are 55 in total, which will be writeen directly in the program. 

Once we have the postal codes ready to be used, the Foursquare will be used to retrieve the following information:
- Coordinates of each postal code.
- Venues that are located on each postal code.
- The coordinates of each individual venue. 

We will use a radius of 1000 meters to retrieve the data around each postal code area. 

For **example**, we will get the information for the postal code **28003** in Madrid, which is near to the center. Later, we will retrieve the latitude and longitude associated to that postal code, and after that we will find which venues can be found in a radius of 1000 meters around the postal code location given by Foursquare. We will repeat the procedure for all the Postal Codes.

# Methodology used to find similarities among Postal Code areas

First, we will identifiy the top 10 most common venues for each postal code area. 

Secondly, we will group all the postal code areas in clusters attending to similarities in the venues present in those areas. 

We will use K-Means algorithm to make such groups. All the 55 Postal Codes will be reduced to 5 group (or clusters). 

Finally, we will show which areas are similar using the clasification given by K-Means: all areas included in the same cluster will be considered as similar. The customer will just find his/her favorite area, given by the postal code, in the same cluster as other areas. Those areas will be the customers alternatives to find similar neighborhoods for their future home.  

# Import Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0          conda-forge
    geopy:           

## Postal Codes in Madrid into a Data Frame

In [2]:
postcode_list=np.dtype(object)  

postcode_list = ['28001','28002','28003','28004','28005','28006','28007','28008','28009','28010','28011','28012','28013','28014',
'28015','28016','28017','28018','28019','28020','28021','28022','28023','28024','28025','28026','28027',
'28028','28029','28030','28031','28032','28033','28034','28035','28036','28037','28038','28039','28040',
'28041','28042','28043','28044','28045','28046','28047','28048','28049','28050','28051','28052','28053',
'28054','28055','28000']

postcode_list = pd.DataFrame(postcode_list)

## Foursquare API Credentials

In [45]:
# The code was removed by Watson Studio for sharing.

## Get coordinates for each Postal Code and merge them in a single Data Frame

In [4]:
address_vector = postcode_list + ' España'

geolocator = Nominatim(user_agent="foursquare_agent")

lat_lng = []

for n in address_vector[0]:
    location = geolocator.geocode(n)
    latitude = location.latitude
    longitude = location.longitude
    lat_lng.append([
        latitude,
        longitude])

lat_lng = pd.DataFrame(lat_lng)

lat_lng.columns = ['postCode_lat','postCode_lng']
postcode_list.columns = ['postCode']

postcode_list['postCode_lat'] = lat_lng['postCode_lat']
postcode_list['postCode_lng'] = lat_lng['postCode_lng']

In [5]:
postcode_list.head()

Unnamed: 0,postCode,postCode_lat,postCode_lng
0,28001,40.424066,-3.686055
1,28002,40.444363,-3.671973
2,28003,40.439374,-3.699367
3,28004,40.428463,-3.703477
4,28005,40.408363,-3.70858


## Taking a preliminary look at the distribution of all postal codes location over the territory in Madrid City

In [6]:
address = 'Madrid, MA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Madrid are {}, {}.'.format(latitude, longitude))

# create map of Manhattan using latitude and longitude values
map_madrid = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(postcode_list['postCode_lat'], postcode_list['postCode_lng'], postcode_list['postCode']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_madrid)  
    
map_madrid

The geograpical coordinate of Madrid are 40.4167047, -3.7035825.


## Define functions to extract information abut the venues for each postal code

In [7]:
def getNearbyVenues(postCodes, latitudes, longitudes, radius=1000, LIMIT = 100):
    
    venues_list=[]
    for postCode, lat, lng in zip(postCodes, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            postCode, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['postCode', 
                  'postCode Latitude', 
                  'postCode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [8]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Get Venues in Madrid for each Postal Code


In [9]:
madrid_venues = getNearbyVenues(postCodes=postcode_list['postCode'],
                                   latitudes=postcode_list['postCode_lat'],
                                   longitudes=postcode_list['postCode_lng']
                                  )

## Mark each venue in the map for a general overview

In [10]:
# add markers to map
for lat, lng, label in zip(madrid_venues['Venue Latitude'], madrid_venues['Venue Longitude'], madrid_venues['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_madrid)  

# add markers to map
for lat, lng, label in zip(postcode_list['postCode_lat'], postcode_list['postCode_lng'], postcode_list['postCode']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_madrid)  
        
map_madrid

## Counting venues by category

In [11]:
madrid_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,postCode,postCode Latitude,postCode Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Accessories Store,3,3,3,3,3,3
Airport,1,1,1,1,1,1
Airport Terminal,3,3,3,3,3,3
American Restaurant,13,13,13,13,13,13
Aquarium,1,1,1,1,1,1
Arcade,1,1,1,1,1,1
Arepa Restaurant,2,2,2,2,2,2
Argentinian Restaurant,16,16,16,16,16,16
Art Gallery,22,22,22,22,22,22
Art Museum,16,16,16,16,16,16


In [12]:
print('There are {} uniques categories.'.format(len(madrid_venues['Venue Category'].unique())))

There are 268 uniques categories.


## Analyze Each Post Code area

In [13]:
# one hot encoding
madrid_onehot = pd.get_dummies(madrid_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
madrid_onehot['postCode'] = madrid_venues['postCode'] 

# move neighborhood column to the first column
fixed_columns = [madrid_onehot.columns[-1]] + list(madrid_onehot.columns[:-1])
madrid_onehot = madrid_onehot[fixed_columns]

madrid_onehot.head()

Unnamed: 0,postCode,Accessories Store,Airport,Airport Terminal,American Restaurant,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Art Studio,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bakery,Bar,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Trail,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Cafeteria,Café,Camera Store,Candy Store,Casino,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Clothing Store,Cocktail Bar,Coffee Shop,College Residence Hall,College Stadium,Comedy Club,Comfort Food Restaurant,Comic Shop,Community College,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Shop,Donut Shop,Drive-in Theater,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Embassy / Consulate,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Latin American Restaurant,Library,Light Rail Station,Liquor Store,Lounge,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music School,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Other Event,Other Nightlife,Outdoors & Recreation,Paella Restaurant,Palace,Park,Pastry Shop,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Store,Pharmacy,Photography Lab,Pie Shop,Pizza Place,Platform,Playground,Plaza,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Print Shop,Pub,Public Art,Racetrack,Ramen Restaurant,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Restaurant,Road,Rock Club,Roof Deck,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tapas Restaurant,Tattoo Parlor,Taverna,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Track,Track Stadium,Trade School,Trail,Train Station,Travel Agency,Travel Lounge,Udon Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Warehouse Store,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,28001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,28001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,28001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,28001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,28001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [14]:
madrid_onehot.shape

(3479, 269)

### Group rows by postCode and by taking the mean of the frequency of occurrence of each category

In [15]:
madrid_grouped = madrid_onehot.groupby('postCode').mean().reset_index()

In [16]:
madrid_grouped.shape

(56, 269)

In [17]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Top 10 Venues by postal code

In [18]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['postCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
postCodes_venues_sorted = pd.DataFrame(columns=columns)
postCodes_venues_sorted['postCode'] = madrid_grouped['postCode']

for ind in np.arange(madrid_grouped.shape[0]):
    postCodes_venues_sorted.iloc[ind, 1:] = return_most_common_venues(madrid_grouped.iloc[ind, :], num_top_venues)

postCodes_venues_sorted.head()

Unnamed: 0,postCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,28000,Spanish Restaurant,Restaurant,Tapas Restaurant,Bar,Japanese Restaurant,Café,Plaza,Italian Restaurant,Brewery,Hotel
1,28001,Restaurant,Spanish Restaurant,Italian Restaurant,Boutique,Tapas Restaurant,Japanese Restaurant,Hotel,Burger Joint,Bakery,Clothing Store
2,28002,Spanish Restaurant,Restaurant,Hotel,Tapas Restaurant,Supermarket,Theme Restaurant,Café,Mediterranean Restaurant,Middle Eastern Restaurant,Sushi Restaurant
3,28003,Spanish Restaurant,Tapas Restaurant,Restaurant,Café,Japanese Restaurant,Bar,Plaza,Italian Restaurant,Burger Joint,Bakery
4,28004,Restaurant,Tapas Restaurant,Plaza,Bookstore,Café,Bar,Spanish Restaurant,Hotel,Ice Cream Shop,Coffee Shop


## Cluster Neighborhoods

In [19]:
# set number of clusters
kclusters = 5

madrid_grouped_clustering = madrid_grouped.drop('postCode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(madrid_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 2, 2, 2, 1, 2, 2], dtype=int32)

## Clusters Data Frame

In [20]:
# add clustering labels
postCodes_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

madrid_merged = postcode_list

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
madrid_merged = madrid_merged.join(postCodes_venues_sorted.set_index('postCode'), on='postCode')

madrid_merged.head() # check the last columns!

Unnamed: 0,postCode,postCode_lat,postCode_lng,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,28001,40.424066,-3.686055,2,Restaurant,Spanish Restaurant,Italian Restaurant,Boutique,Tapas Restaurant,Japanese Restaurant,Hotel,Burger Joint,Bakery,Clothing Store
1,28002,40.444363,-3.671973,2,Spanish Restaurant,Restaurant,Hotel,Tapas Restaurant,Supermarket,Theme Restaurant,Café,Mediterranean Restaurant,Middle Eastern Restaurant,Sushi Restaurant
2,28003,40.439374,-3.699367,2,Spanish Restaurant,Tapas Restaurant,Restaurant,Café,Japanese Restaurant,Bar,Plaza,Italian Restaurant,Burger Joint,Bakery
3,28004,40.428463,-3.703477,2,Restaurant,Tapas Restaurant,Plaza,Bookstore,Café,Bar,Spanish Restaurant,Hotel,Ice Cream Shop,Coffee Shop
4,28005,40.408363,-3.70858,2,Tapas Restaurant,Spanish Restaurant,Plaza,Bar,Coffee Shop,Art Gallery,Hotel,Pizza Place,Restaurant,Market


## Map Clusters

In [21]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(madrid_merged['postCode_lat'], madrid_merged['postCode_lng'], madrid_merged['postCode'], madrid_merged['Cluster Labels'].fillna(0).astype(int)):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

#### Cluster 1

In [22]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 0, madrid_merged.columns[[0] + list(range(4, madrid_merged.shape[1]))]]

Unnamed: 0,postCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
48,28049,Department Store,Pub,Diner,Café,Food Court,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Stand


#### Cluster 2

In [23]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 1, madrid_merged.columns[[0] + list(range(4, madrid_merged.shape[1]))]]

Unnamed: 0,postCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,28007,Spanish Restaurant,Museum,Hotel,Bakery,Peruvian Restaurant,Gym,Bar,Indian Restaurant,Italian Restaurant,Korean Restaurant
10,28011,Spanish Restaurant,Park,Grocery Store,Beer Garden,Pizza Place,Restaurant,Brewery,Pool,Bar,Bakery
16,28017,Spanish Restaurant,Bar,Bakery,Grocery Store,Restaurant,Tapas Restaurant,Clothing Store,Park,Soccer Field,Brewery
17,28018,Pizza Place,Park,Supermarket,Bar,Tapas Restaurant,Grocery Store,Coffee Shop,Restaurant,Spanish Restaurant,Music Venue
18,28019,Spanish Restaurant,Bar,Park,Tapas Restaurant,Pizza Place,Beer Garden,Plaza,Restaurant,Coffee Shop,Playground
20,28021,Sandwich Place,Café,Grocery Store,Bakery,Athletics & Sports,Train Station,Mexican Restaurant,Plaza,Sports Bar,Food Service
22,28023,Deli / Bodega,Café,Spanish Restaurant,Gym,Diner,Pizza Place,Pharmacy,Restaurant,Paella Restaurant,Candy Store
23,28024,Italian Restaurant,Bar,Pizza Place,Grocery Store,Diner,Ice Cream Shop,Park,Tapas Restaurant,Pub,Shopping Mall
24,28025,Fast Food Restaurant,Tapas Restaurant,Grocery Store,Stadium,Breakfast Spot,Bar,Brewery,Pizza Place,Café,Coffee Shop
25,28026,Spanish Restaurant,Grocery Store,Bar,Seafood Restaurant,Coffee Shop,Restaurant,Bakery,Farmers Market,Playground,Park


#### Cluster 3

In [24]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 2, madrid_merged.columns[[0] + list(range(4, madrid_merged.shape[1]))]]

Unnamed: 0,postCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,28001,Restaurant,Spanish Restaurant,Italian Restaurant,Boutique,Tapas Restaurant,Japanese Restaurant,Hotel,Burger Joint,Bakery,Clothing Store
1,28002,Spanish Restaurant,Restaurant,Hotel,Tapas Restaurant,Supermarket,Theme Restaurant,Café,Mediterranean Restaurant,Middle Eastern Restaurant,Sushi Restaurant
2,28003,Spanish Restaurant,Tapas Restaurant,Restaurant,Café,Japanese Restaurant,Bar,Plaza,Italian Restaurant,Burger Joint,Bakery
3,28004,Restaurant,Tapas Restaurant,Plaza,Bookstore,Café,Bar,Spanish Restaurant,Hotel,Ice Cream Shop,Coffee Shop
4,28005,Tapas Restaurant,Spanish Restaurant,Plaza,Bar,Coffee Shop,Art Gallery,Hotel,Pizza Place,Restaurant,Market
5,28006,Spanish Restaurant,Restaurant,Tapas Restaurant,Coffee Shop,Hotel,Boutique,Mediterranean Restaurant,Indian Restaurant,Japanese Restaurant,Jewelry Store
7,28008,Hotel,Spanish Restaurant,Tapas Restaurant,Plaza,Park,Cocktail Bar,Garden,Café,Coffee Shop,Mediterranean Restaurant
8,28009,Spanish Restaurant,Restaurant,Tapas Restaurant,Italian Restaurant,Bakery,Ice Cream Shop,Dessert Shop,Garden,Park,Burger Joint
9,28010,Tapas Restaurant,Spanish Restaurant,Plaza,Restaurant,Bar,Italian Restaurant,Hotel,Japanese Restaurant,Coffee Shop,Ice Cream Shop
11,28012,Plaza,Restaurant,Café,Coffee Shop,Hotel,Spanish Restaurant,Bar,Art Gallery,Tapas Restaurant,Theater


#### Cluster 4

In [25]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 3, madrid_merged.columns[[0] + list(range(4, madrid_merged.shape[1]))]]

Unnamed: 0,postCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
54,28055,Restaurant,Airport Terminal,Diner,Park,Gym / Fitness Center,Asian Restaurant,Zoo,Food & Drink Shop,Fish Market,Flea Market


#### Cluster 5

In [26]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 4, madrid_merged.columns[[0] + list(range(4, madrid_merged.shape[1]))]]

Unnamed: 0,postCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,28022,Sports Club,Trail,Soccer Stadium,Beer Bar,Park,Mediterranean Restaurant,Metro Station,Sporting Goods Shop,Plaza,Diner
29,28030,Plaza,Bar,Metro Station,Café,Park,Coffee Shop,Bakery,Concert Hall,Supermarket,Sushi Restaurant


## Conclusion

We have classified all the Postal Codes in Madrid in 5 clusters according to their similarities in venues and services provided in those areas. We can now make an analysis to provide the customers with alternatives of possible areas that are as close ass possible to one favorite area chose by the customers. 

For example, if one customer desires to live in the area which has the Postal Code **28003** but it is too expensive, he/she could choose the most similar ones. In particular, **28003** is located in the **cluster 3**, which also has the following alternatives:

In [41]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 2, madrid_merged.columns[[0] + list(range(1, 1))]]

Unnamed: 0,postCode
0,28001
1,28002
2,28003
3,28004
4,28005
5,28006
7,28008
8,28009
9,28010
11,28012
