# The Battle of Neighborhoods  
### Content

#### 1. Introduction:

    1.1 Scenario and Background.

    1.2 Problem to be resolved.

    1.3 Interested Audience.
    

#### 2. Data Section:

    2.1 Data Required to resolve the problem.

    2.2 Data sources and manipulation.
    
    2.3 How the data will be manipulated.
    
    2.4 Data Mapping.
    

#### 3. Methodology:

    3.1 Steps taken and strategy followed.

    3.2 Data science methods, machine learning, mapping tools and exploratory data analysis.
    

#### 4. Results obtained:

    Discussion of the results and how they help to make a decision.
    

#### 5. Discussion about observations:

    Observations about the obtained data.
    

#### 6. Last conclusions:

    Decide taken and conclusion report.

# 1.	Introduction Section : 
### Discussion of the business problem and the audience who would be interested in this project.

### 1.1 Scenario and  Background

The purpose is to search for a house for rent in the Canadian city of Vancouver. This house must meet certain requirements regarding the location of the same, environment, places of interest, proximity to means of transport.
It is not a matter of making a comparison with another city in the world, but rather seeks to meet the minimum requirements of a target client.
Although this problem could be solved simply by making use of the professional services of a real estate agent, all the techniques and knowledge acquired in the course of IBM will be applied.



### 1.2  Problem to be resolved:   

As detailed in the previous section, you must search for a home in the city of Vancouver. The home must meet the following requirements:

- Must be in a neighborhood of the city with a family atmosphere.
- In the environment there must be parks to walk and play sports.
- It must be close to public transportation.
- There must be a sports offer in the vicinity.
- The area should be easily accessible by road.
- It must be located next to ski areas.
- The catering offer must be rich, coffee shops and restaurants must be in the immediate vicinity.

### 1.3 Interested Audience:

The audience interested in this project is a supposed target customer who must change his place of residence from a European capital, to the city of Vancouver. This is a manager of a multinational company with a high purchasing power and that has no economic limitations in terms of the rental price of the home in question since the cost of such rent will be borne by the company for which he works.
The requirements on the location of the home seek to find an environment similar to the current place of residence of the client.

# 2. Data Section:
### Description of the data and its sources that will be used to solve the problem

### 2.1 Data Required to resolve the problem

To make the choice of the home that the client is looking for in Vancouver, the following information is needed:

- Information about the neighborhoods of the city with their geographic location data.
- Information on the means of transport of the city with your location data.
- Data on the houses that are under rent in the city.
- Geographical data on the requirements of green leisure areas.
- Interesting information about the restoration offer of the city with its geographical coordinates of location.


In [2]:
import numpy as np
import time
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json
import requests
from pandas.io.json import json_normalize 
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
!conda install -c conda-forge folium=0.5.0 --yes
import folium
from folium import plugins
import matplotlib.cm as cm
import matplotlib.colors as colors
import seaborn as sns
from sklearn.cluster import KMeans
print('Done!')

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Done!


## 2.2 Data sources and manipulation.

Data on the neighborhoods and zip codes of the city of Vancouver have been obtained here:

https://www.geonames.org/postalcode-search.html?q=vancouver&country=CA&adminCode1=BC

This data has been easily and quickly manageable directly by copying from the page and manipulating in an Excel file. Since it is a very small size table, it has been decided to carry out in this way so as not to steal time from the other parts of the project.

The data on points of interest such as transportation, leisure centers, parks, restaurants and others, have been obtained directly from Foursquare.

The geolocation of the neighborhoods carried out directly on the project, allowed to observe at a glance, which neighborhoods of the city were more interesting for the target client.

Since in this case our client does not present economic problems for the rental of the house, because it is financed by the company for which he works, the data on rental housing in the city of Vancouver have been ignored at first and all efforts have been focused on deciding, within the district chosen by the geoposión, which would be the best neighborhood that fulfilled the objectives sought.


**This is the table we have obtained from the page  
https://www.geonames.org/postalcode-search.html?q=vancouver&country=CA&adminCode1=BC   
and that has been handled on an excel sheet to create a .csv extension file**

In [3]:
path_vcbr = 'code_vcbr.csv'

In [4]:
vcbr = pd.read_csv(path_vcbr)

In [5]:
vcbr.head()

Unnamed: 0,Code,Borough,Neighbourhood
0,V5S,Killarney,Killarney
1,V5K,North Hastings,North Hastings
2,V5L,North Grandview,Woodlands
3,V5P,SE Kensington,Victoria
4,V5R,South Renfrew,Collingwood


**We are going to assign coordinates to the data of the neighborhoods.**

In [6]:
!pip -q install geocoder
import geocoder

In [7]:
def g_latlng(arcgis_geocoder):

    lat_lng_coords = None
        
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Vancouver, British Columbia'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords

**We are going to store the coordinates in a df.**

In [9]:
postal_codes = vcbr['Code']    
coordinates = [g_latlng(postal_code) for postal_code in postal_codes.tolist()]

In [10]:
vcbr_loc = vcbr

# The obtained coordinates (latitude and longitude) are joined with the dataframe as shown
vcbr_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
vcbr_loc['Latitude'] = vcbr_coordinates['Latitude']
vcbr_loc['Longitude'] = vcbr_coordinates['Longitude']

**Now we analyze the integrity of the table that we just created.**

In [11]:
vcbr_loc.head(10)

Unnamed: 0,Code,Borough,Neighbourhood,Latitude,Longitude
0,V5S,Killarney,Killarney,49.215345,-123.041225
1,V5K,North Hastings,North Hastings,49.281665,-123.03998
2,V5L,North Grandview,Woodlands,49.2807,-123.066842
3,V5P,SE Kensington,Victoria,49.22337,-123.0671
4,V5R,South Renfrew,Collingwood,49.239335,-123.041105
5,V5T,East Mount Pleasant,East Mount Pleasant,49.26341,-123.091214
6,V5Z,East Fairview,South Cambie,49.247102,-123.12098
7,V6E,South West,South West End,49.28387,-123.128981
8,V6K,Central Kitsilano,Central Kitsilano,49.267105,-123.165282
9,V6L,North West,Arbutus Ridge,49.249915,-123.165854


## 2.3 How the data will be manipulated.

The data on neighborhoods we have already advanced in what way have been handled in the previous section. These have been copied directly from the page provided and have been quickly manipulated on an Excel sheet. Since it is totally clean data, it has only been necessary to order the columans and save the file with .csv extension to facilitate later reading on our notebook.
 
The data will be used as follows:
Use Foursquare and geopy data to assign the top 10 locations for all neighborhoods in the North Vancouver district as this is the designated place to locate the home and grouped into groups.

The foursquare and geopy data have been used to plot the location of the places of interest.

Using Geopy-distance and Nominatim, neighborhoods and places of interest have been geolocated.

In [12]:
print('The dataframe has {} boroughs and {} neighbourhoods.'.format(
        len(vcbr_loc['Borough'].unique()),
        vcbr_loc.shape[0]
    )
)

The dataframe has 30 boroughs and 52 neighbourhoods.


In [14]:
#001
way = 'Vancouver, Canada'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(way)
latitude = location.latitude
longitude = location.longitude
print('The coordinate are {}, {}.'.format(latitude, longitude))

The coordinate are 49.2608724, -123.1139529.


### 2.4 Data Mapping.

Several maps were created for the analysis of the project:

  - Map of the city of Vancouver with the districts.
  - Map of the district of North Vancouver with its neighborhoods.
  - Map with the distribution of points of interest.

In [15]:
#002
# create map of Vancouver using latitude and longitude values
map_vancouver = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(vcbr_loc['Latitude'], vcbr_loc['Longitude'], vcbr_loc['Borough'], vcbr_loc['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_vancouver)  
    
map_vancouver

# 3.  Methodology :
## 3.1 Steps taken and strategy followed.
**By direct appreciation of the neighborhoods of the city of Vancouver on the map, it is decided to perform the analysis on the district of North Vancouver as an objective.**

In [22]:
#003
north_data = vcbr_loc[vcbr_loc['Borough'] == 'North Vancouver'].reset_index(drop=True)
north_data

Unnamed: 0,Code,Borough,Neighbourhood,Latitude,Longitude
0,V7G,North Vancouver,Outer East,49.388046,-122.934285
1,V7H,North Vancouver,Inner East,49.316885,-122.990073
2,V7J,North Vancouver,East Central,49.332998,-123.01899
3,V7K,North Vancouver,North Central,49.34602,-123.039847
4,V7L,North Vancouver,South Central,49.318325,-123.056241
5,V7M,North Vancouver,Southwest Central,49.320777,-123.08262
6,V7N,North Vancouver,Northwest Central,49.34352,-123.073501
7,V7P,North Vancouver,Southwest,49.320565,-123.115307
8,V7R,North Vancouver,Northwest,49.369327,-123.1007
9,V7G,North Vancouver,Deep Cove,49.388046,-122.934285


## 3.2 Data science methods, machine learning, mapping tools and exploratory data analysis.

**Once the objective of the study has been determined, all geolocation, marking and grouping techniques are put into practice, which will be necessary to analyze the best options and make a decision based on the data obtained.
The procedure followed will be the same as that already concluded in the case study of the city of New York. Points of interest will be searched in the designated neighborhood and a grouping will be made by points of interest that will help us decide which steps to follow.**   

**Geolocate the district of North Vancouver with Nominatim.**

In [20]:
address = 'North Vancouver, Vancouver'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of North Vancouver are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of North Vancouver are 49.3207133, -123.0737831.


In [23]:
# create map of North Vancouver using latitude and longitude values
map_north = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(north_data['Latitude'], north_data['Longitude'], north_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_north)  
    
map_north

**It is time to register in the Foursquare service and for this we provide our access credentials.**

In [24]:
#006
CLIENT_ID = 'K34DTZJ2K2F4FDJXQCBGEEUUDEGC40YI3PZTDUFXDISH2W2Z' # your Foursquare ID
CLIENT_SECRET = 'NJOCRALDCBIIE5EMBP1MBYTEMA2FMGZ23MBPZAJK4TNTATOO' # your Foursquare Secret
VERSION = '20190712' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: K34DTZJ2K2F4FDJXQCBGEEUUDEGC40YI3PZTDUFXDISH2W2Z
CLIENT_SECRET:NJOCRALDCBIIE5EMBP1MBYTEMA2FMGZ23MBPZAJK4TNTATOO


**We set one of the neighborhoods in the neighborhood and geolocalize this.**

In [25]:
north_data.loc[14, 'Neighbourhood']

'Horseshoe Bay'

In [26]:
neighborhood_latitude = north_data.loc[14, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = north_data.loc[14, 'Longitude'] # neighborhood longitude value

neighborhood_name = north_data.loc[14, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Horseshoe Bay are 49.320565000000045, -123.11530663899998.


**Now, let's get the top 100 venues that are in Horseshoe Bay within a radius of 500 meters.**  
**First, let's create the GET request URL. Name your URL url.**

In [27]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url 

'https://api.foursquare.com/v2/venues/explore?&client_id=K34DTZJ2K2F4FDJXQCBGEEUUDEGC40YI3PZTDUFXDISH2W2Z&client_secret=NJOCRALDCBIIE5EMBP1MBYTEMA2FMGZ23MBPZAJK4TNTATOO&v=20190712&ll=49.320565000000045,-123.11530663899998&radius=500&limit=100'

In [28]:
#Send the GET request and examine the resutls
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d3043d992e7a9002c9080c1'},
 'response': {'headerLocation': 'Norgate',
  'headerFullLocation': 'Norgate, North Vancouver',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 16,
  'suggestedBounds': {'ne': {'lat': 49.32506500450005,
    'lng': -123.10841584180133},
   'sw': {'lat': 49.31606499550004, 'lng': -123.12219743619863}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4aac4ed0f964a520655d20e3',
       'name': 'The Edge Climbing Centre',
       'location': {'address': '# 2 - 1485 Welch Street',
        'lat': 49.31726811197159,
        'lng': -123.11454564228615,
        'labeledLatLngs': [{'label': 'display',
          'lat': 49.31726811197159,
          'lng': -123.11454564228615}],
        'distance': 371,
   

In [29]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

**We created a new data frame with the points of interest obtained in the GET of the API**

In [31]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,The Edge Climbing Centre,Gym,49.317268,-123.114546
1,Mooyah,Burger Joint,49.323332,-123.113961
2,La Taqueria Pinche Taco Shop,Taco Place,49.317522,-123.11197
3,The Tomahawk,Diner,49.322817,-123.11315
4,Sushi Man Japanese Restaurant,Japanese Restaurant,49.323781,-123.111095


**We check the number of venues returned by Foursquare.**

In [32]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

16 venues were returned by Foursquare.


#### Explore Neighborhoods in North Vancouver

In [33]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

**We list the venues according to the neighborhoods of North Vancouver**

In [34]:
north_venues = getNearbyVenues(names=north_data['Neighbourhood'],
                                   latitudes=north_data['Latitude'],
                                   longitudes=north_data['Longitude']
                                  )

Outer East
Inner East
East Central
North Central
South Central
Southwest Central
Northwest Central
Southwest
Northwest
Deep Cove
Lower Lonsdale
Lynn Valley
Dundarave Village
Edgemont Village
Horseshoe Bay
Ambleside


#### Let's check the size of the resulting dataframe

In [35]:
print(north_venues.shape)
north_venues.head()

(74, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Inner East,49.316885,-122.990073,Green Mountain Landscaping,49.316347,-122.986294,Construction & Landscaping
1,Inner East,49.316885,-122.990073,Mccartney Creek Park,49.320577,-122.993579,Baseball Field
2,North Central,49.34602,-123.039847,Endless Summer Landscapes,49.34843,-123.038549,Construction & Landscaping
3,North Central,49.34602,-123.039847,Mountain Market,49.341978,-123.037982,Convenience Store
4,South Central,49.318325,-123.056241,Filthy Cleaning,49.320605,-123.054664,Paper / Office Supplies Store


#### Let's check how many venues were returned for each neighborhood

In [36]:
north_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ambleside,4,4,4,4,4,4
Dundarave Village,16,16,16,16,16,16
Horseshoe Bay,16,16,16,16,16,16
Inner East,2,2,2,2,2,2
Lower Lonsdale,4,4,4,4,4,4
North Central,2,2,2,2,2,2
Northwest,7,7,7,7,7,7
Northwest Central,1,1,1,1,1,1
South Central,2,2,2,2,2,2
Southwest,16,16,16,16,16,16


#### Let's find out how many unique categories can be curated from all the returned venues

In [37]:
print('There are {} uniques categories.'.format(len(north_venues['Venue Category'].unique())))

There are 32 uniques categories.


#### Analyze Each Neighborhood

In [38]:
# one hot encoding
north_onehot = pd.get_dummies(north_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
north_onehot['Neighborhood'] = north_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [north_onehot.columns[-1]] + list(north_onehot.columns[:-1])
north_onehot = north_onehot[fixed_columns]

north_onehot.head()

Unnamed: 0,Neighborhood,Bank,Baseball Field,Burger Joint,Bus Stop,Coffee Shop,Construction & Landscaping,Convenience Store,Department Store,Diner,Discount Store,Greek Restaurant,Grocery Store,Gym,Japanese Restaurant,Liquor Store,Market,Martial Arts Dojo,Mexican Restaurant,Mobile Phone Shop,Mountain,Other Great Outdoors,Outdoor Supply Store,Paper / Office Supplies Store,Park,Pharmacy,Playground,Shopping Mall,Ski Chairlift,Sporting Goods Shop,Taco Place,Toy / Game Store,Trail
0,Inner East,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Inner East,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,North Central,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,North Central,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,South Central,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0


#### And let's examine the new dataframe size.

In [39]:
north_onehot.shape

(74, 33)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [40]:
north_grouped = north_onehot.groupby('Neighborhood').mean().reset_index()
north_grouped

Unnamed: 0,Neighborhood,Bank,Baseball Field,Burger Joint,Bus Stop,Coffee Shop,Construction & Landscaping,Convenience Store,Department Store,Diner,Discount Store,Greek Restaurant,Grocery Store,Gym,Japanese Restaurant,Liquor Store,Market,Martial Arts Dojo,Mexican Restaurant,Mobile Phone Shop,Mountain,Other Great Outdoors,Outdoor Supply Store,Paper / Office Supplies Store,Park,Pharmacy,Playground,Shopping Mall,Ski Chairlift,Sporting Goods Shop,Taco Place,Toy / Game Store,Trail
0,Ambleside,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Dundarave Village,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0625,0.0625,0.0
2,Horseshoe Bay,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0625,0.0625,0.0
3,Inner East,0.0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Lower Lonsdale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
5,North Central,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Northwest,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.428571
7,Northwest Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,South Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Southwest,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0625,0.0625,0.0


In [41]:
north_grouped.shape

(11, 33)

**Let's now look for the top 5 of venues.**

In [46]:
num_top_venues = 10

for hood in north_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = north_grouped[north_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Ambleside----
                 venue  freq
0                 Bank  0.25
1   Mexican Restaurant  0.25
2     Department Store  0.25
3             Pharmacy  0.25
4     Toy / Game Store  0.00
5           Taco Place  0.00
6  Sporting Goods Shop  0.00
7        Ski Chairlift  0.00
8        Shopping Mall  0.00
9           Playground  0.00


----Dundarave Village----
                  venue  freq
0         Grocery Store  0.06
1        Discount Store  0.06
2  Other Great Outdoors  0.06
3     Mobile Phone Shop  0.06
4         Shopping Mall  0.06
5          Liquor Store  0.06
6   Japanese Restaurant  0.06
7                   Gym  0.06
8      Greek Restaurant  0.06
9  Outdoor Supply Store  0.06


----Horseshoe Bay----
                  venue  freq
0         Grocery Store  0.06
1        Discount Store  0.06
2  Other Great Outdoors  0.06
3     Mobile Phone Shop  0.06
4         Shopping Mall  0.06
5          Liquor Store  0.06
6   Japanese Restaurant  0.06
7                   Gym  0.06
8      Gree

#### Let's put that into a pandas dataframe  
#### First, let's write a function to sort the venues in descending order.

In [47]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [48]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = north_grouped['Neighborhood']

for ind in np.arange(north_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(north_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ambleside,Bank,Department Store,Mexican Restaurant,Pharmacy,Gym,Grocery Store,Greek Restaurant,Discount Store,Diner,Convenience Store
1,Dundarave Village,Other Great Outdoors,Outdoor Supply Store,Grocery Store,Gym,Japanese Restaurant,Liquor Store,Toy / Game Store,Diner,Mobile Phone Shop,Discount Store
2,Horseshoe Bay,Other Great Outdoors,Outdoor Supply Store,Grocery Store,Gym,Japanese Restaurant,Liquor Store,Toy / Game Store,Diner,Mobile Phone Shop,Discount Store
3,Inner East,Baseball Field,Construction & Landscaping,Trail,Toy / Game Store,Burger Joint,Bus Stop,Coffee Shop,Convenience Store,Department Store,Diner
4,Lower Lonsdale,Park,Playground,Market,Burger Joint,Bus Stop,Coffee Shop,Baseball Field,Construction & Landscaping,Liquor Store,Convenience Store


#### Cluster Neighborhoods  
#### Run k-means to cluster the neighborhood into 5 clusters.

In [49]:
# set number of clusters
kclusters = 5

north_grouped_clustering = north_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(north_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 3, 2, 0, 1, 2, 4, 1])

#### Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [50]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

north_merged = north_data

north_merged = north_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

north_merged.sample()

Unnamed: 0,Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,V7N,North Vancouver,Northwest Central,49.34352,-123.073501,2.0,Park,Trail,Liquor Store,Baseball Field,Burger Joint,Bus Stop,Coffee Shop,Construction & Landscaping,Convenience Store,Department Store


In [68]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(north_merged['Latitude'], north_merged['Longitude'], north_merged['Neighbourhood'], north_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**We examine the clusters below**

In [57]:
north_merged.loc[north_merged['Cluster Labels'] == 0, north_merged.columns[[1] + list(range(5, north_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,North Vancouver,0.0,Construction & Landscaping,Convenience Store,Trail,Toy / Game Store,Baseball Field,Burger Joint,Bus Stop,Coffee Shop,Department Store,Diner


In [58]:
north_merged.loc[north_merged['Cluster Labels'] == 1, north_merged.columns[[1] + list(range(5, north_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,North Vancouver,1.0,Other Great Outdoors,Outdoor Supply Store,Grocery Store,Gym,Japanese Restaurant,Liquor Store,Toy / Game Store,Diner,Mobile Phone Shop,Discount Store
8,North Vancouver,1.0,Trail,Ski Chairlift,Bus Stop,Coffee Shop,Mountain,Diner,Gym,Grocery Store,Greek Restaurant,Discount Store
12,North Vancouver,1.0,Other Great Outdoors,Outdoor Supply Store,Grocery Store,Gym,Japanese Restaurant,Liquor Store,Toy / Game Store,Diner,Mobile Phone Shop,Discount Store
14,North Vancouver,1.0,Other Great Outdoors,Outdoor Supply Store,Grocery Store,Gym,Japanese Restaurant,Liquor Store,Toy / Game Store,Diner,Mobile Phone Shop,Discount Store
15,North Vancouver,1.0,Bank,Department Store,Mexican Restaurant,Pharmacy,Gym,Grocery Store,Greek Restaurant,Discount Store,Diner,Convenience Store


In [59]:
north_merged.loc[north_merged['Cluster Labels'] == 2, north_merged.columns[[1] + list(range(5, north_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,North Vancouver,2.0,Park,Playground,Market,Burger Joint,Bus Stop,Coffee Shop,Baseball Field,Construction & Landscaping,Liquor Store,Convenience Store
6,North Vancouver,2.0,Park,Trail,Liquor Store,Baseball Field,Burger Joint,Bus Stop,Coffee Shop,Construction & Landscaping,Convenience Store,Department Store
10,North Vancouver,2.0,Park,Playground,Market,Burger Joint,Bus Stop,Coffee Shop,Baseball Field,Construction & Landscaping,Liquor Store,Convenience Store


In [60]:
north_merged.loc[north_merged['Cluster Labels'] == 3, north_merged.columns[[1] + list(range(5, north_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North Vancouver,3.0,Baseball Field,Construction & Landscaping,Trail,Toy / Game Store,Burger Joint,Bus Stop,Coffee Shop,Convenience Store,Department Store,Diner


In [61]:
north_merged.loc[north_merged['Cluster Labels'] == 4, north_merged.columns[[1] + list(range(5, north_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,North Vancouver,4.0,Paper / Office Supplies Store,Martial Arts Dojo,Trail,Gym,Grocery Store,Greek Restaurant,Discount Store,Diner,Department Store,Liquor Store


# 4. Results obtained:
### Discussion of the results and how they help to make a decision.

**The grouping according to interests is a tool that will allow to make a decision based on the interests of the project.**

# 5. Discussion about observations:
### Observations about the obtained data.

**In view of the results obtained in the grouping according to the groups of venues, it is appreciated that the second one is the most interesting according to the preferences of the client, where he looks for a wide range of services ranging from leisure to transport.**

# 6. Last conclusions:
### Decide taken and conclusion report.

**As already explained in the previous section, the client will be provided with the information and the report will show the best options that have been presented to us thanks to the geolocation of venues and the grouping of them according to the interests of the project.**