# Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods (Week 1)


Now that you have been equipped with the skills and the tools to use location data to explore a geographical location, over the course of two weeks, you will have the opportunity to be as creative as you want and come up with an idea to leverage the Foursquare location data to explore or compare neighborhoods or cities of your choice or to come up with a problem that you can use the Foursquare location data to solve. If you cannot think of an idea or a problem, here are some ideas to get you started:

1) In Module 3, we explored New York City and the city of Toronto and segmented and clustered their neighborhoods. Both cities are very diverse and are the financial capitals of their respective countries. One interesting idea would be to compare the neighborhoods of the two cities and determine how similar or dissimilar they are. Is New York City more like Toronto or Paris or some other multicultural city? I will leave it to you to refine this idea.

2) In a city of your choice, if someone is looking to open a restaurant, where would you recommend that they open it? Similarly, if a contractor is trying to start their own business, where would you recommend that they setup their office?

These are just a couple of many ideas and problems that can be solved using location data in addition to other datasets. No matter what you decide to do, make sure to provide sufficient justification of why you think what you want to do or solve is important and why would a client or a group of people be interested in your project.

**REVIEW CRITERIA**

This capstone project will be graded by your peers. This capstone project is worth 70% of your total grade. The project will be completed over the course of 2 weeks. Week 1 submissions will be worth 30% whereas week 2 submissions will be worth 40% of your total grade.

For this week, you will required to submit the following:

1) A description of the problem and a discussion of the background. (15 marks)
2) description of the data and how it will be used to solve the problem. (15 marks)

For the second week, the final deliverables of the project will be:

1) A link to your Notebook on your Github repository, showing your code. (15 marks)
2) A full report consisting of all of the following components (15 marks):

- Introduction where you discuss the business problem and who would be interested in this project.
- Data where you describe the data that will be used to solve the problem and the source of the data.
- Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
- Results section where you discuss the results.
- Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
- Conclusion section where you conclude the report.
3. Your choice of a presentation or blogpost. (10 marks)


----


Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.

This submission will eventually become your Introduction/Business Problem section in your final report. So I recommend that you push the report (having your Introduction/Business Problem section only for now) to your Github repository and submit a link to it.

-----------


# Introduction & Business Problem :

**Problem Background:**

Having worked for Bureau Van Dijk for over 2 years now, we are encouraged to think about tools that could help improve customer experience in our products. Since we are specialized in banking and insurances, one thing that is currently missing from our databases is a map showing the location of banks and bank branches in Brussels. This is a POC that could then be applied for larger regions covered in our products. 
This study puts a light on one factor that influence the locations of bank branches in establishing a bank branch network, location! We will be analyzing the locations of bank branches on the basis of their geographical characteristics and image. 

This type of analysis could be of interest to our customers as there are different studies/analyses done on branch locations by banking institutions. 

A study done in 2019 shows that while banking preferences have shifted to emphasize desktop and mobile convenience, 30% of respondents still prefer to go to the bank themselves, and despite the ability to perform many banking operations via mobile or on a desktop, the physical proximity of the bank still holds importance to banking customers. This shows that this issue is of interest to us. 
Finally this project is not only intersting to banks for its analytical value but to other users too as it provides an interesting visual tool that will improve their overall experince.

**Problem Description:**

The banks’ branch strategy takes two forms. One of them is establishing a new branch and the other is relocating an existing branch. But in both cases, the question of location emerges as the most important issue. A branch is directly tied to the growth of bank, as it is the frontline of sales. Thus, when choosing a location for a branch, each bank takes various factors into account, such as the level of income, branch functions, competition, land value, growth potential, and the number of financial institutions. Good location means attracting more costumers.

Segmenting and Clustering banking Neighborhoods in Brussels. We will convert addresses into their equivalent latitude and longitude values. We will use Foursquare API to explore neighborhoods in Brussels. You will use the explore function to get all banks in each neighborhood, and then use this feature to group the neighborhoods into clusters. You will use the k-means clustering algorithm to complete this task. Finally, we will use the Folium library to visualize the banking neighborhoods in Brussels and their emerging clusters.

This will allow banks to study where they could install a new branch, either outside of the biggest clusters in which there is a higher demand and less competition, or inside the most popular ones in which they will have a broader customer range.

**Banking population in Brussels:**

A quick search on Orbis gives us the following data:
n° of active banks in Brussels:469
-> of which 293 are branches


**Target Audience:**

The Target audience for this project is Bureau van dijk and it's customers.


**Success Criteria:**

The success criteria of the project will be to show a good POC to Bureau van Dijk that will make them approve the project and a further exploration of this issue.


----

Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

This submission will eventually become your Data section in your final report. So I recommend that you push the report (having your Data section) to your Github repository and submit a link to it.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1.  <a href="#item1">Download and Explore Dataset</a>


2.  <a href="#item2">Explore Neighborhoods in New York City</a>


3.  <a href="#item3">Analyze Each Neighborhood</a>


4.  <a href="#item4">Cluster Neighborhoods</a>


5.  <a href="#item5">Examine Clusters</a>  
    </font>
    </div>


Before we get the data and start exploring it, let's download all the dependencies that we will need.


In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

usage: conda [-h] [-V] command ...
conda: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


Libraries imported.


# 1. Download and Explore Dataset

Brussels has a total of 19 communes (or 22 if you divide Brussels Centre in 4 different ones) . In order to segement the communes and explore them, we will essentially need a dataset that contains them as well as the the latitude and logitude coordinates of each.

I wasn"t able to find this dataset for free on the web so had to create it on my own using a mix of information found on the web and on bvd products, here is the link to the dataset: 

"-------"


**Load and explore the data**

Next, let's load the data and let's tale a quick look at our dataframe

In [3]:
df = pd.read_excel (r'c://Users/PristerM/Desktop/Wijken_Quartiers.xls')
df

Unnamed: 0,Numéro Quartier,Nom,Communes,Latitude,Longitude,Unnamed: 5,Bruxelles,Latitude.1,50.8466,Longitude .1,4.3528
0,12,CUREGHEM VETERINAIRE,Anderlecht,50.8333,4.3,,Anderlecht,Latitude,50.8333,Longitude,4.3
1,52,VEEWEYDE - AURORE,Anderlecht,50.8333,4.3,,Auderghem,Latitude,50.8167,Longitude,4.4333
2,53,BIZET - ROUE - CERIA,Anderlecht,50.8333,4.3,,Berchem-Sainte-Agathe,Latitude,50.8667,Longitude,4.2833
3,54,VOGELENZANG - ERASME,Anderlecht,50.8333,4.3,,Etterbeek,Latitude,50.8358,Longitude,4.388
4,55,NEERPEDE,Anderlecht,50.8333,4.3,,Evere,Latitude,50.8667,Longitude,4.4
5,56,BON AIR,Anderlecht,50.8333,4.3,,Forest,Latitude,50.8,Longitude,4.3167
6,57,SCHERDEMAEL,Anderlecht,50.8333,4.3,,Ganshoren,Latitude,50.8667,Longitude,4.3
7,58,ANDERLECHT CENTRE - WAYEZ,Anderlecht,50.8333,4.3,,Ixelles,Latitude,50.8333,Longitude,4.3667
8,59,SCHEUT,Anderlecht,50.8333,4.3,,Jette,Latitude,50.8667,Longitude,4.3333
9,60,BUFFON,Anderlecht,50.8333,4.3,,Koekelberg,Latitude,50.8667,Longitude,4.3333


#### Let's check the size of the resulting dataframe


In [7]:
print(df.shape)

(145, 3)


Define Foursquare Credentials and Version

In [4]:
CLIENT_ID = 'ZW31EMD1O0ZK0NNSV30GYDLCDSI2HDVXMDMEGB3ECTRYCMJP' # your Foursquare ID
CLIENT_SECRET = 'A44GWGMHYV2PMWGN3T0MYQHB0KJJUIXESHZBGBLGRGGE50KT' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ZW31EMD1O0ZK0NNSV30GYDLCDSI2HDVXMDMEGB3ECTRYCMJP
CLIENT_SECRET:A44GWGMHYV2PMWGN3T0MYQHB0KJJUIXESHZBGBLGRGGE50KT


In [8]:
address = 'Brussels'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

GeocoderUnavailable: HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): Max retries exceeded with url: /search?q=Brussels&format=json&limit=1 (Caused by SSLError(SSLError("bad handshake: SysCallError(10054, 'WSAECONNRESET')",),))

https://api.foursquare.com/v2/venues/search?client_id=CLIENT_ID&client_secret=CLIENT_SECRET&ll=LATITUDE,LONGITUDE&v=VERSION&query=QUERY&radius=RADIUS&limit=LIMIT

In [5]:
search_query = 'Bank'
radius = 500
print(search_query + ' .... OK!')

Bank .... OK!


In [6]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

NameError: name 'latitude' is not defined