# Capstone Project - The Battle of Neighborhoods (Week 1)
## Issue Statement:
Cities are very diverse and are the financial capitals of their respective countries. To be close to the markets and customers, companies are put into challenging situation to make the best choice. Where shall the future regional headquarters be to fit best into a companies purpose. For this decision, multiple factors such as taxation, labour and skill abundancy, political stability among others are essential decision variables.

For this Capstone assignment it is assumed that we advice a company in their decision process of choosing a location for their Europe headquarter. The company has pre-selected three cities in which a local office already exists:

* [London (GBP)](https://en.wikipedia.org/wiki/London)
* [Zurich (CHE)](https://en.wikipedia.org/wiki/Z%C3%BCrich)
* [Barcelona (SPA)](https://en.wikipedia.org/wiki/Barcelona)

Each of these cities are attractive places to live and work. All are very close to international airports and are appreciated by Expats. The organisation wants to settle his new headquarter based on the following decision matrix:

<table><thead>
<tr><th>Nb.</th><th>Attribute</th><th>Details</th><th>Weight</th></tr>
</thead>
<tbody>
<tr><td>1.</td><td>Labour climate</td><td>labour cost, productivity, skills availability</td><td>6</td></tr>
<tr><td>2.</td><td>Political environment</td><td>Effectivness of goverment, policy consistency, corruption</td><td>7</td></tr>
<tr><td>3.</td><td>Access to Universities</td><td>Nb. of Univerities and repution, ranking, size</td><td>5</td></tr>
<tr><td>4.</td><td>Quality of life</td><td>standard of living, recreation, health</td><td>4</td></tr>
<tr><td>5.</td><td>Labour and skill aboundancy</td><td>Skill levle, unemployment rate, labour unions, wage rate</td><td>1</td></tr>
<tr><td>6.</td><td>Cost of labour</td><td>Productivity, exchange rate, </td><td>2</td></tr>
<tr><td>7.</td><td>Tax structure</td><td>Corp. tax rate, social sec. cost, rgulatory barriers </td><td>3</td></tr>
</tbody>
</table>

A special focus is given on "Gen-Y" suitability as this will become both the predominant customer base as well as the main source of future employees and leaders.

GenY or aka "Millennials" is defined as the demographic cohort following with the early 1980s as starting birth years and the mid-1990s to early 2000s as ending birth years [Wikipedia](https://en.wikipedia.org/wiki/Millennials), have the following characteristics:
* Millennials are tech-savvy as grew up with technology, and they rely on it. 
* Millennials are family-centric and are willing to trade high pay for fewer billable hours, flexible schedules, and a better work/life balance. 
* Millennials are confident, ambitious, achievement-oriented but have high expectations towards their employers and aren't afraid to question authority. Generation Y wants meaningful work and a solid learning curve.
* Millennials are team-oriented. They want to be included and involved. 
* Generation Y craves attention, feedback and guidance. 
* Generation Y is prone to "job-hopping" as they're always looking for something new and better. 

An interesting summary is provided by [Goldman Sachs](https://www.goldmansachs.com/insights/archive/millennials/). However, any new location should be "cool" with this Generation's life style and preferences' as to offer a natural cause to remain and limited need for location moves.

## Success criteria and audience:
The analysis is completed successfully, if based on the obtained data set, a recommendation of a) the city and b) the borough in the city can be made towards the board of directors. The board shall be enabled to take the decision with confidence and move on to initiate the project "The New EUR head headquarter".

The decision takers are not interested in the actual data gathering process but very much in the validity and roboustness of the findings to determin cost and prospects of the set up.

## Methodology:
1. Define and agree on the decision matrix with the board, i.e. ensure the weights and priorites reflect the customers need.
2. Obtain the data, i.e. check what data can support the corresponding attribute in the matrix
3. Clean and harmonice the data, from the various sources
4. Prepare the data set and analys it for the three cities
5. Cluster the data by determining the optimal cluster number for the pre-selected city and allow a pre-selection of the recommended city
6. Based on the pre-selection, run a comparison of the boroughs to match the companies needs best
7. Conclude the findings in a recommendation

## Data Acquisition and Cleaning
### Data Sources:
For the assignment a wide variety of data sources are used of well-known and regarded sources. The focus on the data sources is on applying three different methodologies to obtain the data
a) API direct connection
b) csv, xlsx files download
c) Web crawling, i.e. obtaining information directly from html-code

All data sources are merged into one data frame based on their geographical identity

**Sources used:**
* [Foursquare API](https://foursquare.com/developers/)
* [The World Bank](https://databank.worldbank.org/source/worldwide-governance-indicators#)
* [OECD](https://stats.oecd.org/)
* [Renting London](https://www.rentbarometer.com/london/all-prices/by-name.html)
* [Renting Zurich](https://www.hev-schweiz.ch/vermieten/statistiken/mietpreise/durchschnittliche-mietpreise/)
* [Buying Barcelona](https://www.statista.com/statistics/765380/average-price-per-square-meter-of-houses-in-barcelona-by-district/)
* [Statista](https://www.statista.com/)
* [International transparency report](https://www.transparency.org/en/cpi/2019/results/table)

**Postal codes and boroughs:**
* [London](https://www.doogal.co.uk/london_postcodes.php)
* [Barcelona](https://en.wikipedia.org/wiki/List_of_postal_codes_in_Spain#08000%E2%80%9308999:_Barcelona)
* [Zurich](https://www.geonames.org/postal-codes/CH/ZH/zurich.html)

### Data Cleaning:
The data shall be inspected, cleaned and verified to guarantee:
* high validity, i.e. the degree to which the data conform to defined business rules or constraints
* high accuracy, i.e. the degree to which the data is close to the true values
* completeness, i.e. the degree to which all required data is known
* consistency, i.e. the degree to which the data is consistent
* high uniformity, i.e. the degree to which the data is specified using the same unit of measure 





## Capstone Project - The Battle of Neighborhoods (Week 1)
### Acquiring Data sets:

In [9]:
#Download London/UK postal codes along with Longitude and Latitude information
import pandas as pd
plz_lon = pd.read_csv('https://www.doogal.co.uk/UKPostcodesCSV.ashx?area=London')
plz_lon.head()

Unnamed: 0,Postcode,In Use?,Latitude,Longitude,Easting,Northing,Grid Ref,County,District,Ward,...,User Type,Last updated,Nearest station,Distance to station,Postcode area,Postcode district,Police force,Water company,Plus Code,Average Income
0,BR1 1AA,Yes,51.401546,0.015415,540291,168873,TQ402688,Greater London,Bromley,Bromley Town,...,0,2020-06-03,Bromley South,0.218257,BR,BR1,Metropolitan Police,Thames Water,9F32C228+J5,63100
1,BR1 1AB,Yes,51.406333,0.015208,540262,169405,TQ402694,Greater London,Bromley,Bromley Town,...,0,2020-06-03,Bromley North,0.253666,BR,BR1,Metropolitan Police,Thames Water,9F32C248+G3,56100
2,BR1 1AD,No,51.400057,0.016715,540386,168710,TQ403687,Greater London,Bromley,Bromley Town,...,1,2020-06-03,Bromley South,0.044559,BR,BR1,Metropolitan Police,,9F32C228+2M,63100
3,BR1 1AE,Yes,51.404543,0.014195,540197,169204,TQ401692,Greater London,Bromley,Bromley Town,...,0,2020-06-03,Bromley North,0.462939,BR,BR1,Metropolitan Police,Thames Water,9F32C237+RM,63100
4,BR1 1AF,Yes,51.401392,0.014948,540259,168855,TQ402688,Greater London,Bromley,Bromley Town,...,0,2020-06-03,Bromley South,0.227664,BR,BR1,Metropolitan Police,Thames Water,9F32C227+HX,63100


In [149]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Poastalcode,District,State,Valid,geo_point_2d
0,8000,Zurich,ZH,01.01.90,
1,8005,Zurich,ZH,01.01.90,"47.3889310308, 8.52186331434"
2,8010,Zurich BZ FP,ZH,01.03.03,
3,8022,Zurich,ZH,10.01.98,
4,8024,Zurich,ZH,01.01.90,
