# Capstone Project - The Battle of the Neighborhoods

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

# Introduction/Business Problem
## 1. A description of the problem and a discussion of the background

In this project I will compare neiberhoods of two big cities in Europe, Amsterdam and Berlin.

My target audience are people that know one city pretty well and are wondering which neighborhood of the other city is most similar to the one they like.

They could use this data to choose location of their hotel or apartment when visiting or moving to the other city. 

Criteria that I will take into the consideration are, Population density, area, venues nearby

# Data 
## 2. A description of the data and how it will be used to solve the problem. 

I will take postal codes, neigboorhood info, geo location from, **https://postal-codes.cybo.com/germany/berlin/#listcodes**, **https://postal-codes.cybo.com/netherlands/amsterdam/#listcodes** then using the foursquare API, I will obtain data about the venues in each neiborhood to compare them.

The details page on **cybo.com** have **Neighborhoods**, **Coordinates**, **area** and **population** fields 
Foursquare endpoint gives information about the venues **https://api.foursquare.com/v2/venues/explore**

First entry from the Berlin codes:


Postal Code|City|Administrative Region|Population|Area
 --- | --- | --- | --- | --- 
10115|Berlin|Berlin|16,678|2.421 km²

And in the details page **https://postal-codes.cybo.com/germany/10115_berlin-berlin/** fields that I will use:

| | |
:--- | --- 
**Neighborhoods**|Mitte
**Coordinates**|52.533706954560394° / 13.387223860255002°

Interesing information from the API are in the items, especialy venue fields.


Getting data from cybo.com in separate notebook, here I will import the obtained files:

In [2]:
import pandas as pd

In [111]:
def clean_df_codes(df: pd.DataFrame):
    df = df.replace(',','', regex=True)
    df['Population'] = pd.to_numeric(df['Population'], errors='coerce')
    df['Area km²'] = pd.to_numeric(df['Area km²'], errors='coerce')
    df.dropna(inplace=True)
    df['Population density per km²'] = df['Population']/df['Area km²']
    df['Population density per km²'] = df['Population density per km²'].round(2)
    df['Population'] = df['Population'].astype(int)
    return df

In [113]:
berlin_codes_df = pd.read_csv('berlin_codes.csv', delimiter=';')
berlin_codes_df = clean_df_codes(berlin_codes_df)
berlin_codes_df.head()

Unnamed: 0,Postal Code,City,Administrative Region,Population,Area km²,Population density per km²
0,10115,Berlin,Berlin,16678,2.421,6888.89
1,10117,Berlin,Berlin,24223,3.321,7293.89
2,10119,Berlin,Berlin,7408,0.857,8644.11
3,10178,Berlin,Berlin,14069,1.872,7515.49
4,10179,Berlin,Berlin,15897,2.183,7282.18


In [114]:
berlin_details_df = pd.read_csv('berlin_details.csv', delimiter=';', thousands=',')
berlin_details_df.head()

Unnamed: 0,Postal Code,Median Age,Neighborhoods,Latitude,Longitude
0,10115,43.0,Mitte,52.533707,13.387224
1,10117,43.0,Mitte,52.518746,13.390193
2,10119,43.0,"Bezirk Pankow, Mitte",52.532666,13.407149
3,10178,43.0,Mitte,52.523474,13.412203
4,10179,43.0,"Luisenstadt, Mitte",52.514591,13.419699


In [125]:
amsterdam_codes_df = pd.read_csv('amsterdam_codes.csv', delimiter=';')
amsterdam_codes_df = clean_df_codes(amsterdam_codes_df)
amsterdam_codes_df.rename(columns={'Postal District': 'Postal Code'}, inplace=True)
amsterdam_codes_df.head()

Unnamed: 0,Postal Code,City,Administrative Region,Population,Area km²,Population density per km²
0,1011,Amsterdam,North Holland,6606,1.032,6401.16
1,1012,Amsterdam,North Holland,7067,1.207,5855.01
2,1013,Amsterdam,North Holland,26792,6.3,4252.7
3,1014,Amsterdam,North Holland,15056,2.699,5578.36
4,1015,Amsterdam,North Holland,5926,0.776,7636.6


In [116]:
amsterdam_details_df = pd.read_csv('amsterdam_details.csv', delimiter=';', thousands=',')
amsterdam_details_df.head()

Unnamed: 0,Postal Code,Median Age,Neighborhoods,Latitude,Longitude
0,1011,36.3,"Amsterdam-Centrum, Stadsdeel Centrum",52.371124,4.903752
1,1012,36.3,"Amsterdam-Centrum, Centrum, De Wallen, Stadsde...",52.373179,4.89491
2,1013,36.3,"Amsterdam-Centrum, Amsterdam-West, Haarlemmerb...",52.387662,4.883396
3,1014,36.3,"Amsterdam-West, Bedrijventerrein Sloterdijk, S...",52.393066,4.853503
4,1015,36.3,"Amsterdam-Centrum, Grachtengordel-West, Jordaan",52.378205,4.882973


In [117]:
berlin_merged = pd.merge(berlin_codes_df, berlin_details_df, on='Postal Code', how='left')
berlin_merged.head()

Unnamed: 0,Postal Code,City,Administrative Region,Population,Area km²,Population density per km²,Median Age,Neighborhoods,Latitude,Longitude
0,10115,Berlin,Berlin,16678,2.421,6888.89,43.0,Mitte,52.533707,13.387224
1,10117,Berlin,Berlin,24223,3.321,7293.89,43.0,Mitte,52.518746,13.390193
2,10119,Berlin,Berlin,7408,0.857,8644.11,43.0,"Bezirk Pankow, Mitte",52.532666,13.407149
3,10178,Berlin,Berlin,14069,1.872,7515.49,43.0,Mitte,52.523474,13.412203
4,10179,Berlin,Berlin,15897,2.183,7282.18,43.0,"Luisenstadt, Mitte",52.514591,13.419699


In [126]:
amsterdam_merged = pd.merge(amsterdam_codes_df, amsterdam_details_df, on='Postal Code', how='left')
amsterdam_merged.head()

Unnamed: 0,Postal Code,City,Administrative Region,Population,Area km²,Population density per km²,Median Age,Neighborhoods,Latitude,Longitude
0,1011,Amsterdam,North Holland,6606,1.032,6401.16,36.3,"Amsterdam-Centrum, Stadsdeel Centrum",52.371124,4.903752
1,1012,Amsterdam,North Holland,7067,1.207,5855.01,36.3,"Amsterdam-Centrum, Centrum, De Wallen, Stadsde...",52.373179,4.89491
2,1013,Amsterdam,North Holland,26792,6.3,4252.7,36.3,"Amsterdam-Centrum, Amsterdam-West, Haarlemmerb...",52.387662,4.883396
3,1014,Amsterdam,North Holland,15056,2.699,5578.36,36.3,"Amsterdam-West, Bedrijventerrein Sloterdijk, S...",52.393066,4.853503
4,1015,Amsterdam,North Holland,5926,0.776,7636.6,36.3,"Amsterdam-Centrum, Grachtengordel-West, Jordaan",52.378205,4.882973
