# The Battle of Neighboroughs - Part 1

## Background of the Business Problem

Milan, a metropolis in Italy's northern Lombardy region, is a global capital of fashion and design. Home to the national stock exchange, it’s a financial hub also known for its high-end restaurants and shops. Milan is definitely one of the best places to start up a new business.
During the daytime, specially in the morning and lunch hours, office areas provide huge opportunities for coffee shops. Reasonably priced shops are usually always full during the lunch hours (11 am — 2 pm) and, given this scenario, we will go through the benefits and pitfalls of opening a breakfast cum lunch coffee shop in highly densed office places. 
The core of Milano is made of 9 municipalities but, I will later concentrate on 4 most busiest business boroughs of Milan: Centro Storico, Stazione Centrale, Città Studi and Porta Garibaldi to target daily office workers.
We will go through each step of this project and address them separately. I first outline the initial data preparation and describe future steps to start the battle of neighborhoods in Milan


## Preparation for Data (Data Section)

Importing Libraries

In [137]:
import requests
import json

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

### Get table of municipalities of Milan from Wikipedia page

We use BeautifulSoup library for scraping the web page

In [138]:
response = requests.get('https://en.wikipedia.org/wiki/Zones_of_Milan').text
soup = BeautifulSoup(response,'lxml')
table = soup.find('table', {'class':'wikitable sortable'})

In [180]:
table_rows = table.find_all('tr')

res = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        res.append(row)


df = pd.DataFrame(res, columns=["Num", "LongName", "Area", "Population","Density", "Districts"])
#remove column Districts
df = df.drop(columns=['Districts'])
df

Unnamed: 0,Num,LongName,Area,Population,Density
0,1,Centro storico,9.67,96315.0,11074
1,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...",12.58,153.109,13031
2,3,"Città Studi, Lambrate, Porta Venezia",14.23,141229.0,10785
3,4,"Porta Vittoria, Forlanini",20.95,156.369,8069
4,5,"Vigentino, Chiaravalle, Gratosoglio",29.87,123779.0,4487
5,6,"Barona, Lorenteggio",18.28,149000.0,8998
6,7,"Baggio, De Angeli, San Siro",31.34,170814.0,6093
7,8,"Fiera, Gallaratese, Quarto Oggiaro",23.72,181669.0,8326
8,9,"Porta Garibaldi, Niguarda",21.12,181598.0,9204


Remove columns not used

In [181]:
#we assume the first name found in LongName as name of Municipality
df['Shortname'] = df.LongName.str.split(',').str[0]
# convert Shortname in Capital letters  
df['Shortname'] = df.Shortname.str.capitalize() 

In [182]:
df

Unnamed: 0,Num,LongName,Area,Population,Density,Shortname
0,1,Centro storico,9.67,96315.0,11074,Centro storico
1,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...",12.58,153.109,13031,Stazione centrale
2,3,"Città Studi, Lambrate, Porta Venezia",14.23,141229.0,10785,Città studi
3,4,"Porta Vittoria, Forlanini",20.95,156.369,8069,Porta vittoria
4,5,"Vigentino, Chiaravalle, Gratosoglio",29.87,123779.0,4487,Vigentino
5,6,"Barona, Lorenteggio",18.28,149000.0,8998,Barona
6,7,"Baggio, De Angeli, San Siro",31.34,170814.0,6093,Baggio
7,8,"Fiera, Gallaratese, Quarto Oggiaro",23.72,181669.0,8326,Fiera
8,9,"Porta Garibaldi, Niguarda",21.12,181598.0,9204,Porta garibaldi


### Get geo coordinates of boroughs

We use Geopy web service

In [183]:
from geopy.geocoders import Nominatim
geolocator = Nominatim()
location = geolocator.geocode("Milano, MI, Lom, Italia")
address = []
coord = []
address = df['Shortname']+", Milano, MI, Lom, Italy"
coord = address.apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df['Coordinates'] = coord
df

Unnamed: 0,Num,LongName,Area,Population,Density,Shortname,Coordinates
0,1,Centro storico,9.67,96315.0,11074,Centro storico,"(45.41921235, 9.07080197950279)"
1,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...",12.58,153.109,13031,Stazione centrale,"(45.4866591, 9.2072566)"
2,3,"Città Studi, Lambrate, Porta Venezia",14.23,141229.0,10785,Città studi,"(45.4770557, 9.2265746)"
3,4,"Porta Vittoria, Forlanini",20.95,156.369,8069,Porta vittoria,"(45.4622607, 9.2095796)"
4,5,"Vigentino, Chiaravalle, Gratosoglio",29.87,123779.0,4487,Vigentino,"(45.4399296, 9.2004923)"
5,6,"Barona, Lorenteggio",18.28,149000.0,8998,Barona,"(45.4388451, 9.1546701)"
6,7,"Baggio, De Angeli, San Siro",31.34,170814.0,6093,Baggio,"(45.4614328, 9.0910822)"
7,8,"Fiera, Gallaratese, Quarto Oggiaro",23.72,181669.0,8326,Fiera,"(45.5202499, 9.0789880116852)"
8,9,"Porta Garibaldi, Niguarda",21.12,181598.0,9204,Porta garibaldi,"(45.4806652, 9.1868884)"


In [184]:
df[['Latitude', 'Longitude']] = df['Coordinates'].apply(pd.Series)

In [185]:
df = df.drop(columns=['Coordinates'])

In [186]:
df

Unnamed: 0,Num,LongName,Area,Population,Density,Shortname,Latitude,Longitude
0,1,Centro storico,9.67,96315.0,11074,Centro storico,45.419212,9.070802
1,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...",12.58,153.109,13031,Stazione centrale,45.486659,9.207257
2,3,"Città Studi, Lambrate, Porta Venezia",14.23,141229.0,10785,Città studi,45.477056,9.226575
3,4,"Porta Vittoria, Forlanini",20.95,156.369,8069,Porta vittoria,45.462261,9.20958
4,5,"Vigentino, Chiaravalle, Gratosoglio",29.87,123779.0,4487,Vigentino,45.43993,9.200492
5,6,"Barona, Lorenteggio",18.28,149000.0,8998,Barona,45.438845,9.15467
6,7,"Baggio, De Angeli, San Siro",31.34,170814.0,6093,Baggio,45.461433,9.091082
7,8,"Fiera, Gallaratese, Quarto Oggiaro",23.72,181669.0,8326,Fiera,45.52025,9.078988
8,9,"Porta Garibaldi, Niguarda",21.12,181598.0,9204,Porta garibaldi,45.480665,9.186888


In [187]:
df = df.loc[df['Shortname'].isin(["Centro storico", "Stazione centrale", "Città studi", "Porta garibaldi"])]

In [188]:
df = df.reset_index(drop=True)
df

Unnamed: 0,Num,LongName,Area,Population,Density,Shortname,Latitude,Longitude
0,1,Centro storico,9.67,96315.0,11074,Centro storico,45.419212,9.070802
1,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...",12.58,153.109,13031,Stazione centrale,45.486659,9.207257
2,3,"Città Studi, Lambrate, Porta Venezia",14.23,141229.0,10785,Città studi,45.477056,9.226575
3,9,"Porta Garibaldi, Niguarda",21.12,181598.0,9204,Porta garibaldi,45.480665,9.186888


### Get average of property price in major municipalities

We get data from page "Mercato Immobiliare a Milano" by using web scraping technique.

In [189]:
response2 = requests.get('https://www.mercato-immobiliare.info/lombardia/milano/milano.html').text
#soup2 = BeautifulSoup(response2,'lxml')
soup2 = BeautifulSoup(response2, 'html.parser')
table2 = soup2.find('table', {'id':'childrentable'})

In [190]:
table_rows2 = table2.find_all('tr')

res = []
for tr in table_rows2:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        res.append(row)


df2 = pd.DataFrame(res, columns=["Shortname", "Price","Link"])
df2 = df2.drop(columns=['Link'])

In [191]:
df2['Shortname'] = df2.Shortname.str.capitalize() 

We select only data for the major municipalities: Centro Storico, Stazione Centrale, Città Studi and Porta Garibaldi.


In [192]:
df2 = df2.loc[df2['Shortname'].isin(["Centro storico", "Stazione centrale", "Città studi", "Porta garibaldi"])]
df2 = df2.reset_index(drop=True)
df2

Unnamed: 0,Shortname,Price
0,Centro storico,€ 6.500 /m²
1,Città studi,€ 3.950 /m²
2,Porta garibaldi,€ 5.500 /m²
3,Stazione centrale,€ 4.550 /m²


### Join the datasets

In [193]:
df3 = pd.merge(df, df2, on='Shortname', how='inner')

In [194]:
df3

Unnamed: 0,Num,LongName,Area,Population,Density,Shortname,Latitude,Longitude,Price
0,1,Centro storico,9.67,96315.0,11074,Centro storico,45.419212,9.070802,€ 6.500 /m²
1,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...",12.58,153.109,13031,Stazione centrale,45.486659,9.207257,€ 4.550 /m²
2,3,"Città Studi, Lambrate, Porta Venezia",14.23,141229.0,10785,Città studi,45.477056,9.226575,€ 3.950 /m²
3,9,"Porta Garibaldi, Niguarda",21.12,181598.0,9204,Porta garibaldi,45.480665,9.186888,€ 5.500 /m²


## Conclusion
### Part 1: Description of Problem and Data Preparation

We get the Initial Data-Frame with Names of Major Municipalities, and corresponding coordinates of those major districts and average proerty price. Before comparing all the municipalities, since we want to concentrate only on lunch coffee shops targeting the office workers, we need to get the idea about the best business areas in Milan. 
Here we want to concentrate on the best four boroughs:

    Centro storico
    Stazione Centrale
    Città Studi
    Porta Garibaldi
    
So as the next step we will use Foursquare data and obtain information on coffee shops. With these, we can start with our battle of neighborhoods for opening a coffee shop in Milan.
