# **A Tale of Two Cities!**

### *An analysis for comparison of cities using Foursquare data and Machine Learning*

## **Introduction**

 Picking a city, when it comes to **Delhi** and **Mumbai** is always a hard decision as both these cities are truly multicultural, and cosmopolitan cities found in one of fastest developing Nation, India. Along with being two of India’s most important Financial and political centres, they are major centres for commerce, sciences, fashion, arts, culture and gastronomy. Both Delhi (officially the National Capital Territory (NCT) of Delhi) and Mumbai (capital city of the Indian state of Maharashtra) have a rich history and are two of the most visited and sought-after cities in India. Mumbai is the second-most populous city in the country after Delhi (11 million) and the seventh-most populous city in the world with a population of roughly  20 million. Mumbai lies on the Konkan coast on the west coast of India and has a deep natural harbour. Delhi, is a city and a union territory of India containing New Delhi, the capital of India. It is bordered by the state of Haryana on three sides and by Uttar Pradesh to the east.


Our goal is to perform a comparison of the two cities to see how similar or dissimilar they are. Such techniques allow users to identify similar neighbourhoods among cities based on amenities or services being offered locally, and thus can help in understanding the local area activities, what are the hubs of different activities, how citizens are experiencing the city, and how they are utilising its resources.

### What kind of clientele would benefit from such an analysis?

- A potential job seeker with transferable skills may wish to search for jobs in selective cities which provide the most suitable match for their qualifications and experience in terms of salaries, social benefits, or even in terms of a culture fit for expats.
- Further, a person buying or renting a home in a new city may want to look for recommendations for locations in the city similar to other cities known to them.
- Similarly, a large corporation looking to expand its locations to other cities might benefit from such an analysis.
- Many within-city urban planning computations might also benefit from modelling a city’s relationship to other cities.



## **Data Preparation**

To solve the problem at hand, data extraction was done as follows:

**Web scraping**: City data was extracted from the respective Wikipedia pages using Requests and BeautifulSoup libraries.

In [1]:
# Import Required Libraries
from bs4 import BeautifulSoup
import requests
from geopy.geocoders import Nominatim
import folium
from pandas import json_normalize
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

In [2]:
geolocator = Nominatim(user_agent="ny_explorer")

#### **Mumbai**

The city of Mumbai consists of two distinct regions: Mumbai City district and Mumbai Suburban district, which form two separate revenue districts of Maharashtra. The city district region is also commonly referred to as the Island City or South Mumbai. Mumbai Suburban district lies to the north of Mumbai City district and comprises all of Mumbai's suburbs. The western part of the Mumbai Suburban district forms the Western Suburbs and the eastern portion forms the Eastern Suburbs. The suburbs of Chembur, Govandi, Mankhurd and Trombay lie to the south-east of the Eastern Suburbs. These suburbs are generally not considered as part of the Eastern Suburbs and are sometimes referred to as the "Harbour Suburbs".

The total area of Mumbai is 603.4 km2 (233 sq mi). Of this, the island city spans 67.79 km2 (26 sq mi), while the suburban district spans 370 km2 (143 sq mi), together accounting for 437.71 km2 (169 sq mi) under the administration of Brihanmumbai Municipal Corporation (BMC). The remaining area belongs to Defence, Mumbai Port Trust, Atomic Energy Commission and Borivali National Park, which are out of the jurisdiction of the BMC. Mumbai lies at the mouth of the Ulhas River on the western coast of India, in the coastal region known as the Konkan. It sits on Salsette Island, partially shared with the Thane district. Mumbai is bounded by the Arabian Sea to the west. Borivali National Park is located partly in the Mumbai suburban district, and partly in the Thane district, and it extends over an area of 103.09 km2 (39.80 sq mi).

In [3]:
## URL's to extract Mumbai's neighbourhood information
url_mum = "https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai"
html_mum = requests.get(url_mum).text
soup_mum = BeautifulSoup(html_mum, 'html5lib')

## Clean and store extracted data in a dataframe
table_mum = []
for var in soup_mum.find('table').find_all('tr')[1:]:
    row = var.find_all('td')
    cell = {}
    cell['Area'] = row[0].text.split('\n')[0]
    cell['Location'] = row[1].text.split('\n')[0]
    cell['Latitude'] = float(row[2].text.split('\n')[0])
    cell['Longitude'] = float(row[3].text.split('\n')[0])
    table_mum.append(cell)
df_mum = pd.DataFrame(table_mum)

df_mum.set_index("Area", inplace=True)

## Fixing Incorrect Values
df_mum.loc["Nehru Nagar", "Latitude"] = 19.0640
df_mum.loc["Nehru Nagar", "Longitude"] = 72.8826
df_mum.loc["Hindu colony", "Latitude"] = 19.0197
df_mum.loc["Hindu colony", "Longitude"] = 72.8479

df_mum.reset_index(inplace=True)

## Display DataFrame
df_mum

Unnamed: 0,Area,Location,Latitude,Longitude
0,Amboli,"Andheri,Western Suburbs",19.129300,72.843400
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.827210
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.829270
...,...,...,...,...
88,Parel,South Mumbai,18.990000,72.840000
89,Gowalia Tank,"Tardeo,South Mumbai",18.962450,72.809703
90,Dava Bazaar,South Mumbai,18.946882,72.831362
91,Dharavi,Mumbai,19.040208,72.850850


##### **Defining a Function to grab Latitude and Longitude**

In [4]:
def getlatlong(place):
    location = geolocator.geocode(place)
    loc_lat = location.latitude
    loc_long = location.longitude
    return (loc_lat, loc_long)

##### **Get Mumbai's Latitude and Longitude**

In [5]:
address_mum = 'Mumbai, IN'
latlong_mum = getlatlong(address_mum)
print('The geograpical coordinate of Mumbai are {}, {}.'.format(latlong_mum[0], latlong_mum[1]))

The geograpical coordinate of Mumbai are 19.0759899, 72.8773928.


##### **Visualizing Mumbai's neighborhood on a map**

In [6]:
map_mum = folium.Map(location=[latlong_mum[0], latlong_mum[1]], zoom_start=10)

for lat, lng, location, area in zip(df_mum['Latitude'], df_mum['Longitude'], df_mum['Location'], df_mum['Area']):
    label = """Area: {}\nLocation: {}""".format(area, location)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mum)
map_mum

#### **National Capital Territory of Delhi**

Delhi is a vast city and is home to a population of more than 16 million people. It is a microcosm of India and its residents belong to varied ethnic, religious and linguistic groups. As the second-largest city, and the capital of the nation, its 11 districts comprise multiple neighbourhoods. The large expanse of the city comprises residential districts that range from poor to affluent, and small and large commercial districts, across its municipal extent.

This is a list of major neighbourhoods in the city and only pertains to the National Capital Territory of Delhi. It is not complete, and outlines the various neighbourhoods based on the different districts of the metropolis. 

In [7]:
## URL's to extract Delhi's neighbourhood information
url_del = "https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Delhi"
html_del = requests.get(url_del).text
soup_del = BeautifulSoup(html_del, 'html5lib')

## Find, clean and store data in a dataframe
table_del = []
n = soup_del.find('div', class_='mw-parser-output').find_all('h2')[1:-3]
for v in n:
    location = v.text.split('[')[0]
    ll = v.find_next_sibling().find_all('li')
    for i in ll:
        cell = {}
        area = i.text
        latlong = ""
        try:
            latlong = getlatlong(area + ', Delhi')
            cell['Location'] = location
            cell['Area'] = area
            cell['Latitude'] = latlong[0]
            cell['Longitude'] = latlong[1]
            table_del.append(cell)
        except:
            cell['Location'] = location
            cell['Area'] = area
            cell['Latitude'] = 0
            cell['Longitude'] = 0
            table_del.append(cell)

df_del = pd.DataFrame(table_del)

df_del.set_index("Area", inplace=True)

## Fixing Incorrect Values
df_del.loc["Rohini Sub City", "Latitude"] = 28.7383
df_del.loc["Rohini Sub City", "Longitude"] = 77.0822
df_del.loc["Jamia Nagar", "Latitude"] = 28.5539
df_del.loc["Jamia Nagar", "Longitude"] = 77.2956
df_del.loc["Dwarka Sub City", "Latitude"] = 28.5823
df_del.loc["Dwarka Sub City", "Longitude"] = 77.0500
df_del.loc["Kamal Hans Nagar", "Latitude"] = 28.680556
df_del.loc["Kamal Hans Nagar", "Longitude"] = 77.203611
df_del.loc["Rajender Nagar", "Latitude"] = 28.6372
df_del.loc["Rajender Nagar", "Longitude"] = 77.1824
df_del.loc["Sagar Pur", "Latitude"] = 28.6007
df_del.loc["Sagar Pur", "Longitude"] = 77.1031

df_del.reset_index(inplace=True)

## Display the dataframe
df_del

Unnamed: 0,Area,Location,Latitude,Longitude
0,Adarsh Nagar,North West Delhi,28.716580,77.170422
1,Ashok Vihar,North West Delhi,28.699453,77.184826
2,Begum Pur,North West Delhi,28.725503,77.058371
3,Karala,North West Delhi,28.735140,77.032511
4,Narela,North West Delhi,28.842610,77.091835
...,...,...,...,...
120,Rajouri Garden,West Delhi,28.651190,77.124260
121,Tihar Village,West Delhi,28.634636,77.107112
122,Tilak Nagar,West Delhi,28.636548,77.096496
123,Vikas Nagar,West Delhi,28.644009,77.054470


##### **Get Delhi's Latitude and Longitude**

In [8]:
address_del = 'Delhi, IN'
latlong_del = getlatlong(address_del)
print('The geograpical coordinate of Delhi are {}, {}.'.format(latlong_del[0], latlong_del[1]))

The geograpical coordinate of Delhi are 28.6517178, 77.2219388.


##### **Visualizing Delhi's neighborhood on a map**

In [11]:
map_del = folium.Map(location=[latlong_del[0], latlong_del[1]], zoom_start=10)

for lat, lng, location, area in zip(df_del['Latitude'], df_del['Longitude'], df_del['Location'], df_del['Area']):
    label = """Area: {}\nLocation: {}""".format(area, location)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_del)
map_del