# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a cargo facility. Specifically, this report will be targeted to stakeholders interested in opening an **Drone Shipping Facility** in **İstanbul**, Turkey.

Since it will be the first Drone Cargo Facility in İstanbul we need to choose best possible location. 

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

#### Let's start with importing required libraries

In [1]:
#import libraries
import pandas as pd
import numpy as np
import folium
import geopy
import requests
import json
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from collections import Counter
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
print('Libraries imported.')

Libraries imported.


In [2]:
pd.set_option('display.max_columns',None)
pd.set_option('display.max_rows',None)
pd.options.mode.chained_assignment = None

### Obtaing the borough and population data from internet

In [3]:
url ="https://www.nufusu.com/ilceleri/istanbul-ilceleri-nufusu"
dfs = pd.read_html(url, encoding="UTF-8")
df = dfs[0][['İlçe','Toplam Nüfus']]
df.columns = ['Borough','Population']
print(df.shape)
print(df.isnull().value_counts())
df

(39, 2)
Borough  Population
False    False         39
dtype: int64


Unnamed: 0,Borough,Population
0,Esenyurt,954.579
1,Küçükçekmece,792.821
2,Bağcılar,745.125
3,Pendik,711.894
4,Ümraniye,710.28
5,Bahçelievler,611.059
6,Sultangazi,534.565
7,Üsküdar,531.825
8,Maltepe,513.316
9,Gaziosmanpaşa,491.962


### to fetching boroughs coordinates from geopy library 

In [4]:

from geopy.geocoders import Nominatim
lat=[]
lng=[]
add=[]
for item in df.Borough:
    address= item + ', İstanbul'
    geolocator = Nominatim(user_agent="project")
    location = geolocator.geocode(address)
    add.append(location.address)
    lat.append(location.latitude)
    lng.append(location.longitude)
    print(location.address)

Esenyurt, İstanbul, Marmara Bölgesi, Türkiye
Küçükçekmece, İstanbul, Marmara Bölgesi, 34290, Türkiye
Bağcılar, İstanbul, Marmara Bölgesi, Türkiye
Pendik, İstanbul, Marmara Bölgesi, 34890, Türkiye
Ümraniye, İstanbul, Marmara Bölgesi, Türkiye
Bahçelievler Mahallesi, Bahçelievler, İstanbul, Marmara Bölgesi, 34180, Türkiye
Sultangazi, İstanbul, Marmara Bölgesi, Türkiye
Üsküdar, İstanbul, Marmara Bölgesi, Türkiye
Maltepe, İstanbul, Marmara Bölgesi, 34844, Türkiye
Gaziosmanpaşa, İstanbul, Marmara Bölgesi, Türkiye
Kadıköy, İstanbul, Marmara Bölgesi, Türkiye
Kartal, İstanbul, Marmara Bölgesi, 34860, Türkiye
Başakşehir, İstanbul, Marmara Bölgesi, Türkiye
Esenler, İstanbul, Marmara Bölgesi, Türkiye
Avcılar, İstanbul, Marmara Bölgesi, Türkiye
Kağıthane, İstanbul, Marmara Bölgesi, Türkiye
İstanbul, Beyazıt Mahallesi, Fatih, İstanbul, Marmara Bölgesi, 34126, Türkiye
Sancaktepe, İstanbul, Marmara Bölgesi, 34887, Türkiye
Ataşehir, İstanbul, Marmara Bölgesi, Türkiye
Eyüpsultan, İstanbul, Marmara Bölge

In [5]:
df['Address'] = add
df['Latitude'] = lat
df['Longitude'] = lng
df.head()

Unnamed: 0,Borough,Population,Address,Latitude,Longitude
0,Esenyurt,954.579,"Esenyurt, İstanbul, Marmara Bölgesi, Türkiye",41.03424,28.680018
1,Küçükçekmece,792.821,"Küçükçekmece, İstanbul, Marmara Bölgesi, 34290...",41.000214,28.780889
2,Bağcılar,745.125,"Bağcılar, İstanbul, Marmara Bölgesi, Türkiye",41.033899,28.857898
3,Pendik,711.894,"Pendik, İstanbul, Marmara Bölgesi, 34890, Türkiye",40.876589,29.233342
4,Ümraniye,710.28,"Ümraniye, İstanbul, Marmara Bölgesi, Türkiye",41.022269,29.090073


In [6]:
df.drop(['Address'], axis=1, inplace=True)
df.head()

Unnamed: 0,Borough,Population,Latitude,Longitude
0,Esenyurt,954.579,41.03424,28.680018
1,Küçükçekmece,792.821,41.000214,28.780889
2,Bağcılar,745.125,41.033899,28.857898
3,Pendik,711.894,40.876589,29.233342
4,Ümraniye,710.28,41.022269,29.090073


In [7]:
location = geolocator.geocode('Beyoğlu, İstanbul')
lat_ist=location.latitude
lng_ist=location.longitude
print('Coordinates for İstanbul are Lat={}, Lng={}.'.format(lat_ist,lng_ist))

Coordinates for İstanbul are Lat=41.0284233, Lng=28.9736808.


### Let's plot the boroughs on the map. (the size of the markers depends on the county populations.)

In [8]:
map_istanbul = folium.Map(location=[lat_ist,lng_ist],zoom_start=10)
#add markers to map
for boro, popu, lati, long in zip(df.Borough, df.Population, df.Latitude, df.Longitude):
    label='{}\n{}'.format(boro,popu)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lati, long],
        radius = popu*0.015,
        popup = label,
        color='red',
        fill_color='orange',
        fill_opacity = 0.7,
        parse_html=False).add_to(map_istanbul)
map_istanbul

### Exploring FOURSQUARE url to examine data structure

In [9]:
# use first data at df to explore structure
lat_exp = df.loc[0, 'Latitude']
lng_exp = df.loc[0, 'Longitude']
exp_name = df.loc[0, 'Borough']
print (exp_name,lat_exp,lng_exp)

Esenyurt 41.0342402 28.6800178


In [10]:
url = 'https://api.foursquare.com/v2/venues/search'
params = dict(
            client_id='5HET2YQEMWKQ3RZAIKLKECSO330TIBOXX10IR44YL30K2EQX',
            client_secret='L0XHFICQSON3TDPQBH1JHZYKPJTJ2FP4NGR2HYUA03RYQXPA',
            v='20200609',
            ll='{},{}'.format(lat_exp,lng_exp),
            radius = 3000,
            query='bim')
#             ,limit=2)
response = requests.get(url=url, params=params).json()['response']['venues']
# data = json.loads(resp.text)
response

[{'id': '4ec636e0722eeaa24c818a14',
  'name': 'BİM',
  'location': {'lat': 41.04243300044461,
   'lng': 28.65251160682164,
   'labeledLatLngs': [{'label': 'display',
     'lat': 41.04243300044461,
     'lng': 28.65251160682164}],
   'distance': 2483,
   'cc': 'TR',
   'country': 'Türkiye',
   'formattedAddress': ['Türkiye']},
  'categories': [{'id': '4bf58dd8d48988d1fd941735',
    'name': 'Shopping Mall',
    'pluralName': 'Shopping Malls',
    'shortName': 'Mall',
    'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/mall_',
     'suffix': '.png'},
    'primary': True}],
  'referralId': 'v-1604476416',
  'hasPerk': False},
 {'id': '50554b36e4b0b914259b80f1',
  'name': 'BİM',
  'location': {'lat': 41.00490421579907,
   'lng': 28.70277765324506,
   'labeledLatLngs': [{'label': 'display',
     'lat': 41.00490421579907,
     'lng': 28.70277765324506}],
   'distance': 3783,
   'cc': 'TR',
   'country': 'Türkiye',
   'formattedAddress': ['Türkiye']},
  'categories': [{'id': '

### Let's write a function to fetch the BİM markets coordinates from FOURSQUARE api

In [11]:
client_id = '5HET2YQEMWKQ3RZAIKLKECSO330TIBOXX10IR44YL30K2EQX'
client_secret = 'L0XHFICQSON3TDPQBH1JHZYKPJTJ2FP4NGR2HYUA03RYQXPA'
v = '20200609'

def GetBim(boroughs, latitudes, longitudes, radius=3000,query='bim'):
    markets=[]
    for boro, lati, long, in zip (boroughs, latitudes, longitudes):
        print(boro)
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&query={}'.format(
            client_id,
            client_secret,
            v,
            lati,
            long,
            radius,
            query)

        results = requests.get(url).json()['response']['venues']

        for market in results:
#             print(market['name'],'\n')
            markets.append([
                boro,
                lati,
                long,
                market['name'],
                market['location']['lat'],
                market['location']['lng']])
   
    nearby_markets = ['Borough','Borough Latitude','Borough Longitude',
                  'Market Name','Market Latitude','Market Longitude']
    
    nearby_values = pd.DataFrame(markets,columns=nearby_markets)

#     print(nearby_values.head())
    return(nearby_values)
            
        

In [12]:
bim_markets = GetBim(boroughs=df['Borough'],latitudes=df['Latitude'], longitudes=df['Longitude'])

Esenyurt
Küçükçekmece
Bağcılar
Pendik
Ümraniye
Bahçelievler
Sultangazi
Üsküdar
Maltepe
Gaziosmanpaşa
Kadıköy
Kartal
Başakşehir
Esenler
Avcılar
Kağıthane
Fatih
Sancaktepe
Ataşehir
Eyüpsultan
Beylikdüzü
Sarıyer
Sultanbeyli
Zeytinburnu
Güngören
Arnavutköy
Şişli
Bayrampaşa
Tuzla
Çekmeköy
Büyükçekmece
Beykoz
Beyoğlu
Bakırköy
Silivri
Beşiktaş
Çatalca
Şile
Adalar


In [13]:
print(bim_markets.shape)
bim_markets.head()

(927, 6)


Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Market Name,Market Latitude,Market Longitude
0,Esenyurt,41.03424,28.680018,BİM,41.042433,28.652512
1,Esenyurt,41.03424,28.680018,BİM,41.004904,28.702778
2,Esenyurt,41.03424,28.680018,Bim Agena Evleri,41.045673,28.674354
3,Esenyurt,41.03424,28.680018,BİM,41.003947,28.706028
4,Esenyurt,41.03424,28.680018,Bim Bahcesehir,41.06366,28.688366


In [14]:
bim_markets.dtypes

Borough               object
Borough Latitude     float64
Borough Longitude    float64
Market Name           object
Market Latitude      float64
Market Longitude     float64
dtype: object

### Theese steps are for to correct some typos.

In [15]:
bim_markets['Market Name']=bim_markets['Market Name'].str.lower()
bim_markets.head()

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Market Name,Market Latitude,Market Longitude
0,Esenyurt,41.03424,28.680018,bi̇m,41.042433,28.652512
1,Esenyurt,41.03424,28.680018,bi̇m,41.004904,28.702778
2,Esenyurt,41.03424,28.680018,bim agena evleri,41.045673,28.674354
3,Esenyurt,41.03424,28.680018,bi̇m,41.003947,28.706028
4,Esenyurt,41.03424,28.680018,bim bahcesehir,41.06366,28.688366


In [16]:
bim_markets['Market Name'].replace('i̇','i',regex=True,inplace=True)
bim_markets['Market Name'].replace('í','i',regex=True,inplace=True)
bim_markets['Market Name'].replace('ı','i',regex=True,inplace=True)
bim_markets['Market Name'].replace('ì','i',regex=True,inplace=True)
bim_markets['Market Name'][109]

'bimeks'

### We need to check if any other venue exists in other data. I dont want any other data but "bim".

In [17]:
bim_markets['Market Name'].str.contains(r'\bbim\b')

0       True
1       True
2       True
3       True
4       True
5       True
6       True
7       True
8       True
9       True
10      True
11      True
12      True
13      True
14      True
15      True
16      True
17      True
18      True
19      True
20      True
21      True
22      True
23      True
24      True
25      True
26      True
27      True
28      True
29      True
30      True
31      True
32      True
33      True
34      True
35      True
36      True
37      True
38      True
39      True
40      True
41      True
42      True
43      True
44      True
45      True
46      True
47      True
48      True
49     False
50      True
51      True
52      True
53      True
54      True
55      True
56      True
57      True
58      True
59      True
60      True
61      True
62      True
63      True
64      True
65      True
66      True
67      True
68      True
69      True
70      True
71      True
72      True
73      True
74      True
75      True
76      True

In [18]:
bim_markets = bim_markets[bim_markets['Market Name'].str.contains(r'\bbim\b') != False]
bim_markets

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Market Name,Market Latitude,Market Longitude
0,Esenyurt,41.03424,28.680018,bim,41.042433,28.652512
1,Esenyurt,41.03424,28.680018,bim,41.004904,28.702778
2,Esenyurt,41.03424,28.680018,bim agena evleri,41.045673,28.674354
3,Esenyurt,41.03424,28.680018,bim,41.003947,28.706028
4,Esenyurt,41.03424,28.680018,bim bahcesehir,41.06366,28.688366
5,Esenyurt,41.03424,28.680018,özyurtlar bim market,41.02946,28.680719
6,Esenyurt,41.03424,28.680018,bim,41.045856,28.668203
7,Esenyurt,41.03424,28.680018,bim,41.028233,28.65379
8,Esenyurt,41.03424,28.680018,bim haramidere bölge müdürlüğü,41.011596,28.68988
9,Esenyurt,41.03424,28.680018,bim,41.01726,28.661623


In [19]:
bim_markets['Market Name'].str.contains(r'\bbim\b').value_counts()

True    902
Name: Market Name, dtype: int64

In [20]:
bim_markets.drop('Market Name', axis=1, inplace=True)
bim_markets.head()

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Market Latitude,Market Longitude
0,Esenyurt,41.03424,28.680018,41.042433,28.652512
1,Esenyurt,41.03424,28.680018,41.004904,28.702778
2,Esenyurt,41.03424,28.680018,41.045673,28.674354
3,Esenyurt,41.03424,28.680018,41.003947,28.706028
4,Esenyurt,41.03424,28.680018,41.06366,28.688366


### Theese are all the bim markets in The istanbul 
## Let's plot them

In [21]:
bim_markets.groupby('Borough').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Market Latitude,Market Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Arnavutköy,11,11,11,11
Ataşehir,30,30,30,30
Avcılar,19,19,19,19
Bahçelievler,29,29,29,29
Bakırköy,29,29,29,29
Bayrampaşa,30,30,30,30
Bağcılar,30,30,30,30
Başakşehir,12,12,12,12
Beykoz,16,16,16,16
Beylikdüzü,29,29,29,29


In [22]:
map_bim = folium.Map(location=[lat_ist,lng_ist],zoom_start=10)
#add markers to map
for boro, lati, long in zip(bim_markets['Borough'], bim_markets['Market Latitude'], bim_markets['Market Longitude']):
    label = boro
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lati, long],
        radius = 2,
        popup = label,
        color='red',
        fill_color='orange',
        fill_opacity = 0.7,
        parse_html=False).add_to(map_bim)
map_bim

### Now we can start ML modeling
#### First step is to we convert the market coordinates to numpy array

In [23]:
dbscan_data = bim_markets[['Market Latitude','Market Longitude']]
dbscan_data = dbscan_data.values.astype('float32', copy=False)
dbscan_data

array([[41.042435, 28.652512],
       [41.004906, 28.702778],
       [41.045673, 28.674355],
       ...,
       [41.17115 , 29.609386],
       [41.176834, 29.612692],
       [41.16028 , 29.590572]], dtype=float32)

In [24]:
#Normalize Data
dbscan_data_scaler = StandardScaler().fit(dbscan_data)
dbscan_data = dbscan_data_scaler.transform(dbscan_data)
dbscan_data

array([[ 0.40922052, -1.5510353 ],
       [-0.2299254 , -1.2878929 ],
       [ 0.46437752, -1.4366881 ],
       ...,
       [ 2.6013389 ,  3.4581745 ],
       [ 2.6981397 ,  3.4754784 ],
       [ 2.4161828 ,  3.3596835 ]], dtype=float32)

In [25]:
#Construct Model
'''
min samples should be 12
eps : in radius 0.12
'''

model = DBSCAN(eps=0.12, min_samples =12, metric='euclidean').fit(dbscan_data)
cluster_numbers = Counter(model.labels_)
cluster_numbers

Counter({-1: 324,
         0: 25,
         1: 189,
         2: 45,
         3: 22,
         4: 45,
         5: 78,
         6: 18,
         7: 53,
         8: 12,
         9: 21,
         10: 39,
         11: 13,
         12: 18})

### Lets add the cluster numbers to our data frame

In [26]:
bim_markets['cluster_labels'] = model.labels_
bim_markets.head()

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Market Latitude,Market Longitude,cluster_labels
0,Esenyurt,41.03424,28.680018,41.042433,28.652512,-1
1,Esenyurt,41.03424,28.680018,41.004904,28.702778,0
2,Esenyurt,41.03424,28.680018,41.045673,28.674354,-1
3,Esenyurt,41.03424,28.680018,41.003947,28.706028,0
4,Esenyurt,41.03424,28.680018,41.06366,28.688366,-1


### It is time to plot again.

In [31]:
#create map
map_cluster = folium.Map(location=[lat_ist,lng_ist],zoom_start=10)

#set color schema for clusters
k = len(cluster_numbers)
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#add markers to map
marker_colors = []
for boro, lati, long, clust in zip(bim_markets['Borough'], bim_markets['Market Latitude'], 
                                   bim_markets['Market Longitude'],bim_markets['cluster_labels']):
    label = folium.Popup(boro + 'Bim Cluster:' + str(clust), parse_html=True)
    folium.CircleMarker(
        [lati, long],
        radius = 2,
        popup = label,
        color = rainbow[int(clust)],
        fill = True,
        fill_opacity = 0.7,
        fill_color='white').add_to(map_cluster)
map_cluster

In [28]:
#which boroughs inside our cluster
winning_cluster = bim_markets[model.labels_ == 1][['Borough','cluster_labels']]
print(winning_cluster.groupby('Borough').count().sum())
winning_cluster.groupby('Borough').count()

cluster_labels    189
dtype: int64


Unnamed: 0_level_0,cluster_labels
Borough,Unnamed: 1_level_1
Bahçelievler,27
Bakırköy,26
Bayrampaşa,16
Bağcılar,26
Esenler,23
Eyüpsultan,6
Fatih,2
Gaziosmanpaşa,5
Güngören,30
Zeytinburnu,28


## Conclusion:
 * The result of the modeling yielded 14 clusters and it was observed that Cluster-1 has the best possibility. A total of 189 markets from 10 boroughs were preferred in Cluster-1. 
 * If the study is expanded a little more, it will be beneficial to automatically test the parameters and test with the other unsupervised methods to determine which one will give the best result by using other models.
 * I used the DBSCAN ML model in my solution and I was satisfied with the results. Although it might thought that smoother results can be obtained by reproducing and diversifying the data in data frame, I think that the result will be similar to the result of this study.
