# Aim: To plot and identify the commercial markets using point of interest (POI)

### This task requires to create clusters of distinct commercial centers or markets using points of interest data of a city (the city could be yours). Points of interest (POI) data provides location information of different places along with their defining tags like school, type of outlets, type of building, etc.

## Importing necessary libraries


In [167]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.pyplot import figure
%matplotlib inline

## Importing Geopandas

A Library built on the python pandas and some libarires to work with the geospatial data

In [168]:
import geopandas as gpd

## Reading the Geojson file exported from the Open Source Maps with the help of OverpassTurbo 

The area is North Delhi

In [169]:
df = gpd.read_file("rohini.geojson")

## Used Overpass Turbo and requested query to extract various types of POI for this. Types of POIs are mentioned in the query script.

In [170]:
df.head()

Unnamed: 0,id,@id,amenity,atm,brand:wikidata,brand:wikipedia,name,name:en,operator,website,...,unisex,addr:district,addr:subdistrict,branch:type,drink:sugarcane_juice,air_conditioning,drive_through,studio,healthcare,geometry
0,node/280741143,node/280741143,bank,yes,Q2003549,en:Axis Bank,Axis Bank,Axis Bank,Canara Bank,https://www.axisbank.com,...,,,,,,,,,,POINT (77.19423500000001 28.64725)
1,node/355436037,node/355436037,atm,,,,ICICI Bank,,,,...,,,,,,,,,,POINT (77.1723786 28.6458869)
2,node/355436042,node/355436042,fast_food,,,,Dominos Pizza,,,,...,,,,,,,,,,POINT (77.1722681 28.6457998)
3,node/459771176,node/459771176,cinema,,,,"Fun Cinemas, CRM, Shahdara",,Fun Multiplex Pvt Ltd,,...,,,,,,,,,,POINT (77.30198 28.656726)
4,node/496457107,node/496457107,bus_station,,,,,,,,...,,,,,,,,,,POINT (77.2516076 28.6108671)


In [171]:
df.shape

(1123, 126)

Extrating the Longitude and Latitude coordinates from the 'geography' column

In [173]:
df['Long'] = df['geometry'].x
df['Lat'] = df['geometry'].y

Converting the Coordinates into an array of geo pairs

In [174]:
coordinates = np.array([[df['Lat'],df['Long']]])

# Visualization:

## Folium, a python library based on Leaflet, a javascript interactive library used here to plot and visualise the data points on the map.

### Tried Various Map packages such as Basemap, Geopy but didnt get satisfactory results

In [175]:
import folium
from folium import plugins
from folium.plugins import MarkerCluster

In [176]:
#Initiating a folium map instance of North Delhi Area
m = folium.Map([ 28.67304, 77.19767], zoom_start=12)
m

In [177]:
#Setting the Map to show the data points in circular markers

for index, row in df.iterrows():
    folium.CircleMarker([row['Lat'], row['Long']],
                        radius=8,
                        popup=row['name'],
                        fill_color="#3db7e4", # divvy color,
                       ).add_to(m)
   
 #plotting the data points on map
m

### Analyzing the density of POI in areas of Map

In [178]:
# adding heatmap to our folium map to show the density of the data points

m.add_child(plugins.HeatMap(stationArr, radius=13))
m

## Clubbing all the nearby POI into groups/clusters using a folium clustering plugin

Go on and play with the map to check out what shops are there in markets with high density

In [179]:
#Zipping the coordinated in a list
locations = list(zip(df.Lat, df.Long))

#Creating the icon for the data points
icons = [folium.Icon(icon="shop", prefix="fa") for _ in range(len(locations))]

cluster = MarkerCluster(locations=locations, icons=icons)
m.add_child(cluster)
m

# Clustering with Machine Learning

### In Data Science and Machine Learning, KMeans and DBScan are two of the most popular clustering(unsupervised) algorithms. 

### Density clustering algorithms use the concept of reachability i.e. how many neighbors has a point within a radius. DBScan is more lovely because it doesn’t need parameter, k, which is the number of clusters we are trying to find, which KMeans needs. When you don’t know the number of clusters hidden in the dataset and there’s no way to visualize your dataset, it’s a good decision to use DBScan. DBSCAN produces a varying number of clusters, based on the input data.



In [157]:
from sklearn.cluster import DBSCAN
import sklearn.utils
from sklearn.preprocessing import StandardScaler

#Standardising the data for fitting
pairs= df[['Lat', 'Long']]
pairs = StandardScaler().fit_transform(pairs)

db = DBSCAN(eps=0.3, min_samples=7).fit(pairs)
labels = db.labels_
print (labels[500:560])
df["Market"]=labels


#Ignoring the data points outside the labels

realClusterNum=len(set(labels)) - (1 if -1 in labels else 0)
clusterNum = len(set(labels))

[0 2 2 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0
 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]


In [159]:
set(labels)

{-1, 0, 1, 2, 3}

In [166]:
#Id are catogrized into different Markets and a new Market Column is added in dataframe

df.head()

Unnamed: 0,id,@id,amenity,atm,brand:wikidata,brand:wikipedia,name,name:en,operator,website,...,drink:sugarcane_juice,air_conditioning,drive_through,studio,healthcare,geometry,Long,Lat,Clus_Db,Market
0,node/280741143,node/280741143,bank,yes,Q2003549,en:Axis Bank,Axis Bank,Axis Bank,Canara Bank,https://www.axisbank.com,...,,,,,,POINT (77.19423500000001 28.64725),77.194235,28.64725,0,0
1,node/355436037,node/355436037,atm,,,,ICICI Bank,,,,...,,,,,,POINT (77.1723786 28.6458869),77.172379,28.645887,0,0
2,node/355436042,node/355436042,fast_food,,,,Dominos Pizza,,,,...,,,,,,POINT (77.1722681 28.6457998),77.172268,28.6458,0,0
3,node/459771176,node/459771176,cinema,,,,"Fun Cinemas, CRM, Shahdara",,Fun Multiplex Pvt Ltd,,...,,,,,,POINT (77.30198 28.656726),77.30198,28.656726,1,1
4,node/496457107,node/496457107,bus_station,,,,,,,,...,,,,,,POINT (77.2516076 28.6108671),77.251608,28.610867,0,0


# Things Left

## 1. Plotting the DBSCAN clusters in Folium or in some interactive Map package
## 2. Try to include Way and Relation data types of GIS into the clustering method