# Battle of Neighbourhood

#### Description of the problem 
Aim: I have a restaurant in Madhapur,Hyderabad which is a success. So,I want to open a new restaurant in neighbourhood of Central Banglore which is like Madhapur,Hyderabad. So the goal is to find which neighbourhood is like Madhapur,Hyderbad.

#### Using data to solve it
1. First I will collect the nearby data of Madhapur using Foursquare API.
2. Then I will get the region in Central Banglore neighbourhood using BeautifulSoup
3. Then I will get nearby data of every region in Central Banglore and append them to a dataframe.
4. Then I will append Madhapur data.
5. Then Preprocess the data to perform K-Means CLustering
6. Perform K-Means Clustering and find which region is more like Madhapur

### So, the idea of finding Madhapur like place is to do a K-Means Clustering and find out which place is clustered with Madhapur

In [1]:
import time
import pandas as pd 
import numpy as np
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library

#### Getting latitude and longitude of Madhapur

In [2]:
#Hyderabad Residence
address = 'madhapur,Telangana,India'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of madhapur,hyderabad, India {}, {}.'.format(latitude, longitude))

  after removing the cwd from sys.path.


The geograpical coordinate of madhapur,hyderabad, India 17.4408578, 78.3916289.


#### Getting nearby places of Madhapur from foursquare API

In [3]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius

CLIENT_ID='MWZDOMU0JPKMO3BMF12XA2WPII2B0PEPYVCVLEPX0BOSUW0B'
CLIENT_SECRET='KMSXKXBRBMI02YEDTGIV1YQKIH0CW2MVYGG12LYQ23NJ0IY0'
VERSION='20180605'
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=MWZDOMU0JPKMO3BMF12XA2WPII2B0PEPYVCVLEPX0BOSUW0B&client_secret=KMSXKXBRBMI02YEDTGIV1YQKIH0CW2MVYGG12LYQ23NJ0IY0&v=20180605&ll=17.4408578,78.3916289&radius=1000&limit=100'

In [4]:
results = requests.get(url).json()

In [5]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [67]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues['neighbour']='madhapur'
nearby_venues.head(10)

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng,neighbour
0,Bangalore Golf Club,Golf Course,12.989681,77.585933,madhapur
1,Taj West End,Hotel,12.984572,77.584893,madhapur
2,Masala Klub,Indian Restaurant,12.984993,77.585115,madhapur
3,Chitra Kala Parishad,Art Gallery,12.989295,77.581115,madhapur
4,"Shangri-La Hotel, Bengaluru",Hotel,12.992112,77.588446,madhapur
5,ITC Windsor,Hotel,12.994131,77.585896,madhapur
6,Bangalore Turf Club,Racetrack,12.983914,77.58314,madhapur
7,The Sugar Factory,Nightclub,12.990041,77.58632,madhapur
8,Mynt,Coffee Shop,12.984629,77.584989,madhapur
9,Blue Ginger,Vietnamese Restaurant,12.984804,77.584045,madhapur


### Getting places in neighbourhood of Central Banglore

In [68]:
# get Central banglore neighbourhoods
from bs4 import BeautifulSoup 
url = requests.get('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Bangalore').text
soup = BeautifulSoup(url,'lxml')
table_post = soup.find('table')
fields = table_post.find_all('td')

neighbourhood = []

for i in range(0, len(fields), 3):
    neighbourhood.append(fields[i].text.strip())

In [69]:

df = pd.DataFrame()
for neighbour in neighbourhood:
    address = neighbour+',Karnataka,India'
    geolocator = Nominatim()
    location = geolocator.geocode(address)
    if location==None:
        continue
    latitude = location.latitude
    longitude = location.longitude
    print('The geograpical coordinate of {},Karnataka, India {}, {}.'.format(neighbour,latitude, longitude))
    
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
    results = requests.get(url).json()
    if 'warning'  in results['response']:
        continue
    b_venues = results['response']['groups'][0]['items']
    
    b_nearby_venues = json_normalize(b_venues)
    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    b_nearby_venues =b_nearby_venues.loc[:, filtered_columns]

    b_nearby_venues['venue.categories'] = b_nearby_venues.apply(get_category_type, axis=1)

    b_nearby_venues.columns = [col.split(".")[-1] for col in b_nearby_venues.columns]
    b_nearby_venues['neighbour']=neighbour
    df=df.append(b_nearby_venues)

  after removing the cwd from sys.path.


The geograpical coordinate of Cantonment area,Karnataka, India 13.019567, 77.50958888613079.




The geograpical coordinate of Domlur,Karnataka, India 12.9624669, 77.6381958.
The geograpical coordinate of Indiranagar,Karnataka, India 12.9732913, 77.6404672.
The geograpical coordinate of Malleswaram,Karnataka, India 13.0163411, 77.55866418238408.
The geograpical coordinate of Pete area,Karnataka, India 13.023959, 77.024307.
The geograpical coordinate of Sadashivanagar,Karnataka, India 13.0077079, 77.5795893.
The geograpical coordinate of Seshadripuram,Karnataka, India 12.9931876, 77.5753419.
The geograpical coordinate of Shivajinagar,Karnataka, India 12.986391, 77.6075416.
The geograpical coordinate of Ulsoor,Karnataka, India 12.9778793, 77.6246697.
The geograpical coordinate of Vasanth Nagar,Karnataka, India 12.988721250000001, 77.58516877601824.


Unnamed: 0,name,categories,lat,lng,neighbour
0,New Udupi Grand,Vegetarian / Vegan Restaurant,13.022775,77.509830,Cantonment area
1,woodland,Shoe Store,13.024306,77.511055,Cantonment area
2,bhat canteen,Fast Food Restaurant,13.016796,77.504154,Cantonment area
3,Pizza Corner,Pizza Place,13.014279,77.504747,Cantonment area
0,Lavonne,Café,12.963909,77.638579,Domlur
...,...,...,...,...,...
56,Cafe Coffee Day,Coffee Shop,12.992711,77.588854,madhapur
57,KFC,Fast Food Restaurant,12.988550,77.593868,madhapur
58,Dinesh Chat,Snack Place,12.993201,77.589005,madhapur
59,Reliance Digital,Electronics Store,12.989190,77.593099,madhapur


In [76]:
df=df.append(nearby_venues)
df

Unnamed: 0,name,categories,lat,lng,neighbour
0,New Udupi Grand,Vegetarian / Vegan Restaurant,13.022775,77.509830,Cantonment area
1,woodland,Shoe Store,13.024306,77.511055,Cantonment area
2,bhat canteen,Fast Food Restaurant,13.016796,77.504154,Cantonment area
3,Pizza Corner,Pizza Place,13.014279,77.504747,Cantonment area
0,Lavonne,Café,12.963909,77.638579,Domlur
...,...,...,...,...,...
56,Cafe Coffee Day,Coffee Shop,12.992711,77.588854,madhapur
57,KFC,Fast Food Restaurant,12.988550,77.593868,madhapur
58,Dinesh Chat,Snack Place,12.993201,77.589005,madhapur
59,Reliance Digital,Electronics Store,12.989190,77.593099,madhapur


In [77]:
from sklearn import preprocessing

In [78]:
le = preprocessing.LabelEncoder()
le.fit(df['neighbour'])
le.classes_

array(['Cantonment area', 'Domlur', 'Indiranagar', 'Malleswaram',
       'Sadashivanagar', 'Seshadripuram', 'Shivajinagar', 'Ulsoor',
       'Vasanth Nagar', 'madhapur'], dtype=object)

In [79]:
df['label_neighbour']=le.transform(df['neighbour'])

In [80]:
train_df=df.drop(['name','lat','lng','neighbour'],axis=1)

In [81]:
train_df=pd.get_dummies(train_df, prefix=['categories'])
train_x=train_df.groupby('label_neighbour').sum()

In [93]:
train_x

Unnamed: 0_level_0,categories_Accessories Store,categories_Afghan Restaurant,categories_American Restaurant,categories_Andhra Restaurant,categories_Arcade,categories_Art Gallery,categories_Art Museum,categories_Arts & Crafts Store,categories_Asian Restaurant,categories_Athletics & Sports,...,categories_Tea Room,categories_Tex-Mex Restaurant,categories_Thai Restaurant,categories_Theater,categories_Trail,categories_Train Station,categories_Udupi Restaurant,categories_Vegetarian / Vegan Restaurant,categories_Vietnamese Restaurant,categories_Women's Store
label_neighbour,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1,0,0,0,0,0,0,0,0,1,0,...,2,1,0,0,0,0,0,0,2,0
2,1,0,0,2,1,0,0,0,2,1,...,2,0,0,0,0,0,1,0,1,0
3,0,0,0,0,0,0,0,0,1,0,...,0,0,1,0,0,1,0,1,0,1
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
5,0,0,1,0,2,2,0,0,1,0,...,0,0,0,0,0,0,0,3,0,0
6,0,1,0,0,0,0,0,0,1,0,...,1,0,0,0,0,0,0,0,0,2
7,0,0,1,0,0,0,0,1,2,1,...,0,0,2,1,1,0,0,0,0,0
8,0,0,0,0,1,2,1,0,1,0,...,0,0,0,0,0,0,0,2,1,0
9,0,0,0,0,1,2,1,0,1,0,...,0,0,0,0,0,0,0,2,1,0


## K_Means Clustering

In [122]:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=4).fit(train_x)

In [123]:
labels = kmeans.labels_
print(labels)

[2 0 0 2 2 3 1 2 3 3]


In [124]:
for x,y in zip(le.classes_,labels):
    print(x," : ",y)

Cantonment area  :  2
Domlur  :  0
Indiranagar  :  0
Malleswaram  :  2
Sadashivanagar  :  2
Seshadripuram  :  3
Shivajinagar  :  1
Ulsoor  :  2
Vasanth Nagar  :  3
madhapur  :  3


Clearly, from the clustering Vasanth Nagar,Central Banglore is like Madhapur,Hyderabad.