# Battle of Neighborhoods  

# Introduction
Recently, many of my friends and family members have been moving to Coimbatore from different parts of India, and other countries. They would love to move into similar localities as they used to live in. 
The aim of this project is to find a neighborhood in the city of Coimbatore, that is similar to the neighborhood they used to live in.

This project can then be used for anybody thinking of moving into Coimbatore.

# Data
The latitudes and longitudes of different locations in Coimbatore is collected using the geopy.geocoders library in python.
The data of venues will be collected using the Foursquare API. The features that will be extracted are what kind of shops, restaurants, parks and so on are nearby. Data will be collected about the user's current address, and then analyse which locality is most similar in Coimbatore. Using this, we can find a suitable locality to move into.  
No other datasets are being used.

# Methodology

## Import Libraries

In [1]:
import numpy as np
import pandas as pd
from geopy.geocoders import Nominatim
import requests
import json
#!conda install -c conda-forge folium=0.5.0 --yes
import folium

## Get Latitude and Longitude of Coimbatore Localities
These are the major neighborhoods in Coimbatore. Their latitudes and longitudes are found using the geopy.geolocators library.

In [2]:
cbe_localities =  ['Kovaipudur','Kuniyamuthur','RS Puram','Ukkadam','Gandhipuram','Podanur',
                   'Ganapathy','Peelamedu','Singanallur','Saibaba Colony','Race Course']

In [3]:
latitudes = []
longitudes = []
geolocator = Nominatim(user_agent="ny_explorer")
for locality in cbe_localities:
    address = locality+', Coimbatore'
    location = geolocator.geocode(address)
    latitudes.append(location.latitude)
    longitudes.append(location.longitude)

## Create a dataframe of Coimbatore Locations
The neghborhoods and their latitudes and longitudes are made into a dastaframe.

In [4]:
column_names = ['Neighborhood', 'Latitude', 'Longitude']
cbe_locations = pd.DataFrame(columns = column_names)

In [5]:
for i in range(len(cbe_localities)):
    cbe_locations = cbe_locations.append({'Neighborhood':cbe_localities[i], 'Latitude': latitudes[i], 'Longitude': longitudes[i]},ignore_index=True)

In [6]:
cbe_locations

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Kovaipudur,10.945415,76.93926
1,Kuniyamuthur,10.964104,76.956256
2,RS Puram,11.008018,76.950166
3,Ukkadam,10.989522,76.956107
4,Gandhipuram,11.018271,76.967774
5,Podanur,10.958561,76.988307
6,Ganapathy,11.03656,76.969257
7,Peelamedu,11.026958,76.994581
8,Singanallur,11.002859,77.023495
9,Saibaba Colony,11.024334,76.944788


## Map of Coimbatore Neighborhoods
This map is show where each of these neighborhoods is located.

In [10]:
location = geolocator.geocode('Coimbatore, Tamil Nadu')
latitude = location.latitude
longitude = location.longitude

map_cbe = folium.Map(location=[latitude, longitude], zoom_start=13)

for lat, lng, neighborhood in zip(cbe_locations['Latitude'], cbe_locations['Longitude'], cbe_locations['Neighborhood']):
    label = neighborhood
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cbe)  
    
map_cbe

## Use Foursquare API to get details of each locality 

### Client Id and Client Secret 
(Hidden)

## Function to get the category of venue
This function returns what category the given venue is.

In [8]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Function to get nearby venues
This function is used to get the nearby venues to a given location. I used the Foursquare API explore option to get popular attractions around a given latitude and longitude.

In [9]:
def getNearbyVenues(names, latitudes, longitudes, radius=5000, LIMIT = 100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
  
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Get the Neraby Venues of Locations in Coimbatore

In [10]:
cbe_venues = getNearbyVenues(names=cbe_locations['Neighborhood'],
                                   latitudes=cbe_locations['Latitude'],
                                   longitudes=cbe_locations['Longitude']
                                  )

In [11]:
cbe_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Kovaipudur,10.945415,76.93926,Sita Paani Restaurant,10.954885,76.953049,Indian Restaurant
1,Kovaipudur,10.945415,76.93926,Nilgris Kovaipudur,10.946301,76.933981,Department Store
2,Kovaipudur,10.945415,76.93926,Sundarapuram Bus Stop,10.957637,76.972168,Bus Stop
3,Kovaipudur,10.945415,76.93926,Shree Anandhaas,10.959043,76.971924,Indian Restaurant
4,Kovaipudur,10.945415,76.93926,Allwin Hotel,10.9545,76.973755,Hotel


In [12]:
cbe_venues.shape

(790, 7)

## Group by Neighborhood
I grouped the venues by neighborhood to see how many venues are there per location.

In [13]:
cbe_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ganapathy,100,100,100,100,100,100
Gandhipuram,52,52,52,52,52,52
Kovaipudur,8,8,8,8,8,8
Kuniyamuthur,52,52,52,52,52,52
Peelamedu,100,100,100,100,100,100
Podanur,27,27,27,27,27,27
RS Puram,100,100,100,100,100,100
Race Course,100,100,100,100,100,100
Saibaba Colony,100,100,100,100,100,100
Singanallur,51,51,51,51,51,51


In [14]:
print('There are {} uniques categories.'.format(len(cbe_venues['Venue Category'].unique())))

There are 49 uniques categories.


## Onehot Encoding
By doing one hot encoding (using the get_dummies function), it easier to work with data

In [15]:
cbe_onehot = pd.get_dummies(cbe_venues[['Venue Category']], prefix="", prefix_sep="")

cbe_onehot['Neighborhood'] = cbe_venues['Neighborhood'] 

fixed_columns = [cbe_onehot.columns[-1]] + list(cbe_onehot.columns[:-1])
cbe_onehot = cbe_onehot[fixed_columns]

cbe_grouped = cbe_onehot.groupby('Neighborhood').mean().reset_index()

## Function to return the most common values
This function sorts the venues by most often occuring category

In [16]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Create a dataframe of most common venue categories

In [17]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

cbe_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
cbe_neighborhoods_venues_sorted['Neighborhood'] = cbe_grouped['Neighborhood']

for ind in np.arange(cbe_grouped.shape[0]):
    cbe_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(cbe_grouped.iloc[ind, :], num_top_venues)



In [18]:
cbe_neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ganapathy,Indian Restaurant,Ice Cream Shop,Clothing Store,Multiplex,Asian Restaurant,Café,Shopping Mall,Fast Food Restaurant,Hotel,Chinese Restaurant
1,Gandhipuram,Indian Restaurant,Ice Cream Shop,Multiplex,Vegetarian / Vegan Restaurant,Shopping Mall,Italian Restaurant,Hotel,Clothing Store,Fast Food Restaurant,Asian Restaurant
2,Kovaipudur,Indian Restaurant,Hotel,Train Station,Vegetarian / Vegan Restaurant,Department Store,Bus Station,Bus Stop,Clothing Store,Food Court,Fast Food Restaurant
3,Kuniyamuthur,Indian Restaurant,Asian Restaurant,Clothing Store,Multiplex,Ice Cream Shop,Café,Hotel,Department Store,Park,Pizza Place
4,Peelamedu,Indian Restaurant,Ice Cream Shop,Clothing Store,Asian Restaurant,Hotel,Multiplex,Vegetarian / Vegan Restaurant,Shopping Mall,Restaurant,Café
5,Podanur,Indian Restaurant,Bakery,Asian Restaurant,Italian Restaurant,Ice Cream Shop,Multiplex,BBQ Joint,Bus Station,Clothing Store,Coffee Shop
6,RS Puram,Indian Restaurant,Clothing Store,Asian Restaurant,Multiplex,Ice Cream Shop,Café,Pizza Place,Fast Food Restaurant,Hotel,Shopping Mall
7,Race Course,Indian Restaurant,Ice Cream Shop,Asian Restaurant,Clothing Store,Hotel,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Multiplex,Café,Coffee Shop
8,Saibaba Colony,Indian Restaurant,Asian Restaurant,Clothing Store,Ice Cream Shop,Café,Multiplex,Pizza Place,Fast Food Restaurant,Hotel,Shopping Mall
9,Singanallur,Indian Restaurant,Ice Cream Shop,Coffee Shop,Fast Food Restaurant,Hotel,Nightclub,Restaurant,Pizza Place,Chinese Restaurant,Café


## Convert each category to a number
I found the unique venues and numbered them according to their index(+1) in the dataframe.

In [19]:
unique_venues =  cbe_venues['Venue Category'].unique()
cbe_neighborhoods_venues_numbered = cbe_neighborhoods_venues_sorted.replace(to_replace = unique_venues, value=list(range(1,len(unique_venues)+1)))


In [20]:
cbe_neighborhoods_venues_numbered

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ganapathy,1,15,16,23,12,24,9,19,4,37
1,Gandhipuram,1,15,23,6,9,34,4,16,19,12
2,Kovaipudur,1,4,7,6,2,5,3,16,35,19
3,Kuniyamuthur,1,12,16,23,15,24,4,2,10,20
4,Peelamedu,1,15,16,12,4,23,6,9,22,24
5,Podanur,1,29,12,34,15,23,8,5,16,31
6,RS Puram,1,16,12,23,15,24,20,19,4,9
7,Race Course,1,15,12,16,4,19,6,23,24,31
8,Saibaba Colony,1,12,16,15,24,23,20,19,4,9
9,Singanallur,1,15,31,19,4,38,22,20,37,24


## Set up X and y for Model
X is the most common values

In [21]:
X = cbe_neighborhoods_venues_numbered.iloc[:,1:].values

In [22]:
X

array([[1, 15, 16, 23, 12, 24, 9, 19, 4, 37],
       [1, 15, 23, 6, 9, 34, 4, 16, 19, 12],
       [1, 4, 7, 6, 2, 5, 3, 16, 35, 19],
       [1, 12, 16, 23, 15, 24, 4, 2, 10, 20],
       [1, 15, 16, 12, 4, 23, 6, 9, 22, 24],
       [1, 29, 12, 34, 15, 23, 8, 5, 16, 31],
       [1, 16, 12, 23, 15, 24, 20, 19, 4, 9],
       [1, 15, 12, 16, 4, 19, 6, 23, 24, 31],
       [1, 12, 16, 15, 24, 23, 20, 19, 4, 9],
       [1, 15, 31, 19, 4, 38, 22, 20, 37, 24],
       [1, 16, 12, 23, 15, 24, 20, 19, 4, 6]], dtype=object)

y is the neighborhoods(target).

In [23]:
y = cbe_neighborhoods_venues_numbered['Neighborhood']

In [24]:
y

0          Ganapathy
1        Gandhipuram
2         Kovaipudur
3       Kuniyamuthur
4          Peelamedu
5            Podanur
6           RS Puram
7        Race Course
8     Saibaba Colony
9        Singanallur
10           Ukkadam
Name: Neighborhood, dtype: object

## Model
I have used two classification models namely the decision tree and support vector machine. Bothe are from the scikit learn package in python.

In [25]:
from sklearn.tree import DecisionTreeClassifier
from sklearn import svm

Both the models have been trained with the X and y.

In [26]:
tree_classifier = DecisionTreeClassifier(criterion="entropy", max_depth = 4)
tree_classifier.fit(X,y)

DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=4,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

In [27]:
svm_classifier = svm.SVC(kernel='rbf', gamma='auto')
svm_classifier.fit(X, y) 

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

## User's Location
I have taken a sample location. This is the location of my cousin's house, who is looking to move to Coimbatore.

In [28]:
user_address = 'Jafferkhanpet, Chennai'
user_location = geolocator.geocode(user_address)
user_latitude = user_location.latitude
user_longitude = user_location.longitude

In [29]:
user_latitude

13.0299401

In [30]:
user_longitude

80.2056195

## User's Nearby Venues
The same function is used to get the nearby venues for the location of the user.

In [31]:
user_venues = getNearbyVenues(names = [user_address], latitudes = [user_latitude], longitudes = [user_longitude])

In [32]:
user_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Jafferkhanpet, Chennai",13.02994,80.205619,Palazzo,13.050383,80.209541,Multiplex
1,"Jafferkhanpet, Chennai",13.02994,80.205619,Bombay Chat,13.0403,80.193134,Fast Food Restaurant
2,"Jafferkhanpet, Chennai",13.02994,80.205619,Q Bar,13.016606,80.204853,Restaurant
3,"Jafferkhanpet, Chennai",13.02994,80.205619,ITC Grand Chola,13.01044,80.220669,Hotel
4,"Jafferkhanpet, Chennai",13.02994,80.205619,Hilton,13.016621,80.204787,Hotel


## Data Processing
Similarly, the user's details are also made to match the format we require. So first it's onehot encoded, then the most common venue catgories are found and numbered

In [33]:
user_onehot = pd.get_dummies(user_venues[['Venue Category']], prefix="", prefix_sep="")

user_onehot['Neighborhood'] = user_venues['Neighborhood'] 

fixed_columns = [user_onehot.columns[-1]] + list(user_onehot.columns[:-1])
user_onehot = user_onehot[fixed_columns]

user_grouped = user_onehot.groupby('Neighborhood').mean().reset_index()

In [34]:
user_grouped

Unnamed: 0,Neighborhood,Andhra Restaurant,Asian Restaurant,BBQ Joint,Bakery,Bar,Breakfast Spot,Burger Joint,Café,Chinese Restaurant,...,Restaurant,Sandwich Place,Sculpture Garden,Shopping Mall,Snack Place,South Indian Restaurant,Spa,Tennis Stadium,Vegetarian / Vegan Restaurant,Women's Store
0,"Jafferkhanpet, Chennai",0.01,0.02,0.03,0.02,0.01,0.01,0.01,0.02,0.05,...,0.03,0.03,0.01,0.02,0.01,0.03,0.01,0.01,0.04,0.01


In [35]:
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

user_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
user_neighborhoods_venues_sorted['Neighborhood'] = user_grouped['Neighborhood']

for ind in np.arange(user_grouped.shape[0]):
    user_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(user_grouped.iloc[ind, :], num_top_venues)

user_neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Jafferkhanpet, Chennai",Indian Restaurant,Hotel,Ice Cream Shop,Clothing Store,Chinese Restaurant,Multiplex,Italian Restaurant,Vegetarian / Vegan Restaurant,Fast Food Restaurant,BBQ Joint


In [36]:
user_neighborhoods_venues_numbered = user_neighborhoods_venues_sorted.replace(to_replace = unique_venues, value= list(range(1,len(unique_venues)+1)))

In [37]:
user_neighborhoods_venues_numbered

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Jafferkhanpet, Chennai",1,4,15,16,37,23,34,6,19,8


## Predict the Location to Move into
After processing the data to how we want it, we use our models to predict which neighborhood is most similar.

In [38]:
X_test = user_neighborhoods_venues_numbered.iloc[:,1:].values

In [39]:
X_test

array([[1, 4, 15, 16, 37, 23, 34, 6, 19, 8]], dtype=object)

In [40]:
tree_classifier.predict(X_test)

array(['Race Course'], dtype=object)

In [41]:
svm_classifier.predict(X_test)

array(['Saibaba Colony'], dtype=object)

## Another User Location to Test
Another location is used to test. The same process of getting the user data is done.

In [44]:
user_address = 'Vikroli, Mumbai'
user_location = geolocator.geocode(user_address)
user_latitude = user_location.latitude
user_longitude = user_location.longitude

user_venues = getNearbyVenues(names = [user_address], latitudes = [user_latitude], longitudes = [user_longitude])

user_onehot = pd.get_dummies(user_venues[['Venue Category']], prefix="", prefix_sep="")

user_onehot['Neighborhood'] = user_venues['Neighborhood'] 

fixed_columns = [user_onehot.columns[-1]] + list(user_onehot.columns[:-1])
user_onehot = user_onehot[fixed_columns]

user_grouped = user_onehot.groupby('Neighborhood').mean().reset_index()

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

user_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
user_neighborhoods_venues_sorted['Neighborhood'] = user_grouped['Neighborhood']

for ind in np.arange(user_grouped.shape[0]):
    user_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(user_grouped.iloc[ind, :], num_top_venues)

user_neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Vikroli, Mumbai",Indian Restaurant,Restaurant,Café,Lounge,Multiplex,Fast Food Restaurant,Gym,Hotel,Pub,Italian Restaurant


In [46]:
user_neighborhoods_venues_numbered = user_neighborhoods_venues_sorted.replace(to_replace = unique_venues, value= list(range(1,len(unique_venues)+1)))
user_neighborhoods_venues_numbered

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Vikroli, Mumbai",1,22,24,Lounge,23,19,Gym,4,Pub,34


Here we notice, some categories of locations are not present in our categories of venues in Coimbatore. Here we can do two things, either reccommend them not to move to Coimbatore, or set these categories to a 0 value and then predict which location is similar.

In [62]:
X_test = user_neighborhoods_venues_numbered.iloc[:,1:].replace(to_replace='[A-Z a-z]+',value=0,regex=True).values
X_test

array([[ 1, 22, 24,  0, 23, 19,  0,  4,  0, 34]])

In [63]:
tree_classifier.predict(X_test)

array(['Kovaipudur'], dtype=object)

In [64]:
svm_classifier.predict(X_test)

array(['Kuniyamuthur'], dtype=object)

# Results

So we have found that, Saibaba Colony or Race Course is most similar to Jafferkhanpet. And Kovaipudur or Kuniyamuthur are most similar to Vikroli. Please feel free to try out other locations as well.

## Discussion
The main observation I noted is that Coimbatore doesn't have many categories of locations yet, at least not on the Foursquare API. Due to this, if locations outside India are given, most likely, it will not be able to predict very well due to the different categories of venues.  
One more thing to note is that in both examples, the two predictions were different. This is because we are training the model woth very specific data. For each classification we are givning only one row.

## Conclusion
This project is a test to see which neighborhood is suitable to move into based on a given user location. It might not be the most accurate, but it works reasonably well, considering I have noticed similarities in Jafferkhanpet, Chennai and Race Course, Coimbatore.