# Capstone Project - The Battle of Neighborhoods 
## Exploring venues in Aurangabad,Maharashtra, India

## Table of Contents

* [Introduction/Business Problem](#introduction)
* [Data Collection from APIs](#data_collection)
* [Data Cleaning](#data_cleaning)


## Introduction/Business Problem <a name="introduction"></a>

The aim of the project is to identify venues in Aurangabad,Maharashtra, India based on their rating and average prices. In this notebook, we will identify various venues in the city of **Aurangabad,Maharashtra, India**, using **Foursquare API** and **Zomato API**, to help visitors select the restaurants that suit them the best.

Whenever a user is visiting a city they start looking for places to visit during their stay. They primarily look for places based on the venue ratings across all venues and the average prices such that the locations fits in their budget.

Here, we'll **identify places that are fit for various individuals** based on the information collected from the two APIs and Data Science. Once we have the plot with the venues, any company can launch an application using the same data and suggest users such information.

## Data Collection from APIs <a name="data_collection"></a>

To begin with, we will take a look at **Aurangabad,Maharashtra on the Map** using the `folium` library.

We will also fetch the data from **two different APIs**.
* **Foursquare API:** We will use the Foursquare API to fetch venues in Aurangabad starting from the middle upto 10 Kilometers in each direction.
* **Zomato API:** The Zomato API provides information about various venues including the complete address, user ratings, price for two people, price range and a lot more.

### Aurangabad

**Aurangabad** is a city in Maharashtra state, in India. It’s known for the 17th-century marble Bibi ka Maqbara shrine, styled on the Taj Mahal. The nearby Shivaji Maharaj Museum, dedicated to the Maratha king Shivaji, displays war weapons and a coin collection. North of the city, the Aurangabad Caves comprise ancient, rock-cut Buddhist shrines. West of the city, battlements surround the medieval Daulatabad Fort. There are many venues (especially restaurants, hotels and cafes) which can be explored.

We can use the `geopy` library to extract the latitude and longitude values of Aurangabad. but we'll directly supply the values in this case.

In [1]:
ABD_LATITUDE = 19.8733
ABD_LONGITUDE = 75.3285
print('The geograpical coordinates of Aurangabad are {}, {}.'.format(ABD_LATITUDE, ABD_LONGITUDE))

The geograpical coordinates of Aurangabad are 19.8733, 75.3285.


Let's use the `folium` library to create a **complete map zoomed on Aurangabad**. We'll also plot a marker on the coordinates we just identified above. This would give us a relatively good look at the center point we will be considering. 

In [2]:
!pip -q install folium
import folium

aurangabad_map = folium.Map(location = [ABD_LATITUDE, ABD_LONGITUDE], zoom_start = 13)
folium.Marker([ABD_LATITUDE, ABD_LONGITUDE]).add_to(aurangabad_map)
aurangabad_map.save("Aurangabad Map.html")
aurangabad_map

### Foursquare API

We begin by fetching a total of all venues in **Aurangabad** upto a range of 10 Kilometers using the Foursquare API. The Foursquare API has the `explore` API which allows us to find venue recommendations within a given radius from the given coordinates. We will use this API to find all the venues we need.

In [3]:
FOURSQUARE_CLIENT_ID = '4TNADADWJ34J4T0XI5Q4SGGAQMHUIDKYMSRG322XW3HUR01R'
FOURSQUARE_CLIENT_SECRET = 'DIDEPD4QN3FOBCVDO1OEVWCEMGBSO4FP4THO1HZCZ4XPWCGQ'
RADIUS = 10000 # 10 Km
NO_OF_VENUES = 100
VERSION = '20200525' # Current date

We define the `get_category_type` method to get the correct category for each venue.

In [4]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

We'll call the API over and over till we get all venues from the API within the given distance. The maximum venues this API can fetch is 100, so we will fetch all venues by iteratively calling this API and increasing the offset each time.

* Foursquare API requires client_id, and client_secret to function which can be accessed after creating a developer account. 
* We will set the radius as 10 Kilometers. 
* The version is a required parameter which defines the date on which we are browsing so that it retrieves the latest data.

In [5]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from pandas.io.json import json_normalize
import requests

pd.set_option('display.max_rows', None)

offset = 0
total_venues = 0
foursquare_venues = pd.DataFrame(columns = ['name', 'categories', 'lat', 'lng'])

while (True):
    url = ('https://api.foursquare.com/v2/venues/explore?client_id={}'
           '&client_secret={}&v={}&ll={},{}&radius={}&limit={}&offset={}').format(FOURSQUARE_CLIENT_ID, 
                                                                        FOURSQUARE_CLIENT_SECRET, 
                                                                        VERSION, 
                                                                        ABD_LATITUDE, 
                                                                        ABD_LONGITUDE, 
                                                                        RADIUS,
                                                                        NO_OF_VENUES,
                                                                        offset)
    result = requests.get(url).json()
    venues_fetched = len(result['response']['groups'][0]['items'])
    total_venues = total_venues + venues_fetched
    print("Total {} venues fetched within a total radius of {} Km".format(venues_fetched, RADIUS/1000))

    venues = result['response']['groups'][0]['items']
    venues = json_normalize(venues)

    # Filter the columns
    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    venues = venues.loc[:, filtered_columns]

    # Filter the category for each row
    venues['venue.categories'] = venues.apply(get_category_type, axis = 1)

    # Clean all column names
    venues.columns = [col.split(".")[-1] for col in venues.columns]
    foursquare_venues = pd.concat([foursquare_venues, venues], axis = 0, sort = False)
    
    if (venues_fetched < 100):
        break
    else:
        offset = offset + 100

foursquare_venues = foursquare_venues.reset_index(drop = True)
print("\nTotal {} venues fetched".format(total_venues))

Total 24 venues fetched within a total radius of 10.0 Km

Total 24 venues fetched




### Zomato API

The Zomato API allows using its search API to search for any given venue based on certain search filters such as query, latitude, longitude and more. Zomato also requires a Zomato user key which can be accessed with a developer account.

We'll use the `name`, `lat`, and `lng` values of various venues fetched from Foursquare API to use the search API and get more information regarding each venue.

* The query will be the name of the venue.
* The start defines from what offset we want to start, so we'll keep it at 0.
* The count defines the number of restaurants we want to fetch. As we have the exact location coordinates, we'll fetch only one.
* We will supply the latitude and longitude values.
* We will set the sorting criteria as `real_distance` so each time we get the venue we're searching based on location coordinates.

In [6]:
headers = {'user-key': 'd860d13cdce8ce2cd4bf1c64134bf09c'}
venues_information = []

for index, row in foursquare_venues.iterrows():
    print("Fetching data for venue: {}".format(index + 1))
    venue = []
    url = ('https://developers.zomato.com/api/v2.1/search?q={}' + 
          '&start=0&count=1&lat={}&lon={}&sort=real_distance').format(row['name'], row['lat'], row['lng'])
    result = requests.get(url, headers = headers).json()
    if (len(result['restaurants']) > 0):
        venue.append(result['restaurants'][0]['restaurant']['name'])
        venue.append(result['restaurants'][0]['restaurant']['location']['latitude'])
        venue.append(result['restaurants'][0]['restaurant']['location']['longitude'])
        venue.append(result['restaurants'][0]['restaurant']['average_cost_for_two'])
        venue.append(result['restaurants'][0]['restaurant']['price_range'])
        venue.append(result['restaurants'][0]['restaurant']['user_rating']['aggregate_rating'])
        venue.append(result['restaurants'][0]['restaurant']['location']['address'])
        venues_information.append(venue)
    else:
        venues_information.append(np.zeros(6))
    
zomato_venues = pd.DataFrame(venues_information, 
                                  columns = ['venue', 'latitude', 
                                             'longitude', 'price_for_two', 
                                             'price_range', 'rating', 'address'])

Fetching data for venue: 1
Fetching data for venue: 2
Fetching data for venue: 3
Fetching data for venue: 4
Fetching data for venue: 5
Fetching data for venue: 6
Fetching data for venue: 7
Fetching data for venue: 8
Fetching data for venue: 9
Fetching data for venue: 10
Fetching data for venue: 11
Fetching data for venue: 12
Fetching data for venue: 13
Fetching data for venue: 14
Fetching data for venue: 15
Fetching data for venue: 16
Fetching data for venue: 17
Fetching data for venue: 18
Fetching data for venue: 19
Fetching data for venue: 20
Fetching data for venue: 21
Fetching data for venue: 22
Fetching data for venue: 23
Fetching data for venue: 24


## Data Cleaning <a name="data_cleaning"></a>

The data from multiple resources might not always align. Thus, it is **important to combine the data retrieved from multiple resources properly**.

We'll first plot the two data points on the map. We'll then try to combine data points that have their latitude and longitude values very close to one another. From the remaining selected venues, we will inspect the venues to ensure that any remaining mismatched venues are also removed from the final dataset of venues before we begin any analysis.

We will first plot the Foursquare data on the map.

In [7]:
aurangabad_map = folium.Map(location = [ABD_LATITUDE, ABD_LONGITUDE], zoom_start = 13)

for name, latitude, longitude in zip(foursquare_venues['name'], foursquare_venues['lat'], foursquare_venues['lng']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [latitude, longitude],
        radius = 10,
        popup = label,
        color = 'green',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(aurangabad_map)  

aurangabad_map.save("Venues by Foursquare.html")
aurangabad_map

From the map, we can infer that there are clusters of venues around **Cidco** and **Jalna Road to Akashawani**. We can also plot the category count and see the major type of venues that exist.

We will also plot the Zomato data on the map.

In [32]:
aurangabad_map = folium.Map(location = [ABD_LATITUDE, ABD_LONGITUDE], zoom_start =13)

for venue, address, latitude, longitude in zip(zomato_venues['venue'], zomato_venues['address'], 
                                               zomato_venues['latitude'], zomato_venues['longitude']):
    label = '{}, {}'.format(name, address)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [latitude, longitude],
        radius = 5,
        popup = label,
        color = 'red',
        fill = True,
        fill_color = '#cc3535',
        fill_opacity = 0.7,
        parse_html = False).add_to(aurangabad_map)  
    
aurangabad_map.save("Venues by Zomato.html")
aurangabad_map

We can see that there are many venues identified by both Foursquare and Zomato. 

To combine the two datasets, I'll have to check that the latitude and longitude values of each corresponding venue match. Thus, I'll round both the latitude and longitude values upto 4 decimal places. 

In [11]:
foursquare_venues['lat'] = foursquare_venues['lat'].apply(lambda lat: round(float(lat), 4))
foursquare_venues['lng'] = foursquare_venues['lng'].apply(lambda lng: round(float(lng), 4))
zomato_venues['latitude'] = zomato_venues['latitude'].apply(lambda lat: round(float(lat), 4))
zomato_venues['longitude'] = zomato_venues['longitude'].apply(lambda lng: round(float(lng), 4))

In [15]:
dataset = pd.concat([foursquare_venues, zomato_venues], axis = 1)
dataset['lat_diff'] = dataset['latitude'] - dataset['lat']
dataset['lng_diff'] = dataset['longitude'] - dataset['lng']

In [16]:
selected_venues=dataset
selected_venues

Unnamed: 0,name,categories,lat,lng,venue,latitude,longitude,price_for_two,price_range,rating,address,lat_diff,lng_diff
0,Naturals,Ice Cream Shop,19.8763,75.3454,Natural Ice Cream,19.8765,75.3453,150.0,1.0,4.6,"Ahinsa Bhavan, Near Rajendra Tyre Services, Ja...",0.0002,-0.0001
1,Satyam Cineplex,Multiplex,19.8771,75.3724,Hotel Sai Sarang,19.8862,75.3636,200.0,1.0,2.9,"House 35, Satyam Nagar, N5, CIDCO, Aurangabad",0.0091,-0.0088
2,Kream & Krunch,Restaurant,19.8766,75.3461,Kream N Krunch,19.8766,75.3461,1000.0,3.0,4.3,"2, Near Akashwani Circle, Mahesh Nagar, Jalna ...",0.0,0.0
3,Aurangabad,City,19.8721,75.3246,Idli Junction & Tea Corner,19.872,75.3252,100.0,1.0,0.0,"Gopinath Chamber, Beside Jai Dev Travels, Aada...",-0.0001,0.0006
4,Vivanta By Taj,Hotel,19.9076,75.346,Vivanta by Taj,19.9079,75.3457,2200.0,4.0,4.3,"Vivanta by Taj, Rauza Baugh, CIDCO, Aurangabad",0.0003,-0.0003
5,Madhuban Kaleidoscope,Restaurant,19.8497,75.3628,Madhuban Garden Restaurant,19.85,75.363,800.0,3.0,3.8,"Beed Bypass, Madhuban, Gadia Vihar, Aurangabad",0.0003,0.0002
6,Green Leaf,Indian Restaurant,19.8795,75.366,Green Leaf,19.8793,75.366,750.0,3.0,4.9,"Shop 6-9, Fame Tapadiya Multiplex, Town Centre...",-0.0002,0.0
7,Tara Pan Center,Smoke Shop,19.8673,75.3258,Chai Villa,19.8674,75.3259,500.0,2.0,3.1,"Ram Mandir Road, Near Tara Pan Center, New Usm...",0.0001,0.0001
8,Ellora Caves,Cave,19.9103,75.3657,Sanjay Pav Bhaji,19.8799,75.3642,150.0,1.0,3.1,"Shop 1, Near SBI Bank, Opposite Ellora Complex...",-0.0304,-0.0015
9,Pvr Multiplex,Multiplex,19.8728,75.3771,Green Leaf,19.8793,75.366,750.0,3.0,4.9,"Shop 6-9, Fame Tapadiya Multiplex, Town Centre...",0.0065,-0.0111


I'll now select the venue name from **Zomato API**. I'll also get the average price per person by dividing the column `price_for_two` by 2 and removing this column from the dataset along with other unnecessary columns.

In [17]:
selected_venues['average_price'] = selected_venues['price_for_two']/2
selected_venues = selected_venues.drop(columns = ['name', 'lat', 'lng', 'lat_diff', 'lng_diff', 'price_for_two'])

Let's take a look at the final dataset that is left with us.

In [18]:
selected_venues

Unnamed: 0,categories,venue,latitude,longitude,price_range,rating,address,average_price
0,Ice Cream Shop,Natural Ice Cream,19.8765,75.3453,1.0,4.6,"Ahinsa Bhavan, Near Rajendra Tyre Services, Ja...",75.0
1,Multiplex,Hotel Sai Sarang,19.8862,75.3636,1.0,2.9,"House 35, Satyam Nagar, N5, CIDCO, Aurangabad",100.0
2,Restaurant,Kream N Krunch,19.8766,75.3461,3.0,4.3,"2, Near Akashwani Circle, Mahesh Nagar, Jalna ...",500.0
3,City,Idli Junction & Tea Corner,19.872,75.3252,1.0,0.0,"Gopinath Chamber, Beside Jai Dev Travels, Aada...",50.0
4,Hotel,Vivanta by Taj,19.9079,75.3457,4.0,4.3,"Vivanta by Taj, Rauza Baugh, CIDCO, Aurangabad",1100.0
5,Restaurant,Madhuban Garden Restaurant,19.85,75.363,3.0,3.8,"Beed Bypass, Madhuban, Gadia Vihar, Aurangabad",400.0
6,Indian Restaurant,Green Leaf,19.8793,75.366,3.0,4.9,"Shop 6-9, Fame Tapadiya Multiplex, Town Centre...",375.0
7,Smoke Shop,Chai Villa,19.8674,75.3259,2.0,3.1,"Ram Mandir Road, Near Tara Pan Center, New Usm...",250.0
8,Cave,Sanjay Pav Bhaji,19.8799,75.3642,1.0,3.1,"Shop 1, Near SBI Bank, Opposite Ellora Complex...",75.0
9,Multiplex,Green Leaf,19.8793,75.366,3.0,4.9,"Shop 6-9, Fame Tapadiya Multiplex, Town Centre...",375.0


I'll drop the venues which have `0.0` rating as it means it's not been rated yet.

In [19]:
selected_venues = selected_venues[selected_venues['rating'] != 0.0].reset_index(drop = True)
print("Total venues available: {}".format(selected_venues.shape[0]))
selected_venues

Total venues available: 20


Unnamed: 0,categories,venue,latitude,longitude,price_range,rating,address,average_price
0,Ice Cream Shop,Natural Ice Cream,19.8765,75.3453,1.0,4.6,"Ahinsa Bhavan, Near Rajendra Tyre Services, Ja...",75.0
1,Multiplex,Hotel Sai Sarang,19.8862,75.3636,1.0,2.9,"House 35, Satyam Nagar, N5, CIDCO, Aurangabad",100.0
2,Restaurant,Kream N Krunch,19.8766,75.3461,3.0,4.3,"2, Near Akashwani Circle, Mahesh Nagar, Jalna ...",500.0
3,Hotel,Vivanta by Taj,19.9079,75.3457,4.0,4.3,"Vivanta by Taj, Rauza Baugh, CIDCO, Aurangabad",1100.0
4,Restaurant,Madhuban Garden Restaurant,19.85,75.363,3.0,3.8,"Beed Bypass, Madhuban, Gadia Vihar, Aurangabad",400.0
5,Indian Restaurant,Green Leaf,19.8793,75.366,3.0,4.9,"Shop 6-9, Fame Tapadiya Multiplex, Town Centre...",375.0
6,Smoke Shop,Chai Villa,19.8674,75.3259,2.0,3.1,"Ram Mandir Road, Near Tara Pan Center, New Usm...",250.0
7,Cave,Sanjay Pav Bhaji,19.8799,75.3642,1.0,3.1,"Shop 1, Near SBI Bank, Opposite Ellora Complex...",75.0
8,Multiplex,Green Leaf,19.8793,75.366,3.0,4.9,"Shop 6-9, Fame Tapadiya Multiplex, Town Centre...",375.0
9,Pizza Place,Delite Oven Fresh,19.8797,75.3232,2.0,3.4,"Shop 4, Samarth Nagar Road, Nirala Bazar, Aura...",175.0
