# Data Extraction

### This Notebook focuses on extracting the information about the restaurants in Jaipur. The notebook is divided into 3 Phases:  

#### [Phase - 1](#phase1): 
Extract the links for each locality from ZOMATO.
#### [Phase - 2](#phase2): 
Extract the information about each Restaurant from a locality.
#### [Phase - 3](#phase3): 
Geocode the address of the restaurants into latitude and longitudes using Google Geocoding API.

In [86]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import googlemaps
import json

# Phase - 1 <a name='phase1'></a>
**Get the links to all the localities from Zomato**

In [87]:
url = 'https://www.zomato.com/jaipur/'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

**We will fetch the names of localities along with their respective urls and store it in a dictionary.**

In [88]:
list_ = soup.find_all('section')[2]
data = dict()
for i in list_.find_all('a'):
    link = i.attrs['href']
    locality = i.contents[0].string.strip("\n ")
    data.update({locality: link})

In [89]:
for key in data:
    print(data[key])

https://www.zomato.com/jaipur/c-scheme-restaurants
https://www.zomato.com/jaipur/malviya-nagar-restaurants
https://www.zomato.com/jaipur/tonk-road-restaurants
https://www.zomato.com/jaipur/vaishali-nagar-restaurants
https://www.zomato.com/jaipur/mi-road-restaurants
https://www.zomato.com/jaipur/mansarovar-restaurants
https://www.zomato.com/jaipur/raja-park-restaurants
https://www.zomato.com/jaipur/bani-park-restaurants
https://www.zomato.com/jaipur/bais-godam-restaurants
https://www.zomato.com/jaipur/adarsh-nagar-restaurants
https://www.zomato.com/jaipur/ajmer-highway-restaurants
https://www.zomato.com/jaipur/lal-kothi-restaurants
https://www.zomato.com/jaipur/gopalbari-restaurants
https://www.zomato.com/jaipur/sodala-restaurants
https://www.zomato.com/jaipur/shyam-nagar-restaurants
https://www.zomato.com/jaipur/sindhi-camp-restaurants
https://www.zomato.com/jaipur/pink-city-restaurants
https://www.zomato.com/jaipur/amer-restaurants
https://www.zomato.com/jaipur/civil-lines-restaurants

# Phase - 2 <a name='phase2'></a>
**Get the details about all the restaurants in a locality.**

Restaurant Names and their Category

In [90]:
def getName(soup):
    resInfo = soup.find_all('article', class_='search-result')
    name = [item.find(class_='result-title') for item in resInfo]
    name = [item.string.strip("\n ") for item in name]
    category = [item.find_all('a',class_='zdark') for item in resInfo]
    for index, item in enumerate(category):
        for innerIndex, val in enumerate(item):
            category[index][innerIndex] = val.string
    return name, category

Restaurant Address

In [91]:
def getAddress(soup):
    resAdd = soup.find_all('div', class_='search-result-address')
    address = []
    for i in resAdd:
        address.append(i.string.strip(" "))
    return address

Restaurant Ratings and Votes

In [92]:
def getRateAndVotes(soup):
    rate = soup.find_all('div', class_='search_result_rating')
    ratings = []
    votes = []
    for r in rate:
        try:
            val = r.find_all('div', class_='rating-popup')[0].string.strip("\n ")
            ratings.append(val)
            vote = r.find_all('span')[0].string.strip(" votes")
            votes.append(vote)
        except:
            votes.append('No-Votes')
    return ratings, votes

Cuisines for each restaurant and Cost for TWO.

In [93]:
def getCostAndCuisine(soup):
    cuiCost = soup.find_all('div', class_='search-page-text')
    cost = []
    cuisines = []
    for c in cuiCost:
        temp = []
        try:
            for item in c.find(class_='clearfix').find_all('a'):
                temp.append(item.string.strip("\n "))
            cuisines.append(temp)
            cost.append((c.find(class_='res-cost').find(class_='pl0').string))
        except:
            cost.append('No Cost given.')
    return cost, cuisines

Driver Function to extract all the data using the above functions

In [108]:
def extractData(headers,loc,key):
    urlPhase2 = loc
    key = [key]
    while True:
        responsePhase2 = requests.get(urlPhase2, headers=headers)
        soup = BeautifulSoup(responsePhase2.text, 'html.parser')
        
        tempNames,tempCat = getName(soup)
        names.append(tempNames)
        category.append(tempCat)
        
        address.append(getAddress(soup))
        
        tempRatings, tempVotes = getRateAndVotes(soup)
        ratings.append(tempRatings)
        votes.append(tempVotes)
        
        tempCost, tempCuisines = getCostAndCuisine(soup)
        costForTwo.append(tempCost)
        cuisines.append(tempCuisines)
        
        locality.append(key*len(tempNames))
        
#       Look for the link of next page and print it.
        links = soup.find('div', class_='search-pagination-top')
        print(urlPhase2)
        try:
            nextPage = links.find('a', class_='next').attrs['href']
            urlPhase2 = 'https://www.zomato.com'+nextPage
        except: # if the next page does not exists, break out of the loop.
            break

In [109]:
names=[]
address=[]
category=[]
votes=[]
ratings=[]
costForTwo=[]
cuisines=[]
locality=[]
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
for key in data:
    url = data[key]+'?nearby=0'
    extractData(headers,url,key)
    print(key)

https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=2
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=3
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=4
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=5
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=6
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=7
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=8
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=9
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=10
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=11
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=12
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=13
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page=14
https://www.zomato.com/jaipur/c-scheme-restaurants?nearby=0&page

https://www.zomato.com/jaipur/mi-road-restaurants?nearby=0&page=7
https://www.zomato.com/jaipur/mi-road-restaurants?nearby=0&page=8
MI Road
https://www.zomato.com/jaipur/mansarovar-restaurants?nearby=0
https://www.zomato.com/jaipur/mansarovar-restaurants?nearby=0&page=2
https://www.zomato.com/jaipur/mansarovar-restaurants?nearby=0&page=3
https://www.zomato.com/jaipur/mansarovar-restaurants?nearby=0&page=4
https://www.zomato.com/jaipur/mansarovar-restaurants?nearby=0&page=5
https://www.zomato.com/jaipur/mansarovar-restaurants?nearby=0&page=6
https://www.zomato.com/jaipur/mansarovar-restaurants?nearby=0&page=7
https://www.zomato.com/jaipur/mansarovar-restaurants?nearby=0&page=8
https://www.zomato.com/jaipur/mansarovar-restaurants?nearby=0&page=9
https://www.zomato.com/jaipur/mansarovar-restaurants?nearby=0&page=10
https://www.zomato.com/jaipur/mansarovar-restaurants?nearby=0&page=11
https://www.zomato.com/jaipur/mansarovar-restaurants?nearby=0&page=12
https://www.zomato.com/jaipur/mansar

https://www.zomato.com/jaipur/sindhi-camp-restaurants?nearby=0&page=5
Sindhi Camp
https://www.zomato.com/jaipur/pink-city-restaurants?nearby=0
https://www.zomato.com/jaipur/pink-city-restaurants?nearby=0&page=2
https://www.zomato.com/jaipur/pink-city-restaurants?nearby=0&page=3
https://www.zomato.com/jaipur/pink-city-restaurants?nearby=0&page=4
https://www.zomato.com/jaipur/pink-city-restaurants?nearby=0&page=5
https://www.zomato.com/jaipur/pink-city-restaurants?nearby=0&page=6
https://www.zomato.com/jaipur/pink-city-restaurants?nearby=0&page=7
https://www.zomato.com/jaipur/pink-city-restaurants?nearby=0&page=8
https://www.zomato.com/jaipur/pink-city-restaurants?nearby=0&page=9
https://www.zomato.com/jaipur/pink-city-restaurants?nearby=0&page=10
https://www.zomato.com/jaipur/pink-city-restaurants?nearby=0&page=11
https://www.zomato.com/jaipur/pink-city-restaurants?nearby=0&page=12
Pink City
https://www.zomato.com/jaipur/amer-restaurants?nearby=0
https://www.zomato.com/jaipur/amer-resta

In [111]:
len(locality)

312

The results are list of lists, to proceed further, we must flatten them out

In [112]:
import numpy as np
import itertools

locality_ = list(itertools.chain(*locality))
names_ = list(itertools.chain(*names))
address_ = list(itertools.chain(*address))
category_ = list(itertools.chain(*category))
costForTwo_ = list(itertools.chain(*costForTwo))
cuisines_ = list(itertools.chain(*cuisines))
votes_ = list(itertools.chain(*votes))
ratings_ = list(itertools.chain(*ratings))

Storing the data into Pandas DataFrame and exporting to a csv file.

In [125]:
df = pd.DataFrame(data=[locality_,names_,address_,category_,
                        costForTwo_,cuisines_,ratings_,votes_])
df = df.T
columns = ['Locality','RestaurantName','Address',
           'Category','CostForTwo','Cuisines','Ratings','votes']
df.columns = columns

In [127]:
df.to_csv('Restaurants.csv', index=False)

# Phase - 3 <a name="phase3">

In [None]:
# API_KEY = None
# with open('/home/hotpie/Projects/keys/API_KEYS.json','r') as keyFile:
#     f = json.load(keyFile)
    
# API_KEY = f['google-api-key']
# gmaps = googlemaps.Client(key=API_KEY)