# Data Extraction

### This Notebook focuses on extracting the information about the restaurants in Jaipur. The notebook is divided into 3 Phases:  

#### [Phase - 1](#phase1): 
Extract the links for each locality from ZOMATO.
#### [Phase - 2](#phase2): 
Extract the information about each Restaurant from a locality.
#### [Phase - 3](#phase3): 
Geocode the address of the restaurants into latitude and longitudes using Google Geocoding API.

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

# Phase - 1 <a name='phase1'></a>
**Get the links to all the localities from Zomato**

In [2]:
url = 'https://www.zomato.com/jaipur/'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

**We will fetch the names of localities along with their respective urls and store it in a dictionary.**

In [7]:
list_ = soup.find_all('section')[2]
data = dict()
for i in list_.find_all('a'):
    link = i.attrs['href']
    locality = i.contents[0].string.strip("\n ")
    data.update({locality: link})

In [8]:
data

{'C Scheme': 'https://www.zomato.com/jaipur/c-scheme-restaurants',
 'Malviya Nagar': 'https://www.zomato.com/jaipur/malviya-nagar-restaurants',
 'Tonk Road': 'https://www.zomato.com/jaipur/tonk-road-restaurants',
 'Vaishali Nagar': 'https://www.zomato.com/jaipur/vaishali-nagar-restaurants',
 'MI Road': 'https://www.zomato.com/jaipur/mi-road-restaurants',
 'Mansarovar': 'https://www.zomato.com/jaipur/mansarovar-restaurants',
 'Raja Park': 'https://www.zomato.com/jaipur/raja-park-restaurants',
 'Bani Park': 'https://www.zomato.com/jaipur/bani-park-restaurants',
 'Bais Godam': 'https://www.zomato.com/jaipur/bais-godam-restaurants',
 'Adarsh Nagar': 'https://www.zomato.com/jaipur/adarsh-nagar-restaurants',
 'Ajmer Highway': 'https://www.zomato.com/jaipur/ajmer-highway-restaurants',
 'Lal Kothi': 'https://www.zomato.com/jaipur/lal-kothi-restaurants',
 'Gopalbari': 'https://www.zomato.com/jaipur/gopalbari-restaurants',
 'Sodala': 'https://www.zomato.com/jaipur/sodala-restaurants',
 'Shyam Na

# Phase - 2 <a name='phase2'></a>
**Get the details about all the restaurants in a locality.**

In [9]:
urlPhase2 = 'https://www.zomato.com/jaipur/c-scheme-restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
responsePhase2 = requests.get(urlPhase2, headers=headers)
soupPhase2 = BeautifulSoup(responsePhase2.text, 'html.parser')

Restaurant Names

In [144]:
def getName():
    resName = soupPhase2.find_all('a', class_='result-title')
    names = []
    for i in resName:
        names.append(i.string.strip("\n "))
    return names

Restaurant Address

In [143]:
def getAddress():
    resAdd = soupPhase2.find_all('div', class_='search-result-address')
    address = []
    for i in resAdd:
        address.append(i.string.strip(" "))
    return address

Restaurant Ratings and Votes

In [142]:
def getRateAndVotes():
    rate = soupPhase2.find_all('div', class_='search_result_rating')
    ratings = []
    votes = []
    for r in rate:
        ratings.append(r.find_all('div', class_='rating-popup')[0].string.strip("\n "))
        votes.append(r.find_all('span')[0].string.strip(" votes"))
    return ratings, votes

Cuisines for each restaurant and Cost for TWO.

In [137]:
def getCostAndCuisine():
    cuiCost = soupPhase2.find_all('div', class_='search-page-text')
    cost = []
    cuisines = []
    for c in cuiCost:
        temp = []
        cost.append((c.find(class_='res-cost').find(class_='pl0').string))
        for item in c.find(class_='clearfix').find_all('a'):
            temp.append(item.string.strip("\n "))
        cuisines.append(temp)
    return cost, cuisines

Get the link of next page for the locality

In [48]:
links = soup.find('div', class_='search-pagination-top')
nextPage = links.find('a', class_='next')
nextPage.attrs['href']

'/jaipur/c-scheme-restaurants?page=2'