<a href="https://colab.research.google.com/github/sohaaan/restaurant_scrapping/blob/main/restaurant_scrapping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

First import the necessary library

In [None]:
import pandas as pd #to manipulate dataframe
import requests #to get the data from the url
import json #to handle json data from the url
import time #to add some delay so that google API have some time to varify the page token

In order to find restaurants around a place we need the coordinate of the place and then we will find for results around a certain radius.

For example, I took radius 1000m while searching for restaurants around Mirpur-1. We need to be careful about the radius becasue with googgle API we can at best retrieve **60** results per request. So if the place is very popluated like Dhaka City radious should be small so that we don't miss any restaurant. In my case I took 1km radius as it showed 58 results so I guess I did not miss any restaurant

We can set different coordinates by a generous distance so that it cover all restaurants of the area and later in cleaning phase we will remove the duplicates

I only focus on Mirpur area. I took 4 spots which I will use to retrieve the data around

I will search retaruants in terms of keywords. It will get places which's type is restaruant, has restaurant in their name and recognized as place for eating

In [1]:
# Parameters
coordinates = ['23.799378,90.352992','23.817345,90.372628','23.784263,90.361646','23.801518,90.378728']
keyword = 'restaurant'
radius = '1000'
api_key = 'API KEY' # I removed the API key here as you said. I already generated the result using the API key
final_data = []
price_stars = ['*','**','***','****']

Then we will generate the url with all the necessary links and the send a get request for data 

Data will be recived in a Json file than will be sotred in `final_data` list

In [None]:
for coordinate in coordinates:
  url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?location='+coordinate+'&radius='+str(radius)+'&keyword='+str(keyword)+'&key='+str(api_key)
  while True:
    print(url)
    response = requests.get(url)
    js = json.loads(response.text)
    results = js['results']
    for result in results:
      name = result['name']
      lat = result['geometry']['location']['lat']
      lng = result['geometry']['location']['lng']

      if 'rating' in result: #some restaurants aren't yet rated. so we need a condition for them
        rating = result['rating']
      else:
        rating = 'NA'

      if 'user_ratings_total' in result: #for nor rated restaurants the number of people will be 0
        total_rating = result['user_ratings_total']
      else:
        total_rating = 0

      if 'price_level' in result: #similarly some restaurants have no price level. we will put NA in there
        price = result['price_level']
        prices = price_stars[int(price)-1]
      else:
        prices = 'NA'
        
      data = [name, lat, lng, rating, total_rating, prices]
      final_data.append(data)
    time.sleep(2)
    if 'next_page_token' not in js:
      break
    else:
      next_page_token = js['next_page_token']
      url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?key='+str(api_key)+'&pagetoken='+str(next_page_token)


## I removed the output as it contained API information but I kept it in the **file sent in email**

Now we will take all data from `final_data` to `df` dataframe


In [None]:
labels = ['Place Name', 'Latitude', 'Longitude','Rating', 'Number of People Rated', 'Price Level']
df= pd.DataFrame.from_records(final_data, columns=labels)

In [None]:
df

Unnamed: 0,Place Name,Latitude,Longitude,Rating,Number of People Rated,Price Level
0,GOVINDA'S SWEETS & RESTAURANT,23.805616,90.361518,4.6,146,
1,Melting Pot (Mirpur),23.800431,90.355435,5.0,4,
2,Royal Bengal Restaurant,23.802989,90.352788,4.3,44,
3,Sub Station Restaurant,23.806030,90.352018,4.0,108,
4,PERI PASTA,23.803441,90.354751,4.0,1850,**
...,...,...,...,...,...,...
226,Bay Leaf Restaurant,23.809889,90.367557,3.7,949,**
227,Mayer Doa Restaurant,23.797376,90.369816,3.7,28,
228,Xinxian Restaurant- Mirpur 10,23.812559,90.366854,4.0,2477,***
229,Juice World and Chinese Restaurant,23.790726,90.387760,4.7,9,


Now, becasue of overlapping of areas there will be some restaurants which will appear twice. We will remove those by using pandas `drop_duplicates` function

In [None]:
df.drop_duplicates(keep='first', inplace= True)

In [None]:
df

Unnamed: 0,Place Name,Latitude,Longitude,Rating,Number of People Rated,Price Level
0,GOVINDA'S SWEETS & RESTAURANT,23.805616,90.361518,4.6,146,
1,Melting Pot (Mirpur),23.800431,90.355435,5.0,4,
2,Royal Bengal Restaurant,23.802989,90.352788,4.3,44,
3,Sub Station Restaurant,23.806030,90.352018,4.0,108,
4,PERI PASTA,23.803441,90.354751,4.0,1850,**
...,...,...,...,...,...,...
223,Vai Vai Hotel and Restaurant,23.791080,90.386762,3.9,32,
225,Kabab Museum & Restaurant,23.794015,90.386891,3.7,203,
227,Mayer Doa Restaurant,23.797376,90.369816,3.7,28,
229,Juice World and Chinese Restaurant,23.790726,90.387760,4.7,9,


Finally we will convert the dataframe df to csv file and save it

In [None]:
df.to_csv('restarants_mipur.csv')

# Notes:


1. Radius is very important. If we want to retrieve data from a populated place we should keep the radius very  small so that, we don't lose any restaurant. In case of not very popular place we cane provide radius of very big number, this way we can get the data with less number of get request
2. Sometimes restaurants are named as hotel in our country. We can include those by using a keyowrd named 'hotel' but it may confuse with the residential hotels. So I exclude that.

