### Web Scraping Lab

Welcome!  In today's lab we're going to build a web scraper that's going to build a dataset from restaurant listings on a yelp page.  

You can find the web page here: https://www.yelp.com/search?find_desc=Restaurants&find_loc=London%2C%20United%20Kingdom&ns=1

The lab questions listed herein will naturally build from the material discussed in class.

#### Step 1:  Scrape the number of reviews for each restaurant

Using a similar methodology we used to find the title of the restaurant find the number of reviews for the 30 restaurants listed on this web page.

**Hint:** the `isdigit()` string method will be helpful to detect if a string contains a number.

In [30]:
# Step 1 - importing libraries
import requests
import pandas as pd
from bs4 import BeautifulSoup


In [3]:
# Step 2 - connecting to a url
url = 'https://www.yelp.com/search?find_desc=Restaurants&find_loc=London%2C%20United%20Kingdom&ns=1'
yelp_req = requests.get(url)


In [4]:
#Step 3 - feeding text into a scraper
scraper = BeautifulSoup(yelp_req.text)


In [20]:
# scraping individual titles
titles = scraper.find_all('a', {'class': 'css-166la90'})
titles = [title.text for title in titles if len(title.text) > 1]
titles

['The Mayfair Chippy',
 'Dishoom',
 'Flat Iron',
 'Ffiona’s Restaurant',
 'Restaurant Gordon Ramsay',
 'The Fat Bear',
 'The Breakfast Club',
 'Padella',
 'Dishoom',
 'The Golden Chippy']

In [15]:
# finding number of reviews
reviews = scraper.find_all('span', {'class': 'reviewCount__09f24__EUXPN'})
reviews = [int(review.text) for review in reviews]
reviews

[283, 1841, 380, 268, 204, 122, 494, 207, 547, 107]

#### Step 2:  Find the price range for each restaurant

Let's create data for the price range of each restaurant as well, as denoted by the $ \unicode{x00A3} $ symbol listed on each website.

**Hint 1:** The information that you need for this was selected using the same criteria to find the number of reviews.

**Hint 2:** You type the $ \unicode{x00A3} $ symbol in python with the expression `\xA3`

In [24]:
price_range = scraper.find_all('span', {'class': 'priceRange__09f24__2O6le'})
price_range = [prices.text for prices in price_range]
price_range

['££', '££', '££', '££', '££££', '££', '££', '££', '££', '££']

#### Step 3:  Build a Dataframe For the Title, # of Ratings and Price Range of Each Restaurant

You will need to create a dictionary structured in the following way:

` {
    'Title': [list with the titles of each restaurant],
    'NumRatings': [list with the number of ratings of each restaurant],
    'Price Range': [list with the price range of each restaurant]
}`

In [32]:
data_dict = {'Title': titles, 'NumRatings': reviews, 'Price Range': price_range}
df = pd.DataFrame(data_dict)
df

Unnamed: 0,Title,NumRatings,Price Range
0,The Mayfair Chippy,283,££
1,Dishoom,1841,££
2,Flat Iron,380,££
3,Ffiona’s Restaurant,268,££
4,Restaurant Gordon Ramsay,204,££££
5,The Fat Bear,122,££
6,The Breakfast Club,494,££
7,Padella,207,££
8,Dishoom,547,££
9,The Golden Chippy,107,££
