## Scraping IMDB User Reviews


### Step 1

Write code to drive a browser to:

1. Visit the imdb [review page](https://www.imdb.com/title/tt1856010/reviews) for *House of Cards*;

2. Tick the checkbox to hide spoilers (using `span.lister-checkbox`);

3. Sort reviews by total votes (by invoking `.send_keys("Total Votes")` on an appropriate Web element).



In [17]:
# starter code
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By


# provide your code below

s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service = s)

driver.get("https://www.imdb.com/title/tt1856010/reviews")
checkbox = driver.find_element(By.CSS_SELECTOR, "span.lister-checkbox")
checkbox.click()

selector = driver.find_element("name", "sort")
selector.send_keys("Total Votes")

### Step 2


Initially, only 25 reviews are presented on the page. You will find a "load more" button at the bottom of the page. Each time you click on it, additional 25 reviews will be loaded on the fly into the result.

Please complete the code below to simulate repeatedly clicking on the button until all the reviews are rendered and the button is no longer visible.



In [18]:
# starter code

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# replace ... in the try clause below with your code 
while True:
    
    try:
    # tip: use element_to_be_clickable()
    # recall: an error will be thrown when an expected condition cannot be met 
    # within the specified time interval
        load_more_button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable(("class name", "ipl-load-more__button")))
        load_more_button.click()
    
    
    except: 
        # break out of the loop if the button cannot be found anymore
        break   

### Step 3


Pass the source code of the **fully rendered** review page to `BeautifulSoup` to scrape review data.


The pieces of information you need to extract for each review include:

- the title,

- the review text, 

- the ID of the user who gave it, 

- the review date, 

- and the numerical rating (a number out of 10; some reviews have this piece of information missing; in this case, use `None` as a placeholder).



In [19]:
# starter code

from bs4 import BeautifulSoup

# provide your code below
import requests
response = requests.get("https://www.imdb.com/title/tt1856010/reviews", timeout=3)
soup = BeautifulSoup(response.content, 'html.parser')

reviews = soup.select('div.review-container')
review_data = [] 


for review in reviews: 
    title = review.select_one('a.title').get_text()
    review_text = review.select_one('div.text.show-more__control').get_text()
    user_id = review.select_one("span.display-name-link").get_text()
    review_date = review.select_one("span.review-date").get_text()
    rating_value = review.select_one("span.rating-other-user-rating > span:first-of-type").get_text() if review.select_one("span.rating-other-user-rating > span:first-of-type") else None

    print("Title:", title)
    print("Review Text:", review_text)
    print("User ID:", user_id)
    print("Review Date:", review_date)
    print("Rating:", rating_value)
    print("----------------------------------------")














Title:  Stood firm for four seasons, collapsed completely in Season 5

Review Text: 'House of Cards' much of the time was one of the most compelling shows. Sadly, it has also become one of the most frustrating. Not since 'Once Upon a Time' and 'The Walking Dead', and before that 'Lost' has such a brilliant show of great promise declined so rapidly.Lets start with the many great things first. For the first four seasons, 'House of Cards' was seriously addictive, must-watch television and very quickly became one of my favourite shows. Throughout its run, it's one of the most stylish and most atmospheric shows personally seen, with cinematic-quality photography and production design. The direction was smart and intelligent, especially the first two episodes with David Fincher's, to me one of the better directors of the last twenty five or so years, involvement (the first episode earning Fincher a Primetime Emmy) and the music knew when to have presence and when to tone things down to let t

---


### Optional: Practice on CSS Selectors

[This web game](https://flukeout.github.io/) requires you to write CSS Selectors. It's a fun way to test your knowledge of writing CSS selectors. It also explains the selectors so it can be a good learning tool as well. 

Play level 1-9, 12-14, and 28-29 and provide the required answers below:


> Level 5 answer:

> Level 8 answer:

> Level 13 answer:

> Level 29 answer: