# Web Scraping 

In this notebook, I will be using sentiment analysis on Rotten Tomatoes movie reviews to make predictions on users star ratings. First I am web scraping the movie rating website Rotten Tomatoes. Here I will parse the user id, review and star rating and add these features to a dataframe. 

### Importing Modules

In [18]:
import pandas as pd
import requests
import urllib.request
import time
from bs4 import BeautifulSoup

### Connecting to the URL

In [2]:
# Tester URL
url = 'https://www.rottentomatoes.com/m/black_panther_2018/reviews?type=user'
response = requests.get(url)

### Brewing the soup

In [3]:
# Brewing the soup
soup = BeautifulSoup(response.text, 'html.parser')

### Isolating all the relevant user data

In [4]:
# One user
user_all = soup.find_all('li', {'class': 'audience-reviews__item'})
user = user_all[0]
print(user.prettify())

<li class="audience-reviews__item" data-qa="review-item">
 <div class="audience-reviews__user-wrap">
  <a href="/user/id/978824977">
   <span class="audience-review__default-image">
   </span>
  </a>
  <div class="audience-reviews__name-wrap">
   <a class="audience-reviews__name" data-qa="review-name" href="/user/id/978824977">
    Diego O
   </a>
  </div>
 </div>
 <div class="audience-reviews__review-wrap">
  <span class="audience-reviews__score">
   <span class="star-display" data-qa="star-display">
    <span class="star-display__filled">
    </span>
    <span class="star-display__filled">
    </span>
    <span class="star-display__filled">
    </span>
    <span class="star-display__filled">
    </span>
    <span class="star-display__half">
    </span>
   </span>
  </span>
  <span class="audience-reviews__duration" data-qa="review-duration">
   Jan 08, 2021
  </span>
  <p class="audience-reviews__review js-review-text clamp clamp-8 js-clamp" data-qa="review-text">
   Good but not tha

### Parsing the star rating, review and user id

In [5]:
# Star rating
full_stars  = user.find_all('span', {'class': 'star-display__filled'})
half_stars  = user.find_all('span', {'class': 'star-display__half'})
star_rating = len(full_stars)+(len(half_stars)/2)

print(star_rating)

4.5


In [6]:
# Extracting review
review = user.find('p', {'class': 'audience-reviews__review js-review-text clamp clamp-8 js-clamp'})

print(review.text)

Good but not that surprising.


In [13]:
# Extracting user ID
id_     = user.find('a')['href']
id_list = id_.split('/')
user_id = id_list[3]

print(id_)
print(id_list)
print(user_id)

/user/id/978824977
['', 'user', 'id', '978824977']
978824977


In [17]:
# Parsing star rating, review and user id
for user in soup.find_all('li', {'class': 'audience-reviews__item'}):
    # Star rating
    full_stars  = user.find_all('span', {'class': 'star-display__filled'})
    half_stars  = user.find_all('span', {'class': 'star-display__half'})
    star_rating = len(full_stars)+(len(half_stars)/2)
    print(star_rating)

    # Review
    review = user.find('p', {'class': 'audience-reviews__review js-review-text clamp clamp-8 js-clamp'}).text
    print(review)

    # Extracting user ID
    id_     = user.find('a')['href']
    id_list = id_.split('/')
    user_id = id_list[3]
    print(user_id)
    
    print()

4.5
Good but not that surprising.
978824977

5.0
It was exciting to see this in theaters with my friends, and I had an extraordinary time. Again, this was a right place right time kind of film.
906471241

4.0
I'm not a huge Marvel fan, but this movie is VERY good. I watched it with my dad, and we both LOVED it. If you like Marvel or don't, this one is for you!
978925578

0.5
It was absolutely appaling!!!!! I have never been more bored (or ashamed of Marvel) in my entire life. Dreadful movie, would never watch again or recommend.
978898527

1.5
Best movie of all time?  Best drama of all time? Have they lost their collective minds?
This smacks of social justice sympathy.
977911687

5.0
Black panther lived up to the height
978866570

2.0
Not sure what all the hype was about. This movie is just so-so, and kind of boring at times.
978071323

5.0
R.I.P. Chadwick Boseman
978762264

2.0
Average superhero movie. Nothing "super" special about it.
978883340

5.0
A masterpiece and a cultural pheno