# Rotten Tomatoes
***
## All Time lists: The best movies ever made by genre and type, ranked by our unique `adjusted Tomatometer!`

<div class="alert alert-block alert-info">
The adjusted tomatometer Score, takes into account the number of reviews, the year of release, and the average Tomatometer scores of other films released contemporaneously. It is primarily used when comparing or ranking films across several decades.
</div>

This project is about web scrapping using the `BeautifulSoup` library on a html page which content is embedded into divs, tags and formatted with cascade style sheet (CSS). The objective is extracting every data point related to each movie ranked, create a list for every one of them, group the information in a tabular format and save it as an excel and a csv file in our working directory.

Data points:
1.	Rank                 : position on the list higer is worst
2.	Title                : titile of the movie
3.	Year                 : release date
4.	Score                : audience score
5.	Adjusted Tomatometer : rotten caluculated score.
6.	Critics Consensus    : cretics general agreement
7.	Synopsis             : summary of the movie
8.	Starting             : Actor(s)
9.	Director             : Director

Project index:

1. Inital Setup
2. Finding the div container of all the data points
3. Extracting every data point from the soap object to a list
4. Using pandas dataframe to give a tabular form to the extracted data
5. Exporting the data to Excel and CSV (comma-separated values) files

## 1. Initial Setup

In [1]:
# Import libraries and packages for:
# extract the page content code and save it into a variable
# parse the page content into a well format html document
# format  the extracted information into a friendly tabular table like format

import requests
from bs4 import BeautifulSoup
import pandas as pd

In [2]:
# Steps:

# 2. Define the base_site URL for the page we'll get the data from 
#   Note any other page form https://editorial.rottentomatoes.com/all-time-lists/ will work
#   Examples are:
#       base_site = "https://editorial.rottentomatoes.com/guide/best-horror-movies-of-all-time/"
#       base_site = "https://editorial.rottentomatoes.com/guide/best-sci-fi-movies-of-all-time/"
#       base_site = "https://editorial.rottentomatoes.com/guide/140-essential-action-movies-to-watch-now/"

# 3. Request and Get the code from the base_site and save it's content in variable 
#    (usually named 'r' after response) for furder manipulation
# 4. Extract the response content into a variable named html
# 5. Create a BeatifulSoup object from the html variable using 'lxml' as parser 
# 6. Export the created object to your working directory as a file for structure analisys.

# Define the base URL
base_site = "https://editorial.rottentomatoes.com/guide/best-horror-movies-of-all-time/"

In [3]:
# Request and Get the code from the base_site
r = requests.get(base_site)

In [4]:
# Extract the response (r) content
html = r.content

In [5]:
# Create the BeatifulSoup object 
soup = BeautifulSoup(html, 'lxml')

<div class="alert alert-block alert-warning">
BeautifulSoup ranks the lxml parser as the best one.
If a parser is not explicitly stated in the Beautiful Soup constructor,
the best one available on the current machine is chosen. This means that the same 
piece of code can give different results on different computers.
</div>

In [6]:
# Extracting and formating the name of the list
listName = base_site.split("/")[4] .replace('-','_')

# Exporting the html to a file in the working directory
with open('rt_'+ listName + '.html', 'wb') as file:
    file.write(soup.prettify('utf-8'))

## 2. Finding the div container of all the data points

In [7]:
# After reviewing the Rotten_tomatoes_lxml_parser.html file, it was identified that the div tag with class = "row countdown-item" is 
# the container of all sub-divs that contain and format every data element of a movie. 

# Definig the principal container of all movie data elements
divs = soup.find_all("div", {"class": "row countdown-item"})

## 3. Scraping every data point from the soap object to a list

In [8]:
# It is time to start extracting all that information we are intrested in and save it to variables for
# later processing. List Comprehension will be the technique used for filtering the desire data in the divs
# container and store results in list type variables. 

# In detail: every data element and it's formating tag or class. 

# 0. Principal container > div class = "row countdown-item"
# 1. Rank# >  div class="countdown-index"
# 2. Title, Year, Score > div h2 (a, class="subtle start-year", class="tMeterScore")
# 3. Adjusted Score > div "class": "countdown-adjusted-score"
# 4. Critics Consensus > class="info critics-consensus"
# 5. Synopsis > class="info synopsis"
# 6. Starting > class="info cast"
# 7. Director > class="info director"


# Extracting movie rank#
rank = [None if rank.find('div', {"class": "countdown-index"}) is None
    else rank.find('div', {"class": "countdown-index"}).text.strip('#')
        for rank in divs]

In [9]:
# Extracting movie title
title = [None if title.find('a') is None
    else title.find('a').text.strip() 
        for title in (div.find('h2') for div in divs)]

In [10]:
# Extracting movie year
year = [None if year.find('span', {'class':'start-year'}).text.strip("()") == ""
    else year.find('span', {'class':'start-year'}).text.strip("()")
        for year in (div.find('h2') for div in divs)]

In [11]:
# Extracting movie score
score = [None if score.find('span', {'class':'tMeterScore'}) .text.strip("%") == ""
    else int(score.find('span', {'class':'tMeterScore'}).text.strip("%"))
        for score in (div.find('h2') for div in divs)]

In [12]:
#Extracting Adjusted-Score
adjusted_score = [None if adjusted_score.find('div', {"class": "countdown-adjusted-score"}) is None
    else int(adjusted_score.find('div', {"class": "countdown-adjusted-score"}).contents[1].strip('% '))
        for adjusted_score in divs]

In [13]:
# Extracting movie critics-consensus
critics_consensus = [None if critics_consensus.find('div', {"class": "info critics-consensus"}) is None
    else critics_consensus.find('div', {"class": "info critics-consensus"}).text.strip('Critics Consensus: ')
        for critics_consensus in divs]

In [14]:
# Extracting movie synopsis
synopsis = [None if synopsis.find('div', {"class": "info synopsis"}) is None
    else synopsis.find('div', {"class": "info synopsis"}).contents[1].strip()
        for synopsis in divs]

In [15]:
# Extracting movie cast
cast = [None if cast.find('div', {"class": "info cast"}) is None
    else cast.find('div', {"class": "info cast"}).text.strip(' \nStarring:')
        for cast in divs]

In [16]:
# Extracting movie director
director = [None if director.find('div', {"class": "info director"}) is None
    else director.find('div', {"class": "info director"}).text.strip('\nDirected By: ')
        for director in divs]

## 4. Using pandas dataframe to give a tabular form to the extracted data

In [17]:
# Steps

# 1. Set the max_coldwidth feature to None. By default, pandas abbreviates any text beyond
#    a certain length (as seen in the Cast and Consensus columns). We can change that
#    by setting the maximum column width to None, which means the column would be
#    as wide as to display the whole text
# 2. Set the values for score and adjusted score to only show 2 decimal places.
# 3. Create a pandas dataframe hold and organize in tabular form to the data.
# 4. Populate the movies_info dataframe with every list created (title, rank, year, etc.) 
# 5. Sort the data by the field Adjusted Tomatometer show one the many benefits 
#    of having the information into pandas dataframe

# Setting column with to None
pd.set_option('display.max_colwidth', None)

# Setting floats to Two decimal places
pd.options.display.float_format = '${:,.2f}'.format

In [18]:
# Creating the dataframe
movies_info = pd.DataFrame()

In [19]:
# Populating the dataframe
movies_info["Movie Title"] = title
movies_info["Rank"] = rank
movies_info["Year"] = year
movies_info["Score"] = score
movies_info["Adjusted Tomatometer"] = adjusted_score
movies_info["Consensus"] = critics_consensus
movies_info["Synopsis"] = synopsis
movies_info["Cast"] = cast
movies_info["Director"] = director

In [20]:
# Printing the movies_info pandas dataframe 
movies_info.head(10).style.hide(axis='index')

Movie Title,Rank,Year,Score,Adjusted Tomatometer,Consensus,Synopsis,Cast,Director
A Nightmare on Elm Street 3: Dream Warriors,200,1987,71,73367,A Nightmare on Elm Street 3: Dream Warriors offers an imaginative and surprisingly satisfying rebound for a franchise already starting to succumb to sequelitis.,"During a hallucinatory incident, young Kristen Parker (Patricia Arquette) has her wrists slashed by dream-stalking monster Freddy Krueger (Robert Englund)....","Heather Langenkamp, Patricia Arquette, Craig Wasson, Larry Fishburne",Chuck Russell
Phenomena,199,1985,76,75174,No consensus yet.,An American (Jennifer Connelly) at a Swiss finishing school calls on insects to help a paralyzed scientist (Donald Pleasence) fight...,"Jennifer Connelly, Donald Pleasence, Dalila Di Lazzaro, Fausta Avell",ario Argento
Bram Stoker's Dracula,198,1992,76,80034,"Overblown in the best sense of the word, Francis Ford Coppola's vision of Bram Stoker's Dracula rescues the character from decades of campy interpretations -- and features some terrific performances to boot.",Adaptation of Bram Stoker's classic vampire novel. Gary Oldman plays Dracula whose lonely soul is determined to reunite with his...,"Gary Oldman, Winona Ryder, Anthony Hopkins, Keanu Reeves",Francis Ford Coppola
Hellraiser,197,1987,71,73519,"Elevated by writer-director Clive Barker's fiendishly unique vision, Hellraiser offers a disquieting - and sadistically smart - alternative to mindless gore.",Sexual deviant Frank (Sean Chapman) inadvertently opens a portal to hell when he tinkers with a box he bought while...,"Andrew Robinson, Clare Higgins, Ashley Laurence, Sean Chapm",Clive Bark
It's Alive!,196,1974,70,70611,"Tough and unpleasant, It's Alive throttles the viewer with its bizarre mutant baby theatrics.","Leaving their son, Chris (Daniel Holzman), with a family friend (William Wellman Jr.), Frank (John P. Ryan) and Lenore Davis...","John P. Ryan, Sharon Farrell, Andrew Duggan, Guy Stockwell",Larry Cohen
Jacob's Ladder,195,1990,73,77529,"Even with its disorienting leaps of logic and structure, Jacob's Ladder is an engrossing, nerve-shattering experience.","After returning home from the Vietnam War, veteran Jacob Singer (Tim Robbins) struggles to maintain his sanity. Plagued by hallucinations...","Tim Robbins, Elizabeth Peña, Danny Aiello, Matt Crave",Adrian Lyn
Open Water,194,2003,71,78226,A low budget thriller with some intense moments.,Daniel (Daniel Travis) and Susan (Blanchard Ryan) embark on a tropical vacation with their scuba-diving certifications in tow. During a...,"Blanchard Ryan, Daniel Travis, Saul Stein, Estelle Lau",Chris Kentis
The Mist,193,2007,72,77415,Frank Darabont's impressive camerawork and politically incisive script make The Mist a truly frightening experience.,"After a powerful storm damages their Maine home, David Drayton (Thomas Jane) and his young son head into town to...","Thomas Jane, Marcia Gay Harden, Laurie Holden, Andre Braughe",Frank Darabon
The Ring,192,2002,71,77104,"With little gore and a lot of creepy visuals, The Ring gets under your skin, thanks to director Gore Verbinski's haunting sense of atmosphere and an impassioned performance from Naomi Watts.",It sounds like just another urban legend -- a videotape filled with nightmarish images leads to a phone call foretelling...,"Naomi Watts, Martin Henderson, David Dorfman, Brian Cox",Gore Verbinsk
Phantasm,191,1979,74,76476,Phantasm: Remastered adds visual clarity to the first installment in one of horror's most enduring -- and endearingly idiosyncratic -- franchises.,"The residents of a small town have begun dying under strange circumstances, leading young Mike (Michael Baldwin) to investigate. After...","Michael Baldwin, Bill Thornbury, Reggie Bannister, Kathy Leste",on Coscarell


In [21]:
# sorting the dataframe by top 10 Adjusted Tomatometer
movies_info.sort_values('Adjusted Tomatometer', ascending=False).head(10).style.hide(axis='index')

Movie Title,Rank,Year,Score,Adjusted Tomatometer,Consensus,Synopsis,Cast,Director
Get Out,2,2017,98,129043,"Funny, scary, and thought-provoking, Get Out seamlessly weaves its trenchant social critiques into a brilliantly effective and entertaining horror/comedy thrill ride.","Now that Chris and his girlfriend, Rose, have reached the meet-the-parents milestone of dating, she invites him for a weekend...","Daniel Kaluuya, Allison Williams, Catherine Keener, Bradley Whitford",Jordan Peel
Us,4,2019,93,127662,"With Jordan Peele's second inventive, ambitious horror film, we have seen how to beat the sophomore jinx, and it is Us.","Accompanied by her husband, son and daughter, Adelaide Wilson returns to the beachfront home where she grew up as a...","Lupita Nyong'o, Winston Duke, Elisabeth Moss, Tim Heidecke",Jordan Peel
The Invisible Man,9,2020,92,121500,"Smart, well-acted, and above all scary, The Invisible Man proves that sometimes, the classic source material for a fresh reboot can be hiding in plain sight.","After staging his own suicide, a crazed scientist uses his power to become invisible to stalk and terrorize his ex-girlfriend....","Elisabeth Moss, Oliver Jackson-Cohen, Aldis Hodge, Storm Reid",Leigh Whannell
A Quiet Place,10,2018,96,119157,A Quiet Place artfully plays on elemental fears with a ruthlessly intelligent creature feature that's as original as it is scary -- and establishes director John Krasinski as a rising talent.,"If they hear you, they hunt you. A family must live in silence to avoid mysterious creatures that hunt by...","Emily Blunt, John Krasinski, Millicent Simmonds, Noah Jupe",John Krasinsk
It,41,2017,86,115240,"Well-acted and fiendishly frightening with an emotionally affecting story at its core, It amplifies the horror in Stephen King's classic story without losing touch with its heart.","Seven young outcasts in Derry, Maine, are about to face their worst nightmare -- an ancient, shape-shifting evil that emerges...","Jaeden Lieberher, Jeremy Ray Taylor, Sophia Lillis, Finn Wolfhard",Andy Musch
The Cabinet of Dr. Caligari,3,1919,99,115195,"Arguably the first true horror film, The Cabinet of Dr. Caligari set a brilliantly high bar for the genre -- and remains terrifying nearly a century after it first stalked the screen.","At a carnival in Germany, Francis (Friedrich Feher) and his friend Alan (Rudolf Lettinger) encounter the crazed Dr. Caligari (Werner...","Werner Krauss, Conrad Veidt, Lil Dagover, Friedrich Fehe",Robert Wien
The Lighthouse,15,2019,90,113615,"A gripping story brilliantly filmed and led by a pair of powerhouse performances, The Lighthouse further establishes Robert Eggers as a filmmaker of exceptional talent.",Two lighthouse keepers try to maintain their sanity while living on a remote and mysterious New England island in the...,"Robert Pattinson, Willem Dafoe, Valeriia Karaman, Logan Hawkes",Robert Eggers
Hereditary,18,2018,89,112897,"Hereditary uses its classic setup as the framework for a harrowing, uncommonly unsettling horror film whose cold touch lingers long beyond the closing credits.","When the matriarch of the Graham family passes away, her daughter and grandchildren begin to unravel cryptic and increasingly terrifying...","Toni Collette, Gabriel Byrne, Alex Wolff, Ann Dowd",Ari As
The Witch,31,2015,90,111674,"As thought-provoking as it is visually compelling, The Witch delivers a deeply unsettling exercise in slow-building horror that suggests great things for debuting writer-director Robert Eggers.","In 1630 New England, panic and despair envelops a farmer, his wife and their children when youngest son Samuel suddenly...","Anya Taylor-Joy, Ralph Ineson, Kate Dickie, Harvey Scrimshaw",Robert Eggers
10 Cloverfield Lane,33,2016,90,110513,"Smart, solidly crafted, and palpably tense, 10 Cloverfield Lane makes the most of its confined setting and outstanding cast -- and suggests a new frontier for franchise filmmaking.","After surviving a car accident, Michelle (Mary Elizabeth Winstead) wakes up to find herself in an underground bunker with two...","John Goodman, Mary Elizabeth Winstead, John Gallagher Jr., Douglas M. Griff",an Trachtenberg


## 5. Exporting the data to Excel and CSV (comma-separated values) files

In [22]:
# Note: dataframe index is set to False so that the index (0,1,2...) of each movie is not saved to the file (the index is purely internal)
# The header is set to True, so that the names of the columns are saved

# Write data to excel file
movies_info.to_excel(listName + '.xlsx', index = False, header = True)

# Write data to CSV file
movies_info.to_csv(listName + '.csv', index = False, header = True)