# Marvel Cinematic Universe Movie Ratings

Author: Eze Ahunanya 

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

In this project, the ratings of the Marvel Cinematic Universe movies will be explored. In particular, the report aims to answer the following questions: 'How do the scores from critics and general audience compare?' and 'What is the relationship of box office earnings and movie ratings?'. The data used will be sourced from the Rotten Tomatoes website. 

<a id='intro'></a>
## Data Wrangling

In [2]:
import requests 
from bs4 import BeautifulSoup

In [3]:
# url contains movie titles
url = 'https://en.wikipedia.org/wiki/Marvel_Cinematic_Universe#Films'

# save html file in response variable
response = requests.get(url)

In [4]:
# parse html file and save to soup variable
soup = BeautifulSoup(response.content, 'lxml')

In [5]:
movies_list = []

for i in [x for x in range(14, 37)]:
    
    # extract movie titles inside 'th' tag from 15th to 37th elements in the list
    movie_line = soup.find_all('th', scope="row")[i] 
    movie_title = movie_line.contents[0].contents[0].contents[0] 
    print(i, movie_title)
    movies_list.append(movie_title)

14 Iron Man
15 The Incredible Hulk
16 Iron Man 2
17 Thor
18 Captain America: The First Avenger
19 Marvel's The Avengers
20 Iron Man 3
21 Thor: The Dark World
22 Captain America: The Winter Soldier
23 Guardians of the Galaxy
24 Avengers: Age of Ultron
25 Ant-Man
26 Captain America: Civil War
27 Doctor Strange
28 Guardians of the Galaxy Vol. 2
29 Spider-Man: Homecoming
30 Thor: Ragnarok
31 Black Panther
32 Avengers: Infinity War
33 Ant-Man and the Wasp
34 Captain Marvel
35 Avengers: Endgame
36 Spider-Man: Far From Home


In [16]:
urls_list = []

for movie_title in movies_list:
    
    # format movie strings for urls
    movie_title = (movie_title.lower().replace(" ", "_").replace("-", "_")
    .replace(":", "").replace("'", "").replace(".", ""))
    url = 'https://www.rottentomatoes.com/m/{}'.format(movie_title)
    print(url)
    urls_list.append(url) 

https://www.rottentomatoes.com/m/iron_man
https://www.rottentomatoes.com/m/the_incredible_hulk
https://www.rottentomatoes.com/m/iron_man_2
https://www.rottentomatoes.com/m/thor
https://www.rottentomatoes.com/m/captain_america_the_first_avenger
https://www.rottentomatoes.com/m/marvels_the_avengers
https://www.rottentomatoes.com/m/iron_man_3
https://www.rottentomatoes.com/m/thor_the_dark_world
https://www.rottentomatoes.com/m/captain_america_the_winter_soldier
https://www.rottentomatoes.com/m/guardians_of_the_galaxy
https://www.rottentomatoes.com/m/avengers_age_of_ultron
https://www.rottentomatoes.com/m/ant_man
https://www.rottentomatoes.com/m/captain_america_civil_war
https://www.rottentomatoes.com/m/doctor_strange
https://www.rottentomatoes.com/m/guardians_of_the_galaxy_vol_2
https://www.rottentomatoes.com/m/spider_man_homecoming
https://www.rottentomatoes.com/m/thor_ragnarok
https://www.rottentomatoes.com/m/black_panther
https://www.rottentomatoes.com/m/avengers_infinity_war
https://w

In [7]:
# correct faulty url addresses
urls_list[13] = 'https://www.rottentomatoes.com/m/doctor_strange_2016'
urls_list[17] = 'https://www.rottentomatoes.com/m/black_panther_2018'

In [8]:
urls_list

['https://www.rottentomatoes.com/m/iron_man',
 'https://www.rottentomatoes.com/m/the_incredible_hulk',
 'https://www.rottentomatoes.com/m/iron_man_2',
 'https://www.rottentomatoes.com/m/thor',
 'https://www.rottentomatoes.com/m/captain_america_the_first_avenger',
 'https://www.rottentomatoes.com/m/marvels_the_avengers',
 'https://www.rottentomatoes.com/m/iron_man_3',
 'https://www.rottentomatoes.com/m/thor_the_dark_world',
 'https://www.rottentomatoes.com/m/captain_america_the_winter_soldier',
 'https://www.rottentomatoes.com/m/guardians_of_the_galaxy',
 'https://www.rottentomatoes.com/m/avengers_age_of_ultron',
 'https://www.rottentomatoes.com/m/ant_man',
 'https://www.rottentomatoes.com/m/captain_america_civil_war',
 'https://www.rottentomatoes.com/m/doctor_strange_2016',
 'https://www.rottentomatoes.com/m/guardians_of_the_galaxy_vol_2',
 'https://www.rottentomatoes.com/m/spider_man_homecoming',
 'https://www.rottentomatoes.com/m/thor_ragnarok',
 'https://www.rottentomatoes.com/m/bla

In [9]:
import get_movie_module as gm

In [10]:
df = gm.get_movie_data(urls_list)
df

Unnamed: 0,index,score,averageRating,scoreSentiment,reviewCount,ratingCount,scoreType,likedCount,notLikedCount,certified,tomatometerState,audienceClass,movie_title,release_date_theaters,box_office_gross_usa,runtime
0,tomatometerAllCritics,94,7.71,POSITIVE,279,279,,261,18,True,certified-fresh,,Iron Man (2008),"May 2, 2008",$318.3M,2h6m
1,tomatometerTopCritics,90,7.40,POSITIVE,58,58,,52,6,True,certified-fresh,,Iron Man (2008),"May 2, 2008",$318.3M,2h6m
2,audienceAll,91,4.26,POSITIVE,82484,1083066,ALL,220183,22279,False,,upright,Iron Man (2008),"May 2, 2008",$318.3M,2h6m
3,audienceVerified,,,,0,0,VERIFIED,0,0,False,,,Iron Man (2008),"May 2, 2008",$318.3M,2h6m
4,tomatometerAllCritics,67,6.16,POSITIVE,233,233,,156,77,False,fresh,,The Incredible Hulk (2008),"Jun 13, 2008",$134.5M,1h52m
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87,audienceVerified,,,,0,0,VERIFIED,0,0,False,,,Avengers: Endgame (2019),"Apr 26, 2019",$858.4M,3h1m
88,tomatometerAllCritics,90,7.44,POSITIVE,441,441,,399,42,True,certified-fresh,,Spider-Man: Far From Home (2019),"Jul 2, 2019",$390.7M,2h9m
89,tomatometerTopCritics,88,6.80,POSITIVE,51,51,,45,6,True,certified-fresh,,Spider-Man: Far From Home (2019),"Jul 2, 2019",$390.7M,2h9m
90,audienceAll,93,4.53,POSITIVE,14596,94123,ALL,87828,6295,False,,upright,Spider-Man: Far From Home (2019),"Jul 2, 2019",$390.7M,2h9m
