# Investigating Fandango Movie Ratings

Write a short introduction where you explain readers in no more than two paragraphs:
* What the project is about.
* What your goal is in this project.: goal is to determine whether there has been any change in Fandango's rating system after Hickey's analysis.

fandango_score_comparison.csv contains [the data used for the analysis](https://github.com/fivethirtyeight/data/tree/master/fandango) obtained in 2015.<br>
movie_ratings_16_17.csv contains movie ratings [data for 214 of the most popular movies released in 2016 and 2017](https://github.com/mircealex/Movie_ratings_2016_17), after the article was published.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

pd.options.display.max_columns = 100

## Exploratory analysis

In [2]:
before = pd.read_csv('fandango_score_comparison.csv')
print('{} rows x {} columns'.format(before.shape[0], before.shape[1]))
before.head()

146 rows x 22 columns


Unnamed: 0,FILM,RottenTomatoes,RottenTomatoes_User,Metacritic,Metacritic_User,IMDB,Fandango_Stars,Fandango_Ratingvalue,RT_norm,RT_user_norm,Metacritic_norm,Metacritic_user_nom,IMDB_norm,RT_norm_round,RT_user_norm_round,Metacritic_norm_round,Metacritic_user_norm_round,IMDB_norm_round,Metacritic_user_vote_count,IMDB_user_vote_count,Fandango_votes,Fandango_Difference
0,Avengers: Age of Ultron (2015),74,86,66,7.1,7.8,5.0,4.5,3.7,4.3,3.3,3.55,3.9,3.5,4.5,3.5,3.5,4.0,1330,271107,14846,0.5
1,Cinderella (2015),85,80,67,7.5,7.1,5.0,4.5,4.25,4.0,3.35,3.75,3.55,4.5,4.0,3.5,4.0,3.5,249,65709,12640,0.5
2,Ant-Man (2015),80,90,64,8.1,7.8,5.0,4.5,4.0,4.5,3.2,4.05,3.9,4.0,4.5,3.0,4.0,4.0,627,103660,12055,0.5
3,Do You Believe? (2015),18,84,22,4.7,5.4,5.0,4.5,0.9,4.2,1.1,2.35,2.7,1.0,4.0,1.0,2.5,2.5,31,3136,1793,0.5
4,Hot Tub Time Machine 2 (2015),14,28,29,3.4,5.1,3.5,3.0,0.7,1.4,1.45,1.7,2.55,0.5,1.5,1.5,1.5,2.5,88,19560,1021,0.5


In [3]:
after = pd.read_csv('movie_ratings_16_17.csv')
print('{} rows x {} columns'.format(after.shape[0], after.shape[1]))
after.head()

214 rows x 15 columns


Unnamed: 0,movie,year,metascore,imdb,tmeter,audience,fandango,n_metascore,n_imdb,n_tmeter,n_audience,nr_metascore,nr_imdb,nr_tmeter,nr_audience
0,10 Cloverfield Lane,2016,76,7.2,90,79,3.5,3.8,3.6,4.5,3.95,4.0,3.5,4.5,4.0
1,13 Hours,2016,48,7.3,50,83,4.5,2.4,3.65,2.5,4.15,2.5,3.5,2.5,4.0
2,A Cure for Wellness,2016,47,6.6,40,47,3.0,2.35,3.3,2.0,2.35,2.5,3.5,2.0,2.5
3,A Dog's Purpose,2017,43,5.2,33,76,4.5,2.15,2.6,1.65,3.8,2.0,2.5,1.5,4.0
4,A Hologram for the King,2016,58,6.1,70,57,3.0,2.9,3.05,3.5,2.85,3.0,3.0,3.5,3.0


We can isolate the columns that offer information about Fandango's ratings in separate variables so we have just the data we're interested in more readily available for later use.

In [4]:
fandango_before = before[['FILM', 'Fandango_Stars', 'Fandango_Ratingvalue', 'Fandango_votes', 'Fandango_Difference']].copy()
fandango_before.head()

Unnamed: 0,FILM,Fandango_Stars,Fandango_Ratingvalue,Fandango_votes,Fandango_Difference
0,Avengers: Age of Ultron (2015),5.0,4.5,14846,0.5
1,Cinderella (2015),5.0,4.5,12640,0.5
2,Ant-Man (2015),5.0,4.5,12055,0.5
3,Do You Believe? (2015),5.0,4.5,1793,0.5
4,Hot Tub Time Machine 2 (2015),3.5,3.0,1021,0.5


In [5]:
fandango_after = after[['movie', 'year', 'fandango']].copy()
fandango_after.head()

Unnamed: 0,movie,year,fandango
0,10 Cloverfield Lane,2016,3.5
1,13 Hours,2016,4.5
2,A Cure for Wellness,2016,3.0
3,A Dog's Purpose,2017,4.5
4,A Hologram for the King,2016,3.0


Hickey picked 146 films in theaters in 2015 that had 30+ reviews on Fandango. Dataquest's dataset contains 214 movies released between 2016 and 2017 with a significant number of reviews, without especifying how many. As the sampling has definied criteria and not all movies have a chance to be picked, these are not random samples. Therefore, it is unlikely that they are representative of the population (all films in Fandango's database).

## Change of goal

Because of what was stated before, instead of trying to determine whether there has been any change in Fandango's rating system after Hickey's analysis, our new goal is to determine whether there's any difference between Fandango's ratings for popular movies in 2015 and Fandango's ratings for popular movies in 2016. To do this, we need to reassess our samples.

## Preparing the samples

With the new goal, we now have two populations that we want to describe and compare with each other:

1. All Fandango's ratings for popular movies released in 2015.
2. All Fandango's ratings for popular movies released in 2016.

The term "popular" is vague and we need to define it with precision before continuing. We'll use Hickey's benchmark of 30 fan ratings and consider a movie as "popular" only if it has 30 fan ratings or more on Fandango's website.

As we don't have this information on the 'fandango_after' dataset, let's select a random sample from it and manually check the number of reviews on Fandango's website.

In [6]:
fandango_after.sample(10, random_state = 0)

Unnamed: 0,movie,year,fandango
197,The Take (Bastille Day),2016,4.0
37,Come and Find Me,2016,4.0
89,Kickboxer,2016,4.0
176,The Founder,2016,4.0
170,The Darkness,2016,2.5
75,Ice Age: Collision Course,2016,4.0
96,Lion,2016,4.0
137,Ride Along 2,2016,4.0
5,A Monster Calls,2016,4.0
83,Jane Got a Gun,2016,3.5


In [12]:
%%html

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-amwm{font-weight:bold;text-align:center;vertical-align:top}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <td class="tg-amwm">Movie</td>
    <td class="tg-amwm">Ratings</td>
  </tr>
  <tr>
    <td class="tg-0lax">The Take (Bastille Day)</td>
    <td class="tg-0lax">3792</td>
  </tr>
  <tr>
    <td class="tg-0lax">Come and Find Me</td>
    <td class="tg-0lax">551</td>
  </tr>
  <tr>
    <td class="tg-0lax">Kickboxer</td>
    <td class="tg-0lax">1106</td>
  </tr>
  <tr>
    <td class="tg-0lax">The Founder*</td>
    <td class="tg-0lax">32203</td>
  </tr>
  <tr>
    <td class="tg-0lax">The Darkness</td>
    <td class="tg-0lax">6671</td>
  </tr>
  <tr>
    <td class="tg-0lax">Ice Age: Collision Course</td>
    <td class="tg-0lax">32999</td>
  </tr>
  <tr>
    <td class="tg-0lax">Lion</td>
    <td class="tg-0lax">39603</td>
  </tr>
  <tr>
    <td class="tg-0lax">Ride Along 2</td>
    <td class="tg-0lax">35016</td>
  </tr>
  <tr>
    <td class="tg-0lax">A Monster Calls*</td>
    <td class="tg-0lax">25202</td>
  </tr>
  <tr>
    <td class="tg-0lax">Jane Got a Gun</td>
    <td class="tg-0lax">14039</td>
  </tr>
</table>

*Movies released in 2017, not in 2016 as stated in the dataset

0,1
Movie,Ratings
The Take (Bastille Day),3792
Come and Find Me,551
Kickboxer,1106
The Founder*,32203
The Darkness,6671
Ice Age: Collision Course,32999
Lion,39603
Ride Along 2,35016
A Monster Calls*,25202
