# Investigating Fandango Movie Ratings

In October 2015, a data journalist named Walt Hickey analyzed movie ratings data and found strong evidence to suggest that Fandango's rating system was biased and dishonest (Fandango is an online movie ratings aggregator).

Fandango displays a 5-star rating system on their website, where the minimum rating is 0 stars and the maximum is 5 stars.

Hickey found that there's a significant discrepancy between the number of stars displayed to users and the actual rating, which he was able to find in the HTML of the page. He was able to find that:
<ul>
    <li>The actual rating was almost always rounded up to the nearest half-star. For instance, a 4.1 movie would be rounded off to 4.5 stars, not to 4 stars, as you may expect.</li>
    <li>In the case of 8% of the ratings analyzed, the rounding up was done to the nearest whole star. For instance, a 4.5 rating would be rounded off to 5 stars.</li>
    <li>For one movie rating, the rounding off was completely bizarre: from a rating of 4 in the HTML of the page to a displayed rating of 5 stars.</li>
</ul>
Fandango's officials replied that the biased rounding off was caused by a bug in their system rather than being intentional, and they promised to fix the bug as soon as possible. Presumably, this has already happened, although we can't tell for sure since the actual rating value doesn't seem to be displayed anymore in the pages' HTML.

In this project, we'll analyze more recent movie ratings data to determine whether there has been any change in Fandango's rating system after Hickey's analysis.

# Importing libraries

In [10]:
import pandas as pd
import seaborn as sns

# Reading the data

In [11]:
fandango_score_comparison = pd.read_csv("fandango_score_comparison.csv")
fandango_score_comparison.head(5)

Unnamed: 0,FILM,RottenTomatoes,RottenTomatoes_User,Metacritic,Metacritic_User,IMDB,Fandango_Stars,Fandango_Ratingvalue,RT_norm,RT_user_norm,...,IMDB_norm,RT_norm_round,RT_user_norm_round,Metacritic_norm_round,Metacritic_user_norm_round,IMDB_norm_round,Metacritic_user_vote_count,IMDB_user_vote_count,Fandango_votes,Fandango_Difference
0,Avengers: Age of Ultron (2015),74,86,66,7.1,7.8,5.0,4.5,3.7,4.3,...,3.9,3.5,4.5,3.5,3.5,4.0,1330,271107,14846,0.5
1,Cinderella (2015),85,80,67,7.5,7.1,5.0,4.5,4.25,4.0,...,3.55,4.5,4.0,3.5,4.0,3.5,249,65709,12640,0.5
2,Ant-Man (2015),80,90,64,8.1,7.8,5.0,4.5,4.0,4.5,...,3.9,4.0,4.5,3.0,4.0,4.0,627,103660,12055,0.5
3,Do You Believe? (2015),18,84,22,4.7,5.4,5.0,4.5,0.9,4.2,...,2.7,1.0,4.0,1.0,2.5,2.5,31,3136,1793,0.5
4,Hot Tub Time Machine 2 (2015),14,28,29,3.4,5.1,3.5,3.0,0.7,1.4,...,2.55,0.5,1.5,1.5,1.5,2.5,88,19560,1021,0.5


In [12]:
fandango_score_comparison.describe()

Unnamed: 0,RottenTomatoes,RottenTomatoes_User,Metacritic,Metacritic_User,IMDB,Fandango_Stars,Fandango_Ratingvalue,RT_norm,RT_user_norm,Metacritic_norm,...,IMDB_norm,RT_norm_round,RT_user_norm_round,Metacritic_norm_round,Metacritic_user_norm_round,IMDB_norm_round,Metacritic_user_vote_count,IMDB_user_vote_count,Fandango_votes,Fandango_Difference
count,146.0,146.0,146.0,146.0,146.0,146.0,146.0,146.0,146.0,146.0,...,146.0,146.0,146.0,146.0,146.0,146.0,146.0,146.0,146.0,146.0
mean,60.849315,63.876712,58.808219,6.519178,6.736986,4.089041,3.845205,3.042466,3.193836,2.940411,...,3.368493,3.065068,3.226027,2.972603,3.270548,3.380137,185.705479,42846.205479,3848.787671,0.243836
std,30.168799,20.02443,19.517389,1.510712,0.958736,0.540386,0.502831,1.50844,1.001222,0.975869,...,0.479368,1.5146,1.007014,0.990961,0.788116,0.502767,316.606515,67406.509171,6357.778617,0.152665
min,5.0,20.0,13.0,2.4,4.0,3.0,2.7,0.25,1.0,0.65,...,2.0,0.5,1.0,0.5,1.0,2.0,4.0,243.0,35.0,0.0
25%,31.25,50.0,43.5,5.7,6.3,3.5,3.5,1.5625,2.5,2.175,...,3.15,1.5,2.5,2.125,3.0,3.0,33.25,5627.0,222.25,0.1
50%,63.5,66.5,59.0,6.85,6.9,4.0,3.9,3.175,3.325,2.95,...,3.45,3.0,3.5,3.0,3.5,3.5,72.5,19103.0,1446.0,0.2
75%,89.0,81.0,75.0,7.5,7.4,4.5,4.2,4.45,4.05,3.75,...,3.7,4.5,4.0,4.0,4.0,3.5,168.5,45185.75,4439.5,0.4
max,100.0,94.0,94.0,9.6,8.6,5.0,4.8,5.0,4.7,4.7,...,4.3,5.0,4.5,4.5,5.0,4.5,2375.0,334164.0,34846.0,0.5


In [15]:
fandango_score_info = ['FILM', 'Fandango_Stars', 'Fandango_Ratingvalue', 'Fandango_votes', 'Fandango_Difference']

In [16]:
fandango_score_comparison[fandango_score_info]

Unnamed: 0,FILM,Fandango_Stars,Fandango_Ratingvalue,Fandango_votes,Fandango_Difference
0,Avengers: Age of Ultron (2015),5.0,4.5,14846,0.5
1,Cinderella (2015),5.0,4.5,12640,0.5
2,Ant-Man (2015),5.0,4.5,12055,0.5
3,Do You Believe? (2015),5.0,4.5,1793,0.5
4,Hot Tub Time Machine 2 (2015),3.5,3.0,1021,0.5
...,...,...,...,...,...
141,Mr. Holmes (2015),4.0,4.0,1348,0.0
142,'71 (2015),3.5,3.5,192,0.0
143,"Two Days, One Night (2014)",3.5,3.5,118,0.0
144,Gett: The Trial of Viviane Amsalem (2015),3.5,3.5,59,0.0


In [13]:
movie_rating = pd.read_csv("movie_ratings_16_17.csv")
movie_rating.head(5)

Unnamed: 0,movie,year,metascore,imdb,tmeter,audience,fandango,n_metascore,n_imdb,n_tmeter,n_audience,nr_metascore,nr_imdb,nr_tmeter,nr_audience
0,10 Cloverfield Lane,2016,76,7.2,90,79,3.5,3.8,3.6,4.5,3.95,4.0,3.5,4.5,4.0
1,13 Hours,2016,48,7.3,50,83,4.5,2.4,3.65,2.5,4.15,2.5,3.5,2.5,4.0
2,A Cure for Wellness,2016,47,6.6,40,47,3.0,2.35,3.3,2.0,2.35,2.5,3.5,2.0,2.5
3,A Dog's Purpose,2017,43,5.2,33,76,4.5,2.15,2.6,1.65,3.8,2.0,2.5,1.5,4.0
4,A Hologram for the King,2016,58,6.1,70,57,3.0,2.9,3.05,3.5,2.85,3.0,3.0,3.5,3.0


In [17]:
movie_rating.describe()

Unnamed: 0,year,metascore,imdb,tmeter,audience,fandango,n_metascore,n_imdb,n_tmeter,n_audience,nr_metascore,nr_imdb,nr_tmeter,nr_audience
count,214.0,214.0,214.0,214.0,214.0,214.0,214.0,214.0,214.0,214.0,214.0,214.0,214.0,214.0
mean,2016.107477,53.266355,6.440654,53.621495,58.626168,3.89486,2.663318,3.220327,2.681075,2.931308,2.658879,3.214953,2.691589,2.915888
std,0.310444,17.843585,1.030056,30.242396,21.10004,0.516781,0.892179,0.515028,1.51212,1.055002,0.924619,0.526803,1.519273,1.060352
min,2016.0,11.0,3.5,0.0,11.0,2.5,0.55,1.75,0.0,0.55,0.5,2.0,0.0,0.5
25%,2016.0,39.0,5.825,27.0,43.25,3.5,1.95,2.9125,1.35,2.1625,2.0,3.0,1.5,2.0
50%,2016.0,53.5,6.5,56.5,60.5,4.0,2.675,3.25,2.825,3.025,2.5,3.0,3.0,3.0
75%,2016.0,66.0,7.2,83.0,76.75,4.5,3.3,3.6,4.15,3.8375,3.5,3.5,4.0,4.0
max,2017.0,99.0,8.5,99.0,93.0,5.0,4.95,4.25,4.95,4.65,5.0,4.0,5.0,4.5


In [18]:
movie_rating_info = ['movie', 'year', 'fandango']

In [19]:
movie_rating[movie_rating_info]

Unnamed: 0,movie,year,fandango
0,10 Cloverfield Lane,2016,3.5
1,13 Hours,2016,4.5
2,A Cure for Wellness,2016,3.0
3,A Dog's Purpose,2017,4.5
4,A Hologram for the King,2016,3.0
...,...,...,...
209,X-Men: Apocalypse,2016,4.0
210,XX,2017,3.0
211,xXx: Return of Xander Cage,2017,4.0
212,Zoolander 2,2016,2.5


# Defining the population

We found that different sampling criteria was used for each dataset.
The dataset **fandango_score_comparison** contains the movies that until **Aug. 24th, 2015** have had:
<ul>
    <li>At least <b>30</b> fan reviews on Fandango.</li>
    <li>Tickets on sale on 2015.</li>
    <li>Ratings on: Rotten Tomatoes and RT User. </li>
    <li>Scores on: Metractiritc, Metacritic User and IMDb.</li>
</ul>

The **movie_rating** analyzed dataset contains movies that:

<ul>
    <li>Most popular movies with significant number of votes (without specifing it).</li>
    <li>Released in 2016 and 2017.</li>
</ul>

After viewing the sampling criteria, we understand that the sampling process in both cases was more selective rather than random, this means that not all the movies had the same chances to be included in both samples. Hence, the resulting samples are very unlikely to be representative of the entire population of interest.

At this point, we can decide or to collect new data, or to tweak the goal of our analysis so that to change the population of interest and make the available samples representative. The second approach looks much faster and, in general, more realistic, so let's hold to it.

# Analysis Goal

Our initial goal, as stated in the introduction, was to analyze more recent movie ratings data and determine whether there has been any change in Fandango's rating system after Hickey's analysis. Introducing changes to this statement, we have to keep in mind that the modified goal has still to be its fairly good proxy.

We see that in both samples the selected movies are somehow the popular ones (even though the quantitive definition of popularity for them differs), and have ratings from the same review sources, including Rotten Tomatoes, Metacritic, IMDB, and Fandango (for now, we'll be interested only in Fandango's ratings). The dataset before the analysis (the fandango_score_comparison dataframe) is related to **2015**, while the one after the analysis (movie_rating) – to **2016 and 2017**.

We can decide to exclude the data for **2017** from the second dataset and **compare only the year before the analysis (2015) and the year after (2016)**.

**In this way, our new project goal is to analyze popular movie ratings for 2015 and determine whether Fandango changed their rating system for popular movies in 2016.**