# Introduction 

- In October 2015, a data journalist named Walt Hickey analyzed movie ratings data and found strong evidence to suggest that Fandango's rating system was biased and dishonest (Fandango is an online movie ratings aggregator).
- Hickey found that there's a significant discrepancy between the number of stars displayed to users and the actual rating, which he was able to find in the HTML of the page. He was able to find that:

1. The actual rating was almost always rounded up to the nearest half-star. For instance, a 4.1 movie would be rounded off to 4.5 stars, not to 4 stars, as you may expect.
2. In the case of 8% of the ratings analyzed, the rounding up was done to the nearest whole star. For instance, a 4.5 rating would be rounded off to 5 stars.
3. For one movie rating, the rounding off was completely bizarre: from a rating of 4 in the HTML of the page to a displayed rating of 5 stars.

- In this project, we'll analyze more recent movie ratings data to determine whether there has been any change in Fandango's rating system after Hickey's analysis.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

fandango = pd.read_csv('Downloads/Datasets/fandango_score_comparison.csv')
ratings_16_17 = pd.read_csv('Downloads/Datasets/movie_ratings_16_17.csv')

In [2]:
print(fandango.head())

                             FILM  RottenTomatoes  RottenTomatoes_User  \
0  Avengers: Age of Ultron (2015)              74                   86   
1               Cinderella (2015)              85                   80   
2                  Ant-Man (2015)              80                   90   
3          Do You Believe? (2015)              18                   84   
4   Hot Tub Time Machine 2 (2015)              14                   28   

   Metacritic  Metacritic_User  IMDB  Fandango_Stars  Fandango_Ratingvalue  \
0          66              7.1   7.8             5.0                   4.5   
1          67              7.5   7.1             5.0                   4.5   
2          64              8.1   7.8             5.0                   4.5   
3          22              4.7   5.4             5.0                   4.5   
4          29              3.4   5.1             3.5                   3.0   

   RT_norm  RT_user_norm  ...  IMDB_norm  RT_norm_round  RT_user_norm_round  \
0     3

In [3]:
print(ratings_16_17.head())

                     movie  year  metascore  imdb  tmeter  audience  fandango  \
0      10 Cloverfield Lane  2016         76   7.2      90        79       3.5   
1                 13 Hours  2016         48   7.3      50        83       4.5   
2      A Cure for Wellness  2016         47   6.6      40        47       3.0   
3          A Dog's Purpose  2017         43   5.2      33        76       4.5   
4  A Hologram for the King  2016         58   6.1      70        57       3.0   

   n_metascore  n_imdb  n_tmeter  n_audience  nr_metascore  nr_imdb  \
0         3.80    3.60      4.50        3.95           4.0      3.5   
1         2.40    3.65      2.50        4.15           2.5      3.5   
2         2.35    3.30      2.00        2.35           2.5      3.5   
3         2.15    2.60      1.65        3.80           2.0      2.5   
4         2.90    3.05      3.50        2.85           3.0      3.0   

   nr_tmeter  nr_audience  
0        4.5          4.0  
1        2.5          4.0  
2 

In [8]:
#Eliminating irrelevant columns

before = fandango[['FILM', 'Fandango_Stars', 'Fandango_Ratingvalue', 'Fandango_votes', 'Fandango_Difference']]
after = ratings_16_17[['movie', 'year', 'fandango']]

In [9]:
before.describe(include = 'all')

Unnamed: 0,FILM,Fandango_Stars,Fandango_Ratingvalue,Fandango_votes,Fandango_Difference
count,146,146.0,146.0,146.0,146.0
unique,146,,,,
top,Avengers: Age of Ultron (2015),,,,
freq,1,,,,
mean,,4.089041,3.845205,3848.787671,0.243836
std,,0.540386,0.502831,6357.778617,0.152665
min,,3.0,2.7,35.0,0.0
25%,,3.5,3.5,222.25,0.1
50%,,4.0,3.9,1446.0,0.2
75%,,4.5,4.2,4439.5,0.4


We can see that the min value of fandango stars is 3.0 which indicates a huge bias while rating the movies. This says that according to fandango there are no bad movies. 

In [10]:
after.describe(include='all')

Unnamed: 0,movie,year,fandango
count,214,214.0,214.0
unique,214,,
top,10 Cloverfield Lane,,
freq,1,,
mean,,2016.107477,3.89486
std,,0.310444,0.516781
min,,2016.0,2.5
25%,,2016.0,3.5
50%,,2016.0,4.0
75%,,2016.0,4.5


Even the later ratings of popular movies in fandango indicates a bias. We can see that atleast 50% of the movies are rated 4 stars and above.