# Movie Review Analysis

__We analyze movie reviews to display average ratings for each viewer and each movie.  First, we read in the csv file with the relevant data.__

In [14]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

reviews = pd.read_csv('reviews.csv', index_col = 0)
reviews

Unnamed: 0,Peter Rabbit,Black Panther,The Post,Molly's Game,A Wrinkle In Time,Game Night
Matthew,,5,3.0,4.0,,4.0
John,,4,,,,2.0
Diana,4.0,3,5.0,,5.0,3.0
Cynthia,3.0,5,,4.0,,
Christina,4.0,5,,3.0,4.0,


__We see that some values are missing, as not all viewers have seen all movies.  However, we opt to leave the NaN values to prevent skewing the mean calculations with 0 values.  (Pandas will automatically ignore NaN values when calculating the mean.)__

__Next, display the average ratings for each viewer and each movie.__

In [15]:
reviews.mean(axis = 1)

Matthew      4.0
John         3.0
Diana        4.0
Cynthia      4.0
Christina    4.0
dtype: float64

In [16]:
reviews.mean(axis = 0)

Peter Rabbit         3.666667
Black Panther        4.400000
The Post             4.000000
Molly's Game         3.666667
A Wrinkle In Time    4.500000
Game Night           3.000000
dtype: float64

__From this, we see that "A Wrinkle In Time" has the highest average rating of 4.5, and John is the harshest critic with an average rating of 3.0.__


__Next, we normalize the data using the following equation:__


![normalization formula](norm.png "Formula")

In [17]:
normalized = ((reviews - reviews.min()) / (reviews.max() - reviews.min()))
normalized

Unnamed: 0,Peter Rabbit,Black Panther,The Post,Molly's Game,A Wrinkle In Time,Game Night
Matthew,,1.0,0.0,1.0,,1.0
John,,0.5,,,,0.0
Diana,1.0,0.0,1.0,,1.0,0.5
Cynthia,0.0,1.0,,1.0,,
Christina,1.0,1.0,,0.0,0.0,


__Next, we display the normalized average ratings for each viewer and each movie.__

In [18]:
normalized.mean(axis = 1)

Matthew      0.750000
John         0.250000
Diana        0.700000
Cynthia      0.666667
Christina    0.500000
dtype: float64

In [19]:
normalized.mean(axis = 0)

Peter Rabbit         0.666667
Black Panther        0.700000
The Post             0.500000
Molly's Game         0.666667
A Wrinkle In Time    0.500000
Game Night           0.500000
dtype: float64

__In the normalized data, we see that John is still clearly the harshest critic, but there are now more variations between the other viewers' average ratings.__

__Also, now it appears "Black Panther" is the highest rated movie.  The difference seem to arise because all viewers saw "Black Panther" and it received three 5-star ratings, whereas only two viewers saw "A Wrinkle in Time" and it received only one 5-star rating.  Normalization seems to give a more accurate picture when working with datasets that have missing values.__