# Movie Ratings

## Average, Normalized & Standardized

### Task:

Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movies that they have seen on a scale of 1 to 5. There should be at least one movie that not everyone has seen!
Take the results (observations) and store them somewhere (like a SQL database, or a .CSV file). Load the information into a pandas dataframe. Your solution should include Python and pandas code that accomplishes the following:
1. Load the ratings by user information that you collected into a pandas dataframe.
2. Show the average ratings for each user and each movie.
3. Create a new pandas dataframe, with normalized ratings for each user. Again, show the average ratings for each user and each movie.
4. Provide a text-based conclusion: explain what might be advantages and disadvantages of using normalized ratings instead of the actual ratings.
5. [Extra credit] Create another new pandas dataframe, with standardized ratings for each user. Once again, show the average ratings for each user and each movie.

In [3]:
import pandas as pd
import numpy as np

##### Load ratings in a Pandas dataframe:

In [4]:
movie_ratings = pd.read_csv(r'C:\Users\DJEli112\Desktop\CUNY SPS\Spring 2017\IS 362\MovieRatings.csv', index_col=0)

###### Use .head() to see the first five rows of ratings table:

In [5]:
movie_ratings.head()

Unnamed: 0,Logan,Get Out,Beauty and the Beast,The Lego Batman Movie,Fifty Shades Darker,Kong: Skull Island
Elene,4.0,,5.0,4.0,2.0,1.0
Ximena,,4.0,4.0,5.0,3.0,
Jess,4.0,,4.0,4.0,,4.0
Rebecca,5.0,2.0,,3.0,3.0,3.0
Hillary,,2.0,3.0,,2.0,2.0


### Average Ratings

##### Show average ratings for each movie:

In [22]:
avg_movie_ratings = movie_ratings.mean()
avg_movie_ratings

Logan                    4.333333
Get Out                  2.666667
Beauty and the Beast     4.000000
The Lego Batman Movie    4.000000
Fifty Shades Darker      2.500000
Kong: Skull Island       2.500000
dtype: float64

##### Show average ratings for each user:

In [33]:
avg_user_ratings = movie_ratings.mean(axis=1)
avg_user_ratings

Elene      3.20
Ximena     4.00
Jess       4.00
Rebecca    3.20
Hillary    2.25
dtype: float64

### Normalized Ratings

##### Use normalization formula and view new values in a pandas dataframe:

In [28]:
normalized_ratings = (movie_ratings - movie_ratings.min()) / (movie_ratings.max() - movie_ratings.min())
normalized_ratings

Unnamed: 0,Logan,Get Out,Beauty and the Beast,The Lego Batman Movie,Fifty Shades Darker,Kong: Skull Island
Elene,0.0,,1.0,0.5,0.0,0.0
Ximena,,1.0,0.5,1.0,1.0,
Jess,0.0,,0.5,0.5,,1.0
Rebecca,1.0,0.0,,0.0,1.0,0.666667
Hillary,,0.0,0.0,,0.0,0.333333


###### Show average normalized ratings for each movie:

In [29]:
avg_normalized_movie_ratings = normalized_ratings.mean()
avg_normalized_movie_ratings

Logan                    0.333333
Get Out                  0.333333
Beauty and the Beast     0.500000
The Lego Batman Movie    0.500000
Fifty Shades Darker      0.500000
Kong: Skull Island       0.500000
dtype: float64

##### Show average normalized ratings for each user:

In [30]:
avg_normalized_user_ratings = normalized_ratings.mean(axis=1)
avg_normalized_user_ratings

Elene      0.300000
Ximena     0.875000
Jess       0.500000
Rebecca    0.533333
Hillary    0.083333
dtype: float64

### Standardized Ratings

##### Use stantardization formula and view new values in a pandas dataframe:

In [24]:
standardized_ratings = (movie_ratings-movie_ratings.mean()) / movie_ratings.std()
standardized_ratings

Unnamed: 0,Logan,Get Out,Beauty and the Beast,The Lego Batman Movie,Fifty Shades Darker,Kong: Skull Island
Elene,-0.57735,,1.224745,0.0,-0.866025,-1.161895
Ximena,,1.154701,0.0,1.224745,0.866025,
Jess,-0.57735,,0.0,0.0,,1.161895
Rebecca,1.154701,-0.57735,,-1.224745,0.866025,0.387298
Hillary,,-0.57735,-1.224745,,-0.866025,-0.387298


##### Show average standardized ratings for each movie:

In [26]:
avg_standardized_movie_ratings = standardized_ratings.mean()
avg_standardized_movie_ratings

Logan                    5.181041e-16
Get Out                  1.480297e-16
Beauty and the Beast     0.000000e+00
The Lego Batman Movie    0.000000e+00
Fifty Shades Darker      0.000000e+00
Kong: Skull Island       0.000000e+00
dtype: float64

##### Show average standardized ratings for each user:

In [27]:
avg_standardized_user_ratings = standardized_ratings.mean(axis=1)
avg_standardized_user_ratings

Elene     -0.276105
Ximena     0.811368
Jess       0.146136
Rebecca    0.121186
Hillary   -0.763855
dtype: float64

### Final Analysis:

Websites, such as IMDB and Rotten Tomatoes, compile similar data in which people rate upcoming movies and TV shows. In this case, normalized ratings and standardarized ratings provide different approaches in displaying the movie dataset. Although normalized ratings are difficult to comprehend at first, they have a multitude of advantages in relation to movie ratings. First, someone may want to update this dataset and add more people or movies on it to collect more data. Normalizing the ratings will remove any indication of data duplication in a large data set. Addtionally, normalizing the data can be grouped by the movie title and by user easily. Disdvantages may include interpreting the normalized ratings and analyzing what these numbers mean in the given context. Since the normalized ratings scale the ratings from 0 to 1, it may not accurately depict movie ratings by title or by user at first glance. As stated in the Learning pandas text, the data can be plotted against a graph to compare the normalized ratings and standard ratings to further analyze and draw a conclusion. Overall, standard ratings and normalized ratings are pertintent in describing a user's movie rating. It is up to a data scientist to figure out how to analyze ratings for each movie by utilizing average, normalization, and standardization. 