Choose six recent popular movies.  Ask at least five people that you know (friends, family, classmates, 
imaginary friends) to rate each of these movies that they have seen on a scale of 1 to 5.  There should be 
at least one movie that not everyone has seen! 
Take the results (observations) and store them somewhere (like a SQL database, or a .CSV file).  Load the 

information into a pandas dataframe.  Your solution should include Python and pandas code that 
accomplishes the following: 
1. Load the ratings by user information that you collected into a pandas dataframe. 
2. Show the average ratings for each user and each movie. 
3. Create a new pandas dataframe, with normalized ratings for each user.  Again, show the average 
ratings for each user and each movie. 
4. Provide a text-based conclusion: explain what might be advantages and disadvantages of using 
normalized ratings instead of the actual ratings. 
5. [Extra credit] Create another new pandas dataframe, with standardized ratings for each user.  
Once again, show the average ratings for each user and each movie.

## Import pandas and numpy libraries

In [49]:
import pandas as pd
import numpy as np

In [50]:
mr = pd.read_csv("movie_rating.csv", index_col = 0)

# AVERAGE DATA

In [51]:
mr

Unnamed: 0,Beauty and the Beast,Legionnaire,Black Diamond,Strike back,Wakanda,Aqua Man
Evans,3,4.0,2.0,4.0,5.0,
Diana,4,4.0,3.0,,5.0,4.0
David,4,3.0,,4.0,,
Michelle,5,5.0,,,4.0,5.0
Rose,5,5.0,1.0,5.0,4.0,5.0
Benard,3,,4.0,5.0,,3.0


## Replace NaN values with 0

In [52]:
mr.replace(np.NaN, 0)

Unnamed: 0,Beauty and the Beast,Legionnaire,Black Diamond,Strike back,Wakanda,Aqua Man
Evans,3,4.0,2.0,4.0,5.0,0.0
Diana,4,4.0,3.0,0.0,5.0,4.0
David,4,3.0,0.0,4.0,0.0,0.0
Michelle,5,5.0,0.0,0.0,4.0,5.0
Rose,5,5.0,1.0,5.0,4.0,5.0
Benard,3,0.0,4.0,5.0,0.0,3.0


## Average ratings for each movie

In [53]:
average_by_movie = mr.mean()
average_by_movie

Beauty and the Beast    4.00
Legionnaire             4.20
Black Diamond           2.50
Strike back             4.50
Wakanda                 4.50
Aqua Man                4.25
dtype: float64

## Average ratings for each user

In [54]:
average_by_viewer = mr.mean(axis=1)
average_by_viewer

Evans       3.600000
Diana       4.000000
David       3.666667
Michelle    4.750000
Rose        4.166667
Benard      3.750000
dtype: float64

# NORMALIZED DATA

In [55]:
normalized_ratings = (mr - mr.min()) / (mr.max() - mr.min())
normalized_ratings

Unnamed: 0,Beauty and the Beast,Legionnaire,Black Diamond,Strike back,Wakanda,Aqua Man
Evans,0.0,0.5,0.333333,0.0,1.0,
Diana,0.5,0.5,0.666667,,1.0,0.5
David,0.5,0.0,,0.0,,
Michelle,1.0,1.0,,,0.0,1.0
Rose,1.0,1.0,0.0,1.0,0.0,1.0
Benard,0.0,,1.0,1.0,,0.0


In [56]:
normalized_ratings.replace(np.NaN, 0)

Unnamed: 0,Beauty and the Beast,Legionnaire,Black Diamond,Strike back,Wakanda,Aqua Man
Evans,0.0,0.5,0.333333,0.0,1.0,0.0
Diana,0.5,0.5,0.666667,0.0,1.0,0.5
David,0.5,0.0,0.0,0.0,0.0,0.0
Michelle,1.0,1.0,0.0,0.0,0.0,1.0
Rose,1.0,1.0,0.0,1.0,0.0,1.0
Benard,0.0,0.0,1.0,1.0,0.0,0.0


## Normalized ratings by viewer

In [57]:
normalized_avg_by_user = normalized_ratings.mean(axis = 1)
normalized_avg_by_user

Evans       0.366667
Diana       0.633333
David       0.166667
Michelle    0.750000
Rose        0.666667
Benard      0.500000
dtype: float64

## Normalized ratings by movie

In [58]:
normalized_avg_by_movie = normalized_ratings.mean()
normalized_avg_by_movie

Beauty and the Beast    0.500
Legionnaire             0.600
Black Diamond           0.500
Strike back             0.500
Wakanda                 0.500
Aqua Man                0.625
dtype: float64

# CONCLUSION

Normalized data can have its advantages as well as its disadvantages. Normailized data can help users get a better look at the data by accounting for anomalies such as outliners and/or duplicates. Normalization serves the purpose of bringing the indicators into the same unit. A disadvantage of normalized data can be the 'null' values. It creates null values which can become unrealiable data and confusing to the user. Having normalized data can as well have its advantages. One advantage is it gives users a better scale of viewing 'normal' data that can be used in graphs for plotting if needed for graphical viewing.

# STANDARDIZED DATA

In [59]:
standardized = (mr - mr.mean())/mr.std()
standardized

Unnamed: 0,Beauty and the Beast,Legionnaire,Black Diamond,Strike back,Wakanda,Aqua Man
Evans,-1.118034,-0.239046,-0.387298,-0.866025,0.866025,
Diana,0.0,-0.239046,0.387298,,0.866025,-0.261116
David,0.0,-1.434274,,-0.866025,,
Michelle,1.118034,0.956183,,,-0.866025,0.783349
Rose,1.118034,0.956183,-1.161895,0.866025,-0.866025,0.783349
Benard,-1.118034,,1.161895,0.866025,,-1.305582


In [60]:
standardized.replace(np.NaN, 0)

Unnamed: 0,Beauty and the Beast,Legionnaire,Black Diamond,Strike back,Wakanda,Aqua Man
Evans,-1.118034,-0.239046,-0.387298,-0.866025,0.866025,0.0
Diana,0.0,-0.239046,0.387298,0.0,0.866025,-0.261116
David,0.0,-1.434274,0.0,-0.866025,0.0,0.0
Michelle,1.118034,0.956183,0.0,0.0,-0.866025,0.783349
Rose,1.118034,0.956183,-1.161895,0.866025,-0.866025,0.783349
Benard,-1.118034,0.0,1.161895,0.866025,0.0,-1.305582


## Standardized ratings by viewer

In [61]:
standardized_avg_by_users = standardized.mean(1) 
standardized_avg_by_users

Evans      -0.348876
Diana       0.150632
David      -0.766767
Michelle    0.497885
Rose        0.282612
Benard     -0.098924
dtype: float64

## Standardized ratings by movie

In [62]:
standardized_avg_by_movie = standardized.mean(0)
standardized_avg_by_movie

Beauty and the Beast    0.000000e+00
Legionnaire            -1.776357e-16
Black Diamond           0.000000e+00
Strike back             0.000000e+00
Wakanda                 0.000000e+00
Aqua Man                5.551115e-17
dtype: float64