In [1]:
# Load the information into a pandas dataframe.
import pandas as pd
import numpy as np

In [2]:
movieRatings = pd.read_csv('movies.csv', index_col = 0)
movieRatings

Unnamed: 0_level_0,LOTR,GOT,Avangers,Mulan,Pets_Reunite,The_Mandalorian
Names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alex,5.0,5.0,2.0,,1.0,4.0
Ben,5.0,5.0,,3.0,3.0,5.0
Ed,1.0,5.0,3.0,3.0,,5.0
Artur,4.0,,3.0,4.0,1.0,5.0
Aria,,5.0,4.0,5.0,2.0,


In [3]:
# Replace all 'NaN' values with 0, and calculate average ratings.

movieRatings.fillna(value=0, inplace=True)
movieRatings

Unnamed: 0_level_0,LOTR,GOT,Avangers,Mulan,Pets_Reunite,The_Mandalorian
Names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alex,5.0,5.0,2.0,0.0,1.0,4.0
Ben,5.0,5.0,0.0,3.0,3.0,5.0
Ed,1.0,5.0,3.0,3.0,0.0,5.0
Artur,4.0,0.0,3.0,4.0,1.0,5.0
Aria,0.0,5.0,4.0,5.0,2.0,0.0


In [4]:
# Average rating per user.
userAverage = movieRatings.mean(axis = 1)
userAverage

Names
Alex     2.833333
Ben      3.500000
Ed       2.833333
Artur    2.833333
Aria     2.666667
dtype: float64

In [5]:
# Average ratings per movie.
movieRatings.mean()

LOTR               3.0
GOT                4.0
Avangers           2.4
Mulan              3.0
Pets_Reunite       1.4
The_Mandalorian    3.8
dtype: float64

In [6]:
# New pandas dataframe, with normalized ratings for each user
normalized = (movieRatings - movieRatings.min()) / (movieRatings.max() - movieRatings.min())
normalized

Unnamed: 0_level_0,LOTR,GOT,Avangers,Mulan,Pets_Reunite,The_Mandalorian
Names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alex,1.0,1.0,0.5,0.0,0.333333,0.8
Ben,1.0,1.0,0.0,0.6,1.0,1.0
Ed,0.2,1.0,0.75,0.6,0.0,1.0
Artur,0.8,0.0,0.75,0.8,0.333333,1.0
Aria,0.0,1.0,1.0,1.0,0.666667,0.0


In [7]:
# Normalized average ratings per user
normalized.mean(axis = 1)

Names
Alex     0.605556
Ben      0.766667
Ed       0.591667
Artur    0.613889
Aria     0.611111
dtype: float64

### Conclusion
- Normalized data has its advantages and its disadvantages. An advantage is it gives users a better scale of viewing 'normal' data that can be used in graphs for plotting if needed for graphical viewing.  A disadvantage of normalized data can be the 'null' values. It creates null values which can become unrealiable data and confusing to the user. 

### STANDARDIZED DATA

In [10]:
movie_standardized = (movieRatings - movieRatings.mean())/movieRatings.std()
movie_standardized

Unnamed: 0_level_0,LOTR,GOT,Avangers,Mulan,Pets_Reunite,The_Mandalorian
Names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alex,0.852803,0.447214,-0.263752,-1.603567,-0.350823,0.092253
Ben,0.852803,0.447214,-1.582513,0.0,1.403293,0.553519
Ed,-0.852803,0.447214,0.395628,0.0,-1.227881,0.553519
Artur,0.426401,-1.788854,0.395628,0.534522,-0.350823,0.553519
Aria,-1.279204,0.447214,1.055009,1.069045,0.526235,-1.752809


In [11]:
# Average rating per user.
movie_standardized.mean(axis=1)

Names
Alex    -0.137646
Ben      0.279052
Ed      -0.114054
Artur   -0.038268
Aria     0.010915
dtype: float64

In [12]:
# Average rating per movie
movie_standardized.mean(axis=0)

LOTR              -4.440892e-17
GOT                1.110223e-17
Avangers           8.881784e-17
Mulan              0.000000e+00
Pets_Reunite       1.110223e-16
The_Mandalorian    8.881784e-17
dtype: float64

### Conclusion
- Standardized Data can achieve better performance if the data has a consistent scale or distribution, helps center the data around 0 and to scale in respect to standard deviation. 