# IS 362 Week 7 Assignment

Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates,
imaginary friends) to rate each of these movies that they have seen on a scale of 1 to 5. There should be
at least one movie that not everyone has seen!
Take the results (observations) and store them somewhere (like a SQL database, or a .CSV file). Load the
information into a pandas dataframe. Your solution should include Python and pandas code that
accomplishes the following:

1. Load the ratings by user information that you collected into a pandas dataframe.
2. Show the average ratings for each user and each movie.
3. Create a new pandas dataframe, with normalized ratings for each user. Again, show the average
ratings for each user and each movie.
4. Provide a text-based conclusion: explain what might be advantages and disadvantages of using
normalized ratings instead of the actual ratings.
5. [Extra credit] Create another new pandas dataframe, with standardized ratings for each user.
Once again, show the average ratings for each user and each movie

In [1]:
import csv
import pandas as pd

## Create a CSV file

In [17]:
with open('movies.csv', 'w') as csvfile:
    filewriter = csv.writer(csvfile, delimiter=',')
    filewriter.writerow(['User', 'Black_Panther', 'Wonder_Woman', 'Logan', 
                         'Beauty_and_the_Beast', 'Coco', 'The_Shape_of_Water'])
    filewriter.writerow(['Laiza', 5, 5, 4, 5, 5, ''])
    filewriter.writerow(['Joy', 5, 5, 4, 5, '', ''])
    filewriter.writerow(['Doreen', '', 4, 3, 5, 5, ''])
    filewriter.writerow(['Elijah', 5, 3, 5, 2, 5, ''])
    filewriter.writerow(['Lucas', 5, '', 4, '', 5, ''])
    

In [18]:
# opening a csv file by printing each row

with open('movies.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

['User', 'Black_Panther', 'Wonder_Woman', 'Logan', 'Beauty_and_the_Beast', 'Coco', 'The_Shape_of_Water']
[]
['Laiza', '5', '5', '4', '5', '5', '']
[]
['Joy', '5', '5', '4', '5', '', '']
[]
['Doreen', '', '4', '3', '5', '5', '']
[]
['Elijah', '5', '3', '5', '2', '5', '']
[]
['Lucas', '5', '', '4', '', '5', '']
[]


## Load CSV file into a pandas Dataframe

In [49]:
ratings = pd.read_csv('movies.csv')
ratings

Unnamed: 0,User,Black_Panther,Wonder_Woman,Logan,Beauty_and_the_Beast,Coco,The_Shape_of_Water
0,Laiza,5.0,5.0,4,5.0,5.0,
1,Joy,5.0,5.0,4,5.0,,
2,Doreen,,4.0,3,5.0,5.0,
3,Elijah,5.0,3.0,5,2.0,5.0,
4,Lucas,5.0,,4,,5.0,


It looks like the ratings are different datatypes.

In [20]:
# To verify the datatype

ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 7 columns):
User                    5 non-null object
Black_Panther           4 non-null float64
Wonder_Woman            4 non-null float64
Logan                   5 non-null int64
Beauty_and_the_Beast    4 non-null float64
Coco                    4 non-null float64
The_Shape_of_Water      0 non-null float64
dtypes: float64(5), int64(1), object(1)
memory usage: 360.0+ bytes


In [23]:
# Change datatype of column Logan to a float (from int to float)

ratings['Logan'] = ratings.Logan.astype(float)
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 7 columns):
User                    5 non-null object
Black_Panther           4 non-null float64
Wonder_Woman            4 non-null float64
Logan                   5 non-null float64
Beauty_and_the_Beast    4 non-null float64
Coco                    4 non-null float64
The_Shape_of_Water      0 non-null float64
dtypes: float64(6), object(1)
memory usage: 360.0+ bytes


## Average

### Average rating for each movie

In [25]:
ratings.mean()

Black_Panther           5.00
Wonder_Woman            4.25
Logan                   4.00
Beauty_and_the_Beast    4.25
Coco                    5.00
The_Shape_of_Water       NaN
dtype: float64

### Average rating of each user

In [58]:
ratings.mean(axis=1)

0    4.800000
1    4.750000
2    4.250000
3    4.000000
4    4.666667
dtype: float64

In [57]:
# Want the output to include the user's name.  Will make a copy of the dataframe and change the index.

ratings_copy = ratings.copy()
ratings_copy = ratings_copy.set_index(['User'])
ratings_copy.mean(axis=1)

User
Laiza     4.800000
Joy       4.750000
Doreen    4.250000
Elijah    4.000000
Lucas     4.666667
dtype: float64

## Normalized

In [59]:
normalized_ratings = (ratings_copy - ratings_copy.min()) / (ratings_copy.max() - ratings_copy.min())
normalized_ratings

Unnamed: 0_level_0,Black_Panther,Wonder_Woman,Logan,Beauty_and_the_Beast,Coco,The_Shape_of_Water
User,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Laiza,,1.0,0.5,1.0,,
Joy,,1.0,0.5,1.0,,
Doreen,,0.5,0.0,1.0,,
Elijah,,0.0,1.0,0.0,,
Lucas,,,0.5,,,


### Average of normalized ratings for each movie

In [60]:
normalized_ratings.mean()

Black_Panther             NaN
Wonder_Woman            0.625
Logan                   0.500
Beauty_and_the_Beast    0.750
Coco                      NaN
The_Shape_of_Water        NaN
dtype: float64

### Average of normalized ratings by user

In [61]:
normalized_ratings.mean(axis=1)

User
Laiza     0.833333
Joy       0.833333
Doreen    0.500000
Elijah    0.333333
Lucas     0.500000
dtype: float64

Looking at the actual average and normalized averages, we can see differences in the results.  Normalization refers to a process that makes something more normal or regular. Normaliztion of ratings means adjusting values measured on different scales to a notionally common scale prior to averaging; which scales all numeric variables in the range [0,1]. When you normalize data, you eliminate the units of measurement for data, enabling you to more easily compare data from different places.  A drawback is if you have outliers in your data set, normalizing your data will scale the "normal" data to a very small interval.

## Standardized

In [62]:
standardized_ratings = (ratings_copy - ratings_copy.mean()) / ratings_copy.std()
standardized_ratings

Unnamed: 0_level_0,Black_Panther,Wonder_Woman,Logan,Beauty_and_the_Beast,Coco,The_Shape_of_Water
User,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Laiza,,0.783349,0.0,0.5,,
Joy,,0.783349,0.0,0.5,,
Doreen,,-0.261116,-1.414214,0.5,,
Elijah,,-1.305582,1.414214,-1.5,,
Lucas,,,0.0,,,


### Average of standardized ratings for each movie

In [63]:
standardized_ratings.mean()

Black_Panther                    NaN
Wonder_Woman            5.551115e-17
Logan                   0.000000e+00
Beauty_and_the_Beast    0.000000e+00
Coco                             NaN
The_Shape_of_Water               NaN
dtype: float64

### Average of standardized ratings by user

In [64]:
standardized_ratings.mean(axis=1)

User
Laiza     0.427783
Joy       0.427783
Doreen   -0.391777
Elijah   -0.463790
Lucas     0.000000
dtype: float64