# Movie Recommendation Comparison

There are many different types of recommendation algorithms that can be developed depending on elements such as business purpose and data available. 

in this notebook the focus will be introducing a few types of recommendation engines from really simple (top rated movies) to a bit more advance.

The following methodologies will be looked at:

1.0 Adapted "Basket" Analysis

2.0 Correlation (Pearson R)

3.0 k-Nearest Neighbour

4.0 SVD

All of the above can be considered a form of Collaborative Filtering (asides from perhap Basket Analysis - but we will explain that in more detail later). Collaborative Filtering is in essence using individual "user" preference to recommend films similar users have purchased/rated.

As the aim of the this notebook is an introduction of recommendation concepts and will be explained using "out of the box" tools in Python to produce results. To judge results we will be using common sense to see if recommendations of **Star Wars** make sense.

Although we are creating recommendations based on a single movie, some of the methodologies below can be simply edited to make this more personalised for a user, and this will be explained in more detail with each step. 



Depending on need, sometimes the above tools and judgements may not be suitable, so some tweaks to methodologies  or more statistical validation will be required. This will be talked about in more detail later.

##### Dataset: MovieLens database

    
This dataset is created and maintained by Grouplens, which is made up of user reviews of movies, and movie descriptions. They have developed a number of datasets of varying sizes (including one with 20M movie reviews) however for this exercise, the 100K dataset will be used.

### 0.1 Data Import and Investigation

In [4]:
import numpy as np
import pandas as pd

In [314]:
# pass in column names for each CSV and read them using pandas. 
# Column names available in the readme file

#Reading users file:
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv(r'C:\Users\suhaib.qazi\Downloads\ml-100k\ml-100k\u.user', sep='|', names=u_cols,
 encoding='latin-1')

#Reading ratings file:
r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv(r'C:\Users\suhaib.qazi\Downloads\ml-100k\ml-100k\u.data', sep='\t', names=r_cols,
 encoding='latin-1')

#Reading items file:
i_cols = ['movie_id', 'movie title' ,'release date','video release date', 'IMDb URL', 'unknown', 'Action', 'Adventure',
 'Animation', 'Children\'s', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy',
 'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']
items = pd.read_csv(r'C:\Users\suhaib.qazi\Downloads\ml-100k\ml-100k\u.item', sep='|', names=i_cols,
 encoding='latin-1')

In [158]:
users.head()

Unnamed: 0,user_id,age,sex,occupation,zip_code
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


In [159]:
items.head()

Unnamed: 0,movie_id,movie title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children's,...,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


In [160]:
ratings.head()

Unnamed: 0,user_id,movie_id,rating,unix_timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [9]:
ratings['rating'].unique()

array([3, 1, 2, 4, 5], dtype=int64)

In [36]:
#isolate movie title for movie ID
cols = ['movie_id','movie title']
items2 = items[cols]

#Bring in movie names into ratings table
rating_items_raw = pd.merge(ratings,items2,how='left',on='movie_id')

#discard timestamp and movie ID
rating_items_raw = rating_items_raw.drop(['unix_timestamp','movie_id'],axis=1)

### Average rating table

In [146]:

average_rating = rating_items_raw.groupby('movie title').agg({'user_id':'count','rating':'mean'}).reset_index()
average_rating = average_rating.rename(columns = {'user_id':'ratingCount'})

In [224]:
rating_items2 = rating_items_raw.merge(average_rating[['movie title','ratingCount']], on='movie title')

In [149]:
rating_count_min = 300

### 0.2 Top Rated Movie Recommender

Before trying to build a personalised recommender, lets see how a simple "top 10" recommender would look like. This only looks at the movie review metrics to recommend films. It is not personalised at all, so may not be relevant to a specific user. 

There will be two suggested list, top 10 rated movies and top 10 reviewed movies.

In [152]:
top_rated = average_rating[(average_rating['ratingCount'] > rating_count_min)].sort_values(by=['rating'], ascending=False)
top_reviewed = average_rating.sort_values(by=['ratingCount'], ascending=False)

#### Top rated movies

In [153]:
top_rated.head(10)

Unnamed: 0,movie title,ratingCount,rating
318,"Close Shave, A (1995)",112,4.491071
1281,Schindler's List (1993),298,4.466443
1652,"Wrong Trousers, The (1993)",118,4.466102
273,Casablanca (1942),243,4.45679
1317,"Shawshank Redemption, The (1994)",283,4.44523
1215,Rear Window (1954),209,4.38756
1572,"Usual Suspects, The (1995)",267,4.385768
1398,Star Wars (1977),583,4.358491
3,12 Angry Men (1957),125,4.344
303,Citizen Kane (1941),198,4.292929


#### Top reviewed movies

In [154]:
top_reviewed.head(10)

Unnamed: 0,movie title,ratingCount,rating
1398,Star Wars (1977),583,4.358491
333,Contact (1997),509,3.803536
498,Fargo (1996),508,4.155512
1234,Return of the Jedi (1983),507,4.00789
860,Liar Liar (1997),485,3.156701
460,"English Patient, The (1996)",481,3.656965
1284,Scream (1996),478,3.441423
1523,Toy Story (1995),452,3.878319
32,Air Force One (1997),431,3.63109
744,Independence Day (ID4) (1996),429,3.438228


In [None]:
Just by eyeing the results, we can see a varied mix of films appearing in the top 10s, and both lists have a different set of movies, only Star Wars appears in both lists.

## 1.0 Market Basket Analysis Recommender

Basket Analysis tends to be used in  retail to find association between items being bought. It uses the concepts of Support, Confidence and Lift to make the association of whether items are associated or not.

E.g When pizza is bought, garlic bread is also bought in a high proportion of the pizza transaction

(kdnugget provide a good detailed explanation of this analysis: https://www.kdnuggets.com/2016/04/association-rules-apriori-algorithm-tutorial.html)

#### 1.1 Data Manipulation

To get the association statistics, each film needs to be matched against to every other film if they have been reviewed by at least one user

Bring in movie names into ratings table

In [37]:
#create variable of total users - used for calculations later
total_users = rating_items_raw['user_id'].nunique()


In [40]:
#merge table to itself based on user
merge1 = rating_items_raw.merge(rating_items_raw,on='user_id')


#### 1.2 Rating Differences and normalisation

Just because two films are reviewed by the same user, it does not necessarily mean the user liked both films. So the below is an attempt to develop a weight by which an association will be upweighted/downweighted depending on whether two films tend to be voted well or badly together.

This will be done by getting the average difference in rating bwetween two films. A low difference indicates a bad review association and a high difference indicates and good review association. Differences could vary depending on what the rating of one film was e.g. the difference in rating between two films rated 5 is 0, and likewise the difference is 0 for two films rated 1, so multiplying by a factor of the review of the first film would again prioritise better film reviews.

Normalising the differences so that they exist between 0 and 1 would then give a weight, which we can use to penalise bad film reviews.

In [41]:
#get rating differences for the movie
merge1['rating_diff'] = (merge1['rating_x'] - merge1['rating_y'])*merge1['rating_x']
#Get user count of movies reviewed by same users and average rating difference of the two movies by those users
merge2 = merge1.groupby(['movie title_x','movie title_y']).agg({'user_id':'count','rating_x':'mean','rating_diff':'mean'}).reset_index()
#Get minimum rating difference. Added 0.01 so that min != 0, as we want to normalise results
minim = (merge2['rating_diff'].min())+0.01
#Normalise rating differences
merge2['rating_diff_norm'] = (merge2['rating_diff'] - (minim)) / (merge2['rating_diff'].max() - minim)
#
merge2['rating_diff_norm2'] = merge2['rating_diff_norm'].pow(2)

#### 1.3 Support, Confidence and Lift Calculations

In [44]:
#discard rows where movie titles match

multi_item_support = merge2[merge2['movie title_x'] != merge2['movie title_y']]

In [71]:
#Find support
multi_item_support['multi_support']=multi_item_support['user_id']/total_users
#Calculate weighted support
multi_item_support['multi_support_upweight']=multi_item_support['multi_support']*(1-multi_item_support['rating_diff_norm'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Find indivisual movie supports

In [266]:
#count of users per movie
item_support = average_rating

In [267]:
item_support['support']=item_support['ratingCount']/total_users

Merge individual movie supports to the original table twice

In [269]:
#Merge for Movie 1
total1 = multi_item_support.merge(item_support, left_on='movie title_x', right_on='movie title')
total1b = total1.drop(['movie title','ratingCount'], axis=1)

In [272]:
#Merge for Movie 2
total2 = total1b.merge(item_support, left_on='movie title_y', right_on='movie title')
total2b = total2.drop(['movie title','ratingCount'], axis=1)

Calculate Confidence and Lifts (one without utilising weights, and the other weighted)

In [275]:
total2b['Confidence']= total2b['multi_support']/total2b['support_x']
total2b['Lift']= total2b['multi_support']/(total2b['support_x']*total2b['support_y'])
total2b['Lift2']= total2b['multi_support_upweight']/(total2b['support_x']*total2b['support_y'])



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


#### 1.4 Model Evaluation

Top 10 films then are associated with Star Wars

Set confidence to greater than 0.5. Which means the number of users who rated Star Wars and the other film, and the number of customers is more than the those who rated Star Wars and did not rate the other film

In [277]:
film = 'Star Wars (1977)'

In [278]:
cut_off = 300
confidence = 0.5

In [280]:
x = total2b[(total2b['movie title_x']==film) & (total2b['user_id'] > cut_off) & (total2b['Confidence'] > confidence)].sort_values(by=['Lift2'], ascending=False)
x['movie title_y'].head(10)

331343               Empire Strikes Back, The (1980)
857083                Raiders of the Lost Ark (1981)
882736                     Return of the Jedi (1983)
444288                         Godfather, The (1972)
958572              Silence of the Lambs, The (1991)
543597     Indiana Jones and the Last Crusade (1989)
846368                           Pulp Fiction (1994)
81820                      Back to the Future (1985)
365002                                  Fargo (1996)
1099451                             Toy Story (1995)
Name: movie title_y, dtype: object

In [281]:
y = total2b[(total2b['movie title_x']==film) & (total2b['user_id'] > cut_off)& (total2b['Confidence'] > confidence)].sort_values(by=['Lift'], ascending=False)
y['movie title_y'].head(10)

882736                     Return of the Jedi (1983)
331343               Empire Strikes Back, The (1980)
543597     Indiana Jones and the Last Crusade (1989)
857083                Raiders of the Lost Ark (1981)
81820                      Back to the Future (1985)
1016487              Star Trek: First Contact (1996)
444288                         Godfather, The (1972)
958572              Silence of the Lambs, The (1991)
540759                 Independence Day (ID4) (1996)
1099451                             Toy Story (1995)
Name: movie title_y, dtype: object

The recommendations seem logical. Users who are rating Star Wars are also rating the sequels as well as Indiana Jones films (also LucasFilm produced, and starring Harrison Ford, so seem sensible)

Typical basket analysis looks at two items that are likely to appear in a basket together, and recommendations are made on that. Its scope is very narrow and very good if the aim is to increase the basket size/spend. This adapted basket analysis looks at a customers "lifetime" basket (i.e. what they have ever watched/rated) and makes recommendations on that, so in this sense it is more akin to Collaborative Filtering.

However one major difference between this analysis and Collaborative Filtering is that in this analysis we are recommending based on _association_ where in Collaborative Filtering, we are recommending based on _similarity_. Similarity is far more personalised as it is narrowing the scope of users that are similar to each other. 

## 2.0 Pearson R correlation  Recommender


Correlation based recommenders are simple but really effective user-item collaborative Filtering methodologies. For this example, we are starting off by looking at a film (Star Wars), comparing its ratings by users, to all other films ratings by the users. This produces a correlation (similarity) with each and every other film, and so films with high **positive** correlations can be recommended as they are rated in a similar fashion by users.

In [341]:

ratings_pivot = rating_items_raw.pivot_table(index='user_id', columns='movie title',values='rating', aggfunc='min')
user_id = ratings_pivot.index
movie_title = ratings_pivot.columns

In [167]:
film = 'Star Wars (1977)'

film_idx = ratings_pivot.columns.get_loc(film)

In [194]:
film_ratings = ratings_pivot[film]
similar_to_film = ratings_pivot.corrwith(film_ratings)
corr_film = pd.DataFrame(similar_to_film, columns=['pearsonR']).reset_index()
corr_film.dropna(inplace=True)
corr_summary = corr_film.merge(average_rating,on='movie title')

  c = cov(x, y, rowvar)
  c *= 1. / np.float64(fact)


In [195]:
print("recommendation for :", film)
corr_summary[(corr_summary['ratingCount']>=300) & (corr_summary['pearsonR']!=1)].sort_values('pearsonR', ascending=False).head(10)

recommendation for : Star Wars (1977)


Unnamed: 0,movie title,pearsonR,ratingCount,rating
397,"Empire Strikes Back, The (1980)",0.747981,367,4.20436
1051,Return of the Jedi (1983),0.672556,507,4.00789
1026,Raiders of the Lost Ark (1981),0.536117,420,4.252381
647,Indiana Jones and the Last Crusade (1989),0.350107,331,3.930514
388,E.T. the Extra-Terrestrial (1982),0.303619,300,3.833333
97,Back to the Future (1985),0.274839,350,3.834286
1243,"Terminator, The (1984)",0.262255,301,3.933555
1001,"Princess Bride, The (1987)",0.259711,324,4.17284
1188,Star Trek: First Contact (1996),0.255529,365,3.660274
644,Independence Day (ID4) (1996),0.248754,429,3.438228


Again we have a logical list if similar movie recommendations. With the other Star Wars and Indiana Jones films, as well as other 80s sc-fi/fantasy family classics ET, Back to the Future, Terminator and The Princess Bride

An interesting note here is that some films could have high **negative** correlation, and this could suggest two films that are polar opposites, in that users who rated Star Wars highly could rate other films really negatively. 
But looking at the least correlated movies, we do not have anything with a high negative correlation, so nothing too interesting

In [196]:
corr_summary[(corr_summary['ratingCount']>=300) & (corr_summary['pearsonR']!=1)].sort_values('pearsonR', ascending=True).head(10)

Unnamed: 0,movie title,pearsonR,ratingCount,rating
1012,Pulp Fiction (1994),-0.021568,394,4.060914
673,Jerry Maguire (1996),0.038988,384,3.710938
1134,"Silence of the Lambs, The (1991)",0.039488,390,4.289744
432,Fargo (1996),0.044415,508,4.155512
1067,"Rock, The (1996)",0.047937,378,3.693122
401,"English Patient, The (1996)",0.049013,481,3.656965
497,"Full Monty, The (1997)",0.060216,315,3.926984
253,Chasing Amy (1997),0.062663,379,3.83905
1274,Titanic (1997),0.081928,350,4.245714
1088,"Saint, The (1997)",0.10584,316,3.123418


### Personalised Recommendations

The above approach took into account recommending a film based on a single film, in this case, Star Wars. However we could just as easily make this personalised and recommend a film based on a single users rating history.

This would involve transposing the matrix so that we have Movies as rows, and Users and columns, then we would pick a single user, and then look at all the users who are have a high correlation to that user.

From that list of users, to get a list of recommended films, we could take the top average most highly rated movies that the focus user has not watched (or another similar apporach e.g. using weights)

#### Cold Start

This approach is highly reliant on good data to compute correlation coefficients. It does not do well with missing or low levels of data. If a user has a low count of reviews, it will be hard to get a highly correlated users, or if they are, it might be suspect

## 3.0 k-Nearest Neighbour  Recommender

The correlation method above finds movies/users that have a an overall similarity to the the focus movie/user.

The nearest neighbour method looks at similarity between two n-dimensional vectors in an n-dimensional space, based on a metric such as euclidean (distance) or cosine (angle).

For this example, we will use cosine similarity which is the dot product of the two vectors divided by the product of the two vectors' lengths (or magnitudes)

(formula and more info can be found here: https://en.wikipedia.org/wiki/Cosine_similarity)


In [226]:
rating_items2['ratingCount'].quantile(np.arange(0,1,.05))

0.00      1.0
0.05     19.0
0.10     35.0
0.15     47.0
0.20     62.0
0.25     72.0
0.30     86.0
0.35    100.0
0.40    116.0
0.45    128.0
0.50    146.0
0.55    162.0
0.60    175.0
0.65    194.0
0.70    218.0
0.75    240.0
0.80    264.0
0.85    295.0
0.90    336.0
0.95    420.0
Name: ratingCount, dtype: float64

Looking at the distribution of movie ratings, we can see that 5% of movies have 19 ratings, but since we want a good number of ratings for the analysis, we will limit to films with at least 100 user ratings, which will cover 60% of the movies

In [282]:
rating_items3 = rating_items2[rating_items2['ratingCount'] > 100]

In [283]:
rating_items_piv = rating_items3.pivot_table(index='movie title',columns='user_id',values='rating', aggfunc='min').fillna(0)

For quicker calculations we convert out dataframe to a csr matrix

In [198]:
from scipy.sparse import csr_matrix
rating_items_matrix = csr_matrix(rating_items_piv.values)

We fit the cosine k-NN model to the user-item matrix with algorithm-brute (that needs to be applied if using metric, Cosine) 

In [298]:
from sklearn.neighbors import NearestNeighbors

model_knn = NearestNeighbors(metric='cosine',algorithm='brute')
model_knn.fit(rating_items_matrix)

NearestNeighbors(algorithm='brute', leaf_size=30, metric='euclidean',
         metric_params=None, n_jobs=1, n_neighbors=5, p=2, radius=1.0)

Get films index

In [299]:
film_idx = rating_items_piv.index.get_loc(film)

Find the 10 nearest neighbours to Star Wars, based on how close they are using cosine angle

In [353]:
distances, indices = model_knn.kneighbors(rating_items_piv.iloc[film_idx, :].reshape(1, -1), n_neighbors = 11)

for i in range(0, len(distances.flatten())):
    if i == 0:
        print('Recommendations for {0}:\n'.format(rating_items_piv.index[film_idx]))
    else:
        print('{0}: {1}, with distance of {2}:'.format(i, rating_items_piv.index[indices.flatten()[i]], distances.flatten()[i]))

Recommendations for Star Wars (1977):

1: Return of the Jedi (1983), with distance of 50.149775672479336:
2: Raiders of the Lost Ark (1981), with distance of 69.50539547402057:
3: Empire Strikes Back, The (1980), with distance of 71.06335201775947:
4: Toy Story (1995), with distance of 73.08898685848642:
5: Independence Day (ID4) (1996), with distance of 77.42092740338364:
6: Indiana Jones and the Last Crusade (1989), with distance of 77.78174593052023:
7: Godfather, The (1972), with distance of 78.2559901860554:
8: Star Trek: First Contact (1996), with distance of 79.31582439841371:
9: Back to the Future (1985), with distance of 79.45438943192504:
10: Silence of the Lambs, The (1991), with distance of 80.24961059095551:


  """Entry point for launching an IPython kernel.


### Personalised Recommendations

Just like the correlation method, this method can be personalised and recommend a film based on a single users rating history.

This would again involve making the user the focus, and we would go about finding users that are 'close' to the target user, and make similar recommendations that we would with correlation.

#### Cold Start

This approach suffers from the same problems as correlation

## 4.0 SVD  Recommender

Singular Value Decomposition is one of the latest trends in recommendation engines. It formed the basis of most of the best algorithms for The Netflix Prize: https://en.wikipedia.org/wiki/Netflix_Prize.

It a matrix factorisation, that aims in reducing the dimension of the matrix by learning the latent feautures of users/movies that that describe the ratings effectively. 

e.g for a user-item matrix, m x n (where m is number of users and n is number of movies), two smaller matrices (one for users, one for movies) would be created:

- m x d

- n x d

where d is number of dimensions

The dot product of these two smaller matrices is the approximation of our original matrix m x n.


For SVD, our orginal matrix X is split into **3 smaller matrices** U, S and V. With the diagonal elemments of S being the singular values of X, U (simila to m x d) representing user features and V (simila to m x d) representing the item features.

The dot product of the 3 smaller matrices matrices are the prediction of X:

X_hat = U x S x V^T



Utilising SVD is a great way to predict user ratings, which suggest why it has been so successfully used in competitions like the Netflix Prize. However we will adapt it to predict similar movies to Star Wars.

For this we will only look at reducing the dimension of 

In [354]:
rating_items_piv

user_id,1,2,3,4,5,6,7,8,9,10,...,934,935,936,937,938,939,940,941,942,943
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
101 Dalmatians (1996),2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,...,2.0,0.0,0.0,2.0,4.0,0.0,0.0,0.0,0.0,0.0
12 Angry Men (1957),5.0,0.0,0.0,0.0,0.0,4.0,4.0,0.0,0.0,5.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2001: A Space Odyssey (1968),4.0,0.0,0.0,0.0,4.0,5.0,5.0,0.0,0.0,5.0,...,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0
Absolute Power (1997),0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0
"Abyss, The (1989)",3.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,4.0,...,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0
Ace Ventura: Pet Detective (1994),3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
"Adventures of Priscilla, Queen of the Desert, The (1994)",0.0,0.0,0.0,0.0,5.0,0.0,4.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0
"African Queen, The (1951)",0.0,0.0,0.0,0.0,0.0,4.0,5.0,0.0,0.0,5.0,...,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0
Air Force One (1997),0.0,4.0,2.0,5.0,0.0,0.0,4.0,0.0,0.0,0.0,...,0.0,4.0,3.0,4.0,3.0,0.0,5.0,4.0,5.0,0.0
Aladdin (1992),4.0,0.0,0.0,0.0,4.0,2.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,5.0,0.0


In [365]:
rating_items_matrix

<334x943 sparse matrix of type '<class 'numpy.float64'>'
	with 64407 stored elements in Compressed Sparse Row format>