RECOMMENDATION PROBLEMS
Types
	- Collaborative Filtering (Alternating Least Squares OR Nearest Neighbour) 
	- Association Rules
	- Content Based Filtering 

	1. Compute the ratings for all movies for all users
	2. Sort on descending order based on ratings
	3. Pick the top 10 user hasn't watched

Collaborative Filtering Algorithms - predict user rating based on user previous behaviour
Purchases/Browsing/Clicks/Reviews -> Collaborative Filtering -> Top Picks/Like This/Recommendation
Basically if two users like a bunch of products they will also like other products
Training Data -> User_ID | Product_ID | Rating (Explicit/Implicit)


RECOMMENDATIONS WITH COLLABORATIVE FILTERING (USING ALTERNATING LEAST SQUARES (ALS) )
(SUPERVISED)
Implicit -> alternating_least_squares

Latent Filter Analysis -> hidden influence analysis (Genre/Cast/Success/Recent)
Rate every user on these preference / Rate every movie on scale of 1-5 for these factors
So every user and every movie is now represented by 4 numbers in 4-D space
 User_ID | Product_ID | Rating -> Latent Filter Analysis -> User quantified by some factors | Products quantified by some factors
The Factors are not known beforehand
Represent the training data as a matrix of Rating R = User x Product -> User Factor Matrix P * Product Factor Matrix Q -> R43 = P4 * Q3

We just have ratings for the items the user has actually reviewed
When we decompose the matrix we can compute rating for any product
The problem is expressed as finding the set of product vectors (P, Q) such that the total error is minimized against the Training Data 

Alternating Least Squares (ALS) -> is an Optimization technique for minimizing this error
Rui = Actual Rating for a particular user for a particular item
Qi = Item Factor Vector
Pu = Product Factor Vector
Alternating Least Square is a quadratic equation alternating between solving Pu and Qi until the values no longer change
The number of hidden factors can be chosen by the user 

Add a regularization term to penalize models for higher number of factors
The regularization term has a Lamda parameter which user can change


In [1]:
import implicit
import pandas as pd

dataFile='/pluralsight/ml-100k/u.data'
data=pd.read_csv(dataFile, sep="\t", header=None, usecols=[0,1,2], names=['userId', 'itemId', 'rating'])

In [2]:
data.head()

Unnamed: 0,userId,itemId,rating
0,196,242,3
1,186,302,3
2,22,377,1
3,244,51,2
4,166,346,1


In [3]:
#convert dataframe of three rows into a matrix of User, Product, Rating
from scipy.sparse import coo_matrix

data['userId'] = data['userId'].astype("category")
data['itemId'] = data['itemId'].astype("category")
rating_matrix = coo_matrix((data['rating'].astype(float),
                           (data['itemId'].cat.codes.copy(),
                            data['userId'].cat.codes.copy())))

In [5]:
#We call the alternating_least_squares to break down the rating matrix into a user factors and item factors
#We need to give alternating_least_squares to parameters - the # of factors and the regularization parameter
user_factors, item_factors = implicit.alternating_least_squares(rating_matrix, factors=10, regularization=0.01)

This method is deprecated. Please use the AlternatingLeastSquares class instead
100%|████████████████████████████████████████| 15.0/15 [00:00<00:00, 19.12it/s]


In [6]:
# solve all the movie ratings for user 196 
user196=item_factors.dot(user_factors[196])

In [7]:
#sort the list of movie ratings and pick the top three
import heapq
heapq.nlargest(3, range(len(user196)), user196.take)

[17, 89, 473]