In [123]:
import numpy as np
import tensorflow as tf
import sklearn
import csv
import pandas as pd

# MovieLens 100k

In [134]:
names = ['user_id', 'item_id', 'rating', 'timestamp']
result = pd.read_csv('ml-100k/u.data', names=names, sep='\t')

In [140]:
print(result)

       user_id  item_id  rating  timestamp
0          196      242       3  881250949
1          186      302       3  891717742
2           22      377       1  878887116
3          244       51       2  880606923
4          166      346       1  886397596
5          298      474       4  884182806
6          115      265       2  881171488
7          253      465       5  891628467
8          305      451       3  886324817
9            6       86       3  883603013
10          62      257       2  879372434
11         286     1014       5  879781125
12         200      222       5  876042340
13         210       40       3  891035994
14         224       29       3  888104457
15         303      785       3  879485318
16         122      387       5  879270459
17         194      274       2  879539794
18         291     1042       4  874834944
19         234     1184       2  892079237
20         119      392       4  886176814
21         167      486       4  892738452
22         

# Introduction

In model-based methods, a summarized model of data is created up front, as with supervised and unsupervised learning methods. Therefore, the training is clearly separated from the prediction phase. <br>
Examples of such methods in traditional machine learning include decision trees, rule-based methods, Bayes classifiers, regression models, support vector machines, and neural networks.

Unlike data classification, any entry in the ratings matrix maybe missing.

# Decision and Regression Trees

Gini index lies between 0 and 1, with smaller value being more indicative of greater discriminative power: $$ G(S) = 1 - \sum_{i=1}^r p_i^2 $$

$$ Gini(S \Rightarrow [S_i, S_2] = \dfrac{n_1.G(S_1) + n_2.G(S_2)}{n_1 + n_2} $$

### Binary matrix

In [37]:
class BinaryMatrix():
    def __init__(self):
        pass
    
    def random_init(self, size):
        self.matrix = np.random.randint(2, size=size)
        
    def get_label(self):
        return self.matrix[:, -1]
    
    def get_train_data(self):
        return self.matrix[:, :-1]

In [40]:
binary_matrix = BinaryMatrix()
binary_matrix.random_init(size=[100, 100])

In [41]:
print(binary_matrix.matrix)
print(binary_matrix.get_label())
print(binary_matrix.get_train_data())

[[1 1 0 ... 1 1 1]
 [0 0 1 ... 1 1 0]
 [0 0 1 ... 0 1 1]
 ...
 [0 0 0 ... 1 0 0]
 [0 1 1 ... 0 0 1]
 [0 0 1 ... 0 0 0]]
[1 0 1 0 0 1 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 1 1 1 1 0 0 0 1 1 0 0 0
 0 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1 0 1 0 1 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 0 1
 0 1 0 0 1 1 0 1 0 1 1 1 1 0 1 1 0 1 0 1 0 0 1 0 1 0]
[[1 1 0 ... 1 1 1]
 [0 0 1 ... 1 1 1]
 [0 0 1 ... 0 0 1]
 ...
 [0 0 0 ... 0 1 0]
 [0 1 1 ... 0 0 0]
 [0 0 1 ... 1 0 0]]


In [43]:
from sklearn import tree
from sklearn.model_selection import train_test_split

In [44]:
X_train, X_test, y_train, y_test = train_test_split(binary_matrix.get_train_data(),
                                                   binary_matrix.get_label(), 
                                                    test_size=0.2, random_state=42)

In [58]:
clf = tree.DecisionTreeClassifier(random_state=42)

In [59]:
clf.fit(X=X_train, y=y_train)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='best')

In [60]:
predict_test = clf.predict(X_test)

In [61]:
accuracy = np.sum(y_test == predict_test) / len(y_test)
print(accuracy)

0.6


In [62]:
!pip install graphviz

Collecting graphviz
  Downloading https://files.pythonhosted.org/packages/47/87/313cd4ea4f75472826acb74c57f94fc83e04ba93e4ccf35656f6b7f502e2/graphviz-0.9-py2.py3-none-any.whl
[31mpredict-client 1.7.2 has requirement grpcio==1.11.0, but you'll have grpcio 1.14.0 which is incompatible.[0m
[31mpredict-client 1.7.2 has requirement numpy==1.13.1, but you'll have numpy 1.15.0 which is incompatible.[0m
Installing collected packages: graphviz
Successfully installed graphviz-0.9
[33mYou are using pip version 10.0.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


### Sparse Matrix

In [78]:
from sklearn.decomposition import TruncatedSVD
from sklearn.random_projection import sparse_random_matrix

In [85]:
ratings_matrix = sparse_random_matrix(1000, 1000, density=0.05, random_state=42)

We need to choose $j^{th}$ item to be target, and others $n - 1$ columns to be features

In [86]:
svd = TruncatedSVD(n_components=10, n_iter=10, random_state=42)


TruncatedSVD(algorithm='randomized', n_components=10, n_iter=10,
       random_state=42, tol=0.0)

Example: $j^{th}$ column is the last column

In [93]:
X_data = ratings_matrix[:, :-1]
y_data = ratings_matrix[:, -1]

In [96]:
svd.fit(X_data)
print(svd.singular_values_)

[1.99437071 1.99214704 1.97315971 1.96586987 1.95468439 1.94150428
 1.9347887  1.93256985 1.92155791 1.90991485]


Then we use Decision tree on density matrix $m \times d$

In [97]:
reduction_ratings_matrix = svd.transform(X_data)

In [101]:
print(reduction_ratings_matrix)

[[-0.04078747  0.01740403 -0.02456857 ... -0.13284343  0.05258134
   0.10191544]
 [-0.01047693 -0.0370801  -0.08007284 ... -0.02028828 -0.05122547
  -0.00393312]
 [-0.00324058  0.01206909 -0.05224786 ...  0.06357679  0.03219327
  -0.10568864]
 ...
 [ 0.14103169  0.02498886  0.06968854 ...  0.05991687 -0.0404613
   0.09306438]
 [-0.04791901 -0.05253255 -0.0669215  ... -0.00280977 -0.0266462
   0.05423395]
 [ 0.02091143 -0.01567058 -0.03447271 ...  0.01500747  0.07855035
  -0.03599526]]


In [110]:
clf = tree.DecisionTreeRegressor()

clf.fit(reduction_ratings_matrix, y_data.todense())

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')

In [118]:
print(clf.predict(reduction_ratings_matrix))
print(clf.predict(reduction_ratings_matrix).shape)

[ 0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.14142136  0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
 -0.14142136  0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.14142136  0.          0.          0.
  0.14142136  0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.14142136  0.
  0.          0.          0.          0.          0.          0.
  0.         -0.14142136 

We must loop through all items

# Rule-based Collaborative Filtering
(recommenderlab in R)

Consider a transaction database $ T = \{ T_1...T_m \} $ containing $m$ transactions, which are defined on $n$ items $I$. $I$ is the universal set of items, and each transaction $T_i$ is a subset of items in $I$.

$(\textbf{support})$ The $support$ of an item set $X \subseteq I$ is the fraction of transactions in $T$, of which $X$ is a subset <br>
If the support of an itemset is at least equal to predefined threshold $s$, then the itemset is said to be frequent. This threshold is referred to as the $minimum support$, these itemset are referred to as $frequent itemsets$ or $frequent patterns$

$(\textbf{Confidence})$ The confidence of the rule $X \Rightarrow Y$ is the conditional probability that a transaction in $T$ contains $Y$, given that it also contains $X$. Therefore, the confidence is obtained by dividing the support of $X \cup Y$ with the support of $X$

$(\textbf{Association Rules})$ A rule $X \Rightarrow Y$ is said to be an association rule at a minimum support of $s$ and minimum confidence of $c$, if the following two conditions are satisfied:<br>
1. The support of $X \cup Y$ is at least $s$
2. The confidence of $X \Rightarrow Y$ í at least $c$


# Naive  Bayes Collaborative Filtering

Bayes Rule: $$ P(A|B) = \dfrac{P(A).P(B|A)}{P(B)} $$

# Using an Arbitrary Classification Model as a Blackbox

The first step is to initialize the missing entries in the matrix with row averages, column averages, or with any simple collaborative filtering algorithm => remove bias, then fill 0 in the missing entries.

Using the following two steps iterative approach:
1. Use algorithm $A$ to estimate the missing entries of each column by setting it as the target variable and the remaining columns as the feature variables. For the remaining columns, use the current set of filled in values to create a complete matrix of feature variables. The observed ratings in the target column are used for training, the the missing ratings are predicted.
2. Update all the missing entries based on the prediction of algorithm $A$ on each target colum