# Restricted Boltzmann machine

In [1]:
import numpy as np
from rbm import RBM

### Generate dummy recommender data

In [2]:
'''Made up ratings of somewhat rigid watchers'''

# number of watchers
n_p = 1000
# number of unique movies
movies = 80
# types of people (in terms of what types of movies they like)
types = 10

# which types does each person belong to
people = np.random.randint(types,size=n_p)

def get_like(i):
    '''generate a probably to like movies array and then randomly select whether a specific watcher
        liked them or not'''
    # type of ith person
    t = people[i]
    draws = np.random.random(movies)
    # probability array. Each type likes on average 50% of the movies (with fixed criteria)
    # and doesn't like the others, with fairly high probabilities
    like = np.array([0.95 if abs(m%types-t)<3 else 0.05 for m in range(movies)])
    return (like > draws).astype(int)

ratings = np.zeros((len(people),movies))

# populate the ratings array
for i in range(len(ratings)):
    ratings[i] = get_like(i)

# save real values (for verification later)
X = ratings
X_copy = X.copy()
# randomly assign some to ratings to -1 = didn't watch
for i in range(len(ratings)):
    # number of movies seen
    seen = np.random.randint(movies//8,movies+1)
    # index of movies not seen
    not_seen = np.random.choice(np.arange(movies),movies-seen,replace=False)
    ratings[i,not_seen] = -1

In [3]:
(X_copy == 0).mean(),(X_copy == 1).mean(),(X == -1).mean()

(0.5572125, 0.4427875, 0.440075)

Liked and disliked movie are balanced<br>
Almost half of the ratings are missing from the training set (`X`)

### Fitting

In [4]:
# arbitrarily chose 10 hidden nodes (case by case decision)
rbm = RBM(movies,10)

In [5]:
# ignore -1 flags (didn't watch) during the training of the machine
rbm.fit(X,learning_rate=0.02,ignore=-1)

### Predictions

In [6]:
# prediction of entire rating array
pred = rbm.predict(X)

**Total accuracy**

In [8]:
# note that X_copy is the real ratings, where all watchers have seen all movies
(X_copy == pred).mean()

0.7875

The ability of the machine to conserve the input (`X`)

**Accuracy of unknowns**

In [9]:
# select only the predictions for movies that were not watched
(X_copy[X==-1] == pred[X==-1]).mean()

0.733880588536045

The ability of the machine to properly predict unknown viewer's probably taste (stored in `X_copy` but not available to the machine in training)<br>
More than 70% of the ratings are correctly predicted