# [ML on GCP C10] Content-Based Filtering by Hand

## Overview
This lab shows you how to do content-based filtering using low-level TensorFlow commands.

## Objectives
In this lab, you learn to perform the following tasks:

* Create and compute a user feature matrix

* Compute where each user lies in the feature embedding space

* Create recommendations for new movies based on similarity measures between the user and movie feature vectors.


## Introduction
In this lab, you'll be providing movie recommendations for a set of users. Content-based filtering uses features of the items and users to generate recommendations. In this small example, we'll be using low-level TensorFlow operations and a very small set of movies and users to illustrate how this occurs in larger content based recommendation system.

## Open a Datalab notebook
1. In the Datalab browser, navigate to datalab > notebooks > training-data-analyst > courses > machine_learning > deepdive > 10_recommend > labs and open content_based_by_hand.ipynb.

2. Read the commentary, Click Clear | Clear all Cells, then run the Python snippets (Use Shift+Enter to run each piece of code) in the cell, step by step.



In [1]:
import numpy as np
import tensorflow as tf

tf.reset_default_graph()
print(tf.__version__)

1.8.0


In [2]:
users = ['Ryan', 'Danielle',  'Vijay', 'Chris']
movies = ['Star Wars', 'The Dark Knight', 'Shrek', 'The Incredibles', 'Bleu', 'Memento']
features = ['Action', 'Sci-Fi', 'Comedy', 'Cartoon', 'Drama']

num_users = len(users)
num_movies = len(movies)
num_feats = len(features)

In [26]:
# each row represents a user's rating for the different movies
users_movies = [[4,  6,  8,  0, 0, 0],
                [0,  0, 10,  0, 8, 3],
                [0,  6,  0,  0, 3, 7],
                [10, 9,  0,  5, 0, 2]]

# features of the movies one-hot encoded
# e.g. columns could represent ['Action', 'Sci-Fi', 'Comedy', 'Cartoon', 'Drama']
movies_feats = [[1, 1, 0, 0, 1],
                [1, 1, 0, 0, 0],
                [0, 0, 1, 1, 0],
                [1, 0, 1, 1, 0],
                [0, 0, 0, 0, 1],
                [1, 0, 0, 0, 1]]

In [28]:
users_movies = tf.constant(users_movies, dtype = tf.float32)
movies_feats = tf.constant(movies_feats, dtype = tf.float32)

In [4]:
users_feats = tf.constant([[0.25       ,0.25       ,0.2        ,0.2        ,0.1       ],
                           [0.0882353  ,0.         ,0.29411766 ,0.29411766 ,0.32352942],
                           [0.44827586 ,0.20689656 ,0.         ,0.         ,0.3448276 ],
                           [0.3880597  ,0.2835821  ,0.07462686 ,0.07462686 ,0.17910448]])

In [24]:
with tf.Session() as sess:
    feats_top = tf.nn.top_k(users_feats[1], num_feats)
    feats_ind = feats_top[1]
    print(sess.run(feats_top_ind))
    print(sess.run(feats_ind))
    top_feats = tf.gather_nd(features, tf.expand_dims(feats_ind, axis = 1))
    top_feats = sess.run(top_feats)
    print(top_feats)

TopKV2(values=array([0.32352942, 0.29411766, 0.29411766, 0.0882353 , 0.        ],
      dtype=float32), indices=array([4, 2, 3, 0, 1], dtype=int32))
[4 2 3 0 1]
[b'Drama' b'Comedy' b'Cartoon' b'Action' b'Sci-Fi']


In [None]:
def find_user_top_feats(user_index):
    # returns a list of the rank ordered features of most importance for a given user
    feats_ind = tf.nn.top_k(users_feats[user_index], num_feats)[1]
    return tf.gather_nd(features, tf.expand_dims(feats_ind, axis = 1))

In [33]:
print(users_feats[1])
print(movies_feats[0])
tf.tensordot(users_feats[1], movies_feats[0], axes = 1)

Tensor("strided_slice_26:0", shape=(5,), dtype=float32)
Tensor("strided_slice_27:0", shape=(5,), dtype=float32)


<tf.Tensor 'Tensordot_4:0' shape=() dtype=float32>

In [34]:
all_users_ratings_t = tf.matmul(users_feats, tf.transpose(movies_feats))
with tf.Session() as sess:
    print(sess.run(all_users_ratings_t))

[[0.6        0.5        0.4        0.65       0.1        0.35      ]
 [0.4117647  0.0882353  0.5882353  0.67647064 0.32352942 0.4117647 ]
 [1.         0.6551724  0.         0.44827586 0.3448276  0.79310346]
 [0.8507463  0.6716418  0.14925373 0.53731346 0.17910448 0.5671642 ]]


In [36]:
all_users_ratings_new = tf.where(tf.equal(users_movies, tf.zeros_like(users_movies)),
                                  all_users_ratings_t,
                                  -np.inf*tf.ones_like(tf.cast(users_movies, tf.float32)))

In [38]:
with tf.Session() as sess:
    print(sess.run(all_users_ratings_new))

[[      -inf       -inf       -inf 0.65       0.1        0.35      ]
 [0.4117647  0.0882353        -inf 0.67647064       -inf       -inf]
 [1.               -inf 0.         0.44827586       -inf       -inf]
 [      -inf       -inf 0.14925373       -inf 0.17910448       -inf]]


In [39]:
num_to_recommend = tf.reduce_sum(tf.cast(tf.equal(users_movies, 
                                                      tf.zeros_like(users_movies)), dtype = tf.float32), axis = 1)

In [40]:
with tf.Session() as sess:
    print(sess.run(num_to_recommend))

[3. 3. 3. 2.]
