# Recommender System 

## Understanding Reccomender Systems

### Content Based Filtering
        - Uses item features to provide recommendations (uses the features of a certain item to recommend other items with simular features)
        - A great example is if a user watches a certain video from a certain set of traits, we can reccomend other videos based on those traits
        - A limitation with content-based filtering is that it only leverages item simularities

### Collaborative Filtering
        - Uses simularities between items and users simultaneously to provide reccomendations (gives reccomendations to User A based on simular interests of user B)
        - Explicit Feedback: user giving direct feedback (ratings, comments, etc.)
        - Implicit FeedBack: more suttle, your indirect behavior towards an item (watch-time, click-rate, etc.)
        - An example could be that user 1 has watched movie A, B, and C but user 2 has only watched movie A and C, we can use info based off of user A and reccomend to user B 
          that they watch movie B. 
        - Collaborative Filtering, (too me) is a better route to go

#### Collaborative Filtering in practice
        - We can assign a values between -1 and 1 to users for interest in certain movies, -1 means the most interest 1 means the least, and we can do the same for movies
          -1 means it's more of a certain interest of the user and 1 means it's not in the interest of the user
        - In this example we hand-engineered the one-dimensional embeddings, in practice, these embeddings are much higher in dimensions, but we learn these embeddings automatically. 
          that's the beauty of collborative filtering models. 
        -  U is the user embeddings and V is the product embeddngs, the product of these 2 is A, which is a predictive feedback matrix.
        - Our optimization objectie is to minimize the summation of the squared difference between the feedback labels and the predicted feedback
        - We can solve this using SGD or Weighted Alternative Least Squares (WALS), WALS is specific to this problem
        - The idea of WALS is that for each iteration we alternate between fix U and solve for V, and then fixing V and solving for U. 
        - WALS usually converges much faster than SGD, but SGD is more flexible with other loss functions
        - We have only talked about observed items, but we still have the un-observed ones


##### Un-Observed Items in collaborative filtering using matrix factorization
        - Matrix Factorization on only the observed items minimizes the objective function which is what we don't want
        - We can fix this using weighted-matrix-factorization, we treat unobserved entries as 0, but we also scale the un-observed part of the objective function
          so it is not over-weighted

# Building The Recommnder System

In [2]:
import tensorflow_datasets as tfds
import tensorflow as tf
import numpy as np
import pandas as pd

In [3]:
# Load movielens dataset
df = tfds.load("movielens/100k-ratings", split = "train")

2022-08-10 17:35:58.391227: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".


[1mDownloading and preparing dataset 4.70 MiB (download: 4.70 MiB, generated: 32.41 MiB, total: 37.10 MiB) to ~/tensorflow_datasets/movielens/100k-ratings/0.1.1...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]

Generating splits...:   0%|          | 0/1 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/100000 [00:00<?, ? examples/s]

Shuffling ~/tensorflow_datasets/movielens/100k-ratings/0.1.1.incomplete6X3D88/movielens-train.tfrecord*...:   …

[1mDataset movielens downloaded and prepared to ~/tensorflow_datasets/movielens/100k-ratings/0.1.1. Subsequent calls will reuse this data.[0m


2022-08-10 17:36:40.012178: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [5]:
for x in df.take(1).as_numpy_iterator():
    print(x)

{'bucketized_user_age': 45.0, 'movie_genres': array([7]), 'movie_id': b'357', 'movie_title': b"One Flew Over the Cuckoo's Nest (1975)", 'raw_user_age': 46.0, 'timestamp': 879024327, 'user_gender': True, 'user_id': b'138', 'user_occupation_label': 4, 'user_occupation_text': b'doctor', 'user_rating': 4.0, 'user_zip_code': b'53211'}


2022-08-10 17:40:12.133951: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


# Resources
        - Tensorflow Youtube Video's (https://www.youtube.com/watch?v=BthUPVwA59s&list=PLQY2H8rRoyvy2MiyUBz5RWZr5MPFkV3qz)
        - How to Design and Build a Recommendation System Pipeline in Python (Jill Cates) (https://www.youtube.com/watch?v=v_mONWiFv0k)
        - Building a Recommendation System in Python (https://www.youtube.com/watch?v=G4MBc40rQ2k)