<a href="https://colab.research.google.com/github/noahgift/fundamentals_ai_ml/blob/master/Lesson3_5_recommendations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recommender Systems

## Nearest Neighbor CF:  Basics

* One of most popular and common methods for CF (very intuitive)
* User-based and Item Based
* Naive Version (Very simple metric is Pearson correlation coefficient)
    * ***Measures covariance of two variables divided by product of standard deviations.***
    * Problem with basic version is it doesn't account for peoples' **baseline preferences.**


## What is in a Rating?

* Implicit vs. explicit ratings
    * Strongest signal is in explicit ratings made by people.
    * Implicit:  Did you follow someone, buy the movie or watch the trailer

# Surprise SKLearn Recommendation Framework

[Surprise](http://surpriselib.com/) is a Python scikit building and analyzing recommender systems.

### Training a recommender with surprise

In [0]:
!pip install -q scikit-surprise

[K     |████████████████████████████████| 3.3MB 1.4MB/s 
[?25h  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone


In [0]:
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split

In [0]:
# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin('ml-100k')

Dataset ml-100k could not be found. Do you want to download it? [Y/n] Y
Trying to download dataset from http://files.grouplens.org/datasets/movielens/ml-100k.zip...
Done! Dataset ml-100k has been saved to /root/.surprise_data/ml-100k


In [0]:
# sample random trainset and testset
# test set is made of 25% of the ratings.
trainset, testset = train_test_split(data, test_size=.25)

In [0]:
# We'll use the famous SVD algorithm.
algo = SVD()

In [0]:
# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

In [0]:
# Then compute RMSE
accuracy.rmse(predictions)

RMSE: 0.9440


0.9440155966414212

### Get top-N recommendations for each user

https://surprise.readthedocs.io/en/stable/FAQ.html#how-to-get-the-top-n-recommendations-for-each-user

In [0]:
from collections import defaultdict

In [0]:
def get_top_n(predictions, n=10):
    '''Return the top-N recommendation for each user from a set of predictions.

    Args:
        predictions(list of Prediction objects): The list of predictions, as
            returned by the test method of an algorithm.
        n(int): The number of recommendation to output for each user. Default
            is 10.

    Returns:
    A dict where keys are user (raw) ids and values are lists of tuples:
        [(raw item id, rating estimation), ...] of size n.
    '''

    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

In [0]:
# First train an SVD algorithm on the movielens dataset.
data = Dataset.load_builtin('ml-100k')
trainset = data.build_full_trainset()
algo = SVD()
algo.fit(trainset)

# Than predict ratings for all pairs (u, i) that are NOT in the training set.
testset = trainset.build_anti_testset()
predictions = algo.test(testset)

top_n = get_top_n(predictions, n=10)

# Print the recommended items for each user
for uid, user_ratings in top_n.items():
    print(uid, [iid for (iid, _) in user_ratings])

196 ['114', '316', '318', '178', '483', '357', '429', '315', '12', '512']
186 ['528', '169', '194', '133', '318', '136', '197', '603', '496', '498']
22 ['100', '12', '64', '318', '357', '483', '530', '98', '11', '191']
244 ['127', '408', '14', '474', '275', '285', '178', '483', '12', '641']
166 ['50', '408', '64', '963', '174', '178', '199', '22', '251', '923']
298 ['64', '408', '313', '251', '192', '169', '963', '923', '114', '480']
115 ['114', '408', '223', '285', '180', '64', '474', '515', '427', '135']
253 ['178', '169', '408', '174', '172', '480', '709', '136', '641', '515']
305 ['647', '515', '114', '606', '496', '607', '316', '651', '513', '1194']
6 ['603', '654', '114', '606', '923', '657', '611', '589', '83', '647']
62 ['185', '661', '641', '187', '223', '615', '963', '175', '478', '272']
286 ['86', '19', '963', '12', '479', '496', '318', '292', '488', '427']
200 ['272', '64', '186', '641', '408', '114', '12', '316', '1039', '487']
210 ['318', '515', '12', '64', '480', '603', 

### How to get the k nearest neighbors of a user (or item)


In [0]:
import io 
from surprise import KNNBaseline
from surprise import Dataset
from surprise import get_dataset_dir

In [0]:
def read_item_names():
    """Read the u.item file from MovieLens 100-k dataset and return two
    mappings to convert raw ids into movie names and movie names into raw ids.
    """

    file_name = get_dataset_dir() + '/ml-100k/ml-100k/u.item'
    rid_to_name = {}
    name_to_rid = {}
    with io.open(file_name, 'r', encoding='ISO-8859-1') as f:
        for line in f:
            line = line.split('|')
            rid_to_name[line[0]] = line[1]
            name_to_rid[line[1]] = line[0]

    return rid_to_name, name_to_rid


Run KNN

In [0]:
# First, train the algorithm to compute the similarities between items
data = Dataset.load_builtin('ml-100k')
trainset = data.build_full_trainset()
sim_options = {'name': 'pearson_baseline', 'user_based': False}
algo = KNNBaseline(sim_options=sim_options)
algo.fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBaseline at 0x7f53103b5fd0>

Print out KNN Mappings

In [0]:
# Read the mappings raw id <-> movie name
rid_to_name, name_to_rid = read_item_names()

# Retrieve inner id of the movie Toy Story
toy_story_raw_id = name_to_rid['Toy Story (1995)']
toy_story_inner_id = algo.trainset.to_inner_iid(toy_story_raw_id)

# Retrieve inner ids of the nearest neighbors of Toy Story.
toy_story_neighbors = algo.get_neighbors(toy_story_inner_id, k=10)

# Convert inner ids of the neighbors into names.
toy_story_neighbors = (algo.trainset.to_raw_iid(inner_id)
                       for inner_id in toy_story_neighbors)
toy_story_neighbors = (rid_to_name[rid]
                       for rid in toy_story_neighbors)

print()
print('The 10 nearest neighbors of Toy Story are:')
for movie in toy_story_neighbors:
    print(movie)



The 10 nearest neighbors of Toy Story are:
Beauty and the Beast (1991)
Raiders of the Lost Ark (1981)
That Thing You Do! (1996)
Lion King, The (1994)
Craft, The (1996)
Liar Liar (1997)
Aladdin (1992)
Cool Hand Luke (1967)
Winnie the Pooh and the Blustery Day (1968)
Indiana Jones and the Last Crusade (1989)


# Real-World Social Network Problems


**Real World Problems**

*   UX
*   Ethics
*   Operational Complexity



![socialNetwork](https://user-images.githubusercontent.com/58792/62396018-f20cc100-b53f-11e9-9b34-ac791c0b732a.png)

## Next Improvements?

* Recommend Popular People for "Cold Start"
* Convert to Supervized ML problem

# Cloud APIs?

https://azure.microsoft.com/en-us/blog/building-recommender-systems-with-azure-machine-learning-service/

![Cloud APIs](https://azurecomcdn.azureedge.net/mediahandler/acomblog/media/Default/blog/7e796df5-ce15-488c-9c1b-ec110059b5d2.png)

## Related Areas?

[Topic Modeling with Gensim](https://radimrehurek.com/gensim/)

## ML Maturity Model?

![ml maturity model](https://user-images.githubusercontent.com/58792/62405528-88ef7280-b56c-11e9-90a6-8ede635419a4.png)

# References



*   [Netflix Recommender Systems](https://dl.acm.org/citation.cfm?id=2843948)
*   [Surprise](http://surpriselib.com/) is a Python scikit building and analyzing recommender systems.
* [azure recommender](https://azure.microsoft.com/en-us/blog/building-recommender-systems-with-azure-machine-learning-service/)

