# Building Recommendation Engines with PySpark

This course will show you how to build recommendation engines using `Alternating Least Squares` in PySpark. Using the popular MovieLens dataset and the Million Songs dataset, this course will take you step by step through the intuition of the Alternating Least Squares algorithm as well as the code to train, test and implement ALS models on various types of customer data.

## Table of Contents

- [Introduction](#intro)
- 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns

path = "data/dc36/"

In [None]:
from pyspark import SparkContext
sc = SparkContext("local", "First App")
print(sc)

In [None]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('First App').getOrCreate()

In [None]:
# Return spark version
print(spark.version)

# Return python version
import sys
print(sys.version_info)

---
<a id='intro'></a>

## Why learn how to build recommendation engines?

<img src="images/spark6_001.png" alt="" style="width: 800px;"/>

<img src="images/spark6_002.png" alt="" style="width: 800px;"/>

<img src="images/spark6_003.png" alt="" style="width: 800px;"/>

## See the power of a recommendation engine

Taylor and Jane both like watching movies. Taylor only likes dramas, comedies, and romances. Jane likes only action, adventure, and otherwise exciting films. One of the greatest benefits of `ALS-based recommendation engines` is that they can identify movies or items that users will like, even if they themselves think that they might not like them. Take a look at the movie ratings that Taylor and Jane have provided below. It would stand to reason that their different preferences would generate different recommendations.

- Take a look at TJ_ratings using the .show() method and any other methods you prefer to see how each of them rated the various movies they've seen.
- Input user names into the get_ALS_recs() function provided to see what movies ALS recommends for Jane and Taylor based on the ratings provided. Do the ratings make sense to you?

```
In [1]: TJ_ratings.show()
+---------+--------------------+------+
|user_name|          movie_name|rating|
+---------+--------------------+------+
|   Taylor|            Twilight|   4.9|
|   Taylor|  A Walk to Remember|   4.5|
|   Taylor|        The Notebook|   5.0|
|   Taylor|Raiders of the Lo...|   1.2|
|   Taylor|      The Terminator|   1.0|
|   Taylor|      Mrs. Doubtfire|   1.0|
|     Jane|            Iron Man|   4.8|
|     Jane|Raiders of the Lo...|   4.9|
|     Jane|      The Terminator|   4.6|
|     Jane|           Anchorman|   1.2|
|     Jane|        Pretty Woman|   1.0|
|     Jane|           Toy Story|   1.2|
+---------+--------------------+------+

In [2]: get_ALS_recs(["Taylor","Jane"])
    userId  pred_rating                 title          genres
0   Taylor         3.89   Seven Pounds (2008)           Drama
1   Taylor         3.61      Cure, The (1995)           Drama
2   Taylor         3.55  Kiss Me, Guido (1997          Comedy
3   Taylor         3.29  You've Got Mail (199  Comedy|Romance
4   Taylor         3.27  10 Things I Hate Abo  Comedy|Romance
5   Taylor         3.26  Corrina, Corrina (19  Comedy|Drama|R
6     Jane         4.96           Fear (1996)        Thriller
7     Jane         4.85  Lord of the Rings: T  Adventure|Fant
8     Jane         4.70  Lord of the Rings: T  Adventure|Fant
9     Jane         4.55  No Holds Barred (198          Action
10    Jane         4.54  Lord of the Rings: T  Action|Adventu
11    Jane         4.30  Band of Brothers (20  Action|Drama|W
12    Jane         4.26   Transformers (2007)  Action|Sci-Fi|
```

## Recommendation Engine Types and Data Types

<img src="images/spark6_004.png" alt="" style="width: 800px;"/>

<img src="images/spark6_005.png" alt="" style="width: 800px;"/>

In [None]:
# Terminate the cluster
spark.stop()

In [None]:
<img src="images/spark6_006.png" alt="" style="width: 800px;"/>

In [None]:
---
<a id='intro'></a>