# See the power of a recommendation engine

Taylor and Jane both like watching movies. Taylor only likes dramas, comedies, and romances. Jane likes only action, adventure, and otherwise exciting films. One of the greatest benefits of ALS-based recommendation engines is that they can identify movies or items that users will like, even if they themselves think that they might not like them. Take a look at the movie ratings that Taylor and Jane have provided below. It would stand to reason that their different preferences would generate different recommendations.

In [2]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, FloatType

# Create a SparkSession
spark = SparkSession.builder \
    .appName("test") \
    .getOrCreate()



In [3]:
# # View TJ_ratings
# TJ_ratings.show()

# # Generate recommendations for users
# get_ALS_recs(["Jane","Taylor"]) 

# Power of recommendation engines

What is a reason for learning to build recommendation engines?

- Show users items/products relevant to them that they may not know are available.

# Collaborative vs content-based filtering

Below are statements that are often used when providing recommendations. Select the one that  indicate collaborative filtering.

- "Users that bought that also bought this."
- "Other people like you also liked this movie."
- "80% of your friends liked this movie, we think you'll like it too."
- "Here are top choices from similar users."
- NOT: "Because you liked that product, we think you'll like this product."


# Collaborative vs content based filtering part II

Look at the df dataframe using the .show() method and/or the .columns method, and determine whether it is best suited for "collaborative filtering", "content-based filtering", or "both".

In [4]:
columns = ['UserId',
 'MovieId',
 'Movie_Title',
 'Genre',
 'Language',
 'Year_Produced',
 'rating']

- Because this dataset includes descriptive tags like genre and language, as well as user ratings, it is suited for both collaborative and content-based filtering.

# Implicit vs explicit data

Recall the differences between implicit and explicit ratings. Take a look at the df1 dataframe to understand whether the data includes implicit or explicit ratings data. df1 shows : `DataFrame[Movie_Title: string, Genre: string, Num_Views: string]`

In [5]:
# Type "implicit" or "explicit"
answer = "implicit"

# Ratings data types

Markus watches a lot of movies, including documentaries, superhero movies, classics, and dramas. Drawing on your previous experience with Spark, use the markus_ratings dataframe, which contains data on the number of times Markus has seen movies in various genres, and think about whether these are implicit or explicit ratings. markus_ratings shows : `DataFrame[Movie_Title: string, Genre: string, Num_Views: int]`

In [6]:
# # Group the data by "Genre"
# markus_ratings.groupBy("Genre").sum().show()

# Alternate uses of recommendation engines.

Select the best definition of "latent features".

- Latent features aren't directly observable by humans, and need mathematical operations to uncover them.

# Confirm understanding of latent features

Matrix P is provided here. Its columns represent movies and its rows represent several latent features. Use your understanding of Spark commands to view matrix P and see if you can determine what some of the latent features might represent. After examining the matrix, look at the dataframe Pi, which contains a rough approximation of what these latent features could represent. See if you weren't far off.

P shows:
```
    +--------+------------+--------+---------+------------+------+----------+
    |Iron Man|Finding Nemo|Avengers|Toy Story|Forrest Gump|Wall-E|Green Mile|
    +--------+------------+--------+---------+------------+------+----------+
    |     0.2|         2.4|     0.1|      2.4|           0|   2.5|         0|
    |     1.5|         1.4|     1.4|      1.3|         1.8|   1.8|       2.5|
    |     2.5|         1.1|     2.4|      0.9|         0.2|   0.9|      0.09|
    |     1.9|           2|     1.5|      2.2|         1.2|   0.3|      0.01|
    |       0|           0|       0|      2.3|         2.2|     0|       2.5|
    +--------+------------+--------+---------+------------+------+----------+
```

Pi shows:
```
    +---------+--------+------------+--------+---------+------------+------+----------+
    | Lat Feat|Iron Man|Finding Nemo|Avengers|Toy Story|Forrest Gump|Wall-E|Green Mile|
    +---------+--------+------------+--------+---------+------------+------+----------+
    | Animated|     0.2|         2.4|     0.1|      2.4|           0|   2.5|         0|
    |    Drama|     1.5|         1.4|     1.4|      1.3|         1.8|   1.8|       2.5|
    |Superhero|     2.5|         1.1|     2.4|      0.9|         0.2|   0.9|      0.09|
    |   Comedy|     1.9|           2|     1.5|      2.2|         1.2|   0.3|      0.01|
    |Tom Hanks|       0|           0|       0|      1.8|         2.2|     0|       2.5|
    +---------+--------+------------+--------+---------+------------+------+----------+
    
```