# ✏️Intro to content-based recommendations

- Recomendation made by finding items with similar attributes are called **content-based** recommendations. 
- We use items attributes in away that allows us to mathematically compare between them. 
- We need to transform our data so we can have the items in the rows and the attributes in the columns. 

```python
# crosstabulation
pd.crosstab(book_genre_df['Book'], book_genre_df['Genre'])
```

# ✏️ Why use content-based models?

Imagine you are working for a large retailer that has a constantly changing product line, with new items being added every day. Why might content-based models be a good choice to make recommendations on your data?

**A)** You are always guaranteed better recommendations with content-based data. ❌

**B)** Content-based models always recommend the newest products; customers always like the newest products no matter what their past preferences were. ❌

**C)** As the recommendations are based on the item attributes rather than user feedback, recommendations can be made on never-before-purchased products. ✅




# ✏️ Creating content-based data

As much as you might want to jump right to finding similar items and making recommendations, you first need to get your data in a usable format. In the next few exercises, you will explore your base data and work through how to format that data to be used for content-based recommendations.

As a reminder, the desired outcome is a row per movie with each column indicating whether a genre applies to the movie. You will be looking at `movie_genre_df`, which contains these columns:

- `name` - Name of movie
- `genre_list` - Genre that the movie has been labeled as

A movie may have multiple genres, and therefore multiple rows. In this exercise, you will particularly focus on one movie (`Toy Story` in this case) to be able to clearly see what is happening with the data.

In [1]:
import pandas as pd

In [8]:
df = pd.read_csv('movies.csv')
df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [9]:
df[['name','year']] = df.title.str.split('\(|\)', expand=True).iloc[:,[0,1]]

In [10]:
df

Unnamed: 0,movieId,title,genres,name,year
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,Toy Story,1995
1,2,Jumanji (1995),Adventure|Children|Fantasy,Jumanji,1995
2,3,Grumpier Old Men (1995),Comedy|Romance,Grumpier Old Men,1995
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance,Waiting to Exhale,1995
4,5,Father of the Bride Part II (1995),Comedy,Father of the Bride Part II,1995
...,...,...,...,...,...
9737,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy,Black Butler: Book of the Atlantic,2017
9738,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy,No Game No Life: Zero,2017
9739,193585,Flint (2017),Drama,Flint,2017
9740,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation,Bungo Stray Dogs: Dead Apple,2018


In [12]:
df.drop("title", axis=1, inplace=True)

In [15]:
df

Unnamed: 0,movieId,genres,name,year
0,1,Adventure|Animation|Children|Comedy|Fantasy,Toy Story,1995
1,2,Adventure|Children|Fantasy,Jumanji,1995
2,3,Comedy|Romance,Grumpier Old Men,1995
3,4,Comedy|Drama|Romance,Waiting to Exhale,1995
4,5,Comedy,Father of the Bride Part II,1995
...,...,...,...,...
9737,193581,Action|Animation|Comedy|Fantasy,Black Butler: Book of the Atlantic,2017
9738,193583,Animation|Comedy|Fantasy,No Game No Life: Zero,2017
9739,193585,Drama,Flint,2017
9740,193587,Action|Animation,Bungo Stray Dogs: Dead Apple,2018


How many different movies are contained in `movie_genre_df`?


- 50 ❌

- 21 ✅

- 11 ❌

- Get the rows in `movie_genre_df` which have a name equal to Toy Story and save this as toy_story_genres.