In [1]:
import pandas as pd
import numpy as np

pd.set_option("display.max_columns", None)

## Streaming Service Recommender
#### Goals

- Build data frame to be used for recommender

We want our final data frame to llok like the following:

| streaming_service | Drama | Comedy | Animation | ... |
|-------------------|-------|--------|-----------|-----|
| **Netflix**       | 0.3765 | 0.2735 | 0.1680   | ....|
| **Amazon**        | 0.3349 | 0.2432 | 0.1619   | ... |
| ... | ... | ... |... | ...|

Our genres should be our columns and the streaming services should be the index.

### 1. Import data

In [2]:
netflix = pd.read_pickle("../Data/netflix_genres_ratio.pkl")

In [3]:
amazon = pd.read_pickle("../Data/amazon_genres_ratio.pkl")

In [4]:
hbo = pd.read_pickle("../Data/hbo_genres_ratio.pkl")

### 2. Get list of total genres

We will first get the lists of genres for each streaming service, we will them add them together and afterwards convert it to a set to remove duplicates.

In [5]:
netflix_genres = netflix["genre"].to_list()

amazon_genres = amazon["genre"].to_list()

hbo_genres = hbo["genre"].to_list()

In [6]:
len(netflix_genres)

26

In [7]:
total_genres = list(set(netflix_genres + amazon_genres + hbo_genres))

len(total_genres)

26

In [8]:
total_genres.sort()

In [9]:
total_genres

['Action',
 'Adventure',
 'Animation',
 'Biography',
 'Comedy',
 'Crime',
 'Documentary',
 'Drama',
 'Family',
 'Fantasy',
 'Game-Show',
 'History',
 'Horror',
 'Music',
 'Musical',
 'Mystery',
 'News',
 'Reality-TV',
 'Romance',
 'Sci-Fi',
 'Short',
 'Sport',
 'Talk-Show',
 'Thriller',
 'War',
 'Western']

Looks like the three streaming services shared the same genres because we have the same number of genres as for Netflix and Amazon, HBO has 24.

### 3. Create new data frame

Since they share the same genres, we will prepare each data frame, then append them with each other and finally creating a pivot table which we will use for the recommender.

#### i. Prepare each data frame

In [10]:
netflix["streaming_service"] = "Netflix"

netflix_genres = netflix[["genre", "ratio", "streaming_service"]]

netflix_genres.head()

Unnamed: 0,genre,ratio,streaming_service
0,Drama,0.3767,Netflix
1,Comedy,0.2736,Netflix
2,Documentary,0.1718,Netflix
3,Animation,0.1675,Netflix
4,Crime,0.1466,Netflix


In [11]:
amazon["streaming_service"] = "Amazon"

amazon_genres = amazon[["genre", "ratio", "streaming_service"]]

amazon_genres.head()

Unnamed: 0,genre,ratio,streaming_service
0,Drama,0.3349,Amazon
1,Comedy,0.2432,Amazon
2,Documentary,0.2018,Amazon
3,Animation,0.1619,Amazon
4,Crime,0.1252,Amazon


In [12]:
hbo["streaming_service"] = "HBO"

hbo_genres = hbo[["genre", "ratio", "streaming_service"]]

hbo_genres.head()

Unnamed: 0,genre,ratio,streaming_service
0,Drama,0.5089,HBO
1,Comedy,0.3905,HBO
2,Crime,0.1538,HBO
3,Documentary,0.1006,HBO
4,Mystery,0.0828,HBO


#### ii. Append data frames

In [13]:
genres_features = netflix_genres.append(amazon_genres).append(hbo_genres).reset_index()

#### iii. Create pivot table

In [14]:
genres_recommender = genres_features.pivot_table(index="streaming_service",
                                      columns="genre",
                                      values="ratio").fillna(0)

In [15]:
genres_recommender

genre,Action,Adventure,Animation,Biography,Comedy,Crime,Documentary,Drama,Family,Fantasy,Game-Show,History,Horror,Music,Musical,Mystery,News,Reality-TV,Romance,Sci-Fi,Short,Sport,Talk-Show,Thriller,War,Western
streaming_service,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
Amazon,0.1212,0.1228,0.1619,0.0191,0.2432,0.1252,0.2018,0.3349,0.1045,0.0574,0.0152,0.0606,0.0263,0.0096,0.0032,0.0582,0.0048,0.1021,0.055,0.0327,0.0144,0.0152,0.0072,0.0287,0.0112,0.0199
HBO,0.0769,0.0533,0.0473,0.0355,0.3905,0.1538,0.1006,0.5089,0.0414,0.0473,0.0059,0.0769,0.0118,0.0237,0.0059,0.0828,0.0296,0.0237,0.0769,0.0237,0.0,0.0533,0.0414,0.0355,0.0059,0.0
Netflix,0.1399,0.119,0.1675,0.0221,0.2736,0.1466,0.1718,0.3767,0.062,0.0571,0.0123,0.0387,0.0344,0.0178,0.0061,0.0571,0.0018,0.0883,0.0785,0.0344,0.0025,0.016,0.011,0.0479,0.0098,0.0031


### 4. Export final data frame

In [16]:
# genres_recommender.to_pickle("../Data/genres_recommender.pkl")