<h1 align='center'>Recommender system</h1>
<h2 align='center'>Introduction to similarity learning and recommendation systems</h2>
<h3 align='center'>Training work</h3>

In [2]:
import pandas as pd
import numpy as np

<h2 align='center'>Part 1</h1>
<h3 align='center'>Preparing data</h2>

Primarily, let's take a look at the data:

In [6]:
raw_data = pd.read_csv("dataset/netflix_titles.csv")
raw_data.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,"August 14, 2020",2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
1,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",2016,TV-MA,93 min,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
2,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,"December 20, 2018",2011,R,78 min,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
3,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,"November 16, 2017",2009,PG-13,80 min,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
4,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,"January 1, 2020",2008,PG-13,123 min,Dramas,A brilliant group of students become card-coun...


So, we should clear up some data features and leave the most relevant and easy-to-evaluate

The case is rather problematic, there is no assessment of features' weight without users' action information. In this way the task should be reformulated to leaving the most unambiguous features

What about uncertain data, "date_added", "release_year", "duration" should be highlighted. That data can't unambiguous show what user would prefer, but it should be noticed that the features can be participated in evaluation with a collaborative and approaches. "show_id" features in calculation won't be participate, so we can drop it

And now we can transform our dataframe:

In [23]:
input_data = raw_data.drop(['show_id', 'title', 'date_added', 'release_year', 'duration',], axis=1)
input_data.index = raw_data.title
input_data.head()

Unnamed: 0_level_0,type,director,cast,country,rating,listed_in,description
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
3%,TV Show,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,TV-MA,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
7:19,Movie,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,TV-MA,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
23:59,Movie,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,R,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
9,Movie,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,PG-13,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
21,Movie,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,PG-13,Dramas,A brilliant group of students become card-coun...


Also we can notice that in that dataframe movies and tv shows are mixed, necessity for output tv show during movie analysis is arguabled and vauge. So it should be separated:

In [27]:
movie = input_data.query('type == "Movie"')
movie.head()

Unnamed: 0_level_0,type,director,cast,country,rating,listed_in,description
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
7:19,Movie,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,TV-MA,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
23:59,Movie,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,R,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
9,Movie,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,PG-13,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
21,Movie,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,PG-13,Dramas,A brilliant group of students become card-coun...
122,Movie,Yasir Al Yasiri,"Amina Khalil, Ahmed Dawood, Tarek Lotfy, Ahmed...",Egypt,TV-MA,"Horror Movies, International Movies","After an awful accident, a couple admitted to ..."


In [26]:
tv_show = input_data.query('type == "TV Show"')
tv_show.head()

Unnamed: 0_level_0,type,director,cast,country,rating,listed_in,description
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
3%,TV Show,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,TV-MA,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
46,TV Show,Serdar Akar,"Erdal Beşikçioğlu, Yasemin Allen, Melis Birkan...",Turkey,TV-MA,"International TV Shows, TV Dramas, TV Mysteries",A genetics professor experiments with a treatm...
1983,TV Show,,"Robert Więckiewicz, Maciej Musiał, Michalina O...","Poland, United States",TV-MA,"Crime TV Shows, International TV Shows, TV Dramas","In this dark alt-history thriller, a naïve law..."
1994,TV Show,Diego Enrique Osorno,,Mexico,TV-MA,"Crime TV Shows, Docuseries, International TV S...",Archival video and new interviews examine Mexi...
Feb-09,TV Show,,"Shahd El Yaseen, Shaila Sabt, Hala, Hanadi Al-...",,TV-14,"International TV Shows, TV Dramas","As a psychology professor faces Alzheimer's, h..."


<h2 align='center'>Part 2</h2>
<h3 align='center'>Approaches for similarity evaluation</h3>

<h3 align='center'>2.1. Similarity measure</h3>