# Defining the Statement of Problem

<li> This notebook implements a notebook recommender system
<li> Recommender systems are used to suggest movies or songs to users based on their interest or usage history
<li> In this example we will use User and Item-bases Collaborative Filter
<li> Dataset MovieLens: <a href="https://grouplens.org/datasets/movielens/100k/">https://grouplens.org/datasets/movielens/100k/</a>
<li> Photo Credit: <a href="https://pxhere.com/en/photo/1588369">https://pxhere.com/en/photo/1588369</a>


# Import libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Extracción y verificación de data

In [2]:
movie_titles_df = pd.read_csv('Movie_Id_Titles')
movie_titles_df

Unnamed: 0,item_id,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)
...,...,...
1677,1678,Mat' i syn (1997)
1678,1679,B. Monkey (1998)
1679,1680,Sliding Doors (1998)
1680,1681,You So Crazy (1994)


In [3]:
movie_titles_df.head(9)

Unnamed: 0,item_id,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)
5,6,Shanghai Triad (Yao a yao yao dao waipo qiao) ...
6,7,Twelve Monkeys (1995)
7,8,Babe (1995)
8,9,Dead Man Walking (1995)


In [4]:
movie_titles_df.tail(9)

Unnamed: 0,item_id,title
1673,1674,Mamma Roma (1962)
1674,1675,"Sunchaser, The (1996)"
1675,1676,"War at Home, The (1996)"
1676,1677,Sweet Nothing (1995)
1677,1678,Mat' i syn (1997)
1678,1679,B. Monkey (1998)
1679,1680,Sliding Doors (1998)
1680,1681,You So Crazy (1994)
1681,1682,Scream of Stone (Schrei aus Stein) (1991)


In [5]:
movies_rating_df = pd.read_csv('u.data', sep='\t', names=['user_id','item_id','rating','timestamp'])

In [6]:
movies_rating_df

Unnamed: 0,user_id,item_id,rating,timestamp
0,0,50,5,881250949
1,0,172,5,881250949
2,0,133,1,881250949
3,196,242,3,881250949
4,186,302,3,891717742
...,...,...,...,...
99998,880,476,3,880175444
99999,716,204,5,879795543
100000,276,1090,1,874795795
100001,13,225,2,882399156


## Dropeando columnas que no sirven

In [7]:
movies_rating_df.drop(['timestamp'],axis=1,inplace=True)
movies_rating_df

Unnamed: 0,user_id,item_id,rating
0,0,50,5
1,0,172,5
2,0,133,1
3,196,242,3
4,186,302,3
...,...,...,...
99998,880,476,3
99999,716,204,5
100000,276,1090,1
100001,13,225,2


## El modelo en términos estadísticos

In [8]:
movies_rating_df.describe()

Unnamed: 0,user_id,item_id,rating
count,100003.0,100003.0,100003.0
mean,462.470876,425.520914,3.529864
std,266.622454,330.797791,1.125704
min,0.0,1.0,1.0
25%,254.0,175.0,3.0
50%,447.0,322.0,4.0
75%,682.0,631.0,4.0
max,943.0,1682.0,5.0


In [9]:
movies_rating_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100003 entries, 0 to 100002
Data columns (total 3 columns):
user_id    100003 non-null int64
item_id    100003 non-null int64
rating     100003 non-null int64
dtypes: int64(3)
memory usage: 2.3 MB


# Merge/Join para homologar la data de los titulos con la de los ratings de users

In [10]:
movies_rating_df = pd.merge(movies_rating_df,movie_titles_df, on = 'item_id')
movies_rating_df

Unnamed: 0,user_id,item_id,rating,title
0,0,50,5,Star Wars (1977)
1,290,50,5,Star Wars (1977)
2,79,50,4,Star Wars (1977)
3,2,50,5,Star Wars (1977)
4,8,50,5,Star Wars (1977)
...,...,...,...,...
99998,840,1674,4,Mamma Roma (1962)
99999,655,1640,3,"Eighth Day, The (1996)"
100000,655,1637,3,Girls Town (1996)
100001,655,1630,3,"Silence of the Palace, The (Saimt el Qusur) (1..."


In [11]:
movies_rating_df.shape

(100003, 4)