> ## Recommender System using Python

Based on Linear Algebra the recommendation system are broadly classified into two categories:
- Content Based: **Filtering based on the similarity between items.**

* Collaborative Based (which are further classified into two sub categories): **Filtering based on the user preferences.**
    * Memory Based.
    * Model Based.
    


>## **Content Based** Recommender System example:

 >Imported some main and useful Libraries:

In [None]:
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

> The column_names statement will decide the columns of our DataFrame, separated by the tab and used `pd.read_csv`.

In [None]:
column_names=['user_id','item_id','rating','timestamp']
df =pd.read_csv('../input/movielens/u.data.csv',sep='\t',names=column_names)
movieT=pd.read_csv('../input/movie-titles/Movie_Id_Titles')
df.head(6)

>Table Description as follows:

* **user_id**: ID of the user.<br>
* **item_id**: ID of the item (in this case it's movies).<br>
* **rating**: Rating given to a movie.<br>
* **timestamp**: The time at which the rating was given.



In [None]:
#Similarly,
movieT.head(5) #To check first 5 rows of the DataFrame.

>Now used `pd.merge` to merge **movieT** and **df** DataFrames,<br>

`merger['rating'].value_counts()` Prints out unique records per entry in the **rating** column.<br>

`merger[merger['rating']==5].count()` Counts in the DF which is **merger**, where "rating=5".<br><br>
     The above two statements can be used for calculating some useful stats out the data.<br>
     For more documentation on Pandas merge method:
      __[Click here](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html)__


In [None]:
merger=pd.merge(df,movieT,on='item_id')
merger.head(5)

>The below statement groups the **title** column and do the average (mean) at the **rating** column.<br>
Note: This returns a series object.

In [None]:
merger.groupby('title')['rating'].mean().sort_values(ascending=False).head()

>The below statement groups the **title** column and do the count at the **rating** column and returns the highest rating along with the title.<br>
Note: This returns a series object.

In [None]:
merger.groupby('title')['rating'].count().sort_values(ascending=False).head()

>Converting it into a DataFrame objectnamed as **rating** by the below statement.

In [None]:
ratings = pd.DataFrame(merger.groupby('title')['rating'].mean())
ratings.head()

>Similarly making a new column as **NumofRatings** .

In [None]:
ratings['NumofRatings'] = pd.DataFrame(merger.groupby('title')['rating'].count())
ratings.head()

>The first plot with the title: **Count of Rating given by the user** gives us an overview that most of the times ratings are provided to 0 or 1 rated movies.<br>
The second plot wih the title: **Rating provided by the user** gives us an overview that users have provided ratings mostly between 3-4 to a movie.

In [None]:
plt.figure(figsize=(10,4))
ratings['NumofRatings'].hist(bins=80,color='black') 
plt.title("Count of Rating given by the user")
plt.figure(figsize=(10,4))
ratings['rating'].hist(bins=50,color='black') 
plt.title("Rating provided by the user")

>The joint plot can be used to analyze the relationship between **rating** and **number of ratings**  which is directly proportional to each other as describe below.

In [None]:
sns.jointplot(x='rating',y='NumofRatings',data=ratings,alpha=0.3,color='k',kind='scatter',marker="*",)

>Converting the DataFrame into matrix form using __[Pivot](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html)__ method.<br>
Each cell will consist of the rating the user gave to that movie. Note there will be a lot of NaN values, because most people have not seen most of the movies


In [None]:
moviematrix = merger.pivot_table(index='user_id',columns='title',values='rating')
moviematrix.head(5)

>Most rated movies:

In [None]:
ratings.sort_values('NumofRatings',ascending=False).head(10)

>Let's choose two movies: **Star Wars** and **Liar Liar**.<br>
The user ratings for the above movies can be analyzed using below statements:

In [None]:
starwars_Uratings = moviematrix['Star Wars (1977)']
liarliar_Uratings = moviematrix['Liar Liar (1997)']
starwars_Uratings.head(), liarliar_Uratings.head(), print("The object type is: {}".format(type(starwars_Uratings))) #Returns a Pandas series object


>The  __[corrwith](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.corrwith.html)__ method is used to compute pairwise correlation between rows or columns of two data frame objects instead of index for columns of the data frame.

In [None]:
similar_starwars = moviematrix.corrwith(starwars_Uratings)
similar_liarliar = moviematrix.corrwith(liarliar_Uratings)

>Clean the data **similar_starwars** & **similar_liarliar** by removing NaN values and using a DataFrame instead of a series. The below data provides a value in the **Correlation_StarWars** column that shows how other movies ratings in the data set are correlated the **Star Wars (1977)**.

In [None]:
corr1starwars = pd.DataFrame(similar_starwars,columns=['Correlation_StarWars'])
corr1starwars.dropna(inplace=True)
corr1starwars.head()

>Based on the above data we can sort out similar movies to **Star Wars (1977)** as: <br>
"1.0 for perfectly correlated"<br>
>We can relate NumberofRatings as:<br>`ratings['NumofRatings'] = pd.DataFrame(merger.groupby('title')['rating'].count())`
<br>Joining the two DataFrames using join  __[join](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.join.html)__
 method as <br>`corr1starwars = corr1starwars.join(ratings['NumofRatings'])` <br>Output will be:

In [None]:
corr1starwars.sort_values('Correlation_StarWars',ascending=False).head(50)

>For movies having **NumberofRatings>100** we can sort at **Correlation_StarWars** to give movies recommendations to user as follows: 

In [None]:
corr1starwars = corr1starwars.join(ratings['NumofRatings'])

In [None]:
Recommendations_StarWars=corr1starwars[corr1starwars['NumofRatings']>100].sort_values('Correlation_StarWars',ascending=False).head(10)

In [None]:
print("Recommendations based on Star Wars (1977) movie are")
Recommendations_StarWars

>Same is applicable to **Liar Liar (1997)** as follows:

In [None]:
corr1liarliar = pd.DataFrame(similar_liarliar,columns=['Correlation_liarliar'])
corr1starwars.dropna(inplace=True)
corr1liarliar.head()
corr1liarliar.sort_values('Correlation_liarliar',ascending=False).head(50)
corr1liarliar = corr1liarliar.join(ratings['NumofRatings'])
Recommendations_liarliar=corr1liarliar[corr1liarliar['NumofRatings']>100].sort_values('Correlation_liarliar',ascending=False).head(10)
print("Recommendations based on Liar Liar (1997) movie are:")
Recommendations_liarliar