# What is Recommendation System ?
Recommender/recommendation system is a subclass of information filtering system that seeks to predict the rating/ preference a user would give to an item.

They are primarily used in applications where a person/ entity is involved with a product/ service. To further improve their experience with this product, we try to personalize it to their needs. For this we have to look up at their past interactions with this product.

*In one line* -> **Specialized content for everyone.**

*For further info, [Wiki](https://en.wikipedia.org/wiki/Recommender_system#:~:text=A%20recommender%20system%2C%20or%20a,would%20give%20to%20an%20item.)*

Here we will learn the foll.

* 1. Popularity based Recommender System
* 2. Cosine Similarity (code from scratch)


# Popularity based recommender system
As the name suggests it recommends based on what is currently trending/ popular across the site. This is particularly useful when you don't have past data as a reference to recommend product to the user. It is not tailor fit for any particular group of audience or movie.

# Import packages and dataset

In [None]:
import numpy as np
import pandas as pd

We will recommend movies based on ratings they have got. For the huge dataset, we will use just movies & ratings data.

In [None]:
#Import movie ratings
data_ratings = pd.read_csv('../input/movie-lens-small-latest-dataset/ratings.csv')
data_ratings.head()

In [None]:
#Import movie data
data_movies = pd.read_csv('../input/movie-lens-small-latest-dataset/movies.csv')
data_movies.head()

In [None]:
#Merge both the datasets
movie_ratings = pd.merge(data_movies, data_ratings, on = 'movieId')
print(movie_ratings.shape)
movie_ratings.head()

# Recommend Popular Movies

This dataset doesn't need much of data preprocessing also there are no NaN values so we can directly proceed over to recommending popular movies based on ratings.

**Things to do:**

* Groupby all movie titles together and find their mean ratings
* Sort movies based on ratings from highest to lowest
* Recommend top n popular movies

In [None]:
#Groupby all movie titles together and find their mean ratings
movie_ratings.groupby('title')['rating'].mean().head()

In [None]:
#Sort movies based on ratings from highest to lowest
movie_ratings.groupby('title')['rating'].mean().sort_values(ascending = False)

In [None]:
#Recommend top n popular movies
n = 10

movie_ratings.groupby('title')['rating'].mean().sort_values(ascending = False).head(n)

**How many users have rated a given movie ?**

In [None]:
movie_ratings['title'].value_counts()
#movie_ratings.groupby('title')['rating'].count().sort_values(ascending = False).head() either of the 2 gives same output

**What is movie rating and how many people voted for this ?**

In [None]:
#First create a DataFrame
data = pd.DataFrame(movie_ratings.groupby('title')['rating'].mean())
data['rating_counts'] = pd.DataFrame(movie_ratings['title'].value_counts())
#data['rating_counts'] = pd.DataFrame(movie_ratings.groupby('title')['rating'].count()) #either of the 2 codes
data.head()

# Calculating Cosine Similarity

Cosine similarity is a measure of similarity between two non-zero vectors, that measures the cosine of the angle between them. Here we would a write code for cosine similarity from scratch.

*For more info -> [Wiki](https://en.wikipedia.org/wiki/Cosine_similarity#:~:text=Cosine%20similarity%20is%20a%20measure,to%20both%20have%20length%201.)*

**Things to do:**

* Import math package
* Create square root and cosine similarity function

In [None]:
#load packages
from math import *

#Creating 2 functions, square root and cosine similarity just like the formula

def square_rooted(x):
    return round(sqrt(sum([a*a for a in x])),3)

def cosine_similarity(x,y):
    numerator = sum(a*b for a,b in zip(x,y))
    denominator = square_rooted(x) * square_rooted(y)
    return round(numerator/ float(denominator),3)

print(cosine_similarity([3,45,7,2],[2,54,13,15]))

**0.972 implies a good cosine score.**