# Recommendation Engine

###### A recommendation engine filters the data using different algorithms and recommends the most relevant items to users. It first captures the past behavior of a customer and based on that, recommends products which the users might be likely to buy.

If a completely new user visits an e-commerce site, that site will not have any past history of that user. So how does the site go about recommending products to the user in such a scenario? One possible solution could be to recommend the best selling products, i.e. the products which are high in demand. Another possible solution could be to recommend the products which would bring the maximum profit to the business.

Before we deep dive into this topic, first we’ll think of how we can recommend items to users:

We can recommend items to a user which are most popular among all the users
We can divide the users into multiple segments based on their preferences (user features) and recommend items to them based on the segment they belong to.

The data can be collected by two means: explicitly and implicitly. Explicit data is information that is provided intentionally, i.e. input from the users such as movie ratings. Implicit data is information that is not provided intentionally but gathered from available data streams like search history, clicks, order history, etc.

Two types of recommendation

1) Content based filtering

2) Collaborative filtering 



## Content based filtering

This algorithm recommends products which are similar to the ones that a user has liked in the past.

![image.png](attachment:image.png)

Consider the example of Netflix. They save all the information related to each user in a vector form. This vector contains the past behavior of the user, i.e. the movies liked/disliked by the user and the ratings given by them. This vector is known as the profile vector. All the information related to movies is stored in another vector called the item vector. Item vector contains the details of each movie, like genre, cast, director, etc.

The content-based filtering algorithm finds the cosine of the angle between the profile vector and item vector, i.e. cosine similarity
Based on the cosine value, which ranges between -1 to 1, the movies are arranged in descending order and one of the two below approaches is used for recommendations:

Top-n approach: where the top n movies are recommended (Here n can be decided by the business)
Rating scale approach: Where a threshold is set and all the movies above that threshold are recommended

Other methods that can be used to calculate the similarity are:
    
Euclidean Distance: 

Pearson’s Correlation:

A major drawback of this algorithm is that it is limited to recommending items that are of the same type. It will never recommend products which the user has not bought or liked in the past. So if a user has watched or liked only action movies in the past, the system will recommend only action movies. It’s a very narrow way of building an engine.

## Collaborative filtering

### User-User collaborative filtering
The collaborative filtering algorithm uses “User Behavior” for recommending items.

![image.png](attachment:image.png)

This algorithm is quite time consuming as it involves calculating the similarity for each user and then calculating prediction for each similarity score. 

One way of handling this problem is to select only a few users (neighbors) instead of all to make predictions, i.e. instead of making predictions for all similarity values, we choose only few similarity values. There are various ways to select the neighbors:

Select a threshold similarity and choose all the users above that value

Randomly select the users

Arrange the neighbors in descending order of their similarity value and choose top-N users

Use clustering for choosing neighbors

This algorithm is useful when the number of users is less. Its not effective when there are a large number of users as it will take a lot of time to compute the similarity between all user pairs. This leads us to item-item collaborative filtering, which is effective when the number of users is more than the items being recommended.

### Item-Item collaborative filtering
In this algorithm, we compute the similarity between each pair of items.

![image.png](attachment:image.png)

So in our case we will find the similarity between each movie pair and based on that, we will recommend similar movies which are liked by the users in the past. This algorithm works similar to user-user collaborative filtering with just a little change – instead of taking the weighted sum of ratings of “user-neighbors”, we take the weighted sum of ratings of “item-neighbors”.



#### what will happen if a new user or a new item is added in the dataset? It is called a Cold Start. There can be two types of cold start:

Visitor Cold Start

Product Cold Start