### Recommender Systems:

What is a recommender Systems? Before we deep-dive, let's discuss some of the examples of recommender systems in our daily use.

- Amazon's recommendation based product search.
- Youtube's video recommendation base on the current video being watched.
- Surprise me feature in Netflix.
- Instagram reels.
- Tiktok.

Broadly there are 3 classes of recommender systems:
- Pre-2006: (Simple techniques like Content based, similarity based).
- 2007-2015: Matrix Factorization. (Famous Netflix Prize)
- 2015+: Deep learning based approach.


**Problem Formulation:**
- Imagine, we have a bunch of users $(U_i)$ and bunch of items.
- User: $U_i$ i=1$\rightarrow$n & Items:$I_j$ j=1$\rightarrow$m (both are very very large numbers).

**Task:**
- Give some suggestions to user about items (preferrable ranked) they might be interested. For e.g. $U_i  \rightarrow I_{10}, I_{12}, I_{16}, I_{18}, $

**Dataset:**
- Is represented as Matrix A:
    <img src='https://drive.google.com/uc?id=1550hQlw6UPFy0-MX-rSFG8hc31r1t85c'>
- This data is setup using historical data such that $A_{ij}$ could have rating for product bought or video watched. It could also be empty if user $U_i$ did not watch movie $I_j$.
- There will be a lot of empty cells. Hence, this matrix is also referred to as Sparse Matrix.
- $A_{n*m}$ would be very large. For e.g. There could be $10^8$ movies/vidoes on youtube of which we would have watched 1000 odd movies in the last 10 years.
- If there were around $10^9$ users then Sparsity would be $\frac{10^9 users * 10^3 vidoes}{10^17} = 10^{-5}$ i.e. 1 in every 100k cells is non-empty in A.

#### ***Question:*** Can you use techniques that we already know for recommending some items to user?
**Task:**
- Given $A_{n*m}$, we need to recommend some items to user $U_i$.

**Idea 1:**
- $U_i \rightarrow I_{10}, I_{12}$ (already bought items).
- Now, $\forall_j$, find Sim($I_j$,$I_{10}$) and Sim($I_j$,$I_{12}$) where Sim function uses Cosine Similarity to find items similar to $I_{10}$ and $I_{12}$  .
    <img src='https://drive.google.com/uc?id=1YV7uqWE7t0RN2BIHjxBzTb4yYyez2oXL'>
- This is nothing but Item-Item similarity based recommender system.


**Idea 2:**
- $U_i \rightarrow I_{10}, I_{18}$ (already bought items).
- Now, $\forall_j$, find Sim($U_j$,$U_i$) where Sim function uses Cosine Similarity to find users similar to $U_i$.
- Let $U_{10}, U_{26} and U_{58}$ be most similar to $U_i$.
- Now, we find the items that were brought by these users and exclude the once that were already bought by user $U_i$.
- We can also do a frequency based approach to suggest items from above subset.
    <img src='https://drive.google.com/uc?id=1UXbBpyE2D2SwQ_y3aZDs8_n9GrpeDpUF'>
- This is nothing but User-User similarity based recommender system.


*Note:* The above ideas is what we refer to as collaborative filtering based recommender systems, where it uses similarities between users and items simultaneously to provide recommendations.

#### Question: What if there is a new user/new item? (Often referred to as Cold Start problem)


<img src='https://drive.google.com/uc?id=1_sNEkPEf81kVRWzDRPUnfCbm--oPigYv'>

**Idea 1:**
- Frequently bought items or popular items could be recommended.
- We can go deep one step. For e.g. location based popular recommendations based on GPS/Pincode.
- This is the additional data about users (metadata) that we collected for the users in the beginning. For e.g. Credit Card Type, Gender, Age etc.
- If the Item is new, we would have seller provided information. Using this and some additional data, we can find User-User similarities or I-I similarities.
- This approach is called Content-based Recommender Systems.

**Idea 2:**
- Regression/Classification based approach.
<img src='https://drive.google.com/uc?id=1k6ry8MhtY71XluXcujZnWiFhgRAVsBQr'>
- Let say, we have $U_i \in $ ${\mathbb{R}^d}^{'}$ (User metadata) and $I_j \in \mathbb{R}^d$ (Item features provided by Seller).
- Assume that $A_{ij}$ = 4 which is the rating given by user $U_i$ on item $I_j$.
- Then we could treat these ratings as $y_i$ and $x_i$ = $U_i$ + $I_j$ as shown in figure and build a classification/regression model to predict $y_i$.

#### Question: For new user, what will be the rating?

- Here, for each new user, we would be predicting ratings $\forall_j$.
- This task will be very very expensive as we would have few millon products/videos.
<img src='https://drive.google.com/uc?id=1dRIZErwFms_Vv6dlespED7p6Kl4hKeLk'>


#### Question: Back in the days, when this methodology (Collaborative Filtering) was used, wasn't it time consuming to calculate so many similarities on the go? How was it Productionalized?

- The Item-Item Similarity matrix would be calculated nightly/weekly.
<img src='https://drive.google.com/uc?id=1awajKu90E1Uy6ZHrPZrkAOfcRnzjR3J7'>
- This matrix information would then be stored in couple of dictionaries.
<img src='https://drive.google.com/uc?id=1421aZY_4jYBnhOYxIlwuFve4m-nvQIcV'>
- One dictionary would store information about Items as **Key** and the similar items as **values**. These Key-value pairs could be accessed in O(1) time complexity.
<img src='https://drive.google.com/uc?id=1AmmSV2mMwc68srdtqww4edi9C-KMZNwC'>
- Another dictionary would store users as **Keys** and products purchased or vidoes watched as **values**. These Key-value pairs could be accessed in O(1) time complexity.
- These dictionaries could also be stored as distributed dictionaries by using Redis/Mem Cache.