# Recommendation Systems

# Outline

1. What are Recommendation Systems?
2. Why do we need Recommendation Systems?
3. Applications of Recommendation Systems in products that we use daily ?
    + Ecommerce sites: Amazon/Flipkart
    + Video Streaming: Netflix/Youtube/Prime Videos
    + music streaming: Saavn/Gaana/Spotify
    + Book recommendations: Goodreads
4. Formal problem statement
5. What are the different techniques of building Recommendation Systems?
    + Content Based
    + Collaborative Filtering
    + Hybrid
6. Dataset - brief description
7. Build recommendation Systems incrementally -
    + Recommend most popular items
    + item-item similarity recommendation
    + user-user similarity recommendation
    + Matrix-Factorization based recommendation
8. Problems with recommendation systems in general.

# Use Cases

<img src='images/netflix1.png'>


<br>


<img src='images/netflix3.png'>

<br>

<img src='images/amazon.png'>

<br>

<img src='images/amazon2.png'>

# The Dataset - Review Apps For Android

Download it from http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Apps_for_Android_5.json.gz

In [5]:
import pandas as pd
import gzip
from pprint import pprint
import matplotlib
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

In [7]:
ls data

meta_Apps_for_Android.json.gz       reviews_Apps_for_Android_5.json
ratings_Apps_for_Android.csv        reviews_Apps_for_Android_5.json.gz
ratings_Movies_and_TV.csv


In [8]:
#as given in the site

def parse(path):
  g = gzip.open(path, 'rb')
  for l in g:
    yield eval(l)

def getDF(path):
  i = 0
  df = {}
  for d in parse(path):
    df[i] = d
    i += 1
  return pd.DataFrame.from_dict(df, orient='index')


In [9]:
df = getDF('data/reviews_Apps_for_Android_5.json.gz')

In [10]:
df.head()

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime
0,A1N4O8VOJZTDVB,B004A9SDD8,Annette Yancey,"[1, 1]","Loves the song, so he really couldn't wait to ...",3.0,Really cute,1383350400,"11 2, 2013"
1,A2HQWU6HUKIEC7,B004A9SDD8,"Audiobook lover ""Kathy""","[0, 0]","Oh, how my little grandson loves this app. He'...",5.0,2-year-old loves it,1323043200,"12 5, 2011"
2,A1SXASF6GYG96I,B004A9SDD8,Barbara Gibbs,"[0, 0]",I found this at a perfect time since my daught...,5.0,Fun game,1337558400,"05 21, 2012"
3,A2B54P9ZDYH167,B004A9SDD8,"Brooke Greenstreet ""Babylove""","[3, 4]",My 1 year old goes back to this game over and ...,5.0,We love our Monkeys!,1354752000,"12 6, 2012"
4,AFOFZDTX5UC6D,B004A9SDD8,C. Galindo,"[1, 1]",There are three different versions of the song...,5.0,This is my granddaughters favorite app on my K...,1391212800,"02 1, 2014"


<br>
<br>
## metadata:

    reviewerID - ID of the reviewer, e.g. A2SUAM1J3GNN3B

    asin - ID of the product, e.g. 0000013714

    reviewerName - name of the reviewer

    helpful - helpfulness rating of the review, e.g. 2/3

    reviewText - text of the review

    overall - rating of the product

    summary - summary of the review

    unixReviewTime - time of the review (unix time)

    reviewTime - time of the review (raw)
    
<br>
<br>

In [15]:
#rename certain columns
df = df.rename(columns={'asin':'itemID','overall':'rating'})

In [16]:
from random import choice

desktop_agents = ['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
                 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14',
                 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
                 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
                 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0']
 
def get_random_header():
    return {'User-Agent': choice(desktop_agents),'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}

# get name of the items
import requests
from bs4 import BeautifulSoup
from functools import lru_cache

#cache up to 1000 items to avoid calling the API repeatedly
@lru_cache(maxsize=1000)
def get_name_of_item(itemID):
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    url = "https://www.amazon.com/dp/"
    item_url = url+itemID
    html = requests.get(item_url,headers=get_random_header())
    soup = BeautifulSoup(html.content,"html5lib")
    item = soup.find(name='div',attrs={'id':'mas-title'})
    item_name = ""
    if item!=None and len(item.contents) >0:
        item_name = item.contents[0].string
    return item_name

In [17]:
get_name_of_item("B00FAPF5U0")

'Candy Crush Saga'

In [18]:
print("number of reviews: ",df.shape[0])
print("number of items: ",df.itemID.unique().shape[0])
print("number of users: ",df.reviewerID.unique().shape[0])

number of reviews:  752937
number of items:  13209
number of users:  87271


# Data Exploration

# Data preparation

# Implementation



## Popularity Based Model

## Similarity Based Models

### item-item Similarity

### user-user Similarity

### Matrix Factorization Based Approach

<img src='images/mf.jpeg'>

<br>

In [25]:
# data preparation

In [26]:
# matrix factorization svds

In [27]:
# predictions

# What next ?

# Questions ?

# References:

1. [My blog post](https://medium.com/data-science-for-everyone/how-does-recommendation-systems-work-f3e1c96b14e8)
2. https://beckernick.github.io/matrix-factorization-recommender/
3. https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-recommendation-engine-python/
4. [Machine Learning Paradigms - Applications in Recommender Systems ](https://www.springer.com/in/book/9783319191348)
5. https://towardsdatascience.com/how-did-we-build-book-recommender-systems-in-an-hour-part-2-k-nearest-neighbors-and-matrix-c04b3c2ef55c