# <center> Recommendation Systems

## <center> Collaborative Filtering

<center><img src="uservsitem.jpeg">

## <center> Model-based
<center><img src="modelbased.png">

## <center> Similarity Metrics

### Cosine Similarity
<center><img src="cosinesim.png" height=500 width=500></center>

### Jaccard Similarity
<center><img src="jaccard.png" height=500 width=500></center>
<center><img src="jaccard2.png" height=500 width=500></center>

### <center> User-based example

In [None]:
import numpy as np
import pandas as pd
df = pd.read_csv('sushi_recs.csv')
df

In [None]:
df.fillna(0,inplace=True)
df

In [None]:
 from sklearn.metrics.pairwise import cosine_similarity

In [None]:
print('Customer 0 and 1 similarity:',cosine_similarity(np.array(df.loc[0]).reshape(1,-1), np.array(df.loc[1]).reshape(1,-1))[0][0])

In [None]:
print('Customer 1 and 2 similarity:',cosine_similarity(np.array(df.loc[1]).reshape(1,-1), np.array(df.loc[2]).reshape(1,-1))[0][0])

In [None]:
## recommendations for customer 4
recs_df = pd.DataFrame()
similarities = []
customers = []
for customer in df.index:
    similarities.append(cosine_similarity(np.array(df.loc[4]).reshape(1,-1), np.array(df.loc[customer]).reshape(1,-1))[0][0])
    customers.append(customer)
recs_df['Cosine Similarity'] = similarities
recs_df['Customer'] = customers
recs_df.sort_values('Cosine Similarity', ascending=False)

In [None]:
df.loc[[4,1]]

We can suggest that customer EN tries a tempura roll.

In [None]:
def jaccard_similarity(list1, list2):
    s1 = set(list1)
    s2 = set(list2)
    return len(s1.intersection(s2)) / len(s1.union(s2))

In [None]:
df

In [None]:
customer_likes = []
for customer in df.T.columns:
    customer_likes.append([sushi for sushi in df.T.index if df.loc[customer][sushi]==1.0])
customer_likes

In [None]:
jaccard_similarity(customer_likes[2], customer_likes[1])

In [None]:
recs_df = pd.DataFrame()
similarities = []
customers = []
for customer in customer_likes:
    similarities.append(jaccard_similarity(customer_likes[4], customer))
    customers.append(customer)
recs_df['Jaccard Similarity'] = similarities
recs_df['Liked Sushi'] = customers
recs_df.sort_values('Jaccard Similarity', ascending=False)

### <center> Item-based example

In [None]:
df = pd.read_csv('sushi_recs.csv')
options = [1,2,2,3,3,3,4,4,5]
for i in df.columns:
    df[i] = [np.random.choice(options) if x==1.0 else np.nan for x in df[i] ]
df

In [None]:
customers, sushis, ratings = [],[],[]
sushi = 0
for i in df.columns:
    customer = 0
    for i in df[i]:
        customers.append(df.index[customer])
        sushis.append(sushi)
        ratings.append(i)
        customer += 1
    sushi += 1
data_df = pd.DataFrame()
data_df['Customer'] = customers
data_df['Sushi'] = sushis
data_df['Rating'] = ratings
data_df

In [None]:
from surprise import KNNBasic, Reader, Dataset

In [None]:
reader = Reader(rating_scale=(1,5))
data = Dataset.load_from_df(data_df.fillna(0), reader)
sim_options = {'name': 'cosine', 'user_based':'False'}
algo = KNNBasic(sim_options=sim_options)
model = algo.fit(data.build_full_trainset())

In [None]:
df

In [None]:
## getting predicted ratings for unrated sushi
customer = 4
not_rated = [list(df.columns).index(x) for x in df.columns[df.loc[customer].isna()==True]]
for i in not_rated:
    print(df.columns[i],":",model.predict(0,i).est)

#### <center> <a href="https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html">Prediction Algorithms </a>


### <center> Singular Value Decomposition

<center><img src='svd_image.png'>

In [None]:
from sklearn.decomposition import TruncatedSVD
svd = TruncatedSVD(n_components=2)
df_transformed = svd.fit_transform(df.fillna(0).T)

In [None]:
df_transformed

In [None]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12,12))
plt.title('Item-based')
ax.scatter(df_transformed[:,0],df_transformed[:,1],s=300)
for sushi,i in zip(df.columns,[x for x in range(len(df.columns))]):
    ax.annotate(sushi, (df_transformed[:,0][i], df_transformed[:,1][i]))

In [None]:
svd = TruncatedSVD(n_components=2)
df_transformed = svd.fit_transform(df.fillna(0))
fig, ax = plt.subplots(figsize=(12,12))
plt.title('User-based')
ax.scatter(df_transformed[:,0],df_transformed[:,1],s=300)
for customer,i in zip(df.index,[x for x in range(len(df.index))]):
    ax.annotate(customer, (df_transformed[:,0][i], df_transformed[:,1][i]))

<center><img src="svd.png">

# <center> Activity
<center> The files <i>anime_data.csv</i> and <i>anime_ratings.csv</i> contain data from 1000 users on the top 100 anime shows from Myanimelist.com.

<b>1)</b> Perform a train-test split using a test size of 0.15 and a random state of 4. (https://surprise.readthedocs.io/en/stable/getting_started.html#train-test-split-example)

In [None]:
from surprise.model_selection import train_test_split


<b>2)</b> Choose a collaborative filtering method and implement it on the ratings dataset.

<b>3)</b> Check the RMSE of the model on the test set.

In [None]:
from surprise.accuracy import rmse


<b>4)</b> Try additional prediction algorithms and compare the performance. Choose the best one. https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html

<b>5)</b> What are the top 5 shows you would recommend for user 47?