## Item-Based Collaborative filtering

This is a second version of anime recommendation system. We will use another dataset, however is totally connected with 'anime.csv'. What is different at this project? First of all let's say something for our title:
- **Collaborative Filtering (CF)**: Recommend items based only on the users past behavior.
- **Item-based (IB)**: Find similar items to those that I have previously liked.

Also I-B recommenders are personalized systems. Content Based (CB) that we used in the previous project wasn't personalized.

So the question is :

"*Personalised vs Non-Personalised CF ?*"

CF recommendations are personalized since the “prediction” is based on the ratings expressed by similar users. Those neighbors are different for each target user.
A non-personalized collaborative-based recommendation can be generated by averaging the recommendations of ALL the users. 

We will work with IB-CF process:
- Look into the items the target user has rated
- Compute how similar they are to the target item
    - Similarity only using past ratings from other users!
- Select k most similar items.

No time to waste, let's start!

**Feel free to download dataset from [here](https://www.kaggle.com/CooperUnion/anime-recommendations-database)**
 

In [1]:
# load libraries

import pandas as pd
import numpy as np

# and datasets

rates = pd.read_csv('rating.csv')
anime = pd.read_csv('anime.csv')

rates.shape

(7813737, 3)

'rates' dataset has almost 8 million rows. For computation reasons and to save some memory we will keep a small sample from the initial dataset.

In [2]:
mini_rates = rates[rates.user_id <= 2000]
mini_rates.head()

Unnamed: 0,user_id,anime_id,rating
0,1,20,-1
1,1,24,-1
2,1,79,-1
3,1,226,-1
4,1,241,-1


#names = anime.filter(['name','anime_id'], axis=1)
#mini_rates = mini_rates.merge(names, left_on = 'anime_id', right_on = 'anime_id', suffixes= ['_user', ''])

Rating column has some '-1' values. These are for users that didn't rate the anime. We don't need them, so we replace them.

In [3]:
mini_rates = mini_rates.replace({-1 : np.nan}, regex = True)
mini_rates.head(2)

Unnamed: 0,user_id,anime_id,rating
0,1,20,
1,1,24,


In [4]:
# Change dataset's form, make it look like pivot table

user_ratings = mini_rates.pivot_table(index=['user_id'],columns=['anime_id'], values='rating')
user_ratings.head()

anime_id,1,5,6,7,8,15,16,17,18,19,...,33798,33902,33934,33964,34085,34103,34136,34173,34240,34325
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,8.0,,,6.0,,6.0,6.0,,...,,,,,,,,,,


We are about to create a matrix with similarities between anime movies. This a [Pearson's Correlation coefficient matrix](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient). Pearson's Correlation is a centered [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) formula.

In [5]:
# We'll use the min_periods argument to throw out results
# where fewer than 100 users rated a given movie pair.
# please be a little patient

corr_matrix = user_ratings.corr(method = 'pearson', min_periods = 100)
corr_matrix.head()

anime_id,1,5,6,7,8,15,16,17,18,19,...,33798,33902,33934,33964,34085,34103,34136,34173,34240,34325
anime_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,,0.318329,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,
6,0.318329,,1.0,,,,,,,,...,,,,,,,,,,
7,,,,,,,,,,,...,,,,,,,,,,
8,,,,,,,,,,,...,,,,,,,,,,


We will choose randomly a user who rated some movies in the past. According to user's history we will find recommendations for him/her.

In [6]:
user1 = user_ratings.loc[1].dropna()
user1

anime_id
8074     10.0
11617    10.0
11757    10.0
15451    10.0
Name: 1, dtype: float64

In [7]:
# empty pandas series to add recommendations

recoms = pd.Series()

for i in range(0,len(user1.index)):
        print('Adding similars for ', user1.index[i], '...')
        # Retrieve similar movies to this one that I rated
        sims = corr_matrix[user1.index[i]].dropna()
        # Now scale its similarity by how well I rated this movie
        sims = sims.map(lambda x: x * user1.iloc[i])
        # Add the score to the list of similarity candidates
        recoms = recoms.append(sims)

Adding similars for  8074 ...
Adding similars for  11617 ...
Adding similars for  11757 ...
Adding similars for  15451 ...


In [8]:
# sorting highest reccomendations
recoms.sort_values(inplace = True, ascending = False)

# group by anime. 
# Some recommendations may be the same for several user1's ratings

recoms = recoms.groupby(recoms.index).sum()
recoms.sort_values(inplace = True, ascending = False)
print(recoms.head(10))

# drop user1's ratings. We don't want duplicates
filter_sims = recoms.drop(user1.index)

# final recommendations
print('\nrecommendations for user1\n')
for r in filter_sims.index[:5]:
    print('title: ',anime.loc[anime['anime_id']==r]['name'].iloc[0],',  ',
          'rating:',anime.loc[anime['anime_id']==r]['rating'].iloc[0])

11617    28.801591
15451    28.371814
8074     25.305184
13663    24.768280
12549    24.706200
11757    23.848984
3712     22.868168
19163    22.542697
9367     22.539561
11319    22.485182
dtype: float64

recommendations for user1

title:  To LOVE-Ru Darkness ,   rating: 7.82
title:  Dakara Boku wa, H ga Dekinai. ,   rating: 6.96
title:  Zero no Tsukaima: Princesses no Rondo ,   rating: 7.6
title:  Date A Live II ,   rating: 7.5
title:  Freezing ,   rating: 7.2
