## Collaborative Filtering

It’s the most sought after, most widely implemented and most mature technologies that is available in the market. Collaborative recommender systems aggregate ratings or recommendations of objects, recognize commonalities between the users on the basis of their ratings, and generate new recommendations based on inter-user comparisons. The greatest strength of collaborative techniques is that they are completely independent of any machine-readable representation of the objects being recommended and work well for complex objects where variations in taste are responsible for much of the variation in preferences. Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future and that they will like similar kind of objects as they liked in the past.

It is basically of two types:-

**User Based Collaborative Filtering** - These systems recommend products to a user that similar users have liked. For measuring the similarity between two users we can either use pearson correlation or cosine similarity. This filtering technique can be illustrated with an example. In the following matrixes, each row represents a user, while the columns correspond to different movies except the last one which records the similarity between that user and the target user.

**Item Based Collaborative Filtering** - Instead of measuring the similarity between users, the item-based CF recommends items based on their similarity with the items that the target user rated. Likewise, the similarity can be computed with Pearson Correlation or Cosine Similarity. The major difference is that, with item-based collaborative filtering, we fill in the blank vertically, as oppose to the horizontal manner that user-based CF does.

### Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


### Loading Data

In [2]:
column_names = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv('Data/u.data', sep='\t', names=column_names)

In [3]:
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,0,50,5,881250949
1,0,172,5,881250949
2,0,133,1,881250949
3,196,242,3,881250949
4,186,302,3,891717742


In [4]:
movie_titles = pd.read_csv("Data/Movie_Id_Titles")
movie_titles.head()

Unnamed: 0,item_id,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


In [5]:
df = pd.merge(df,movie_titles,on='item_id')
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp,title
0,0,50,5,881250949,Star Wars (1977)
1,290,50,5,880473582,Star Wars (1977)
2,79,50,4,891271545,Star Wars (1977)
3,2,50,5,888552084,Star Wars (1977)
4,8,50,5,879362124,Star Wars (1977)


#### item based collaborative filtering

In [6]:
#Average rating for each movie
ratings = pd.DataFrame(df.groupby('title')['rating'].mean())


In [7]:
# no of votes for each movie
ratings['num of ratings'] = pd.DataFrame(df.groupby('title')['rating'].count())

In [8]:
ratings.head()

Unnamed: 0_level_0,rating,num of ratings
title,Unnamed: 1_level_1,Unnamed: 2_level_1
'Til There Was You (1997),2.333333,9
1-900 (1994),2.6,5
101 Dalmatians (1996),2.908257,109
12 Angry Men (1957),4.344,125
187 (1997),3.02439,41


In [9]:
movie_item_mat = df.pivot_table(index='user_id',columns='title',values='rating')
movie_item_mat.head()

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,,,,,,,,,,...,,,,,,,,,,
1,,,2.0,5.0,,,3.0,4.0,,,...,,,,5.0,3.0,,,,4.0,
2,,,,,,,,,1.0,,...,,,,,,,,,,
3,,,,,2.0,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,


In [10]:
ratings=ratings.sort_values('num of ratings',ascending=False)

In [11]:
ratings.head()

Unnamed: 0_level_0,rating,num of ratings
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Star Wars (1977),4.359589,584
Contact (1997),3.803536,509
Fargo (1996),4.155512,508
Return of the Jedi (1983),4.00789,507
Liar Liar (1997),3.156701,485


In [12]:
def Recommend_similar_item(x):
    movie_user_ratings = movie_item_mat[x] # taking user ratings of that particular movie
    similar_to_movie = movie_item_mat.corrwith(movie_user_ratings) # correlation of x with each movie
    corr_movie = pd.DataFrame(similar_to_movie,columns=['Correlation']) # dataframe
    corr_movie.dropna(inplace=True) # droping values with nan corelation
    corr_movie = corr_movie.join(ratings['num of ratings']) # adding a column -> num of rating of each movie 
    print(corr_movie[corr_movie['num of ratings']>100].sort_values('Correlation',ascending=False).head()) # filter top 5 having most correlation with minm votes 100

In [13]:
Recommend_similar_item('Star Wars (1977)')

  c = cov(x, y, rowvar)
  c *= np.true_divide(1, fact)


                                                    Correlation  \
title                                                             
Star Wars (1977)                                       1.000000   
Empire Strikes Back, The (1980)                        0.748353   
Return of the Jedi (1983)                              0.672556   
Raiders of the Lost Ark (1981)                         0.536117   
Austin Powers: International Man of Mystery (1997)     0.377433   

                                                    num of ratings  
title                                                               
Star Wars (1977)                                               584  
Empire Strikes Back, The (1980)                                368  
Return of the Jedi (1983)                                      507  
Raiders of the Lost Ark (1981)                                 420  
Austin Powers: International Man of Mystery (1997)             130  


In [14]:
Recommend_similar_item('Liar Liar (1997)')

  c = cov(x, y, rowvar)
  c *= np.true_divide(1, fact)


                       Correlation  num of ratings
title                                             
Liar Liar (1997)          1.000000             485
Batman Forever (1995)     0.516968             114
Mask, The (1994)          0.484650             129
Down Periscope (1996)     0.472681             101
Con Air (1997)            0.469828             137


#### user based collaborative filtering

In [15]:
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp,title
0,0,50,5,881250949,Star Wars (1977)
1,290,50,5,880473582,Star Wars (1977)
2,79,50,4,891271545,Star Wars (1977)
3,2,50,5,888552084,Star Wars (1977)
4,8,50,5,879362124,Star Wars (1977)


In [16]:
rating_user=pd.DataFrame(df.groupby("user_id")["rating"].mean()) # Avg rating given by each user
rating_user['num of ratings'] = pd.DataFrame(df.groupby('user_id')['rating'].count()) # num of rating given by user

In [17]:
rating_user

Unnamed: 0_level_0,rating,num of ratings
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1
0,3.666667,3
1,3.610294,272
2,3.709677,62
3,2.796296,54
4,4.333333,24
...,...,...
939,4.265306,49
940,3.457944,107
941,4.045455,22
942,4.265823,79


In [18]:
moviemat = df.pivot_table(index='title',columns='user_id',values='rating')
moviemat.head() # item-user matrix

user_id,0,1,2,3,4,5,6,7,8,9,...,934,935,936,937,938,939,940,941,942,943
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Til There Was You (1997),,,,,,,,,,,...,,,,,,,,,,
1-900 (1994),,,,,,,,,,,...,,,,,,,,,,
101 Dalmatians (1996),,2.0,,,,2.0,,,,,...,2.0,,,2.0,4.0,,,,,
12 Angry Men (1957),,5.0,,,,,4.0,4.0,,,...,,,,,,,,,,
187 (1997),,,,2.0,,,,,,,...,,,,,,,,,,


In [19]:
def get_similar_users(x):
    user = moviemat[x] # movies rated by user x
    similar_to_user = moviemat.corrwith(user) # correlation with other users
    corr_user1 = pd.DataFrame(similar_to_user,columns=['Correlation'])
    corr_user1.dropna(inplace=True)
    corr_user1 = corr_user1.join(rating_user['num of ratings'])
    return(corr_user1.sort_values('Correlation',ascending=False).head())

In [20]:
z=int(input("Enter user no."))
pred=get_similar_users(z)
pred

Enter user no.1


  c = cov(x, y, rowvar)
  c *= np.true_divide(1, fact)


Unnamed: 0_level_0,Correlation,num of ratings
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1
351,1.0,44
531,1.0,30
39,1.0,22
866,1.0,20
1,1.0,272


In [21]:
similar_users=pred.index
similar_users # top 5 users with most similarity

Int64Index([351, 531, 39, 866, 1], dtype='int64', name='user_id')

In [22]:
current_user=list(df[df['user_id']==z]["title"])
current_user # movies of user z

['Star Wars (1977)',
 'Empire Strikes Back, The (1980)',
 'Gone with the Wind (1939)',
 'Kolya (1996)',
 'Legends of the Fall (1994)',
 'Hunt for Red October, The (1990)',
 'Remains of the Day, The (1993)',
 'Men in Black (1997)',
 'Star Trek: First Contact (1996)',
 'To Wong Foo, Thanks for Everything! Julie Newmar (1995)',
 'Batman Forever (1995)',
 'Die Hard (1988)',
 'Twister (1996)',
 'Toy Story (1995)',
 'Aladdin (1992)',
 'Jaws (1975)',
 'Chasing Amy (1997)',
 'Silence of the Lambs, The (1991)',
 'Right Stuff, The (1983)',
 'Sleepless in Seattle (1993)',
 'Sting, The (1973)',
 'Crumb (1994)',
 'French Twist (Gazon maudit) (1995)',
 'Evil Dead II (1987)',
 'Last of the Mohicans, The (1992)',
 'Get Shorty (1995)',
 'Fargo (1996)',
 'Return of the Jedi (1983)',
 'Dead Poets Society (1989)',
 'Sound of Music, The (1965)',
 'Angels and Insects (1995)',
 'Nightmare on Elm Street, A (1984)',
 'Brothers McMullen, The (1995)',
 'Young Guns (1988)',
 "Mr. Holland's Opus (1995)",
 'Jean de

In [23]:
recommended_movie=[] # movies watched or rated by similar users
for i in similar_users:
    user=df[df['user_id']==i]["title"]
    for i in user:
        recommended_movie.append(i)

In [24]:
movie=set(recommended_movie)-set(current_user) # Recommended movies

In [25]:
for i in movie:
    print(i)

Spice World (1997)
Assignment, The (1997)
In & Out (1997)
Everyone Says I Love You (1996)
Devil's Advocate, The (1997)
Firestorm (1998)
Desperate Measures (1998)
Free Willy 3: The Rescue (1997)
Washington Square (1997)
Tango Lesson, The (1997)
Dante's Peak (1997)
Mrs. Brown (Her Majesty, Mrs. Brown) (1997)
Spawn (1997)
Kundun (1997)
Half Baked (1998)
Postman, The (1997)
Flubber (1997)
Anastasia (1997)
Wag the Dog (1997)
Wings of the Dove, The (1997)
Rosewood (1997)
Soul Food (1997)
Game, The (1997)
Apt Pupil (1998)
Great Expectations (1998)
Scream 2 (1997)
Saint, The (1997)
Mortal Kombat: Annihilation (1997)
Horse Whisperer, The (1998)
Scream (1996)
Volcano (1997)
Titanic (1997)
Shadow Conspiracy (1997)
Cats Don't Dance (1997)
G.I. Jane (1997)
Alien: Resurrection (1997)
Peacemaker, The (1997)
Murder at 1600 (1997)
Home Alone 3 (1997)
Boogie Nights (1997)
Ice Storm, The (1997)
Air Force One (1997)
Midnight in the Garden of Good and Evil (1997)
Liar Liar (1997)
Deconstructing Harry (1997