#In this notebook, I detail the workings of a Recommendation System based on Memory Based Collaborative Filtering (CF). This approach finds and groups similar items and users together using mathematical inductions and non-machine learning techniques to objectively classify.

## Import libraries and movie dataset

In [None]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import os
from google.colab import drive
drive.mount('/content/drive')

ratings = pd.read_csv('drive/MyDrive/CZ4032 Project 2/Reference/input/ratings.csv')
movies = pd.read_csv('drive/MyDrive/CZ4032 Project 2/Reference/input/movies.csv')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Take a look at the ratings and movies dataset

You can see 4 attributes: userID that is unique to each user, movieID for each movie, rating out of 5 and timestamp in seconds since standard time.

In [None]:
ratings.head(5)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


By Describing, we know that the average movie rating is 3.5 and a standard deviation of 1.04, with a total count of 100'386 ratings collected. 

In [None]:
ratings.describe()

Unnamed: 0,userId,movieId,rating,timestamp
count,100836.0,100836.0,100836.0,100836.0
mean,326.127564,19435.295718,3.501557,1205946000.0
std,182.618491,35530.987199,1.042529,216261000.0
min,1.0,1.0,0.5,828124600.0
25%,177.0,1199.0,3.0,1019124000.0
50%,325.0,2991.0,3.5,1186087000.0
75%,477.0,8122.0,4.0,1435994000.0
max,610.0,193609.0,5.0,1537799000.0


For the movies dataset, we have the movieID, title string with year in brackets to seperate movies with same titles but different releases, as well as genres seperated by the | symbol.

In [None]:
movies.head(5)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


We have a total of 9742 movies collected.

In [None]:
movies.shape

(9742, 3)

##Merge the two datasets into one: MovieRatings

We use pandas merge to inner join the movies and ratings together. Inner Join means the 2 dataframes are merged on a selected column only, and non-existing elements are ditched. Then we drop timestamp column.

In [None]:
movieratings = pd.merge(ratings, movies, on = 'movieId')
movieratings.drop(['timestamp'], axis=1, inplace=True)
movieratings.head(5)

Unnamed: 0,userId,movieId,rating,title,genres
0,1,1,4.0,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,5,1,4.0,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,7,1,4.5,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
3,15,1,2.5,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
4,17,1,4.5,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy


##Types of Memory-Based Collaborative Filtering

There are 2 types of memory based CF typically used. One is **User-User** and the other is **Item-Item**. This means that the approach is to find similar products of either Similar Users or Similar Items. Similarity is measured by Similarity Metrics such as **Jaccard Similarity** or **Cosine Similarity**. I will walk you through a sample of how both works.

**User-User CF**: Here you create a dump of a pivot table of each userID against movie title and value being the rating. Now you can see what each user rates each movie in a matrix format. 

In [None]:
ratingscollection = movieratings.pivot_table(index = 'userId', columns = 'title', values = 'rating')
ratingscollection.head(5)

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...All the Marbles (1981),...And Justice for All (1979),00 Schneider - Jagd auf Nihil Baxter (1994),1-900 (06) (1994),10 (1979),10 Cent Pistol (2015),10 Cloverfield Lane (2016),10 Items or Less (2006),10 Things I Hate About You (1999),10 Years (2011),"10,000 BC (2008)",100 Girls (2000),100 Streets (2016),101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),101 Dalmatians II: Patch's London Adventure (2003),101 Reykjavik (101 Reykjavík) (2000),102 Dalmatians (2000),10th & Wolf (2006),"10th Kingdom, The (2000)","10th Victim, The (La decima vittima) (1965)","11'09""01 - September 11 (2002)",11:14 (2003),"11th Hour, The (2007)",12 Angry Men (1957),12 Angry Men (1997),12 Chairs (1971),12 Chairs (1976),12 Rounds (2009),12 Years a Slave (2013),...,Zathura (2005),Zatoichi and the Chest of Gold (Zatôichi senryô-kubi) (Zatôichi 6) (1964),Zazie dans le métro (1960),Zebraman (2004),"Zed & Two Noughts, A (1985)",Zeitgeist: Addendum (2008),Zeitgeist: Moving Forward (2011),Zeitgeist: The Movie (2007),Zelary (2003),Zelig (1983),Zero Dark Thirty (2012),Zero Effect (1998),"Zero Theorem, The (2013)",Zero de conduite (Zero for Conduct) (Zéro de conduite: Jeunes diables au collège) (1933),Zeus and Roxanne (1997),Zipper (2015),Zodiac (2007),Zombeavers (2014),Zombie (a.k.a. Zombie 2: The Dead Are Among Us) (Zombi 2) (1979),Zombie Strippers! (2008),Zombieland (2009),Zone 39 (1997),"Zone, The (La Zona) (2007)",Zookeeper (2011),Zoolander (2001),Zoolander 2 (2016),Zoom (2006),Zoom (2015),Zootopia (2016),Zulu (1964),Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,
2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,
3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


Now we will use **Pearson Correlation (PC)** to determine which other users is most similar to a particular user. PC takes a set of data points and computes ratio of similarity to the mean of the set.
![picture](https://editor.analyticsvidhya.com/uploads/39170Formula.JPG)

Now allow me to compute the mean of each user and pad all NaN values with it.

In [None]:
ratingscollection = ratingscollection.apply(lambda row: row.fillna(row.mean()), axis=1)
ratingscollection.head(5)

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...All the Marbles (1981),...And Justice for All (1979),00 Schneider - Jagd auf Nihil Baxter (1994),1-900 (06) (1994),10 (1979),10 Cent Pistol (2015),10 Cloverfield Lane (2016),10 Items or Less (2006),10 Things I Hate About You (1999),10 Years (2011),"10,000 BC (2008)",100 Girls (2000),100 Streets (2016),101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),101 Dalmatians II: Patch's London Adventure (2003),101 Reykjavik (101 Reykjavík) (2000),102 Dalmatians (2000),10th & Wolf (2006),"10th Kingdom, The (2000)","10th Victim, The (La decima vittima) (1965)","11'09""01 - September 11 (2002)",11:14 (2003),"11th Hour, The (2007)",12 Angry Men (1957),12 Angry Men (1997),12 Chairs (1971),12 Chairs (1976),12 Rounds (2009),12 Years a Slave (2013),...,Zathura (2005),Zatoichi and the Chest of Gold (Zatôichi senryô-kubi) (Zatôichi 6) (1964),Zazie dans le métro (1960),Zebraman (2004),"Zed & Two Noughts, A (1985)",Zeitgeist: Addendum (2008),Zeitgeist: Moving Forward (2011),Zeitgeist: The Movie (2007),Zelary (2003),Zelig (1983),Zero Dark Thirty (2012),Zero Effect (1998),"Zero Theorem, The (2013)",Zero de conduite (Zero for Conduct) (Zéro de conduite: Jeunes diables au collège) (1933),Zeus and Roxanne (1997),Zipper (2015),Zodiac (2007),Zombeavers (2014),Zombie (a.k.a. Zombie 2: The Dead Are Among Us) (Zombi 2) (1979),Zombie Strippers! (2008),Zombieland (2009),Zone 39 (1997),"Zone, The (La Zona) (2007)",Zookeeper (2011),Zoolander (2001),Zoolander 2 (2016),Zoom (2006),Zoom (2015),Zootopia (2016),Zulu (1964),Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,...,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.366379,4.0,4.366379
2,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,...,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.0,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276,3.948276
3,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,...,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897,2.435897
4,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,5.0,3.555556,3.555556,3.555556,3.555556,3.555556,...,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556,3.555556
5,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,...,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364,3.636364


I use panda's own Pearson Correlation matrix calculator to compute the similarity amongst the different user's ratings.

In [None]:
usercorrelation = ratingscollection.T.corr()
usercorrelation.head(5)

userId,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,...,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1,1.0,0.001264516,0.0005525772,0.048419,0.021847,-0.045497,-0.006199672,0.047013,0.01950985,-0.008754088,-0.056206,0.013266,0.05938345,0.021815,0.037209,0.009664,0.037997,0.043789,0.085249,0.06699,0.011732,-0.004171,-0.023124,0.008417,0.001181,-0.01351819,0.045303,0.001611,-0.006969,0.003274438,0.006956394,0.04251,0.012918,0.011187,0.02150784,0.02652436,-0.04934852,0.009979,-0.072666,-0.041753,...,-0.03037625,0.046056,0.01408778,0.0407187,0.00123265,-8.180768e-05,0.089396,1.834885e-23,-0.1050026,0.019428,0.007481,0.009521898,-0.005892852,-0.01114252,0.01002267,0.011086,-0.04307956,-0.019135,-0.059246,0.095191,0.0382762,-0.05950684,0.082403,-0.006195,0.03652909,0.035349,0.102631,0.004179383,0.052614,0.069528,0.018127,-0.017172,-0.015221,-0.03705875,-0.02912138,0.012016,0.055261,0.075224,-0.02571255,0.010932
2,0.001265,1.0,-8.476071e-25,-0.017164,0.021796,-0.021051,-0.01111357,-0.048085,-5.534891e-23,0.003011629,-0.002745,0.0,0.003047559,0.012518,-0.03328,-0.014771,-0.001263,-0.047825,-0.000106,0.00061,-0.001949,-0.024828,-0.001169,-0.037418,0.016208,5.351820000000001e-23,-3.7543970000000006e-23,-0.00424,-0.019714,-0.03399468,-1.282363e-22,-0.0329,0.008467,-0.008806,-5.546318e-24,-0.03370691,0.007806392,-0.042923,0.0,-0.026103,...,-9.716309e-24,-0.006322,0.02238275,-0.05331292,5.74238e-23,-8.105733000000001e-23,-0.029436,6.444172e-23,4.0374170000000006e-23,-0.009902,-0.071276,-0.02549881,-0.001193537,-6.607594e-23,0.01072662,-0.012535,-4.196319e-23,-0.04809,-0.035668,0.002703,0.002482889,-0.01031647,-0.02612,0.000338,-4.207526e-23,0.031263,0.006005,0.001917654,-0.000213,-0.00098,-0.050551,-0.031581,-0.001688,-3.345291e-23,-3.088913e-23,0.006226,-0.020504,-0.006001,-0.060091,0.024999
3,0.000553,-8.476071e-25,1.0,-0.01126,-0.031539,0.0048,3.308497e-25,-0.032471,5.123183e-25,2.20398e-25,0.0,0.0,2.345529e-25,0.008453,-0.046046,-0.005142,-0.00469,-0.012322,-0.00956,-0.021145,-0.022848,0.010239,-0.03015,-0.009327,-0.015357,-4.927043e-25,-0.005515577,0.016249,-0.017537,-2.298376e-24,-0.001704944,-0.045848,-0.030532,0.029048,5.1612329999999996e-26,-6.239109000000001e-25,-3.753826e-25,-0.028985,0.028645,-0.003331,...,-0.02542548,-0.00593,-2.362342e-25,-3.146267e-25,-5.318024e-25,-0.04999971,-0.037601,-5.944731e-25,-0.001013576,0.015241,-0.037817,3.81923e-25,9.702808000000001e-25,6.096952e-25,-2.118945e-25,0.031381,-0.01332141,-0.014251,-0.024086,-0.02841,3.576149e-25,-3.7924760000000003e-25,0.01092,-0.007823,3.916421e-25,0.018377,-0.020662,2.830972e-25,0.000223,-0.004669,-0.004904,-0.016117,0.017749,3.0852900000000003e-25,-0.001430628,-0.037289,-0.007789,-0.013001,7.905178e-25,0.01955
4,0.048419,-0.01716402,-0.01125978,1.0,-0.02962,0.013956,0.05809139,0.002065,-0.005873603,0.05159032,0.062453,-0.016056,-0.05903304,-0.058388,0.033399,-0.027573,-0.039002,-0.00409,0.009273,0.035423,-0.005827,-0.006829,-0.014836,-0.008388,-0.008842,-0.01729245,0.01700677,0.014851,0.033426,0.0009596864,-0.006446306,0.056834,0.011651,0.050031,0.004037649,-0.001010067,-0.01558075,0.022612,0.02364,-0.002802,...,0.005153728,-0.014396,-0.001012132,0.03299309,-0.02610603,-0.02594862,0.06023,-2.015518e-23,-0.02099,-0.078191,0.049113,-0.0263061,-0.03076547,-0.008766061,0.004294412,-0.016293,-0.04037277,0.007293,-0.008866,0.087103,-0.01826945,-0.01676408,0.057924,-0.007381,0.009839663,-0.001616,0.044671,-0.01585653,0.014862,0.023748,-0.037687,0.063122,0.02764,-0.01378212,0.04003747,0.02059,0.014628,-0.037569,-0.01788358,-0.000995
5,0.021847,0.02179571,-0.03153892,-0.02962,1.0,0.009111,0.01011715,-0.012284,-1.950809e-24,-0.03316512,-0.013396,0.06644,0.01171895,-0.038016,-0.0153,-0.033442,0.05243,0.021218,-0.014108,0.044086,-0.004803,-0.02254,-0.010344,0.003476,0.021465,0.1042304,0.008737227,0.024003,-0.025456,0.0345571,0.02994713,0.065724,0.057004,0.040536,0.3012556,-0.01759305,0.1350826,0.160828,-0.024023,0.015391,...,-0.01884562,0.034142,0.03135125,0.03843883,0.05390044,-2.857975e-24,0.102826,2.279426e-24,0.003943181,0.005858,0.023493,-1.455653e-24,-0.02190478,0.1170911,0.03089337,-0.045611,0.0925675,0.121282,0.102781,0.050302,-1.3631109999999999e-24,-0.05476072,0.041755,0.007496,-0.009246321,0.020641,0.030517,-1.082644e-24,0.012748,0.021435,0.015964,0.012427,0.027076,0.01246135,-0.03627206,0.026319,0.031896,-0.001751,0.09382892,-0.000278


Now let us examine user1's data only. We want to see which other user is similar to user1 in rating their movies. In this case, userID 301 is most similar to userID 1 as you can see with highest non-self user1score of 0.124.

In [None]:
user1corr = usercorrelation.iloc[0]
user1corr.sort_values(ascending=False, inplace=True)
user1corr.dropna(inplace = True)
user1corr.head(4)

userId
1      1.000000
301    0.124799
597    0.102631
414    0.101348
Name: 1, dtype: float64

Now here is the fun part for the recommendation system. Say we know that/see that User 1 has not rated the **movieID 32: 'Twelve Monkeys (a.k.a. 12 Monkeys)(1995)'**. I will demonstrate how we pull the rating out of thin air using correlation magic and recommend it to user 1 if it is higher than a threshold.

Below you see user 1's rating is not present for it.

In [None]:
movieratings[ (movieratings['userId'] == 1) & (movieratings['movieId'] == 32) ]

Unnamed: 0,userId,movieId,rating,title,genres


First I find 100 users that are most similar to User 1 in the movieratings database. Use index slicing and exclude the first one because it is user 1 itself. After calculating, the results are conclusive that 56 users similar to user 1 have rated movieID 32: 'Twelve Monkeys (a.k.a. 12 Monkeys)(1995)'. That is, we can leverage these 56 users to estimate User 1 rating for it. 

In [None]:
hundreduserslikeuser1 = user1corr[1:101].keys()
count = 0
users = list()
for user in hundreduserslikeuser1:
    if movieratings[ (movieratings['userId'] == user) & (movieratings['movieId'] == 32) ]['rating'].sum()  :
        count +=1
        users.append(user)
print(count)

56


Using these 56 users, I pool together all their ratings for this movie, weighted them through addition, and then divide by the similarity shared between all of the users. This gives us user1's estimated **rating of 4.01 stars**. It is quite high > 4, so perhaps it can be recommended. This is how naive User-User CF functions.

In [None]:
sum_similarity = 0
weighted_ratings = 0
for user in users:
    weighted_ratings += user1corr[1:101].loc[user] * movieratings[ (movieratings['userId'] == user) & 
                                                          (movieratings['movieId'] == 32) ]['rating'].sum()
    sum_similarity += user1corr[1:101].loc[user]

print(weighted_ratings / sum_similarity)

4.012647949637183


## **Item-Item CF** seeks to locate similar movies to the one that is preferred by the user. It provides movies that are rated similarly to one.

![picture](https://miro.medium.com/max/609/1*4-c4LZRDJVFXBzWiRpaK4A.png)

This helps recommend more movies that the user likes. The method is very similar to the ones I use above. Now say we want to find the movies that are similarly rated to Jurassic Park (1993). 

![picture](https://a.ltrbxd.com/resized/sm/upload/8x/wj/zt/pp/jurassic-park-1200-1200-675-675-crop-000000.jpg?k=e43b153360)

I start again by creating the user pivot table, but this time pad the NaN values with the **mean of each movie** instead of the previous mean of each user. 

In [None]:
ratingscollection = movieratings.pivot_table(index = 'userId', columns = 'title', values = 'rating')
ratingscollection = ratingscollection.apply(lambda col : col.fillna(col.mean()), axis=0)
ratingscollection.head(5)

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...All the Marbles (1981),...And Justice for All (1979),00 Schneider - Jagd auf Nihil Baxter (1994),1-900 (06) (1994),10 (1979),10 Cent Pistol (2015),10 Cloverfield Lane (2016),10 Items or Less (2006),10 Things I Hate About You (1999),10 Years (2011),"10,000 BC (2008)",100 Girls (2000),100 Streets (2016),101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),101 Dalmatians II: Patch's London Adventure (2003),101 Reykjavik (101 Reykjavík) (2000),102 Dalmatians (2000),10th & Wolf (2006),"10th Kingdom, The (2000)","10th Victim, The (La decima vittima) (1965)","11'09""01 - September 11 (2002)",11:14 (2003),"11th Hour, The (2007)",12 Angry Men (1957),12 Angry Men (1997),12 Chairs (1971),12 Chairs (1976),12 Rounds (2009),12 Years a Slave (2013),...,Zathura (2005),Zatoichi and the Chest of Gold (Zatôichi senryô-kubi) (Zatôichi 6) (1964),Zazie dans le métro (1960),Zebraman (2004),"Zed & Two Noughts, A (1985)",Zeitgeist: Addendum (2008),Zeitgeist: Moving Forward (2011),Zeitgeist: The Movie (2007),Zelary (2003),Zelig (1983),Zero Dark Thirty (2012),Zero Effect (1998),"Zero Theorem, The (2013)",Zero de conduite (Zero for Conduct) (Zéro de conduite: Jeunes diables au collège) (1933),Zeus and Roxanne (1997),Zipper (2015),Zodiac (2007),Zombeavers (2014),Zombie (a.k.a. Zombie 2: The Dead Are Among Us) (Zombi 2) (1979),Zombie Strippers! (2008),Zombieland (2009),Zone 39 (1997),"Zone, The (La Zona) (2007)",Zookeeper (2011),Zoolander (2001),Zoolander 2 (2016),Zoom (2006),Zoom (2015),Zootopia (2016),Zulu (1964),Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1,4.0,4.0,3.5,5.0,4.0,1.5,3.176471,3.0,3.666667,3.285714,2.0,3.166667,4.5,4.0,3.375,1.25,3.678571,2.666667,3.527778,3.5,2.705882,3.25,2.5,3.074468,3.431818,2.5,3.5,2.777778,4.5,2.75,4.0,4.0,3.75,4.0,4.149123,5.0,4.5,5.0,3.0,3.625,...,3.375,4.0,3.0,3.5,3.0,4.5,5.0,3.75,4.0,4.0,4.107143,3.966667,3.0,4.0,1.0,2.0,3.710526,2.5,4.0,0.5,3.877358,2.0,3.5,2.75,3.509259,2.5,2.5,2.5,3.890625,4.0,1.5,4.0625,3.666667,3.0,3.0,3.863636,2.770833,2.0,4.0,1.0
2,4.0,4.0,3.5,5.0,4.0,1.5,3.176471,3.0,3.666667,3.285714,2.0,3.166667,4.5,4.0,3.375,1.25,3.678571,2.666667,3.527778,3.5,2.705882,3.25,2.5,3.074468,3.431818,2.5,3.5,2.777778,4.5,2.75,4.0,4.0,3.75,4.0,4.149123,5.0,4.5,5.0,3.0,3.625,...,3.375,4.0,3.0,3.5,3.0,4.5,5.0,3.75,4.0,4.0,4.107143,3.966667,3.0,4.0,1.0,2.0,3.710526,2.5,4.0,0.5,3.0,2.0,3.5,2.75,3.509259,2.5,2.5,2.5,3.890625,4.0,1.5,4.0625,3.666667,3.0,3.0,3.863636,2.770833,2.0,3.134615,1.0
3,4.0,4.0,3.5,5.0,4.0,1.5,3.176471,3.0,3.666667,3.285714,2.0,3.166667,4.5,4.0,3.375,1.25,3.678571,2.666667,3.527778,3.5,2.705882,3.25,2.5,3.074468,3.431818,2.5,3.5,2.777778,4.5,2.75,4.0,4.0,3.75,4.0,4.149123,5.0,4.5,5.0,3.0,3.625,...,3.375,4.0,3.0,3.5,3.0,4.5,5.0,3.75,4.0,4.0,4.107143,3.966667,3.0,4.0,1.0,2.0,3.710526,2.5,4.0,0.5,3.877358,2.0,3.5,2.75,3.509259,2.5,2.5,2.5,3.890625,4.0,1.5,4.0625,3.666667,3.0,3.0,3.863636,2.770833,2.0,3.134615,1.0
4,4.0,4.0,3.5,5.0,4.0,1.5,3.176471,3.0,3.666667,3.285714,2.0,3.166667,4.5,4.0,3.375,1.25,3.678571,2.666667,3.527778,3.5,2.705882,3.25,2.5,3.074468,3.431818,2.5,3.5,2.777778,4.5,2.75,4.0,4.0,3.75,4.0,5.0,5.0,4.5,5.0,3.0,3.625,...,3.375,4.0,3.0,3.5,3.0,4.5,5.0,3.75,4.0,4.0,4.107143,3.966667,3.0,4.0,1.0,2.0,3.710526,2.5,4.0,0.5,3.877358,2.0,3.5,2.75,3.509259,2.5,2.5,2.5,3.890625,4.0,1.5,4.0625,3.666667,3.0,3.0,3.863636,2.770833,2.0,3.134615,1.0
5,4.0,4.0,3.5,5.0,4.0,1.5,3.176471,3.0,3.666667,3.285714,2.0,3.166667,4.5,4.0,3.375,1.25,3.678571,2.666667,3.527778,3.5,2.705882,3.25,2.5,3.074468,3.431818,2.5,3.5,2.777778,4.5,2.75,4.0,4.0,3.75,4.0,4.149123,5.0,4.5,5.0,3.0,3.625,...,3.375,4.0,3.0,3.5,3.0,4.5,5.0,3.75,4.0,4.0,4.107143,3.966667,3.0,4.0,1.0,2.0,3.710526,2.5,4.0,0.5,3.877358,2.0,3.5,2.75,3.509259,2.5,2.5,2.5,3.890625,4.0,1.5,4.0625,3.666667,3.0,3.0,3.863636,2.770833,2.0,3.134615,1.0


Extract the Pearson correlation matrix and sort based on highest first on only the movie Jurassic Park. Drop all null values. 

In [None]:
JPcorrelation = ratingscollection.corr()
JPcorrelation = JPcorrelation['Jurassic Park (1993)']
JPcorrelation = JPcorrelation.sort_values(ascending=False)
JPcorrelation.dropna(inplace=True)

And here are the top 10 movies most similarly rated to Jurassic Park 1993, with their correlation values. Enjoy.

In [None]:
JPcorrelation.head(11)

title
Jurassic Park (1993)                           1.000000
Fugitive, The (1993)                           0.324717
Lethal Weapon (1987)                           0.318646
Independence Day (a.k.a. ID4) (1996)           0.263629
Mission: Impossible (1996)                     0.258080
Ghostbusters (a.k.a. Ghost Busters) (1984)     0.256527
Mulan (1998)                                   0.255672
Rise of the Planet of the Apes (2011)          0.248134
Bug's Life, A (1998)                           0.240964
Indiana Jones and the Temple of Doom (1984)    0.239826
Die Hard (1988)                                0.239294
Name: Jurassic Park (1993), dtype: float64