We will develop basic recommendation systems using Python and pandas.
In this notebook, we will focus on providing a basic recommendation system by suggesting items that are most similar to a particular item(movies).
Keep in mind, this is not a true robust recommendation system, to describe it more accurately,it just tells you what movies/items are most similar to your movie choice.

# Importing Libraries

In [1]:
import numpy as np
import pandas as pd

# Importing Datasets From Excel Files

In [2]:
movies=pd.read_excel("D:\CV\Projects present in CV\Movie recommender system/movies.xlsx")
ratings=pd.read_excel("D:\CV\Projects present in CV\Movie recommender system/ratings.xlsx",sheet_name="ratings")

In [3]:
#dropping of extra columns
#in data cleaing i added some extra columns to make an attempt with recommendation based on genre of the movie so i have to drop those things here
movies=movies.drop(['Genres','Genere','Genere2','Genere3','Genere4','Column1'],axis=1)

In [4]:
#this data gives us the movieId and title for that movie id
#further columns tell us about the genre of the movies
movies.head()

Unnamed: 0,movieId,title,Adventure,Animation,Children,Comedy,Fantasy,Romance,Crime,Thriller,Drama,Horror,Sci-Fi,Mystery,War,Musical,Action,Documentary,NO Genre
0,1,Toy Story (1995),True,True,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False
1,2,Jumanji (1995),True,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False
2,3,Grumpier Old Men (1995),False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False
3,4,Waiting to Exhale (1995),False,False,False,True,False,True,False,False,True,False,False,False,False,False,False,False,False
4,5,Father of the Bride Part II (1995),False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False


In [5]:
#the column 1 consist of user ids which have rated the movies present in data set
#movie id connects movies with userId
ratings.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


In [6]:
#merging the two dataframes together
ratings=pd.merge(movies,ratings)
ratings=ratings.iloc[:,[0,1,-2,-1]]
ratings

Unnamed: 0,movieId,title,userId,rating
0,1,Toy Story (1995),1,4.0
1,1,Toy Story (1995),5,4.0
2,1,Toy Story (1995),7,4.5
3,1,Toy Story (1995),15,2.5
4,1,Toy Story (1995),17,4.5
...,...,...,...,...
100831,193581,Black Butler: Book of the Atlantic (2017),184,4.0
100832,193583,No Game No Life: Zero (2017),184,3.5
100833,193585,Flint (2017),184,3.5
100834,193587,Bungo Stray Dogs: Dead Apple (2018),184,3.5


In [7]:
#making a pivot table similar to excel 
#to create a correlation table we have to convert the data to this format
user_ratings=ratings.pivot_table(index=['userId'],columns=['title'],values='rating')
user_ratings

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...,Zulu (2013),[REC] (2007),[REC]Â² (2009),[REC]Â³ 3 GÃ©nesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),Â¡Three Amigos! (1986),Ã€ nous la libertÃ© (Freedom for Us) (1931)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,4.0,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,,,,,,,,,,,...,,,,,,,,,,
607,,,,,,,,,,,...,,,,,,,,,,
608,,,,,,,,,,,...,,,,,,4.5,3.5,,,
609,,,,,,,,,,,...,,,,,,,,,,


In [8]:
#many nan values
#we have to drop some values as many movies are rated by less than 10 userids which may create noise in data set

In [9]:
#dropping The movies which are rated by less than 10 people
#replacing NAN with 0
#we could have standardize the value here but we didnt do that beacuse there were many 0s
user_ratings=user_ratings.dropna(thresh=10,axis=1).fillna(0)
user_ratings

title,"'burbs, The (1989)",(500) Days of Summer (2009),10 Cloverfield Lane (2016),10 Things I Hate About You (1999),"10,000 BC (2008)",101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),12 Angry Men (1957),12 Years a Slave (2013),127 Hours (2010),...,Zack and Miri Make a Porno (2008),Zero Dark Thirty (2012),Zero Effect (1998),Zodiac (2007),Zombieland (2009),Zoolander (2001),Zootopia (2016),eXistenZ (1999),xXx (2002),Â¡Three Amigos! (1986)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
607,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,3.0,0.0,4.5,3.5,0.0
609,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [10]:
##using correlation 
#here we are observing the correlation between the movies based on ratings of various users
#we are using pearson's method of correlation
item_similarity=user_ratings.corr(method='pearson')
item_similarity

title,"'burbs, The (1989)",(500) Days of Summer (2009),10 Cloverfield Lane (2016),10 Things I Hate About You (1999),"10,000 BC (2008)",101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),12 Angry Men (1957),12 Years a Slave (2013),127 Hours (2010),...,Zack and Miri Make a Porno (2008),Zero Dark Thirty (2012),Zero Effect (1998),Zodiac (2007),Zombieland (2009),Zoolander (2001),Zootopia (2016),eXistenZ (1999),xXx (2002),Â¡Three Amigos! (1986)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"'burbs, The (1989)",1.000000,0.063117,-0.023768,0.143482,0.011998,0.087931,0.224052,0.034223,0.009277,0.008331,...,0.017477,0.032470,0.134701,0.153158,0.101301,0.049897,0.003233,0.187953,0.062174,0.353194
(500) Days of Summer (2009),0.063117,1.000000,0.142471,0.273989,0.193960,0.148903,0.142141,0.159756,0.135486,0.200135,...,0.374515,0.178655,0.068407,0.414585,0.355723,0.252226,0.216007,0.053614,0.241092,0.125905
10 Cloverfield Lane (2016),-0.023768,0.142471,1.000000,-0.005799,0.112396,0.006139,-0.016835,0.031704,-0.024275,0.272943,...,0.242663,0.099059,-0.023477,0.272347,0.241751,0.195054,0.319371,0.177846,0.096638,0.002733
10 Things I Hate About You (1999),0.143482,0.273989,-0.005799,1.000000,0.244670,0.223481,0.211473,0.011784,0.091964,0.043383,...,0.243118,0.104858,0.132460,0.091853,0.158637,0.281934,0.050031,0.121029,0.130813,0.110612
"10,000 BC (2008)",0.011998,0.193960,0.112396,0.244670,1.000000,0.234459,0.119132,0.059187,-0.025882,0.089328,...,0.260261,0.087592,0.094913,0.184521,0.242299,0.240231,0.094773,0.088045,0.203002,0.083518
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zoolander (2001),0.049897,0.252226,0.195054,0.281934,0.240231,0.184324,0.274260,0.122107,0.017351,0.091416,...,0.304364,0.243820,-0.006269,0.242033,0.299522,1.000000,0.108147,0.097147,0.338034,0.109455
Zootopia (2016),0.003233,0.216007,0.319371,0.050031,0.094773,0.054024,0.077594,0.056742,0.063325,0.225747,...,0.286213,0.156603,0.011418,0.214385,0.298504,0.108147,1.000000,0.046885,0.200762,0.020595
eXistenZ (1999),0.187953,0.053614,0.177846,0.121029,0.088045,0.047804,0.085606,-0.001708,0.002528,0.128638,...,0.088202,0.028566,0.167541,0.145741,0.068763,0.097147,0.046885,1.000000,0.163022,0.138611
xXx (2002),0.062174,0.241092,0.096638,0.130813,0.203002,0.156932,0.248820,0.074306,0.037469,0.153335,...,0.271180,0.193624,0.080585,0.209840,0.203285,0.338034,0.200762,0.163022,1.000000,0.065673


We can clearly see that how movies are correlated with each other each movie is correlated with other on a %
ALL ACTION MOVIES ARE CORRELATED WITH EACH OTHER AND ALL ROMANTIC MOVIES ARE CORRELATED WTH EACH OTHER

In [13]:
#to get recommendation based on items correlation on ratings

#WE ARE DEFINING A FUNCTION WHICH WILL SORT THE VALUES OF CORRELATED VALUES AND WILL SHOW US THE VALUES
def get_similar_movies(movie_name,user_rating):
    similar_score = item_similarity[movie_name]*(user_rating-2.5)
    similar_score = similar_score.sort_values(ascending=False)
    
    return similar_score


In [41]:
#a user who has rated these movies the user is action lover
#we can add new values to the tuple

In [40]:
action_lover = [("Catch Me If You Can (2002)",1),("10 Things I Hate About You (1999)",2)
               ]

#a new data frame has been made here which will return similar kind of movie

similar_movies = pd.DataFrame()

for movie,rating in action_lover:
    similar_movies = similar_movies.append(get_similar_movies(movie,rating),ignore_index=True)

all_recommend = similar_movies.sum().sort_values(ascending=False)

for movie,score in all_recommend.iteritems():
    if not check_seen(movie,action_lover):
        print(movie)

Disclosure (1994)
Madness of King George, The (1994)
Like Water for Chocolate (Como agua para chocolate) (1992)
Jean de Florette (1986)
Specialist, The (1994)
Juror, The (1996)
Star Trek: Generations (1994)
Sliver (1993)
Nixon (1995)
Paper, The (1994)
Client, The (1994)
Free Willy 2: The Adventure Home (1995)
Crimson Tide (1995)
Nobody's Fool (1994)
Drop Zone (1994)
Diabolique (1996)
Kazaam (1996)
Fatal Instinct (1993)
Postman, The (Postino, Il) (1994)
Body Heat (1981)
Sudden Death (1995)
Exit to Eden (1994)
Don Juan DeMarco (1995)
All Quiet on the Western Front (1930)
Coco (2017)
Another Stakeout (1993)
Mulholland Falls (1996)
Cliffhanger (1993)
Dead Man Walking (1995)
Clear and Present Danger (1994)
Boys on the Side (1995)
Forget Paris (1995)
Mighty Aphrodite (1995)
Quiz Show (1994)
Screamers (1995)
Heavy Metal (1981)
Remains of the Day, The (1993)
City Slickers II: The Legend of Curly's Gold (1994)
Devil in a Blue Dress (1995)
Something to Talk About (1995)
Conquest of the Planet of

Avengers: Age of Ultron (2015)
Clueless (1995)
Neighbors (2014)
Saw II (2005)
Cop Land (1997)
Karate Kid, Part III, The (1989)
Adventures of Priscilla, Queen of the Desert, The (1994)
Lethal Weapon 3 (1992)
Robin Hood: Prince of Thieves (1991)
Final Destination 2 (2003)
Jurassic Park (1993)
Dr. No (1962)
Metropolis (2001)
One, The (2001)
Lethal Weapon 4 (1998)
Sword in the Stone, The (1963)
River Runs Through It, A (1992)
Six Days Seven Nights (1998)
Muppet Treasure Island (1996)
What About Bob? (1991)
Mission: Impossible - Rogue Nation (2015)
To Kill a Mockingbird (1962)
Night at the Museum: Battle of the Smithsonian (2009)
Showtime (2002)
Universal Soldier (1992)
Volcano (1997)
Frozen (2013)
Fox and the Hound, The (1981)
Guardians of the Galaxy 2 (2017)
Sexy Beast (2000)
Marie Antoinette (2006)
Rescuers Down Under, The (1990)
Bolt (2008)
Ugly Truth, The (2009)
Empire of the Sun (1987)
Porky's (1982)
Mystery Men (1999)
Great Escape, The (1963)
Transformers: Revenge of the Fallen (2009

In [38]:
#https://youtu.be/3ecNC-So0r4
#VIDEO TUTORIAL   
