# Introduction
Recommender Systems:
1. User Based Recommender Systems
1. Item Based Recommender Systems

<br>What is recommender System?
   * Based on previous(past) behaviours, it predicts the likelihood that a user would prefer an item.
   * For example, Netflix uses recommendation system. It suggest people new movies according to their past activities that are like watching and voting movies.
   * The purpose of recommender systems is recommending new things that are not seen before from people.
   
<br>
1. User Based Collaborative Filtering
    * Collaborative filtering is making recommend according to combination of your experience and experiences of other people.
    * First we need to make user vs item matrix.
        * Each row is users and each columns are items like movie, product or websites
    * Secondly, computes similarity scores between users.
        * Each row is users and each row is vector.
        * Compute similarity of these rows (users).
    * Thirdly, find users who are similar to you based on past behaviours
    * Finally, it suggests that you are not experienced before.
    * Lets make an example of user based collaborative filtering
        * Think that there are two people
        * First one watched 2 movies that are lord of the rings and hobbit
        * Second one watched only lord of the rings movie
        * User based collaborative filtering computes similarity of these two people and sees both are watched a lord of the rings.
        * Then it recommends hobbit movie to second one as it can be seen picture
        *<a href="https://ibb.co/droZMy"><img src="https://preview.ibb.co/feq3EJ/resim_a.jpg" alt="resim_a" border="0"></a>
        
    * User based collaborative filtering has some problems
        * In this system, each row of matrix is user. Therefore, comparing and finding similarity between of them is computationaly hard and spend too much computational power.
        * Also, habits of people can be changed. Therefore making correct and useful recommendation can be hard in time.
    * In order to solve these problems, lets look at another recommender system that is item based collaborative filtering
1. Item Based Collaborative Filtering
    * In this system, instead of finding relationship between users, used items like movies or stuffs are compared with each others.
    * In user based recommendation systems, habits of users can be changed. This situation makes hard to recommendation. However, in item based recommendation systems, movies or stuffs does not change. Therefore recommendation is easier.
    * On the other hand, there are almost 7 billion people all over the world. Comparing people increases the computational power. However, if items are compared, computational power is less.
    * In item based recommendation systems, we need to make user vs item matrix that we use also in user based recommender systems.
        * Each row is user and each column is items like movie, product or websites.
        * However, at this time instead of calculating similarity between rows, we need to calculate similarity between columns that are items like movies or stuffs.
    * Lets look at how it is works.
        * Firstly, there are similarities between lord of the rings and hobbit movies because both are liked by three different people. There is a similarity point between these two movies.
        * If the similarity is high enough, we can recommend hobbit to other people who only watched lord of the rings movie as it can be seen in figure below.
        *<a href="https://imgbb.com/"><img src="https://image.ibb.co/maEQdd/resim_b.jpg" alt="resim_b" border="0"></a>




In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

['tag.csv', 'movie.csv', 'link.csv', 'genome_scores.csv', 'rating.csv', 'genome_tags.csv']


In [2]:
# import movie data set and look at columns
movie = pd.read_csv("../input/movie.csv")
movie.columns

Index(['movieId', 'title', 'genres'], dtype='object')

In [3]:
# what we need is that movie id and title
movie = movie.loc[:,["movieId","title"]]
movie.head(10)

Unnamed: 0,movieId,title
0,1,Toy Story (1995)
1,2,Jumanji (1995)
2,3,Grumpier Old Men (1995)
3,4,Waiting to Exhale (1995)
4,5,Father of the Bride Part II (1995)
5,6,Heat (1995)
6,7,Sabrina (1995)
7,8,Tom and Huck (1995)
8,9,Sudden Death (1995)
9,10,GoldenEye (1995)


In [4]:
# import rating data and look at columsn
rating = pd.read_csv("../input/rating.csv")
rating.columns

Index(['userId', 'movieId', 'rating', 'timestamp'], dtype='object')

In [5]:
# what we need is that user id, movie id and rating
rating = rating.loc[:,["userId","movieId","rating"]]
rating.head(10)

Unnamed: 0,userId,movieId,rating
0,1,2,3.5
1,1,29,3.5
2,1,32,3.5
3,1,47,3.5
4,1,50,3.5
5,1,112,3.5
6,1,151,4.0
7,1,223,4.0
8,1,253,4.0
9,1,260,4.0


In [6]:
# then merge movie and rating data
data = pd.merge(movie,rating)

In [7]:
# now lets look at our data 
data.head(10)

Unnamed: 0,movieId,title,userId,rating
0,1,Toy Story (1995),3,4.0
1,1,Toy Story (1995),6,5.0
2,1,Toy Story (1995),8,4.0
3,1,Toy Story (1995),10,4.0
4,1,Toy Story (1995),11,4.5
5,1,Toy Story (1995),12,4.0
6,1,Toy Story (1995),13,4.0
7,1,Toy Story (1995),14,4.5
8,1,Toy Story (1995),16,3.0
9,1,Toy Story (1995),19,5.0


* As it can be seen data frame that is above, we have 4 features that are movie id, title user id and rating
* According to these data frame, we will make item based recommendation system
* Lets look at shape of the data. The number of sample in data frame is 20 million that is too much. There can be problem in kaggle even if their own desktop ide's like spyder or pycharm.
* Therefore, in order to learn item based recommendation system lets use 1 million of sample in data.

In [8]:
data.shape

(20000263, 4)

In [9]:
data = data.iloc[:1000000,:]

In [10]:
# lets make a pivot table in order to make rows are users and columns are movies. And values are rating
pivot_table = data.pivot_table(index = ["userId"],columns = ["title"],values = "rating")
pivot_table.head(10)

title,Ace Ventura: When Nature Calls (1995),Across the Sea of Time (1995),"Amazing Panda Adventure, The (1995)","American President, The (1995)",Angela (1995),Angels and Insects (1995),Anne Frank Remembered (1995),Antonia's Line (Antonia) (1995),Assassins (1995),Babe (1995),Bad Boys (1995),Balto (1995),"Basketball Diaries, The (1995)",Beautiful Girls (1996),Bed of Roses (1996),Before and After (1996),Big Bully (1996),"Big Green, The (1995)",Bio-Dome (1996),"Birdcage, The (1996)",Black Sheep (1996),Boomerang (1992),Bottle Rocket (1996),"Boys of St. Vincent, The (1992)",Braveheart (1995),"Bridges of Madison County, The (1995)",Broken Arrow (1996),"Brothers McMullen, The (1995)",Carrington (1995),Casino (1995),Catwalk (1996),Chungking Express (Chung Hing sam lam) (1994),City Hall (1996),"City of Lost Children, The (Cité des enfants perdus, La) (1995)",Clueless (1995),"Confessional, The (Confessionnal, Le) (1995)",Copycat (1995),"Crossing Guard, The (1995)","Cry, the Beloved Country (1995)",Cutthroat Island (1995),...,Persuasion (1995),Pie in the Sky (1996),Pocahontas (1995),"Postman, The (Postino, Il) (1994)",Powder (1995),Race the Sun (1996),Restoration (1995),Richard III (1995),Rumble in the Bronx (Hont faan kui) (1995),Sabrina (1995),Screamers (1995),Sense and Sensibility (1995),Seven (a.k.a. Se7en) (1995),Shadows (Cienie) (1988),Shanghai Triad (Yao a yao yao dao waipo qiao) (1995),Shopping (1994),"Silences of the Palace, The (Saimt el Qusur) (1994)",Sonic Outlaws (1995),"Star Maker, The (Uomo delle stelle, L') (1995)","Steal Big, Steal Little (1995)",Sudden Death (1995),Target (1995),Taxi Driver (1976),Things to Do in Denver When You're Dead (1995),To Die For (1995),Tom and Huck (1995),Toy Story (1995),Twelve Monkeys (a.k.a. 12 Monkeys) (1995),Two Bits (1995),Two if by Sea (1996),Unforgettable (1996),Up Close and Personal (1996),"Usual Suspects, The (1995)",Vampire in Brooklyn (1995),Waiting to Exhale (1995),When Night Is Falling (1995),"White Balloon, The (Badkonake sefid) (1995)",White Squall (1996),Wings of Courage (1995),"Young Poisoner's Handbook, The (1995)"
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.5,,,,,,,...,,,,,,,,,3.5,,,,3.5,,,,,,,,,,,,,,,3.5,,,,,3.5,,,,,,,
2,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,4.0,4.0,,,,,5.0,,,,,,,
4,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,,,,,
5,,,,5.0,,,,,,,,,,,,,,,,5.0,,,,,4.0,,,,,,,,,,,,,,,,...,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,
6,,,,,,,,,,,,,,,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,5.0,,5.0,,,,,,,,,,,,,,,5.0,,,,,4.0,,,,,,,,
7,,,,4.0,,,,,,,,,,,,,,,,,,2.0,,,,2.0,,,,3.0,,,,,,,,,,2.0,...,,,,,3.0,,,,,3.0,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,1.0,,,,,,,,,,,,,,,,,,,,,,,,5.0,,,,,,,,,,3.0,,,,,,...,,,4.0,,,,,,,,,,5.0,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,
10,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,
11,3.5,,,,,,,,,,3.0,,,,,,,,2.0,,,,,,4.0,,,,,,,,,,4.5,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,4.5,5.0,,,,,,,,,,,,


* As it can be seen from table above, rows are users, columns are movies and values are ratings
* For example user 11 gives 3.5 rating to movie "Ace Ventura: When Nature Calls (1995)" and gives 3.0 rating to movie "Bad Boys (1995)".
* Now lets make a scenario, we have movie web site and "Bad Boys (1995)" movie are watched and rated by people. The question is that which movie do we recommend these people who watched "Bad Boys (1995)" movie.
* In order to answer this question we will find similarities between "Bad Boys (1995)" movie and other movies.

In [11]:
movie_watched = pivot_table["Bad Boys (1995)"]
similarity_with_other_movies = pivot_table.corrwith(movie_watched)  # find correlation between "Bad Boys (1995)" and other movies
similarity_with_other_movies = similarity_with_other_movies.sort_values(ascending=False)
similarity_with_other_movies.head()

title
Bad Boys (1995)                        1.000000
Headless Body in Topless Bar (1995)    0.723747
Last Summer in the Hamptons (1995)     0.607554
Two Bits (1995)                        0.507008
Shadows (Cienie) (1988)                0.494186
dtype: float64

* It can be concluded that we need to recommend "Headless Body in Topless Bar (1995)" movie to people who watched "Bad Boys (1995)".
* On the other hand even if we do not consider, number of rating for each movie is also important.

# Conclusion
What we learn is that
* User based recommentation systems
* Item based recommentation systems
* How to find correlation or similarity between two vectors
* Then we make very basic movie recommendation system.
* **If you have any question I will be happy to hear it.**