## **Item-Based Collaborative Filtering**

### **İş Problemi**

* Online bir film izleme platformu (örneğin kuzukuzu.tv) iş birlikçi filtreleme yöntemi ile bir öneri sistemi geliştirmek istemektedir. 

* İçerik temelli öneri sistemlerini deneyen şirket topluluğunun kanaatlerini barındıracak şekilde öneriler geliştirmek istemektedir. 

* Kullanıcıları bir filmi beğendiğinde o film ile benzer beğenilme örüntüsüne sahip olan diğer filmler önerilmektedir. 

### **Veri Seti Hikayesi**

* Veri seti MovieLens tarafından sağlanmıştır.

* İçerisinde filmler ve bu filmlere verilen puanları barındırmaktadır.

* Veri Seti yaklaşık 27000 film için yaklaşık 2.000.000 derecelendirme içermektedir. 

* Veri seti: https://grouplens.org/datasets/movielens/

#### **Adım 1: Veri Setinin Hazırlanması**

In [None]:
import pandas as pd
pd.set_option('display.max_columns', 500)
movie = pd.read_csv('/content/drive/MyDrive/DSMLBC10/week_7 (10.11.22-16.11.22)/datasets/movie_lens_dataset/movie.csv')
rating = pd.read_csv('/content/drive/MyDrive/DSMLBC10/week_7 (10.11.22-16.11.22)/datasets/movie_lens_dataset/rating.csv')
df = movie.merge(rating, how="left", on="movieId")
df.head()

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.0,4.0,1999-12-11 13:36:47
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,6.0,5.0,1997-03-13 17:50:52
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,8.0,4.0,1996-06-05 13:37:51
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,10.0,4.0,1999-11-25 02:44:47
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,11.0,4.5,2009-01-02 01:13:41


#### **Adım 2: User Movie Df'inin Oluşturulması**

In [None]:
df.shape

(20000797, 6)

In [None]:
df["title"].nunique()

27262

In [None]:
df["title"].value_counts().head()

Pulp Fiction (1994)                 67310
Forrest Gump (1994)                 66172
Shawshank Redemption, The (1994)    63366
Silence of the Lambs, The (1991)    63299
Jurassic Park (1993)                59715
Name: title, dtype: int64

In [None]:
comment_counts = pd.DataFrame(df["title"].value_counts())
rare_movies = comment_counts[comment_counts["title"] <= 10000].index
common_movies = df[~df["title"].isin(rare_movies)]
common_movies.shape

(9050403, 6)

In [None]:
common_movies["title"].nunique()

462

In [None]:
df["title"].nunique()

27262

In [None]:
user_movie_df = common_movies.pivot_table(index=["userId"], columns=["title"], values="rating")
user_movie_df.shape

(137658, 462)

In [None]:
user_movie_df.columns

Index(['10 Things I Hate About You (1999)', '12 Angry Men (1957)',
       '2001: A Space Odyssey (1968)', '28 Days Later (2002)', '300 (2007)',
       'A.I. Artificial Intelligence (2001)', 'Abyss, The (1989)',
       'Ace Ventura: Pet Detective (1994)',
       'Ace Ventura: When Nature Calls (1995)', 'Addams Family Values (1993)',
       ...
       'Wild Wild West (1999)', 'William Shakespeare's Romeo + Juliet (1996)',
       'Willy Wonka & the Chocolate Factory (1971)', 'Witness (1985)',
       'Wizard of Oz, The (1939)', 'X-Files: Fight the Future, The (1998)',
       'X-Men (2000)', 'X2: X-Men United (2003)', 'You've Got Mail (1998)',
       'Young Frankenstein (1974)'],
      dtype='object', name='title', length=462)

#### **Adım 3: Item-Based Film Önerilerinin Yapılması**

In [None]:
movie_name = "Matrix, The (1999)"

In [None]:
movie_name = "X-Men (2000)"

In [None]:
movie_name = user_movie_df[movie_name]

In [None]:
user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)

title
X-Men (2000)                                                     1.000000
X2: X-Men United (2003)                                          0.716946
Spider-Man (2002)                                                0.492376
Iron Man (2008)                                                  0.458369
Spider-Man 2 (2004)                                              0.422594
Blade (1998)                                                     0.395497
Men in Black (a.k.a. MIB) (1997)                                 0.394806
Pirates of the Caribbean: The Curse of the Black Pearl (2003)    0.383056
Mummy, The (1999)                                                0.376553
Batman Begins (2005)                                             0.375067
dtype: float64

In [None]:
movie_name = pd.Series(user_movie_df.columns).sample(1).values[0]
movie_name = user_movie_df[movie_name]
user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)

title
True Romance (1993)            1.000000
Reservoir Dogs (1992)          0.339518
Scarface (1983)                0.326247
Pulp Fiction (1994)            0.323993
Natural Born Killers (1994)    0.294575
From Dusk Till Dawn (1996)     0.293865
Kill Bill: Vol. 2 (2004)       0.286415
Sin City (2005)                0.285757
Desperado (1995)               0.284598
Kill Bill: Vol. 1 (2003)       0.275683
dtype: float64

In [None]:
def check_film(keyword, user_movie_df):
    return [col for col in user_movie_df.columns if keyword in col]

check_film("Insomnia", user_movie_df)

[]

#### **Adım 4: Çalışma Scriptinin Hazırlanması**

In [None]:
def create_user_movie_df():
    import pandas as pd
    movie = pd.read_csv('/content/drive/MyDrive/DSMLBC10/week_7 (10.11.22-16.11.22)/datasets/movie_lens_dataset/movie.csv')
    rating = pd.read_csv('/content/drive/MyDrive/DSMLBC10/week_7 (10.11.22-16.11.22)/datasets/movie_lens_dataset/rating.csv')
    df = movie.merge(rating, how="left", on="movieId")
    comment_counts = pd.DataFrame(df["title"].value_counts())
    rare_movies = comment_counts[comment_counts["title"] <= 10000].index
    common_movies = df[~df["title"].isin(rare_movies)]
    user_movie_df = common_movies.pivot_table(index=["userId"], columns=["title"], values="rating")
    return user_movie_df

user_movie_df = create_user_movie_df()

In [None]:
def item_based_recommender(movie_name, user_movie_df):
    movie_name = user_movie_df[movie_name]
    return user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)

In [None]:
item_based_recommender("Matrix, The (1999)", user_movie_df)

title
Matrix, The (1999)                                           1.000000
Matrix Reloaded, The (2003)                                  0.516906
Matrix Revolutions, The (2003)                               0.449588
Blade (1998)                                                 0.334493
Terminator 2: Judgment Day (1991)                            0.333882
Minority Report (2002)                                       0.332434
Mission: Impossible (1996)                                   0.320815
Lord of the Rings: The Fellowship of the Ring, The (2001)    0.318726
Lord of the Rings: The Two Towers, The (2002)                0.318086
Lord of the Rings: The Return of the King, The (2003)        0.314241
dtype: float64

In [None]:
movie_name = pd.Series(user_movie_df.columns).sample(1).values[0]

In [None]:
item_based_recommender(movie_name, user_movie_df)

title
Remains of the Day, The (1993)       1.000000
Sense and Sensibility (1995)         0.398029
Little Women (1994)                  0.310405
Talented Mr. Ripley, The (1999)      0.307695
Postman, The (Postino, Il) (1994)    0.296924
Piano, The (1993)                    0.292561
Crying Game, The (1992)              0.291792
Gandhi (1982)                        0.287449
Much Ado About Nothing (1993)        0.286938
Quiz Show (1994)                     0.281122
dtype: float64