#Introduction:
We have gathered here dataset from below Grouplens organisation which has information about large number of users and movies watched by those users. we are going to apply association rules to understand confidence and support based on that we can suggest user which movie can be recommended to watch based on history of usage availed by dataset. 
#Acknowledgement
Data is utilised from webpage
https://grouplens.org/datasets/movielens/latest/

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
#imports
import numpy as np
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
data = pd.read_csv('/content/drive/MyDrive/Rating1.csv')

In [4]:
data.head()

Unnamed: 0,userId,movieId,movie name
0,1,1,Toy Story (1995)
1,1,3,Grumpier Old Men (1995)
2,1,6,Heat (1995)
3,1,47,Seven (a.k.a. Se7en) (1995)
4,1,50,"Usual Suspects, The (1995)"


In [5]:
data['movieId'].nunique()

9724

In [6]:
data.shape

(100836, 3)

In [7]:
data['movie name'].value_counts()

Forrest Gump (1994)                   329
Shawshank Redemption, The (1994)      317
Pulp Fiction (1994)                   307
Silence of the Lambs, The (1991)      279
Matrix, The (1999)                    278
                                     ... 
Touch of Zen, A (Xia nu) (1971)         1
Redline (2009)                          1
Last Wave, The (1977)                   1
What Happened, Miss Simone? (2015)      1
Alex Cross (2012)                       1
Name: movie name, Length: 9719, dtype: int64

In [8]:
data.isnull().sum()

userId        0
movieId       0
movie name    0
dtype: int64

In [9]:
pd.set_option("display.max_rows", None)

In [10]:
data['movie name'].value_counts()

Forrest Gump (1994)                                                                                                                                               329
Shawshank Redemption, The (1994)                                                                                                                                  317
Pulp Fiction (1994)                                                                                                                                               307
Silence of the Lambs, The (1991)                                                                                                                                  279
Matrix, The (1999)                                                                                                                                                278
Star Wars: Episode IV - A New Hope (1977)                                                                                                                         251
Jura

In [11]:
user_count = pd.DataFrame(data.groupby('userId')['movie name'].count())

In [12]:
user_count.sort_values('movie name', ascending = False)

Unnamed: 0_level_0,movie name
userId,Unnamed: 1_level_1
414,2698
599,2478
474,2108
448,1864
274,1346
610,1302
68,1260
380,1218
606,1115
288,1055


In [13]:
user_count = user_count[user_count['movie name'] >= 50]

In [14]:
new_data = data[data['userId'].isin(user_count.index)]

In [15]:
new_data.shape

(93812, 3)

In [16]:
new_data.head()

Unnamed: 0,userId,movieId,movie name
0,1,1,Toy Story (1995)
1,1,3,Grumpier Old Men (1995)
2,1,6,Heat (1995)
3,1,47,Seven (a.k.a. Se7en) (1995)
4,1,50,"Usual Suspects, The (1995)"


In [17]:
new_data['movie_name']=new_data['movie name']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [18]:
new_data.head()

Unnamed: 0,userId,movieId,movie name,movie_name
0,1,1,Toy Story (1995),Toy Story (1995)
1,1,3,Grumpier Old Men (1995),Grumpier Old Men (1995)
2,1,6,Heat (1995),Heat (1995)
3,1,47,Seven (a.k.a. Se7en) (1995),Seven (a.k.a. Se7en) (1995)
4,1,50,"Usual Suspects, The (1995)","Usual Suspects, The (1995)"


In [19]:
new_data = new_data.drop('movie name',axis=1)

In [20]:
#look in one of the baskets, by grouping by userid and aggregating by movie_names (converetd into an array)

movie_baskets = new_data.groupby(['userId']).movie_name.apply(np.array).reset_index()

movie_baskets.head()

Unnamed: 0,userId,movie_name
0,1,"[Toy Story (1995), Grumpier Old Men (1995), He..."
1,4,"[Get Shorty (1995), Twelve Monkeys (a.k.a. 12 ..."
2,6,"[Jumanji (1995), Grumpier Old Men (1995), Wait..."
3,7,"[Toy Story (1995), Usual Suspects, The (1995),..."
4,10,"[Pulp Fiction (1994), Forrest Gump (1994), Ala..."


In [21]:
#use TransactioEncoder to create sparse matrix

te = TransactionEncoder()

te_ary = te.fit(movie_baskets['movie_name']).transform(movie_baskets['movie_name'])

te_ary

array([[False, False, False, ..., False,  True, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [ True, False, False, ...,  True, False, False]])

In [22]:
data_1 = pd.DataFrame(te_ary, columns = te.columns_)

data_1.head()

Unnamed: 0,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...All the Marbles (1981),...And Justice for All (1979),00 Schneider - Jagd auf Nihil Baxter (1994),10 (1979),10 Cent Pistol (2015),10 Cloverfield Lane (2016),10 Items or Less (2006),10 Things I Hate About You (1999),10 Years (2011),"10,000 BC (2008)",100 Girls (2000),100 Streets (2016),101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),101 Dalmatians II: Patch's London Adventure (2003),101 Reykjavik (101 Reykjavík) (2000),102 Dalmatians (2000),"10th Kingdom, The (2000)","10th Victim, The (La decima vittima) (1965)","11'09""01 - September 11 (2002)",11:14 (2003),12 Angry Men (1957),12 Angry Men (1997),12 Chairs (1971),12 Chairs (1976),12 Rounds (2009),12 Years a Slave (2013),127 Hours (2010),13 Assassins (Jûsan-nin no shikaku) (2010),13 Ghosts (1960),...,Zathura (2005),Zatoichi and the Chest of Gold (Zatôichi senryô-kubi) (Zatôichi 6) (1964),Zazie dans le métro (1960),Zebraman (2004),"Zed & Two Noughts, A (1985)",Zeitgeist: Addendum (2008),Zeitgeist: Moving Forward (2011),Zeitgeist: The Movie (2007),Zelary (2003),Zelig (1983),Zero Dark Thirty (2012),Zero Effect (1998),"Zero Theorem, The (2013)",Zero de conduite (Zero for Conduct) (Zéro de conduite: Jeunes diables au collège) (1933),Zeus and Roxanne (1997),Zipper (2015),Zodiac (2007),Zombeavers (2014),Zombie (a.k.a. Zombie 2: The Dead Are Among Us) (Zombi 2) (1979),Zombie Strippers! (2008),Zombieland (2009),Zone 39 (1997),"Zone, The (La Zona) (2007)",Zookeeper (2011),Zoolander (2001),Zoolander 2 (2016),Zoom (2006),Zoom (2015),Zootopia (2016),Zulu (1964),Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [25]:

frequent_items = apriori(data_1, min_support=0.2, use_colnames=True)

frequent_items.shape

(9195, 2)

In [26]:
frequent_items.head()

Unnamed: 0,support,itemsets
0,0.262338,(2001: A Space Odyssey (1968))
1,0.327273,(Ace Ventura: Pet Detective (1994))
2,0.212987,(Ace Ventura: When Nature Calls (1995))
3,0.218182,(Airplane! (1980))
4,0.394805,(Aladdin (1992))


In [27]:
#add length attribute
frequent_items['length'] = frequent_items['itemsets'].apply(lambda x: len(x))
frequent_items.head()

Unnamed: 0,support,itemsets,length
0,0.262338,(2001: A Space Odyssey (1968)),1
1,0.327273,(Ace Ventura: Pet Detective (1994)),1
2,0.212987,(Ace Ventura: When Nature Calls (1995)),1
3,0.218182,(Airplane! (1980)),1
4,0.394805,(Aladdin (1992)),1


In [28]:
#restrict data to length >= 2
frequent_items = frequent_items[(frequent_items['length'] >= 2)]
frequent_items = frequent_items.sort_values(by='support', ascending = False)
frequent_items

Unnamed: 0,support,itemsets,length
935,0.498701,"(Pulp Fiction (1994), Forrest Gump (1994))",2
943,0.490909,"(Shawshank Redemption, The (1994), Forrest Gum...",2
1641,0.467532,"(Shawshank Redemption, The (1994), Pulp Fictio...",2
1644,0.454545,"(Pulp Fiction (1994), Silence of the Lambs, Th...",2
921,0.446753,"(Matrix, The (1999), Forrest Gump (1994))",2
1817,0.441558,"(Star Wars: Episode IV - A New Hope (1977), St...",2
912,0.438961,"(Jurassic Park (1993), Forrest Gump (1994))",2
946,0.436364,"(Silence of the Lambs, The (1991), Forrest Gum...",2
1737,0.433766,"(Shawshank Redemption, The (1994), Silence of ...",2
1487,0.420779,"(Star Wars: Episode IV - A New Hope (1977), Ma...",2


In [29]:
frequent_itemsets = apriori(data_1, min_support=0.25 , use_colnames=True)



In [30]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold = 0.4)

In [31]:
rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x))

In [32]:
refined_rules = rules[ (rules['antecedent_len'] >= 2) & 
                      (rules['lift'] > 2) ]

In [33]:
#order by lift

refined_rules.sort_values(by='lift', ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
5038,"(Star Wars: Episode IV - A New Hope (1977), Lo...",(Star Wars: Episode V - The Empire Strikes Bac...,0.290909,0.280519,0.254545,0.875,3.119213,0.17294,5.755844,2
5043,(Star Wars: Episode V - The Empire Strikes Bac...,"(Star Wars: Episode IV - A New Hope (1977), Lo...",0.280519,0.290909,0.254545,0.907407,3.119213,0.17294,7.658182,2
5042,"(Lord of the Rings: The Two Towers, The (2002)...","(Star Wars: Episode IV - A New Hope (1977), Lo...",0.280519,0.296104,0.254545,0.907407,3.06449,0.171483,7.602078,2
5039,"(Star Wars: Episode IV - A New Hope (1977), Lo...","(Lord of the Rings: The Two Towers, The (2002)...",0.296104,0.280519,0.254545,0.859649,3.06449,0.171483,5.126299,2
5001,(Lord of the Rings: The Fellowship of the Ring...,"(Star Wars: Episode IV - A New Hope (1977), Lo...",0.290909,0.290909,0.251948,0.866071,2.977121,0.16732,5.294545,2
4996,"(Star Wars: Episode IV - A New Hope (1977), Lo...",(Lord of the Rings: The Fellowship of the Ring...,0.290909,0.290909,0.251948,0.866071,2.977121,0.16732,5.294545,2
4998,"(Lord of the Rings: The Two Towers, The (2002)...","(Star Wars: Episode IV - A New Hope (1977), Lo...",0.280519,0.306494,0.251948,0.898148,2.930399,0.165971,6.808973,2
4973,(Star Wars: Episode V - The Empire Strikes Bac...,"(Star Wars: Episode IV - A New Hope (1977), Lo...",0.280519,0.306494,0.251948,0.898148,2.930399,0.165971,6.808973,2
4968,"(Star Wars: Episode IV - A New Hope (1977), Lo...",(Star Wars: Episode V - The Empire Strikes Bac...,0.306494,0.280519,0.251948,0.822034,2.930399,0.165971,4.042795,2
4999,"(Star Wars: Episode IV - A New Hope (1977), Lo...","(Lord of the Rings: The Two Towers, The (2002)...",0.306494,0.280519,0.251948,0.822034,2.930399,0.165971,4.042795,2
