# An Introduction to Association Rules in Python

##### Association rules is a rule-based learning method used to draw frequent patterns and correlations from datasets such as transactional and relational data.

##### In essence it computes the co-occurence statistics between items, in the form of an implication expression (X → Y).

##### For instance, in customer basket analysis, {diaper} → {beer} means if diaper is bought, then beer is put into basket.

#### 4 fundamental concepts in association rules:

* *(Not a Rule)* Support: number of times X occurs over all instances.

* Support(X→Y) is the probability of co-occurence of both items within all data.

* Confidence(X→Y) is the probability of Y occurs given that X is present.

* Lift(X→Y) is the probability of Y being bought given that X is present, taking into account the popularity of Y as well.

* Conviction(X→Y) is the measure of implication. A value > 1 indicates that Y is highly depending on X.

So basically it is probability/statistics. A simple but useful decision making tool for a wide range of usages such as market basket analysis, customer relationship management, recommender system, marketing activities, network traffic analysis, intrusion detection (fraud & malware detection) and bioinformatics.


# Example 1

### Before getting into the formnulas and terminology, let's begin by a simple example.

Mlxtend is a rich and useful library for machine learning. It provides methods in association rules with a major algorithm *apriori*.

You can install mlxtend via pip or conda.

In [1]:
!pip install mlxtend



In [2]:
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

To use association rules, first we neeed some data in one-hot encoded format.

Imagine in a grocery database, there are order id with some products...

In [39]:
data = {'ID':[1,2,3,4,5,6],
       'Onion':[1,0,0,1,1,1],
       'Potato':[1,1,0,1,1,1],
       'Burger':[1,1,0,0,1,1],
       'Milk':[0,1,1,1,0,1],
       'Beer':[0,0,1,0,1,0]}

  and should_run_async(code)


In [41]:
data['ID']

  and should_run_async(code)


[1, 2, 3, 4, 5, 6]

In [42]:
df = pd.DataFrame(data)
df.head()

  and should_run_async(code)


Unnamed: 0,ID,Onion,Potato,Burger,Milk,Beer
0,1,1,1,1,0,0
1,2,0,1,1,1,0
2,3,0,0,0,1,1
3,4,1,1,0,1,0
4,5,1,1,1,0,1


In [43]:
df = df[['ID', 'Onion', 'Potato', 'Burger', 'Milk', 'Beer' ]]

  and should_run_async(code)


In [44]:
df

  and should_run_async(code)


Unnamed: 0,ID,Onion,Potato,Burger,Milk,Beer
0,1,1,1,1,0,0
1,2,0,1,1,1,0
2,3,0,0,0,1,1
3,4,1,1,0,1,0
4,5,1,1,1,0,1
5,6,1,1,1,1,0


### Then, we can generate frequent itemsets based on *support*.

Here we need to set the minimum support value between [0,1]. Using min_supp = 50% means we only want itemsets that co-occur more than half of the time.

`apriori(df, min_support=0.5, use_colnames=False, max_len=None)`

In [50]:
frequent_itemsets = apriori(df[['Onion', 'Potato', 'Burger', 'Milk', 'Beer' ]], min_support=0.5, use_colnames=True)

  and should_run_async(code)


In [51]:
frequent_itemsets

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.666667,(Onion)
1,0.833333,(Potato)
2,0.666667,(Burger)
3,0.666667,(Milk)
4,0.666667,"(Onion, Potato)"
5,0.5,"(Onion, Burger)"
6,0.666667,"(Potato, Burger)"
7,0.5,"(Milk, Potato)"
8,0.5,"(Onion, Potato, Burger)"


Itemsets with 1, 2 or 3 items are returned, with support > 0.5

The only itemset with 3 products is [Onion, Potato, Burger].

### Final Step: generate the rules with their corresponding support, confidence and lift, (and leverage & conviction):

```association_rules(df, metric='confidence', min_threshold=0.8)```

* Here, df means the frequent_itemsets dataframe;

* metrics is the parameters to consider if there is association. You can set it to one of the five metrics.

* min_threshold is the mininum value for the specified metrics.

In [52]:
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1)

  and should_run_async(code)


In [53]:
df

  and should_run_async(code)


Unnamed: 0,ID,Onion,Potato,Burger,Milk,Beer
0,1,1,1,1,0,0
1,2,0,1,1,1,0
2,3,0,0,0,1,1
3,4,1,1,0,1,0
4,5,1,1,1,0,1
5,6,1,1,1,1,0


In [54]:
rules

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Onion),(Potato),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf,0.5
1,(Potato),(Onion),0.833333,0.666667,0.666667,0.8,1.2,0.111111,1.666667,1.0
2,(Onion),(Burger),0.666667,0.666667,0.5,0.75,1.125,0.055556,1.333333,0.333333
3,(Burger),(Onion),0.666667,0.666667,0.5,0.75,1.125,0.055556,1.333333,0.333333
4,(Potato),(Burger),0.833333,0.666667,0.666667,0.8,1.2,0.111111,1.666667,1.0
5,(Burger),(Potato),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf,0.5
6,"(Onion, Potato)",(Burger),0.666667,0.666667,0.5,0.75,1.125,0.055556,1.333333,0.333333
7,"(Onion, Burger)",(Potato),0.5,0.833333,0.5,1.0,1.2,0.083333,inf,0.333333
8,"(Potato, Burger)",(Onion),0.666667,0.666667,0.5,0.75,1.125,0.055556,1.333333,0.333333
9,(Onion),"(Potato, Burger)",0.666667,0.666667,0.5,0.75,1.125,0.055556,1.333333,0.333333


### Intrepreting the result:

We can see that there are quite a few rules with a high lift value which means that it occurs more frequently than would be expected given the number of transaction and product combinations.

Several are high in confidence as well. But domain knowledge will be useful in explaining the phenomenon.

In [55]:
rules [ (rules['lift'] >1.125)  & (rules['confidence']> 0.7)  ]

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Onion),(Potato),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf,0.5
1,(Potato),(Onion),0.833333,0.666667,0.666667,0.8,1.2,0.111111,1.666667,1.0
4,(Potato),(Burger),0.833333,0.666667,0.666667,0.8,1.2,0.111111,1.666667,1.0
5,(Burger),(Potato),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf,0.5
7,"(Onion, Burger)",(Potato),0.5,0.833333,0.5,1.0,1.2,0.083333,inf,0.333333


Subsetting the lift and confidence values return you with the itemsets that are relatively highly correlated in this data.

We can see that:

* **If Onion or Burger is in a users' basket, it is highly likely that the user will buy Potato as well.**
* **If Burger and Onion is in a users' basket, it is highly likely that the user will also buy Potato.**

### Some notes on Lift, Conviction & Leverage:


1.  Lift(X→Y) : the likelihood of Y being bought when X is present, taking into account the popularity of Y as well.
    > When Lift=1,  X makes no impact on Y  
    > When Lift>1, there is a relationship between X & Y
2.  Conviction(X→Y): Conviction is a measure of the implication and has value 1 if items are unrelated.
    > A high conviction value means that the consequent is highly depending on the antecedent. For instance, in the case of a perfect confidence score, the denominator becomes 0 (due to 1 - 1) for which the conviction score is defined as 'inf'. Similar to lift, if items are independent, the conviction is 1.
3.  Leverage(X→Y): the difference between the observed frequency of X and Y appearing together and the frequency that would be expected if X and Y were independent. An leverage value of 0 indicates independence.

# Example 2

In [56]:
retail_shopping_basket = {'ID':[1,2,3,4,5,6],
                         'Basket':[['Beer', 'Diaper', 'Pretzels', 'Chips', 'Aspirin'],
                                   ['Diaper', 'Beer', 'Chips', 'Lotion', 'Juice', 'BabyFood', 'Milk'],
                                   ['Soda', 'Chips', 'Milk'],
                                   ['Soup', 'Beer', 'Diaper', 'Milk', 'IceCream'],
                                   ['Soda', 'Coffee', 'Milk', 'Bread'],
                                   ['Beer', 'Chips']
                                  ]
                         }

  and should_run_async(code)


In [57]:
retail = pd.DataFrame(retail_shopping_basket)
retail

  and should_run_async(code)


Unnamed: 0,ID,Basket
0,1,"[Beer, Diaper, Pretzels, Chips, Aspirin]"
1,2,"[Diaper, Beer, Chips, Lotion, Juice, BabyFood, Milk]"
2,3,"[Soda, Chips, Milk]"
3,4,"[Soup, Beer, Diaper, Milk, IceCream]"
4,5,"[Soda, Coffee, Milk, Bread]"
5,6,"[Beer, Chips]"


  and should_run_async(code)


In [58]:
retail = retail[['ID', 'Basket']]

  and should_run_async(code)


In [16]:
pd.options.display.max_colwidth=100

  and should_run_async(code)


Suppose we have a list of customer ids to a list of basket items:

In [59]:
retail

  and should_run_async(code)


Unnamed: 0,ID,Basket
0,1,"[Beer, Diaper, Pretzels, Chips, Aspirin]"
1,2,"[Diaper, Beer, Chips, Lotion, Juice, BabyFood, Milk]"
2,3,"[Soda, Chips, Milk]"
3,4,"[Soup, Beer, Diaper, Milk, IceCream]"
4,5,"[Soda, Coffee, Milk, Bread]"
5,6,"[Beer, Chips]"


First one-hot encode the basket, but how?

In [60]:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
retail = pd.DataFrame(mlb.fit_transform(retail.Basket), columns=mlb.classes_)

  and should_run_async(code)


In [61]:
retail

  and should_run_async(code)


Unnamed: 0,Aspirin,BabyFood,Beer,Bread,Chips,Coffee,Diaper,IceCream,Juice,Lotion,Milk,Pretzels,Soda,Soup
0,1,0,1,0,1,0,1,0,0,0,0,1,0,0
1,0,1,1,0,1,0,1,0,1,1,1,0,0,0
2,0,0,0,0,1,0,0,0,0,0,1,0,1,0
3,0,0,1,0,0,0,1,1,0,0,1,0,0,1
4,0,0,0,1,0,1,0,0,0,0,1,0,1,0
5,0,0,1,0,1,0,0,0,0,0,0,0,0,0


Making use of `Series.str.get_dummies`, we can easily encode lists of items in a dataframe's column!

In [64]:
frequent_itemsets_2 = apriori(retail, min_support=0.5,use_colnames=True)
frequent_itemsets_2

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.666667,(Beer)
1,0.666667,(Chips)
2,0.5,(Diaper)
3,0.666667,(Milk)
4,0.5,"(Beer, Chips)"
5,0.5,"(Beer, Diaper)"


Just by calculating the support(X>Y), [Beer, Chips] & [Beer, Diaper] are the two frequent basket of intereseted.

But which one is more correlated than the other?

In [67]:
association_rules(frequent_itemsets_2, metric='lift',min_threshold=1)

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Beer),(Chips),0.666667,0.666667,0.5,0.75,1.125,0.055556,1.333333,0.333333
1,(Chips),(Beer),0.666667,0.666667,0.5,0.75,1.125,0.055556,1.333333,0.333333
2,(Beer),(Diaper),0.666667,0.5,0.5,0.75,1.5,0.166667,2.0,1.0
3,(Diaper),(Beer),0.5,0.666667,0.5,1.0,1.5,0.166667,inf,0.666667



Clearly, {Diaper, Beer} is the most associated itemset in this data!

# Example 3 - Movie Genre Associations

It seems a bit boring playing only with basket analysis and imaginary datasets.

In this example, let's play with an open dataset [MovieLens (small)](https://grouplens.org/datasets/movielens/).

This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100004 ratings and 1296 tag applications across 9125 movies. These data were created by 671 users between January 09, 1995 and October 16, 2016.

Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.

We might want to take a look at the data and look at the stat first:

In [68]:
from google.colab import drive
drive.mount('/content/drive')

movies = pd.read_csv("/content/drive/My Drive/Colab Notebooks/Intellipaat/Weekend_ML_Batch/Association_Rule/movies.csv")

# movies = pd.read_csv('ml-latest-small/movies.csv')

  and should_run_async(code)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [70]:
movies.head(10)

  and should_run_async(code)


Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
5,6,Heat (1995),Action|Crime|Thriller
6,7,Sabrina (1995),Comedy|Romance
7,8,Tom and Huck (1995),Adventure|Children
8,9,Sudden Death (1995),Action
9,10,GoldenEye (1995),Action|Adventure|Thriller


In [71]:
movies_ohe = movies.drop('genres',1).join(movies.genres.str.get_dummies())

  and should_run_async(code)
  movies_ohe = movies.drop('genres',1).join(movies.genres.str.get_dummies())


In [72]:
movies_ohe.head()

  and should_run_async(code)


Unnamed: 0,movieId,title,(no genres listed),Action,Adventure,Animation,Children,Comedy,Crime,Documentary,...,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,Toy Story (1995),0,0,1,1,1,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2,Jumanji (1995),0,0,1,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,3,Grumpier Old Men (1995),0,0,0,0,0,1,0,0,...,0,0,0,0,0,1,0,0,0,0
3,4,Waiting to Exhale (1995),0,0,0,0,0,1,0,0,...,0,0,0,0,0,1,0,0,0,0
4,5,Father of the Bride Part II (1995),0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


In [73]:
movies_ohe.shape

  and should_run_async(code)


(9125, 22)

### Let's get back to analysing the genre associations:

In [74]:
movies_ohe.set_index(['movieId','title'],inplace=True)
movies_ohe

  and should_run_async(code)


Unnamed: 0_level_0,Unnamed: 1_level_0,(no genres listed),Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
movieId,title,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,Toy Story (1995),0,0,1,1,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0
2,Jumanji (1995),0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,Grumpier Old Men (1995),0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,Waiting to Exhale (1995),0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0
5,Father of the Bride Part II (1995),0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
162672,Mohenjo Daro (2016),0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0
163056,Shin Godzilla (2016),0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0
163949,The Beatles: Eight Days a Week - The Touring Years (2016),0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
164977,The Gay Desperado (1936),0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [76]:
movies_ohe.Comedy.value_counts(normalize=True)

  and should_run_async(code)


0    0.636712
1    0.363288
Name: Comedy, dtype: float64

In [77]:
movies_ohe.shape

  and should_run_async(code)


(9125, 20)

In [82]:
frequent_itemsets_movies = apriori(movies_ohe,use_colnames=True,min_support=0.05)

  and should_run_async(code)


In [83]:
frequent_itemsets_movies

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.169315,(Action)
1,0.122411,(Adventure)
2,0.06389,(Children)
3,0.363288,(Comedy)
4,0.120548,(Crime)
5,0.054247,(Documentary)
6,0.478356,(Drama)
7,0.071671,(Fantasy)
8,0.09611,(Horror)
9,0.059507,(Mystery)


In [84]:
rules_movies =  association_rules(frequent_itemsets_movies, metric='lift', min_threshold=1)

  and should_run_async(code)


In [85]:
rules_movies.sort_values('lift',ascending=False)

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Adventure),(Action),0.122411,0.169315,0.058301,0.476276,2.812955,0.037575,1.586111,0.734401
1,(Action),(Adventure),0.169315,0.122411,0.058301,0.344337,2.812955,0.037575,1.338475,0.775868
8,(Crime),(Thriller),0.120548,0.189479,0.057863,0.48,2.533256,0.035022,1.558693,0.688214
9,(Thriller),(Crime),0.189479,0.120548,0.057863,0.305379,2.533256,0.035022,1.266089,0.746744
3,(Action),(Thriller),0.169315,0.189479,0.062904,0.371521,1.960746,0.030822,1.289654,0.589863
2,(Thriller),(Action),0.189479,0.169315,0.062904,0.331984,1.960746,0.030822,1.24351,0.604537
4,(Comedy),(Romance),0.363288,0.169315,0.090082,0.247964,1.464511,0.028572,1.104581,0.49815
5,(Romance),(Comedy),0.169315,0.363288,0.090082,0.532039,1.464511,0.028572,1.360609,0.381827
10,(Drama),(Romance),0.478356,0.169315,0.10126,0.211684,1.250236,0.020267,1.053746,0.383693
11,(Romance),(Drama),0.169315,0.478356,0.10126,0.598058,1.250236,0.020267,1.29781,0.240947


***As we can see in this dataset, the support and hence confidence values are fairly small. This makes it difficult interpreting the result based on these two values. Whereas, the lift and conviction remains to very intuitive and representative. That is why we should understand the meaning of all of the 5 metrics to accurately interpret the result!***

In [86]:
rules_movies[(rules_movies.lift>2)]

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Adventure),(Action),0.122411,0.169315,0.058301,0.476276,2.812955,0.037575,1.586111,0.734401
1,(Action),(Adventure),0.169315,0.122411,0.058301,0.344337,2.812955,0.037575,1.338475,0.775868
8,(Crime),(Thriller),0.120548,0.189479,0.057863,0.48,2.533256,0.035022,1.558693,0.688214
9,(Thriller),(Crime),0.189479,0.120548,0.057863,0.305379,2.533256,0.035022,1.266089,0.746744


* As we are expecting the {Romance, Drama} pair, it is not as correlated as other groups such as {Animation, Childres} which has a much higher lift & conviction levels.

In [90]:
rules_movies[(rules_movies.lift>2)].sort_values(by=['lift','confidence'], ascending=[False,False])

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Adventure),(Action),0.122411,0.169315,0.058301,0.476276,2.812955,0.037575,1.586111,0.734401
1,(Action),(Adventure),0.169315,0.122411,0.058301,0.344337,2.812955,0.037575,1.338475,0.775868
8,(Crime),(Thriller),0.120548,0.189479,0.057863,0.48,2.533256,0.035022,1.558693,0.688214
9,(Thriller),(Crime),0.189479,0.120548,0.057863,0.305379,2.533256,0.035022,1.266089,0.746744


By making a subset with ordering with lift & conviction:

* The highest correlation: {Animation, Childres} correlates in both directions! Recall those Pixar & Disney films that we love watching
* {Children, Adventure} ...
* {Fantasy, Adventure} ... How to interpret these two pairs?

The best way is to go back to your movies table and check it out!

In [95]:
# pd.options.display.max_rows=50

  and should_run_async(code)


In [97]:
movies.head()

  and should_run_async(code)


Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


So we want Adventure & Children but NOT Animation...

In [91]:
movies[ (movies.genres.str.contains('Action')) & (~movies.genres.str.contains('Adventure'))]

  and should_run_async(code)


Unnamed: 0,movieId,title,genres
5,6,Heat (1995),Action|Crime|Thriller
8,9,Sudden Death (1995),Action
19,20,Money Train (1995),Action|Comedy|Crime|Drama|Thriller
22,23,Assassins (1995),Action|Crime|Thriller
40,42,Dead Presidents (1995),Action|Crime|Drama
...,...,...,...
9093,159093,Now You See Me 2 (2016),Action|Comedy|Thriller
9099,160080,Ghostbusters (2016),Action|Comedy|Horror|Sci-Fi
9100,160271,Central Intelligence (2016),Action|Comedy
9101,160438,Jason Bourne (2016),Action


In [93]:
movies[ (movies.genres.str.contains('Action')) & (movies.genres.str.contains('Adventure'))]

  and should_run_async(code)


Unnamed: 0,movieId,title,genres
9,10,GoldenEye (1995),Action|Adventure|Thriller
14,15,Cutthroat Island (1995),Action|Adventure|Romance
42,44,Mortal Kombat (1995),Action|Adventure|Fantasy
80,86,White Squall (1996),Action|Adventure|Drama
87,95,Broken Arrow (1996),Action|Adventure|Thriller
...,...,...,...
9095,159690,Teenage Mutant Ninja Turtles: Out of the Shadows (2016),Action|Adventure|Comedy
9103,160563,The Legend of Tarzan (2016),Action|Adventure
9114,161594,Kingsglaive: Final Fantasy XV (2016),Action|Adventure|Animation|Drama|Fantasy|Sci-Fi
9116,161918,Sharknado 4: The 4th Awakens (2016),Action|Adventure|Horror|Sci-Fi


So, well, what are these movies? I rarely know any of them... (proves again the notion that domain knowledge is of utmost importance in data science!)

Viola, I know ***Tomorrowland (2015)***! We all know this movie, so we sort of understand why {Children, Adventure} is an associated pair. Given that this is not an animation, but its interesting and fantasy storyline in discovering the secrets of a mystic place kind of succeeded in targeting little boys and girls.

There are more to discover. Try finding an interesting pair on your own!

# Summary

To recap, a straightforward 4-steps approach to association rule:

1. One-hot encone the basket in dataframe.
2. Generate frequent itemsets using `apriori`.
3. Generate rule with `association_rules`.
4. Interpret & evalute the result with metrics.

### References:
1. [Introduction to Market Basket Analysis in Python](http://pbpython.com/market-basket-analysis.html)
2. [Movie genre associations](https://mathematicaforprediction.wordpress.com/2013/10/06/movie-genre-associations/)
3. [Mining Association Rules](https://paginas.fe.up.pt/~ec/files_0506/slides/04_AssociationRules.pdf)
4. [Association Rules Generation from Frequent Itemsets](https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/)
5. F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872

  and should_run_async(code)


  and should_run_async(code)
