# Hands-On Exercise 8.1:
# Generating Association Rules From Transaction Data
***

## Objectives

#### In this exercise, you will perform association rule mining with Python. This exercise allows you to analyze the transaction data by identifying frequently co-occurring items in the data set. The goal is to show you how association rules mining with Python can be used to draw relationships between seemingly unrelated items.

### Overview

You will work on the Online Retail data set. You will:

● Use the Apriori algorithm on the data set to mine association rules <br>
● Evaluate the derived rules through their measures of support, confidence,
and lift

1. ❏ Import the **csv** and **pandas** libraries


In [None]:
import csv
import pandas as pd

2. ❏ Create a sparse matrix from the external dataset *Groceries.csv*

In [None]:
with open('Groceries.csv', 'r') as f:  
    reader = csv.reader(f)
    data = list(list(rec) for rec in csv.reader(f)) 

3. ❏ Import the **TransactionEncoder** function from **mlxtend.preprocessing**

In [None]:
from mlxtend.preprocessing import TransactionEncoder

4. ❏ Transform the unique labels in the list into a one-hot encoded array and convert them into a dataframe for display

In [None]:
te = TransactionEncoder()
te_ary = te.fit(data).transform(data)

df = pd.DataFrame(te_ary, columns=te.columns_)
df.head()

5. ❏ Import the **apriori** and **association_rules** functions from **mlxtend.frequent_patterns**

In [None]:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

6. ❏ Generate itemsets using the **apriori()** function <br><br>
*Hint: Try different values for min_support to see different results. For example, try a min_support of .07 and then try smaller and greater values to see different results*

In [None]:
frequent_itemsets = apriori(df, min_support=0.07, use_colnames = True)
frequent_itemsets 

7. ❏ Use a bar chart to visualize the itemset frequencies<br><br>
*Hint: You may need to set an appropriate value for min_support in step 6, to avoid getting too cluttered a visualization*<br><br>
*Hint2: If the the visualization doesn't appear when you first execute the cell, try re-executing it*

In [None]:
frequent_itemsets.plot.bar(x='itemsets', y='support')

8. ❏ Explore the sizes of the transactions by summing each row (axis=1) and using the **.value_counts()** method to count the number of transactions of each size. The result could then be stored in a dataframe and transposed for easier display

In [None]:
pd.DataFrame(df.sum(axis=1).value_counts()).T

9. ❏  Replace **.value_counts()** from the previos step with **.describe()** to generate summary statistics on the transaction sizes

In [None]:
df.sum(axis=1).describe()

10. ❏ Use the **.sum()** function on the entire dataset to explore the frequency of items across transactions

In [None]:
df.sum()

11. ❏ Divide the summed values in step 10 by the number of rows to explore the proportion of transactions that contain each item<br><br>
*Hint: .shape[0] will return the number of rows*

In [None]:
df.sum()/df.shape[0]

12. ❏ Find association rules in the dataset using the **association_rules()** function and a **confidence** metric

In [None]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0)
rules

13. ❏ Sort the rules by **lift**

In [None]:
rules.sort_values(by=['lift'], ascending=False).head()

14. ❏ Find rules that have a chocolate antecedent<br><br>
*Hint: You may need to modify the min_support value in step 6 to a much lower value (eg. min_support=.01), and re-execute steps 6 and 12, in order to generate any rules in this step*

In [None]:
rules[rules['antecedents'] == {'chocolate'}]