# Association Rule Mining

Association rule mining is a process that deploys pattern recognition to identify and quantify relationships between different, yet related items.

Say, for example, eggs and bread are frequently purchased together. With that finding you can increase sales by:

- Placing eggs and bread next to each other so that when a customer buys one of the products, they don't have to walk to buy another products.
- Advertising to buyers of either eggs or butter in order to increase that person's propensity to purchase the (paired) other product.
- Offer discounts on both eggs and butter if the customer buys both of them in one purchase. 

*Association rule:*

"If item eggs are purchased, then the possibility of buying the bread is ___"

*Can also be represented:*

{eggs} -> {bread}

## Advantages 

- Fast 
- Works with small quantities of data 
- Few (if any) feature engineering requirement

> The term feature engineering refers to the process of engineering data into a predictive feature that fits the requirements (and/or improves the performance), of a machine learning model.

## Three Ways to Measure Association

1. Support
2. Confidence
3. Lift

*Illustrating with Bread and Eggs*

Scenario: 5000 total transactions in a supermarket

- `A` = Bread purchases = 500 transactions 
- `C` = Eggs purchases = 350 transactions 
- `A -> C` Both eggs and bread purchases = 150 transactions 

### Support

Support is the relative frequency of an item within a dataset.  
Support for an item can be calculated as: `support(A->C) = support(AuC)`

Example: support(bread) = (transactions containing bread)/(total number of transactions) = 500/5000 = 0.1 

### Confidence

Confidence is a probability of seeing the consequent item (a "then" term) within data, given that the data also contains the antecedent (the "if" term) item. 

- `THEN` How likely it is for 1 item to be purchased given that,
- `IF` another item is purchased.

Confidence determines how many if-then statements are found to be true within a dataset. 

`confidence(A->C) = support(A->C) / support(A)`
> Where A is the antecedent and C is the consequent

Example: confidence(bread -> eggs) = (150/5000) / (500/5000) = 0.3

So there is a 30% likelihood that eggs will be bought if bread is purchased.

### Lift 

Lift is a metric that measures how much more often the antecedent and consequent occur together rather than them occuring independently.

`lift(A->C) = confidence(A->C)/support(C)`

Example: lift(A->C) = 0.3 / (350/500) = 4.28

With lift score 4.28 we can conclude that: "If a customer buys bread, then they are likely to also buy eggs"

#### Lift Scores 

- `>1` = A is highly associated with C. If A is purchased, it is likely that C will also be purchased.
- `<1` = If A is purchased, it is unlikely that C will be purchased.
- `1` = Indicated that it is no association between A and C 

## Apriori

The Apriori algorithmis the algorithm that you use to implement association rule mining over structured data 

In [4]:
import pandas as pd 
from mlxtend.frequent_patterns import apriori, association_rules

### Data Loading

In [14]:
data = pd.read_csv('../../inputs/groceries.csv')
data.head

Unnamed: 0,1,2,3,4,5,6,7,8,9
0,citrus fruit,semi-finished bread,margarine,ready soups,,,,,
1,tropical fruit,yogurt,coffee,,,,,,
2,whole milk,,,,,,,,
3,pip fruit,yogurt,cream cheese,meat spreads,,,,,
4,other vegetables,whole milk,condensed milk,long life bakery product,,,,,


### Data Connversion

In [17]:
basket_sets = pd.get_dummies(data)
basket_sets.head()

Unnamed: 0,1_Instant food products,1_UHT-milk,1_artif. sweetener,1_baby cosmetics,1_bags,1_baking powder,1_bathroom cleaner,1_beef,1_berries,1_beverages,1_bottled beer,1_bottled water,1_brandy,1_brown bread,1_butter,1_butter milk,1_candy,1_canned beer,1_canned fish,1_canned fruit,1_canned vegetables,1_cat food,1_chewing gum,1_chicken,1_chocolate,1_chocolate marshmallow,1_citrus fruit,1_cleaner,1_cling film/bags,1_coffee,1_condensed milk,1_cookware,1_cream cheese,1_curd,1_curd cheese,1_decalcifier,1_dental care,1_dessert,1_detergent,1_dish cleaner,...,9_pastry,9_pet care,9_photo/film,9_pickled vegetables,9_popcorn,9_pot plants,9_potato products,9_processed cheese,9_red/blush wine,9_rice,9_rolls/buns,9_root vegetables,9_rubbing alcohol,9_rum,9_salt,9_salty snack,9_sauces,9_seasonal products,9_semi-finished bread,9_shopping bags,9_skin care,9_sliced cheese,9_soda,9_soft cheese,9_sparkling wine,9_specialty bar,9_specialty cheese,9_specialty vegetables,9_spread cheese,9_sugar,9_sweet spreads,9_tea,9_vinegar,9_waffles,9_whipped/sour cream,9_white bread,9_white wine,9_whole milk,9_yogurt,9_zwieback
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
