# Association Analysis (Market Basket)
## Install and Load Packages


In [2]:
import apriori
import os

## Association Analysis with apriori

Apriori algorithm finds the most frequent itemsets or elements in the data and identifies association rules between the items.

The algorithm uses a "bottom-up" approach, where frequent subsets are extended one item at once (candidate generation) and groups of candidates are tested against the data. The algorithm terminates when no further successful rules can be derived from the data.

We'll use the "groceries.csv" and set the minimum confidence level to .05.

In [3]:
dirpath = os.getcwd()
item_supports, rules = apriori.run_apriori(dirpath+"\\groceries.csv", min_confidence=0.05)

### Support
The support of item I is defined as the ratio between the number of transactions containing the item I by the total number of transactions.

We can check the five items with the highest levels of support in the transaction dataset.

In [4]:
for items, support in item_supports[: 5]:
    print("{0} - {1:.2f}".format(", ".join(items), support))

whole milk - 0.26
other vegetables - 0.19
rolls/buns - 0.18
soda - 0.17
yogurt - 0.14


The line means that whole milk appears in 26% of the transactions in this dataset.
### Confidence
This is measured by the proportion of transactions with item I1, in which item I2 also appears. The confidence between two items I1 and I2,  in a transaction is defined as the total number of transactions containing both items I1 and I2 divided by the total number of transactions containing I1.

We can also check the five items with the highest levels of confidence.

In [5]:
for items, rule in rules[: 5]:
    print("{0} => {1} - {2:.2f}".format(", ".join(items[0]), ", ".join(items[1]), rule))

yogurt => whole milk - 0.40
other vegetables => whole milk - 0.39
rolls/buns => whole milk - 0.31
whole milk => other vegetables - 0.29
whole milk => rolls/buns - 0.22


The first line means that of all the baskets with yogurt, 40% also contained whole milk.