# Objective


Market Basket Analysis, which is a powerful tool for translating vast amounts of customer transaction and viewing data into simple rules for product promotion and recommendation. Purpose of this notebook is to learn how to perform Market Basket Analysis using the Apriori algorithm, standard and custom metrics, association rules, aggregation and pruning, and visualization. Also to reinforce new skills through building recommendations for a small grocery store, a library, an e-book seller, a novelty gift retailer, and a movie streaming service.

# Market Basket Analysis

The basics of Market Basket Analysis: association rules, metrics, and pruning. You’ll then apply these concepts to help a small grocery store improve its promotional and product placement efforts.

## What is market basket analysis?

1. Identify products frequently purchased together
    - Biography and history
2. Construct recommendations based on these findings
    - place biography and history sections together 

## Use cases
    - Construct association rules
        - {antecedent} -> {consequent}
    - identify items frequently purchased
    

In [None]:
import pandas as pd

books = pd.read_csv('/kaggle/input/market-basket-analysis-dataset/bookstore_transactions.csv')
print(books.head(2))

In [None]:
transactions = books['Transaction'].apply(lambda t: t.split(','))
transactions = list(transactions)
transactions

## The basics of market basket analysis

Market basket analysis uses lists of transactions to identify useful associations between items. Such associations can be written in the form of a rule that has an antecedent and a consequent. Let's assume a small grocery store has asked you to look at their transaction data. After some analysis, you find the rule given below.

`{cereal} → {milk}`

Which statement about this rule is correct?

`{cereal} is the antecedent, {milk} is the consequent, and both are items.`

## Cross-selling products

The small grocery store has decided to cross-sell chewing gum with either `coffee, cereal, or bread`. To determine which of the three items is best to use, the store owner has performed an experiment. For one week, she sold chewing gum next to the register and recorded all transactions where it was purchased with either `coffee, cereal, or bread`. The transactions from that day are available as a list of lists named transactions. Each transaction is either `['coffee','gum']`, `['cereal','gum']`, or `['bread','gum']`

In [None]:
history = transactions.count(['History', 'Bookmark'])
biography = transactions.count(['Biography', 'Bookmark'])
fiction = transactions.count(['Fiction', 'Bookmark'])

print('history:', history)
print('biography:', biography)
print('fiction:', fiction)

## Identifying association rules

### Association rules
    - Association rule 
        - Contains antecedent and consequent
            - {health} -> {cooking}
    - Multi-antecedent rule
        - {humor,travel} -> {language}
    - Multi-consequent rule
        - {biography} -> {histroy,language}


### Generating rules with itertools

In [None]:
from itertools import permutations

flattened = [item for transaction in transactions for item in transaction]
items = list(set(flattened))

In [None]:
rules = list(permutations(items,2))
print(rules)

In [None]:
print(len(rules))

### MLxtend package

```python
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import apriori

frequent_itemsets = apriori(transactions, min_support = .001, max_len = 2, use_colnames = True)

rules = association_rules(frequent_itemsets, metric = 'lift', min_threshold = 1.0)
```


## Multiple antecedents and consequents

Market basket analysis revolves around the use of association rules, which are if-then statements about the relationship between two sets of items. The rule {coffee} → {milk}, for instance, is read as "if coffee then milk," where coffee is the antecedent and milk is the consequent. Many rules have multiple antecedents and consequents.

    - Multiple Antecedent `{sugar, flower} -> {sweet puff}`
    - Multiple Consequent `{tea} -> {milk,biscuit}`
    - Multiple Antecedent and Consequent `{biscuit,jam} -> {milk,cereal}`

## Generating association rules

As you saw, the function `permutations` from the module `itertools` can be used to quickly generate the set of all one-antecedent, one-consequent rules. You do not, of course, know which of these rules are useful. You simply know that each is a valid way to combine two items.

Let's practice generating and counting the set of all rules for a subset of the grocery dataset: coffee, tea, milk, and sugar.

## The simplest metric

### Metrics and Pruning

    - A metric is a measure of performance for rules
        - {humor} -> {poetry} - .81
    - pruning is the use of metrics to discard the rules
        - Retain {humor} -> {poetry}
        
- The support metric measures the share of transactions that contain the itemset
    - number of transactions with item(s) / number of transactions 
    - number of transactions with milk / total transactions 


## One-hot encoding transaction data

We will use a common pipeline for preprocessing data for use in market basket analysis. The first step is to import a pandas DataFrame and select the column that contains transactions. Each transaction in the column will be a string that consists of a number of items, each separated by a comma. The next step is to use a lambda function to split each transaction string into a list, thereby transforming the column into a list of lists.

Here the list of lists, which is available as transactions. You will then transform transactions into a one-hot encoded DataFrame, where each column consists of TRUE and FALSE values that indicate whether an item was included in a transaction.

In [None]:
from mlxtend.preprocessing import TransactionEncoder
encoder = TransactionEncoder().fit(transactions)
onehot = encoder.transform(transactions)

In [None]:
onehot = pd.DataFrame(onehot, columns=encoder.columns_)
print(onehot)

### Computing the support metric

You one-hot encoded transactions as the DataFrame onehot. Here you'll make use of that DataFrame and the support metric to help the owner. First, she has asked you to identify frequently purchased items, which you'll do by computing support at the item-level. And second, she asked you to check whether the rule {Fiction} → {Poetry} has a support of over 0.05. Note that onehot has been defined and is available. 

### Computing support for single items

Note that for calculating support for items we use `mean()` method

In [None]:
print(onehot.mean())

In [None]:
import numpy as np

onehot['Fiction+Poetry'] = np.logical_and(onehot['Fiction'],onehot['Poetry'])

print(onehot.mean())

# Association Rules

Association rules tell us that two or more items are related. Metrics allow us to quantify the usefulness of those relationships. Learning six metrics to evaluate association rules: supply, confidence, lift, conviction, leverage, and Zhang's metric. You’ll then use association rules and metrics to assist a library and an e-book seller.


### Confidence and lift

> Confidence = support(Milk & Coffee)/ support(Milk) = 0.20/1.0 = 0.20

As support for milk and coffee and the confidence is same, this means purchasing milk doesn't assure purchasing coffee.

**Lift**: Lifts provide another metric for evaluating the relationship between items
    - Numerator: proportion of transactions that contain X and Y
    - Denominator: Proportion if X and Y assigned randomly and independently


### Recommending books with support

A library wants to get members to read more and has decided to use market basket analysis to figure out how. They approach you to do the analysis. You are given the data in one-hot encoded format in a pandas DataFrame called books.

Each column in the DataFrame corresponds to a book and has the value TRUE if the book is contained in a reader's library and is rated highly. To make things simpler, we'll work with shortened book names: Hunger, Potter, and Twilight.

In [None]:
supportBF = np.logical_and(onehot['Biography'], onehot['Fiction']).mean()
supportBF

In [None]:
supportBP = np.logical_and(onehot['Biography'], onehot['Poetry']).mean()
supportBP

In [None]:
supportBH = np.logical_and(onehot['Biography'], onehot['History']).mean()
supportBH

In [None]:
supportPH = np.logical_and(onehot['Poetry'], onehot['History']).mean()
supportPH

> <script.py> output:
- Hunger Games and Harry Potter: 0.12
- Hunger Games and Twilight: 0.09
- Harry Potter and Twilight: 0.14

Based on the support metric, Harry Potter and Twilight appear to be the best options for cross-promotion. In the next problem, we'll consider whether we should use Harry Potter to promote Twilight or Twilight to promote Harry Potter.

### Refining support with confidence

After reporting your findings, the library asks you about the direction of the relationship. Should they use Harry Potter to promote Twilight or Twilight to promote Harry Potter?

After thinking about this, you decide to compute the confidence metric, which has a direction, unlike support. You'll compute it for both {Potter} → {Twilight} and {Twilight} → {Potter}

##### Compute support for Potter and Twilight
supportPT = np.logical_and(books['Potter'], books['Twilight']).mean()

##### Compute support for Potter
supportP = books['Potter'].mean()

##### Compute support for Twilight
supportT = books['Twilight'].mean()

##### Compute confidence for both rules

##### Compute the confidence of {Potter} → {Twilight} and {Twilight} → {Potter}.
confidencePT = supportPT / supportP
confidenceTP = supportPT / supportT

##### Print results
print('{0:.2f}, {1:.2f}'.format(confidencePT, confidenceTP))

> Even though the support is identical for the two association rules, the confidence is much higher for Twilight -> Harry Potter, since Harry Potter has a higher support than Twilight.

# Aggregation and Pruning


The fundamental problem of Market Basket Analysis is determining how to translate vast amounts of customer decisions into a small number of useful rules. This process typically starts with the application of the Apriori algorithm and involves the use of additional strategies, such as pruning and aggregation. Learn how to use these methods and will ultimately apply them in exercises where you assist a retailer in selecting a physical store layout and performing product cross-promotions.


# Visualizing Rules

Learn how visualizations are used to guide the pruning process and summarize final results, which will typically take the form of itemsets or rules. You’ll master the three most useful visualizations -- heatmaps, scatterplots, and parallel coordinates plots – and will apply them to assist a movie streaming service.

**Important links:**
    - [Instacart Market Basket Analysis](https://medium.com/kaggle-blog/instacart-market-basket-analysis-feda2700cded)
    - [Food Discovery with Uber Eats: Recommending for the Marketplace](https://eng.uber.com/uber-eats-recommending-marketplace/)