In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder
import seaborn as sns

  if LooseVersion(mpl.__version__) >= "3.0":
  other = LooseVersion(other)


In [2]:
df_2010 = pd.read_excel('online_retail.xlsx', sheet_name = 0)
df_2011 = pd.read_excel('online_retail.xlsx', sheet_name = 1)

## 1. Market Basket Analysis with Apriori

In [3]:
# Preparing data for Apriori
transactions_2010 = df_2010.groupby('Invoice')['Description'].apply(list)
transactions_2011 = df_2011.groupby('Invoice')['Description'].apply(list)

In [4]:
# Convert all items to strings
transactions_2010 = transactions_2010.apply(lambda items: [str(item) for item in items])
transactions_2011 = transactions_2011.apply(lambda items: [str(item) for item in items])

In [5]:
# Combine transactions
transactions = pd.concat([transactions_2010, transactions_2011])

In [6]:
# Transaction encoding
te = TransactionEncoder()
te_data = te.fit(transactions).transform(transactions)
df_te = pd.DataFrame(te_data, columns=te.columns_)

In [7]:
# Applying Apriori
frequent_itemsets = apriori(df_te, min_support=0.01, use_colnames=True)

In [8]:
# Extracting association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(72 SWEETHEART FAIRY CAKE CASES),(60 TEATIME FAIRY CAKE CASES),0.027286,0.039806,0.013725,0.503014,12.636785,1.0,0.012639,1.932035,0.946698,0.257192,0.482411,0.423913
1,(60 TEATIME FAIRY CAKE CASES),(72 SWEETHEART FAIRY CAKE CASES),0.039806,0.027286,0.013725,0.344812,12.636785,1.0,0.012639,1.484632,0.959041,0.257192,0.326433,0.423913
2,(PACK OF 60 DINOSAUR CAKE CASES),(60 TEATIME FAIRY CAKE CASES),0.024125,0.039806,0.012428,0.515152,12.941704,1.0,0.011468,1.980401,0.945541,0.241306,0.495052,0.413682
3,(60 TEATIME FAIRY CAKE CASES),(PACK OF 60 DINOSAUR CAKE CASES),0.039806,0.024125,0.012428,0.312213,12.941704,1.0,0.011468,1.418863,0.960983,0.241306,0.29521,0.413682
4,(PACK OF 60 PINK PAISLEY CAKE CASES),(60 TEATIME FAIRY CAKE CASES),0.037192,0.039806,0.017344,0.466339,11.715431,1.0,0.015864,1.799259,0.949974,0.290748,0.444216,0.45103



### Overview
This analysis uses the Apriori algorithm to extract association rules from the transaction dataset. The goal is to identify relationships between different products that are frequently bought together.

### Association Rules

#### Key Findings
1. **Rule 1**: Buying *60 Teatime Fairy Cake Cases* often leads to buying *72 Sweetheart Fairy Cake Cases*. 
   - **Support**: 0.0137
   - **Confidence**: 34.48%
   - **Lift**: 12.64

2. **Rule 2**: Buying *72 Sweetheart Fairy Cake Cases* is associated with the purchase of *60 Teatime Fairy Cake Cases*.
   - **Support**: 0.0137
   - **Confidence**: 50.30%
   - **Lift**: 12.64

3. **Rule 3**: *60 Teatime Fairy Cake Cases* and *Pack of 60 Dinosaur Cake Cases* are often bought together.
   - **Support**: 0.0124
   - **Confidence**: 31.22%
   - **Lift**: 12.94

#### Interpretation
- **Support** indicates how frequently the itemset appears in the dataset.
- **Confidence** measures the reliability of the rule.
- **Lift** indicates how much more likely the items are bought together compared to being bought independently.

### Conclusion
Understanding these relationships allows businesses to make informed decisions about product placement, cross-promotions, and inventory management to enhance sales strategies.

## 2. Price Elasticity Analysis

In [9]:
# Function to calculate price elasticity
def calculate_elasticity(df):
    df['Revenue'] = df['Quantity'] * df['Price']
    df = df[df['Quantity'] != 0]  # Avoid division by zero
    df['Elasticity'] = (df['Quantity'].pct_change() / df['Price'].pct_change()).replace([np.inf, -np.inf], np.nan)
    return df

In [10]:
# Calculate for 2010 and 2011
elasticity_2010 = calculate_elasticity(df_2010)
elasticity_2011 = calculate_elasticity(df_2011)

elastic_products_2010 = elasticity_2010[['Description', 'Elasticity']].dropna().groupby('Description').mean()
elastic_products_2011 = elasticity_2011[['Description', 'Elasticity']].dropna().groupby('Description').mean()

In [11]:
print("Top Elastic Products 2010:")
print(elastic_products_2010.sort_values(by='Elasticity', ascending=False).head())
print("Top Elastic Products 2011:")
print(elastic_products_2011.sort_values(by='Elasticity', ascending=False).head())

Top Elastic Products 2010:
                         Elasticity
Description                        
ebay sales              2837.944444
mouldy                  1001.000000
Rusty                    852.000000
Given away               734.333333
wet/smashed/unsellable   701.000000
Top Elastic Products 2011:
                               Elasticity
Description                              
Thrown away-rusty             2377.000000
Unsaleable, destroyed.        1672.000000
Printing smudges/thrown away  1510.666667
lost??                        1132.000000
mouldy, thrown away.           867.666667


### Overview
This analysis examines the price elasticity of products for 2010 and 2011. Elasticity measures how sensitive the quantity demanded is to a change in price. High elasticity indicates significant consumer response to price changes, which is crucial for pricing strategy.

### Top Elastic Products

#### 2010
1. **ebay sales**
   - **Elasticity**: 2837.94
2. **mouldy**
   - **Elasticity**: 1001.00
3. **Rusty**
   - **Elasticity**: 852.00
4. **Given away**
   - **Elasticity**: 734.33
5. **wet/smashed/unsellable**
   - **Elasticity**: 701.00

#### 2011
1. **Thrown away—rusty**
   - **Elasticity**: 2377.00
2. **Unsaleable, destroyed**
   - **Elasticity**: 1672.00
3. **Printing smudges/thrown away**
   - **Elasticity**: 1510.67
4. **lost??**
   - **Elasticity**: 1132.00
5. **mouldy, thrown away**
   - **Elasticity**: 867.67

### Interpretation
Products with high elasticity show a strong demand response to price changes. This insight helps in:

- **Pricing Strategy**: Adjusting prices for elastic products can significantly impact sales volumes.
- **Marketing Focus**: Promotional strategies can target these products to optimize sales.

### Conclusion
Identifying highly elastic products enables businesses to make data-driven pricing decisions, optimizing revenue while addressing consumer sensitivity to price changes.