### Problem (Part II)

**Problem:** Cross selling means selling more products to a customer by analyzing their shopping trend and comparing the pattern of which to general shopping trends. In ecommerce, retailers will often offer customers bundle of products with attractive offers in order to boost sales.

**Dataset:** The online retail transactions dataset is available from the [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/datasets/online+retail). 

**Objective:** We want to understand what merchandise items customers purchase together and use that to offer additions to customers' original purchase as suggestion. 

**Approach:** Use **association rule-mining** technique for cross selling. More details will be covered below.

In [19]:
# Load Necessary Dependencies

import csv
import pandas as pd
import matplotlib.pyplot as plt
import Orange
from Orange.data import Domain, DiscreteVariable, ContinuousVariable
from orangecontrib.associate.fpgrowth import *

%matplotlib inline

### Part I: Exploratory Analysis

Load the Online Transaction Dataset

In [20]:
cs_mba = pd.read_excel(io=r'../data/Online Retail.xlsx')

# transactions in UK, non-refund
cs_mba_uk = cs_mba[cs_mba.Country == 'United Kingdom']
cs_mba_uk = cs_mba_uk[~(cs_mba_uk.InvoiceNo.str.contains("C") == True)]
cs_mba_uk = cs_mba_uk[~cs_mba_uk.Quantity<0]

In [21]:
cs_mba_uk.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


In [22]:
cs_mba_uk.shape

(486286, 8)

In [23]:
cs_mba_uk.InvoiceNo.value_counts().shape

(18786,)

### Part II Data Preparation

**FP growth algorithm:** The algorithm we use for this association-rule mining task. It uses a special data structure called FP Tree to hold itemset association information and uses a divide-and-conquer strategy to find frequent itemsets without generating all itemsets (${2^k}$ sets if there are k items).

![title](fp_growth.png)

**Orange:** The framework we use for implementing FP growth and core data structures needed. Basically we need convert panads dataframe to *Orange table* data structure.


Association Rule Mining with FP Growth