# BAX 453 - Assignment 2
# Michael Chen

## Business Case

Online retail company XYZ sells various products and looking to increase its revenue by promoting cross-selling (i.e. selling related or complementary items) opportunities to its customers. The company is looking to apply advanced analytics on its historical transactional data to answer the following business question:

When a customer buys an item, what are the related or complementary items that can be presented to them to promote cross-selling?
 

## Data Preparation

In [130]:
import pandas as pd
import re

In [131]:
txac = pd.read_excel('transactions_by_dept.xlsx')

In [132]:
txac.head()

Unnamed: 0,POS Txn,Dept,ID,Sales U
0,16120100160021008773,0261:HOSIERY,250,2
1,16120100160021008773,0634:VITAMINS & HLTH AIDS,102,1
2,16120100160021008773,0879:PET SUPPLIES,158,2
3,16120100160021008773,0973:CANDY,175,2
4,16120100160021008773,0982:SPIRITS,176,1


In [133]:
def get_rid_of_dept_num(string):
    """Helper function to strip Dept number in front of items."""
    return re.findall("\:(.*)", string)[0]

In [134]:
txac['Dept_strip'] = txac['Dept'].apply(lambda x: get_rid_of_dept_num(x))

In [135]:
items_by_pos = txac.groupby('POS Txn')['Dept_strip'].apply(','.join).reset_index()
items_by_pos['Dept_strip'] = items_by_pos['Dept_strip'].apply(lambda x: x.split(","))

In [136]:
items_by_pos.head()

Unnamed: 0,POS Txn,Dept_strip
0,16120100160021008773,"[HOSIERY, VITAMINS & HLTH AIDS, PET SUPPLIES, ..."
1,16120100160021008774,"[HEALTH AIDS, PERSONAL CARE]"
2,16120100160021008775,"[PRE-RECORDED A/V, SMALL ELECTRICS, SPIRITS]"
3,16120100160021008776,[GENERAL GROCERIES]
4,16120100160021008777,[SPIRITS]


## Association Rule Mining

Machine Learning Project: Generate frequent itemsets and association rules for a recommender system using Association Rule Mining*. Feel free to choose appropriate levels of minimum support and minimum confidence (e.g., 0.20 support and 30% confidence).

In [137]:
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

### Frequent itemsets

In [138]:
items_set = list(items_by_pos['Dept_strip'])
te = TransactionEncoder()
te_ary = te.fit(items_set).transform(items_set)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.01, use_colnames=True)

frequent_itemsets

Unnamed: 0,support,itemsets
0,0.049903,(AMERICAN GREETINGS)
1,0.010174,(AS SEEN ON TV)
2,0.011143,(AUDIO ELECTRONICS)
3,0.016957,(BABY CARE)
4,0.054264,(BARBER SERVICES)
...,...,...
76,0.017926,"(PERSONAL CARE, HOUSEHOLD CLEANING)"
77,0.010659,"(HOUSEHOLD CLEANING, SPIRITS)"
78,0.025194,"(TOBACCO, SPIRITS)"
79,0.037306,"(SPIRITS, WINE)"


### Association Rules

In [142]:
association_rules(frequent_itemsets, metric="confidence", min_threshold=0.4)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(MASS COSMETICS),(BEAUTY CARE),0.025194,0.064438,0.010174,0.403846,6.267206,0.008551,1.56933
1,(GENERAL GROCERIES),(BEVERAGES),0.047965,0.122578,0.020833,0.434343,3.543418,0.014954,1.551158
2,(PERSONAL CARE),(HEALTH AIDS),0.073643,0.096899,0.031008,0.421053,4.345263,0.023872,1.559901
3,(WINE),(SPIRITS),0.093023,0.152132,0.037306,0.401042,2.636146,0.023154,1.415571
4,"(PERSONAL CARE, HOUSEHOLD CLEANING)",(HEALTH AIDS),0.017926,0.096899,0.010659,0.594595,6.136216,0.008922,2.227649


## Business Recommendations

* In-Store Arrangement Optimization
    - According to this association rule, the grocery store could make adjustments on items arrangements based on the above table. For example, make small piles of beverages avaiable among the general groceries, fresh and area with spaces. This could lead to potentially more sales of the items and therefore more revenue by promoting cross-sell.
    - The aisles in the store can be arranged accoridng to the relationships between antecedents and consequents to promote more complimentary purchases or relevant purchases.
    <br>
* Online Recommendation System
  - This rule could be used to implement a recommendation system to suggest "what is next to buy"? When a customer put certain items in the shopping cart, the system could give him/her recommendations like "People buy this also buy..." according to both antecedents and consequents generated by the above rule. For example, once a customer buys MASS COSMETICS, items from BEAUTY CARE would be recommended. This could as well lead to more revenue by promoting cross-sell.

## Additional business use cases

Research and identify three additional business use cases where Association Rule Mining can be used to deliver business outcomes? Practice your storytelling communication skills! 

<table>
  <tr>
    <th>Business Question</th>
    <th>Data Need</th>
    <th>Predictive Analytics</th>
    <th>Business Action</th>
    <th>Business Outcome</th>  
  </tr>
  <tr>
    <td>Create more targeted store Ads/Categlog at a supermarket.</td>
    <td>Customers' basket information and demographic information collected at check-out through barscode scanning and loyalty programs.</td>
    <td>Association Rule Mining</td>
    <td>Based on analytical results, store becomes aware which groups are inclined to buy what items together or sequentially. Customized email, coupons and catelogs can be created and delivered.</td>
    <td>Increased cross-sell and customer experience with customized services</td>
  </tr>
  <tr>
    <td>Coronavirus Gene-Sequencing and mutation studies.</td>
    <td>Viruses' genome data collected from all over the world with lab-tested results.</td>
    <td>Association Rule Mining</td>
    <td>To discover the association among traits of coronavirus in different countries, and understand how it mutates and spreads from one place to another.</td>
    <td>Such relationships mining could help us to understand the virus better from its origin to its current status. It might as well helps to develop medical treatment or vaccine with a greater understanding of where it could go. </td>
  </tr>
  <tr>
    <td>website usage analytics and click through increase.</td>
    <td>Users' weblog with tracking information through cookies. Just like what Google Analytics does.</td>
    <td>Association Rule Mining</td>
    <td>Connecting dots among all the actions an user does before click-through certain ads, which potentially leads to business opportunities. A dynamic and customized experience could be built to tailor groups of customer. For example, website UI and ads appearance.</td>
    <td>Such detailed customer behavior mapping and usage preferences could help to understand users' happits and potential business leads can be created if tailoring what he/she likes. Both up-sell and cross-sell can happen.</td>
  </tr>
</table>