<font color = "#CC3D3D"><b>
# (DW Practice #3) Market Basket Analysis

- 장바구니분석(Market Basket Analysis)은 거래내역(Transaction)을 통해 고객이 구매한 상품 간의 연관 관계 또는 규칙를 찾을 때 사용하는 분석기법이다.  
  - (연관규칙의 표현) `항목 A`와 `품목 B`를 구매한 고객은 `품목 C`를 구매한다: *(품목 A) & (품목 B) => (품목 C)*
- 교차판매, 상품진열, 부정탐지, 상품 카달로그 디자인 등에 주로 활용된다.  
<img align='left' src='https://blog.rsquaredacademy.com/img/mba_steps.png' style='width: 80%; height: auto;'>

- 장바구니분석을 하게되면 수많은 연관규칙이 나오기 때문에 이 중에서 유용한 규칙을 선별할 수 있는 아래와 같은 평가기준이 요구된다.  
<img align='left' src='http://drive.google.com/uc?export=view&id=191LWlu63r0T3GIv-FX-x7Ds4bezBfxfU' style='width: 80%; height: auto;'>

#### 데이터 준비

In [2]:
import pandas as pd
import numpy as np

In [3]:
# read raw data
cs = pd.read_csv('L사_고객정보.csv')
gd = pd.read_csv('L사_상품정보.csv')
tr = pd.read_csv('L사_거래정보.csv')

# merge data 
gd.pd_c = gd.pd_c.astype(str) 
df = pd.merge(tr, cs).merge(gd, on='pd_c')
df.de_dt = df.de_dt.astype(str).astype('datetime64') 

In [5]:
# transform data
store_data = pd.pivot_table(df, index='clnt_id', columns='clac_nm2', values='buy_ct', aggfunc=np.size, fill_value=0) \
            .applymap(lambda x: 1 if x>=1 else 0).reset_index() # apply는 시리즈(열) applymap은 데프(모든열)
transactions = store_data.iloc[:,1:]
transactions

clac_nm2,Arts / Crafts Supplies,Audios,Bikes,Biscuits,Body Care,Boy's Toys,Breads,Business Paper Products,Cameras / Camcorders,Camping,...,Women's Lower Bodywear / Bottoms,Women's Outwear,Women's Socks and Hosiery,Women's Special Materials Clothing,Women's Special Use Clothing,Women's Sport Shoes,Women's Underwear,Women's Upper Bodywear / Tops,Writing Pads,Writing Supplies
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,1,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10093,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10094,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
10095,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10096,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


In [4]:
transactions.sum().sort_values(ascending=False).head(20)

clac_nm2
Instant Noodles        5231
Snacks                 5186
Tofu / Bean Sprouts    4749
Leaf Vegetables        4361
Fruit Vegetables       4251
Biscuits               4034
Retort Pouches         3625
Root Vegetables        3193
Mushrooms              3041
Sauces                 3008
Western Vegetables     2692
Instant Cup Noodles    2519
Mature Sauces          2399
Seasonings             2334
Candies                2226
Dried Noodles          2009
Cooking Oils           1924
Pies                   1830
Cereals                1680
Restaurants            1675
dtype: int64

#### 빈발항목집합 추출 - Apriori

In [7]:
# 대표적인 연관규칙탐사 알고리즘인 Apriori를 실행하기 위해서는 mlxtend 패키지를 설치해야 함
!pip install mlxtend



In [8]:
from mlxtend.frequent_patterns import apriori, association_rules

In [12]:
# 지지도(support)가 5% 이상인 빈발항목집합(itemsets)만 추출하고 지지도 기준 내림차순으로 출력
freq_items = apriori(transactions, min_support=0.2, use_colnames=True)
freq_items.sort_values(by='support', ascending=False)

Unnamed: 0,support,itemsets
4,0.518023,(Instant Noodles)
12,0.513567,(Snacks)
13,0.470291,(Tofu / Bean Sprouts)
5,0.431868,(Leaf Vegetables)
2,0.420974,(Fruit Vegetables)
...,...,...
85,0.202119,"(Instant Noodles, Biscuits, Tofu / Bean Sprout..."
56,0.201921,"(Instant Noodles, Biscuits, Fruit Vegetables)"
45,0.201822,"(Leaf Vegetables, Western Vegetables)"
79,0.201327,"(Instant Noodles, Root Vegetables, Tofu / Bean..."


#### 연관규칙 도출

In [11]:
# 신뢰도(confidence)가 85% 이상인 연관규칙만 출력
rules = association_rules(freq_items, metric='confidence')
rules.query('confidence >= 0.8') # 잘 나오지 않는 제품은 전체분류를 묶어야함.

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Biscuits),(Snacks),0.399485,0.513567,0.334819,0.838126,1.63197,0.129656,3.005008
1,(Instant Cup Noodles),(Instant Noodles),0.249455,0.518023,0.203902,0.817388,1.577898,0.074678,2.639346
2,(Instant Cup Noodles),(Snacks),0.249455,0.513567,0.203308,0.815006,1.586951,0.075196,2.629452
3,(Mushrooms),(Tofu / Bean Sprouts),0.301149,0.470291,0.248762,0.826044,1.756453,0.107135,3.045075
4,(Root Vegetables),(Tofu / Bean Sprouts),0.316201,0.470291,0.254209,0.803946,1.709465,0.105502,2.701854
5,"(Biscuits, Fruit Vegetables)",(Instant Noodles),0.245593,0.518023,0.201921,0.822177,1.587143,0.074698,2.710435
6,"(Biscuits, Fruit Vegetables)",(Snacks),0.245593,0.513567,0.216181,0.880242,1.713977,0.090053,4.061797
7,"(Biscuits, Fruit Vegetables)",(Tofu / Bean Sprouts),0.245593,0.470291,0.205684,0.8375,1.780812,0.090184,3.259747
8,"(Biscuits, Leaf Vegetables)",(Instant Noodles),0.247178,0.518023,0.20509,0.829728,1.601718,0.077046,2.83062
9,"(Biscuits, Instant Noodles)",(Snacks),0.300753,0.513567,0.263319,0.875535,1.704812,0.108863,3.908193


<font color = "#CC3D3D"><b>
# End