# 연관분석(Association Rule Analysis)
    
| 사건의 연관규칙을 찾는 방법

## 연관규칙분석의 척도
1. 지지도(support): A,B가 포함된 거래 / 전체 거래
2. 신뢰도(confidence): A,B가 포함된 거래 / A가 포함된 거래

    *연관성의 정도 파악 가능*
3. 향상도(lift): 신뢰도 / B가 포함된 거래

    *A,B의 구매가 서로 관련이 없는 경우에는 향상도가 1이 됌*

    **음의 관계인 경우 향상도 < 1**

    **양의 관계인 경우 향상도 > 1**

## Run-test

| 연관규칙을 찾기 전, 연속적인 binary 관측 값들이 임의적으로 나타난 값이 아닌지(**연관이 있는지**)를 먼저 검정하는 검정기법 
| 정규분포를 사용하므로 Z-통계량을 사용

    귀무가설: 연속적인 관측값이 임의적이다
    대립가설: 연속적인 관측값이 임의적이지 않으며, 연관이 있다

```runtest_1samp(x, cutoff, correction)```
* x: (array)2개의 관측값으로 이루어진 binary data, 정수형이어야 함
* cutoff: (mean, median, or number) data를 나누는 기준값
* correction: 50 미만의 샘플사이즈일 경우 각 사건이 일어날 확률을 정의하기 어려우므로 해당 확률을 0.5로 수정

## 연관분석 예제 - groceries 데이터

In [2]:
import pandas as pd
import numpy as np
from mlxtend.preprocessing import TransactionEncoder # 구매 목록 트랙잭션 형태로 변환
from mlxtend.frequent_patterns import apriori # 빈발항목집합 생성
from mlxtend.frequent_patterns import association_rules # 연관규칙 생성

In [5]:
df = pd.read_csv('https://raw.githubusercontent.com/ADPclass/ADP_book_ver01/main/data/groceries.csv', header=None) # header=컬럼명이 첫행 value로 나오게끔
df.head()

Unnamed: 0,0
0,"citrus fruit,semi-finished bread,margarine,rea..."
1,"tropical fruit,yogurt,coffee"
2,whole milk
3,"pip fruit,yogurt,cream cheese,meat spreads"
4,"other vegetables,whole milk,condensed milk,lon..."


In [23]:
# TransactionEncoder 입력형태에 맞게 전처리

def for_transaction(row):
    llist = np.array(row.str.split(','))
    return llist
llist = for_transaction(df[0])
llist[0]

['citrus fruit', 'semi-finished bread', 'margarine', 'ready soups']

```from mlxtend.preprocessing import TransactionEncoder```

array형태의 데이터를 입력해야함

In [27]:
# 트랙잭션 형태로 인코딩
te = TransactionEncoder()
groceries = te.fit_transform(llist)
groceries = pd.DataFrame(groceries, columns=te.columns_) # te.columns: 구매 전품목
groceries.head()

Unnamed: 0,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,baby food,bags,baking powder,bathroom cleaner,beef,berries,beverages,bottled beer,bottled water,brandy,brown bread,butter,butter milk,cake bar,candles,candy,canned beer,canned fish,canned fruit,canned vegetables,cat food,cereals,chewing gum,chicken,chocolate,chocolate marshmallow,citrus fruit,cleaner,cling film/bags,cocoa drinks,coffee,condensed milk,cooking chocolate,cookware,cream,...,salty snack,sauces,sausage,seasonal products,semi-finished bread,shopping bags,skin care,sliced cheese,snack products,soap,soda,soft cheese,softener,sound storage medium,soups,sparkling wine,specialty bar,specialty cheese,specialty chocolate,specialty fat,specialty vegetables,spices,spread cheese,sugar,sweet spreads,syrup,tea,tidbits,toilet cleaner,tropical fruit,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,...,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False


```from mlxtend.frequent_patterns import apriori```

가능한 모든 경우의 수를 탐색하여 지지도, 신뢰도, 향상도가 높은 규칙들을 찾아내는 방식

* 아이템 수가 증가할수록 계산에 소요되는 시간이 기하급수적으로 증가하는 문제가 있음
* one-hot형식의 DataFramedptj 빈발항목집합을 출력

In [37]:
pd.set_option('display.max.rows', 10)

In [35]:
# 빈발항목집합 생성
groceries_ap = apriori(groceries, min_support=0.01, use_colnames=True) # 최소 지지도 1%, 품목 이름 사용
groceries_ap.head(50)


Unnamed: 0,support,itemsets
0,0.033452,(UHT-milk)
1,0.017692,(baking powder)
2,0.052466,(beef)
3,0.033249,(berries)
4,0.026029,(beverages)
5,0.080529,(bottled beer)
6,0.110524,(bottled water)
7,0.06487,(brown bread)
8,0.055414,(butter)
9,0.027961,(butter milk)


In [39]:
# 연관규칙 파악
rules = association_rules(groceries_ap, metric='confidence', min_threshold=0.3) # metric을 신뢰도로, 최소값 0.3


In [40]:
# 향상도가 가장 높은 연관규칙 top10
top_10 = rules.sort_values('lift', ascending=False)[:10]

In [41]:
top_10

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
74,"(citrus fruit, other vegetables)",(root vegetables),0.028876,0.108998,0.010371,0.359155,3.295045,0.007224,1.390354,0.717225
96,"(tropical fruit, other vegetables)",(root vegetables),0.035892,0.108998,0.012303,0.342776,3.14478,0.008391,1.355705,0.707403
1,(beef),(root vegetables),0.052466,0.108998,0.017387,0.331395,3.040367,0.011668,1.332628,0.708251
73,"(citrus fruit, root vegetables)",(other vegetables),0.017692,0.193493,0.010371,0.586207,3.029608,0.006948,1.949059,0.68199
95,"(tropical fruit, root vegetables)",(other vegetables),0.021047,0.193493,0.012303,0.584541,3.020999,0.008231,1.941244,0.683367
98,"(whole milk, other vegetables)",(root vegetables),0.074835,0.108998,0.023183,0.309783,2.842082,0.015026,1.2909,0.700572
79,"(whole milk, curd)",(yogurt),0.026131,0.139502,0.010066,0.385214,2.761356,0.006421,1.399671,0.654974
91,"(root vegetables, rolls/buns)",(other vegetables),0.024301,0.193493,0.012201,0.502092,2.59489,0.007499,1.619792,0.629935
100,"(root vegetables, yogurt)",(other vegetables),0.025826,0.193493,0.012913,0.5,2.584078,0.007916,1.613015,0.629266
121,"(tropical fruit, whole milk)",(yogurt),0.042298,0.139502,0.01515,0.358173,2.567516,0.009249,1.340701,0.637483
