# Apriori 關鍵字組合


## 介紹

### 關聯規則 (Association Rule)

又稱 **購物籃分析**。由 Agrawal et al.(1993) 提出，從蒐集的龐大交易資料中，發掘隱含於商品間的關聯性，以瞭解消費者購買行為與產品銷售關係。

### 大賣場購買的交易紀錄

- 購買交易紀錄彙總：可以找出顧客消費行為間的特別模式
  | 訂單編號 | 商品（代碼） |
  | -------- | --------------------------------------------- |
  | 101 | 牛奶(A)、麵包(B)、餅乾(C)、柳橙汁(D) |
  | 102 | 麵包(B)、餅乾(C)、汽水(E)、泡麵(F) |
  | 103 | 牛奶(A)、餅乾(C)、水果(G) |
  | 104 | 牛奶(A)、麵包(B)、柳橙汁(D)、泡麵(F)、水果(G) |
  | 105 | 餅乾(C)、汽水(E)、水果(G) |
- 購物籃二元資料表
  | 訂單編號 | 牛奶(A) | 麵包(B) | 餅乾(C) | 柳橙汁(D) | 汽水(E) | 泡麵(F) | 水果(G) |
  | -------- | ------- | ------- | ------- | --------- | ------- | ------- | ------- |
  | 101 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
  | 102 | 0 | 1 | 1 | 0 | 1 | 1 | 0 |
  | 103 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
  | 104 | 1 | 1 | 0 | 1 | 0 | 1 | 1 |
  | 105 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |

### 各項指標

- 支持度(support)
  - 購買組合在整個樣本中出現的機率
  - `Support(X) = number(X) / number(AllSamples)`
- 信心度(confidence)
  - 當事件 X 發生的情況下，同時會發生 Y 的可能性
  - `Confidence(X→Y) = P(Y|X) ，= P(X∩Y) / P(X)`
- 增益(lift)
  - 當事情 X 發生的情況下，同時發生 Y 的機率/Y 發生的機率
  - 提升度反應了 X 與 Y 的關聯性，提升度>1 或越高表示越相關，提升度=1 表示為互相獨立，提升度<1 或越低代表負相關性越高
  - `Lift(X→Y) = Confidence(X→Y) / P(Y) = P(Y|X) / P(Y)`

### Confidence vs Lift

1. Confidence（信賴度）

   - 表示在條件 A 發生時，B 同時發生的機率。
   - 值域：0 ~ 1，越高表示規則越可靠。
   - 適合用在：需要找出高機率發生的規則時。
   - 優點：
     - 直觀且常用。
     - 適合初步過濾低價值規則。
   - 缺點：
     - 容易受「母體支持度高」的項目影響，可能挑出常見但不具洞察力的項目。

2. Lift（提升度）
   - 衡量 A → B 是否為真正有意義的關聯，而不是只是巧合。
   - 值域：
     - `= 1`：表示 A 與 B 無關
     - `> 1`：表示 A 增加了 B 發生的機率（正向關聯）
     - `< 1`：表示 A 反而降低了 B 發生的機率（負向關聯）
   - 優點：
     - 能有效過濾掉只是因為項目常出現而產生的「假關聯」。
     - 適合找出具洞察力的關聯規則。
   - 缺點：
     - 如果支援度很低，即使 lift 很高也可能是統計上不穩定的規則。

| 研究目的/狀況                                          | 建議使用的 metric |
| ------------------------------------------------------ | ----------------- |
| 想找出機率高、常見的購買關聯（例如：商品推薦）         | Confidence        |
| 想找出真正「有趣」或「不尋常」的關聯（例如：市場洞察） | Lift              |
| 追求精準規則 → 可先用 Confidence 篩選，再用 Lift 精選  | Confidence + Lift |


## 讀取資料和資料清洗


In [22]:
import pandas as pd

# 讀取運動內衣原始資料
ptt_data = pd.read_csv("data/PTT_運動內衣_所有資料.csv")

In [23]:
# 將重複與空白訊息去除
ptt_data = ptt_data.drop_duplicates()
ptt_data = ptt_data.dropna()

# 將「內文」與「所有留言」文字內容合併，創造一欄位 - 「所有文」
ptt_data["所有文"] = ptt_data["標題"] + ptt_data["內文"]

# 品牌名字統一
ptt_data["所有文"] = ptt_data["所有文"].str.replace("adidas", "Adidas")
ptt_data["所有文"] = ptt_data["所有文"].str.replace("ADIDAS", "Adidas")
ptt_data["所有文"] = ptt_data["所有文"].str.replace("addias", "Adidas")

In [24]:
# 挑選目的文章
ptt_data = ptt_data[ptt_data["所有文"].str.contains("Adidas")]

In [25]:
# 去除無意義字元，先進行無意義字元列表，可以自行新增
removeword = [
    "span",
    "class",
    "f3",
    "https",
    "imgur",
    "h1",
    "_   blank",
    "href",
    "rel",
    "nofollow",
    "target",
    "cdn",
    "cgi",
    "b4",
    "jpg",
    "hl",
    "b1",
    "f5",
    "f4",
    "goo.gl",
    "f2",
    "email",
    "map",
    "f1",
    "f6",
    "__cf___",
    "data",
    "bbshtml",
    "cf",
    "f0",
    "b2",
    "b3",
    "b5",
    "b6",
    "原文內容",
    "原文連結",
    "作者標題",
    "時間",
    "看板",
    "<",
    ">",
    "，",
    "。",
    "？",
    "—",
    "閒聊",
    "・",
    "/",
    " ",
    "=",
    '"',
    "\n",
    "」",
    "「",
    "！",
    "[",
    "]",
    "：",
    "‧",
    "╦",
    "╔",
    "╗",
    "║",
    "╠",
    "╬",
    "╬",
    ":",
    "╰",
    "╩",
    "╯",
    "╭",
    "╮",
    "│",
    "╪",
    "─",
    "《",
    "》",
    ".",
    "、",
    "（",
    "）",
    "　",
    "*",
    "※",
    "~",
    "○",
    "”",
    "“",
    "～",
    "@",
    "＋",
    "\r",
    "▁",
    ")",
    "(",
    "-",
    "═",
    "?",
    ",",
    "!",
    "…",
    "&",
    ";",
    "『",
    "』",
    "#",
    "＝",
    "＃",
    "\\",
    "\\n",
    '"',
    "的",
    "^",
    "︿",
    "＠",
    "$",
    "＄",
    "%",
    "％",
    "＆",
    "＊",
    "＿",
    "+",
    "'",
    "{",
    "}",
    "｛",
    "｝",
    "|",
    "｜",
    "．",
    "‵",
    "`",
    "；",
    "●",
    "§",
    "※",
    "○",
    "△",
    "▲",
    "◎",
    "☆",
    "★",
    "◇",
    "◆",
    "□",
    "■",
    "▽",
    "▼",
    "㊣",
    "↑",
    "↓",
    "←",
    "→",
    "↖",
    "XD",
    "XDD",
    "QQ",
    "【",
    "】",
    "Fw",
    "心得",
    "贈送",
]

for word in removeword:
    ptt_data["所有文"] = ptt_data["所有文"].str.replace(word, "")

## 所有文關鍵字萃取


In [26]:
import jieba as jieba

jieba.set_dictionary("dict/dict.txt.big")
jieba.load_userdict("dict/user_dict.txt")

Building prefix dict from /Volumes/Dev/nkust/nkust-homework/semester-6/marketing/05-voice/dict/dict.txt.big ...
Loading model from cache /var/folders/qj/62r8d09n5hn3nm_bdzf0dcpr0000gn/T/jieba.u7f356f34462f1b91de22574178031f11.cache
Loading model from cache /var/folders/qj/62r8d09n5hn3nm_bdzf0dcpr0000gn/T/jieba.u7f356f34462f1b91de22574178031f11.cache
Loading model cost 0.170 seconds.
Prefix dict has been built successfully.


In [40]:
import jieba.analyse

ptt_data = ptt_data.dropna(subset=["所有文"])

ptt_data["關鍵字"] = ptt_data["所有文"].apply(lambda x: list(jieba.cut(x)))
ptt_data["重要關鍵字"] = ptt_data["所有文"].apply(
    lambda x: jieba.analyse.extract_tags(str(x), topK=5)
)

ptt_data

Unnamed: 0.1,Unnamed: 0,看板,分類,貼文時間,所有文,標題,內文,留言,關鍵字,評價_推,評價_中立,評價_噓,重要關鍵字,重要關鍵字stopwords,關鍵字stopwords
2374,2381,basketball,配件,2017-09-02 04:34:54,配件球鞋推薦小弟最近想買新球鞋看中這兩雙1NIKEHYPERDUNK2016LOW2Adid...,[配件] 球鞋推薦,小弟最近想買新球鞋\n看中這兩雙\n1.NIKE HYPERDUNK 2016 LOW\n2...,"[{'type': '推', 'user': 'a52214', 'content': '正...","[配件, 球鞋, 推薦, 小弟, 最近, 想買, 新球, 鞋, 看, 中, 這, 兩雙, 1...",9,5,0,"[推薦, 想買, 新球, 兩雙, 1NIKEHYPERDUNK2016LOW2]","[推薦, 想買, 新球, 兩雙, 1NIKEHYPERDUNK2016LOW2]","[配件, 球鞋, 推薦, 小弟, 最近, 想買, 新球, 鞋, 中, 兩雙, 1NIKEHY..."
2401,2408,basketball,其他,2017-07-25 23:59:27,其他基本款NIKEAdidas球褲最近想要買球褲主要是鎖定NIKEAdidas基本款就是那種...,"[其他] 基本款的NIKE,adidas球褲","最近想要買球褲,\n\n主要是鎖定NIKE,ADIDAS基本款,\n\n就是那種底色黑白灰藍...","[{'type': '推', 'user': 'crawford438', 'content...","[其他, 基本, 款, NIKE, Adidas, 球褲, 最近, 想要, 買球, 褲, 主...",10,9,0,"[Adidas, NIKE, 球褲, 那種, 什麼]","[Adidas, NIKE, 球褲, 那種]","[基本, 款, NIKE, Adidas, 球褲, 最近, 想要, 買球, 褲, 主要, 鎖..."
2604,2612,basketball,配件,2016-11-27 21:29:37,配件冬天打籃球服飾各位好今天才注意到這裡可以討論配件真是太棒了想請問一下冬天籃球服飾問題是這...,[配件] 冬天打籃球服飾,各位好\n今天才注意到這裡可以討論配件\n真是太棒了\n\n想請問一下冬天籃球服飾問題\n\...,"[{'type': '→', 'user': 'AirFang', 'content': '...","[配件, 冬天, 打籃球, 服飾, 各位, 好, 今天, 才, 注意, 到, 這裡, 可以,...",6,8,0,"[打籃球, 袖緊, 身衣, 外套, 服飾]","[打籃球, 袖緊, 身衣, 外套, 服飾]","[配件, 冬天, 打籃球, 服飾, 今天, 才, 注意, 討論, 配件, 真, 太棒了, 想..."
2657,2665,basketball,傷害,2016-10-16 15:57:08,傷害物理治療師復健師建議台北前陣子扭傷腳踝基本上應該已經好差不多了這次腳踝大翻船後嚇到為了往...,[傷害] 物理治療師(復健師)建議?(台北),前陣子扭傷的腳踝，基本上應該已經好的差不多了。\n這次腳踝大翻船後嚇到，為了往後能夠更安全的...,"[{'type': '→', 'user': 'tmmtilc', 'content': '...","[傷害, 物理治療師, 復健, 師, 建議, 台北, 前陣子, 扭傷腳, 踝, 基本上, 應...",4,4,0,"[腳踝, 訓練, 復健, 護踝, 肌力]","[腳踝, 訓練, 復健, 護踝, 肌力]","[傷害, 物理治療師, 復健, 師, 建議, 台北, 前陣子, 扭傷腳, 踝, 基本上, 應..."
2662,2670,basketball,配件,2016-10-14 22:48:35,配件懇請有穿LBJ13代板友入內護踝問題是這樣小弟前陣子打籃球翻船現在已經好得差不多了打算買...,[配件] 懇請有穿LBJ-13代的板友入內(護踝問題),\n是這樣的，小弟前陣子打籃球翻船，現在已經好得差不多了。\n打算買個保護性更好的球鞋&am...,"[{'type': '推', 'user': 'iamlilt', 'content': '...","[配件, 懇請, 有, 穿, LBJ13, 代板, 友入, 內護踝, 問題, 是, 這樣, ...",2,4,0,"[LBJ13, 護踝, 這樣, Adidas, 候選]","[LBJ13, 護踝, Adidas, 候選]","[配件, 懇請, 穿, LBJ13, 代板, 友入, 內護踝, 問題, 小弟, 前陣子, 打..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15145,15288,Road_Running,問題,2018-10-13 10:34:48,問題AdidasAdizeroBoston7大概就如前一篇討論nikepegasus35一樣...,[問題] Adidas Adizero Boston 7,大概就如前一篇討論nike pegasus 35一樣\n\n想問一下，版上有人穿這雙鞋子跑嗎...,"[{'type': '推', 'user': 'kasim0607', 'content':...","[問題, Adidas, AdizeroBoston7, 大概, 就, 如前, 一篇, 討論...",29,19,0,"[Adidas, 這雙, Nike, 入手, 問題]","[Adidas, 這雙, Nike, 入手, 問題]","[問題, Adidas, AdizeroBoston7, 大概, 如前, 一篇, 討論, n..."
15156,15299,Road_Running,心得,2018-10-10 18:30:15,初戰世界六大馬第一場柏林awasonyahahablogspotcom201809d3htm...,[心得] 初戰世界六大馬第一場～柏林,"<a href=""https://wasonyahaha.blogspot.com/2018...","[{'type': '推', 'user': 'FCBXaVi5566', 'content...","[初戰, 世界, 六, 大馬, 第, 一場, 柏林, awasonyahahablogspo...",9,5,0,"[台灣, 其實, 合照, 還有, 一個]","[台灣, 其實, 合照, 一個]","[初戰, 世界, 大馬, 一場, 柏林, awasonyahahablogspotcom20..."
15360,15505,Road_Running,問題,2018-08-17 10:53:20,問題慢跑鞋腳型小弟最近愛上慢跑近來在選購慢跑鞋時發現miznoasics都有針對腳型去對慢跑...,[問題] 慢跑鞋腳型,小弟最近愛上慢跑，近來在選購慢跑鞋時，\n發現mizno、asics都有針對腳型，\n去對慢...,"[{'type': '推', 'user': 'XR125', 'content': '在這...","[問題, 慢跑鞋, 腳型, 小弟, 最近, 愛上, 慢跑, 近來, 在, 選購, 慢跑鞋, ...",8,4,0,"[慢跑鞋, 腳型, 大廠, 其他, 問題]","[慢跑鞋, 腳型, 大廠, 問題]","[問題, 慢跑鞋, 腳型, 小弟, 最近, 愛上, 慢跑, 近來, 選購, 慢跑鞋, 時, ..."
15425,15571,Road_Running,跑鞋,2018-07-26 23:53:16,跑鞋AdidasBoston7今年三月左右Adidas發表了Boston7同時也發了波馬紀念...,[跑鞋] Adidas Boston 7,\n今年三月左右，Adidas發表了Boston 7，\n同時也發了波馬紀念配色！\n\n<...,"[{'type': '推', 'user': 'n88526', 'content': '好...","[跑, 鞋, Adidas, Boston7, 今年, 三月, 左右, Adidas, 發表...",18,8,0,"[Adidas, Boston7, 一整, 發表, 同時]","[Adidas, Boston7, 一整, 發表]","[跑, 鞋, Adidas, Boston7, 今年, 三月, 左右, Adidas, 發表..."


In [28]:
# 去除ptt_data['關鍵字']中的stopwords
def stopwords(x):
    with open("dict/stopwords.txt", "r", encoding="utf-8-sig") as f:
        stops = f.read().split("\n")

    words = []
    for i in x:
        if i not in stops:
            words.append(i)  # 請參考上面"請找出兩字串不同的元素"那題的做法

    return words

In [29]:
ptt_data["重要關鍵字stopwords"] = ptt_data["重要關鍵字"].apply(
    stopwords
)  # 此欄與"重要關鍵字"一欄沒差


ptt_data["關鍵字stopwords"] = ptt_data["關鍵字"].apply(
    stopwords
)  # 此欄與"關鍵字"一欄差很多

ptt_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 76 entries, 2374 to 15429
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Unnamed: 0      76 non-null     int64 
 1   看板              76 non-null     object
 2   分類              76 non-null     object
 3   貼文時間            76 non-null     object
 4   所有文             76 non-null     object
 5   標題              76 non-null     object
 6   內文              76 non-null     object
 7   留言              76 non-null     object
 8   關鍵字             76 non-null     object
 9   評價_推            76 non-null     int64 
 10  評價_中立           76 non-null     int64 
 11  評價_噓            76 non-null     int64 
 12  重要關鍵字           76 non-null     object
 13  重要關鍵字stopwords  76 non-null     object
 14  關鍵字stopwords    76 non-null     object
dtypes: int64(4), object(11)
memory usage: 9.5+ KB


## Apriori 關聯分析


In [30]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

In [31]:
# 將重要關鍵字放入做apriori關聯分析-
records = []
for i in range(len(ptt_data)):
    records.append(ptt_data["重要關鍵字stopwords"].iloc[i])

In [32]:
# TransactionEncoder進行one_hot編碼。
te = TransactionEncoder()
te_ary = te.fit(records).transform(records)
df_trans = pd.DataFrame(te_ary, columns=te.columns_)  # array轉DataFrame

In [33]:
# 利用apirori找出頻繁集frequent_itemsets
frequent_itemsets = apriori(df_trans, min_support=0.01, use_colnames=True)

frequent_itemsets

Unnamed: 0,support,itemsets
0,0.026316,(101)
1,0.013158,(10k)
2,0.013158,(1300)
3,0.013158,(17158)
4,0.013158,(181)
...,...,...
1865,0.013158,"(袖緊, 打籃球, 外套, 服飾, 身衣)"
1866,0.013158,"(大阪, 比賽, 廁所, 這次, 訓練)"
1867,0.013158,"(護踝, 肌力, 復健, 腳踝, 訓練)"
1868,0.013158,"(臂套, 指紋, 手機, 解鎖, 跑步)"


### 計算關聯規則


In [34]:
from mlxtend.frequent_patterns import association_rules

In [35]:
# 使用預設的度量標準和閾值來生成關聯規則。預設的度量標準通常是 "confidence"（可信度），而預設的最小閾值為 0.8。
rules1 = association_rules(frequent_itemsets)

rules1

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(101),(Adidas),0.026316,0.355263,0.026316,1.0,2.814815,1.0,0.016967,inf,0.662162,0.074074,1.0,0.537037
1,(世仇),(101),0.013158,0.026316,0.013158,1.0,38.000000,1.0,0.012812,inf,0.986667,0.500000,1.0,0.750000
2,(報名),(101),0.013158,0.026316,0.013158,1.0,38.000000,1.0,0.012812,inf,0.986667,0.500000,1.0,0.750000
3,(101),(比賽),0.026316,0.039474,0.026316,1.0,25.333333,1.0,0.025277,inf,0.986486,0.666667,1.0,0.833333
4,(球衣),(101),0.013158,0.026316,0.013158,1.0,38.000000,1.0,0.012812,inf,0.986667,0.500000,1.0,0.750000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8637,"(挑色, 猶豫)","(挑選, 連結, 號物品)",0.013158,0.013158,0.013158,1.0,76.000000,1.0,0.012985,inf,1.000000,1.000000,1.0,1.000000
8638,"(連結, 號物品)","(挑選, 挑色, 猶豫)",0.013158,0.013158,0.013158,1.0,76.000000,1.0,0.012985,inf,1.000000,1.000000,1.0,1.000000
8639,"(號物品, 猶豫)","(挑選, 挑色, 連結)",0.013158,0.013158,0.013158,1.0,76.000000,1.0,0.012985,inf,1.000000,1.000000,1.0,1.000000
8640,(挑色),"(挑選, 連結, 號物品, 猶豫)",0.013158,0.013158,0.013158,1.0,76.000000,1.0,0.012985,inf,1.000000,1.000000,1.0,1.000000


In [36]:
rules2 = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.8)

rules2

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(101),(Adidas),0.026316,0.355263,0.026316,1.0,2.814815,1.0,0.016967,inf,0.662162,0.074074,1.0,0.537037
1,(世仇),(101),0.013158,0.026316,0.013158,1.0,38.000000,1.0,0.012812,inf,0.986667,0.500000,1.0,0.750000
2,(報名),(101),0.013158,0.026316,0.013158,1.0,38.000000,1.0,0.012812,inf,0.986667,0.500000,1.0,0.750000
3,(101),(比賽),0.026316,0.039474,0.026316,1.0,25.333333,1.0,0.025277,inf,0.986486,0.666667,1.0,0.833333
4,(球衣),(101),0.013158,0.026316,0.013158,1.0,38.000000,1.0,0.012812,inf,0.986667,0.500000,1.0,0.750000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8637,"(挑色, 猶豫)","(挑選, 連結, 號物品)",0.013158,0.013158,0.013158,1.0,76.000000,1.0,0.012985,inf,1.000000,1.000000,1.0,1.000000
8638,"(連結, 號物品)","(挑選, 挑色, 猶豫)",0.013158,0.013158,0.013158,1.0,76.000000,1.0,0.012985,inf,1.000000,1.000000,1.0,1.000000
8639,"(號物品, 猶豫)","(挑選, 挑色, 連結)",0.013158,0.013158,0.013158,1.0,76.000000,1.0,0.012985,inf,1.000000,1.000000,1.0,1.000000
8640,(挑色),"(挑選, 連結, 號物品, 猶豫)",0.013158,0.013158,0.013158,1.0,76.000000,1.0,0.012985,inf,1.000000,1.000000,1.0,1.000000


In [37]:
# 使用 "lift"（提升度）作為度量標準來生成關聯規則。並且，規則的提升度必須至少達到 1.000000001 才會被視為有效。
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.000000001)

rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(101),(Adidas),0.026316,0.355263,0.026316,1.000000,2.814815,1.0,0.016967,inf,0.662162,0.074074,1.000000,0.537037
1,(Adidas),(101),0.355263,0.026316,0.026316,0.074074,2.814815,1.0,0.016967,1.051579,1.000000,0.074074,0.049049,0.537037
2,(世仇),(101),0.013158,0.026316,0.013158,1.000000,38.000000,1.0,0.012812,inf,0.986667,0.500000,1.000000,0.750000
3,(101),(世仇),0.026316,0.013158,0.013158,0.500000,38.000000,1.0,0.012812,1.973684,1.000000,0.500000,0.493333,0.750000
4,(101),(台北),0.026316,0.052632,0.013158,0.500000,9.500000,1.0,0.011773,1.894737,0.918919,0.200000,0.472222,0.375000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10931,(挑選),"(挑色, 連結, 號物品, 猶豫)",0.026316,0.013158,0.013158,0.500000,38.000000,1.0,0.012812,1.973684,1.000000,0.500000,0.493333,0.750000
10932,(挑色),"(挑選, 連結, 號物品, 猶豫)",0.013158,0.013158,0.013158,1.000000,76.000000,1.0,0.012985,inf,1.000000,1.000000,1.000000,1.000000
10933,(連結),"(挑選, 挑色, 號物品, 猶豫)",0.026316,0.013158,0.013158,0.500000,38.000000,1.0,0.012812,1.973684,1.000000,0.500000,0.493333,0.750000
10934,(號物品),"(挑選, 挑色, 連結, 猶豫)",0.013158,0.013158,0.013158,1.000000,76.000000,1.0,0.012985,inf,1.000000,1.000000,1.000000,1.000000


## 問題


### 如何在 rules 挑選出 antecedents == 'Adidas', confidence>=0.1 以及 "lift">1.000000001 的資料？


In [38]:
import pandas as pd

# 選出antecedents為'Adidas'、confidence>=0.1且lift>1.000000001的規則
filtered_rules = rules[
    (rules["antecedents"] == frozenset(["Adidas"]))
    & (rules["confidence"] >= 0.1)
    & (rules["lift"] > 1.000000001)
]

filtered_rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
142,(Adidas),(台北),0.355263,0.052632,0.039474,0.111111,2.111111,1.0,0.020776,1.065789,0.816327,0.107143,0.061728,0.430556
212,(Adidas),(這雙),0.355263,0.065789,0.039474,0.111111,1.688889,1.0,0.016101,1.050987,0.632653,0.103448,0.048513,0.355556


### 計算某品牌(Adidas)與哪些字詞常一起出現, confidence 值前五名


In [39]:
adidas_rules = rules[rules["antecedents"] == frozenset(["Adidas"])]

# 按照 confidence 值排序，並選取前五名
top_adidas_rules = adidas_rules.sort_values(by="confidence", ascending=False).head(5)

top_adidas_rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
212,(Adidas),(這雙),0.355263,0.065789,0.039474,0.111111,1.688889,1.0,0.016101,1.050987,0.632653,0.103448,0.048513,0.355556
142,(Adidas),(台北),0.355263,0.052632,0.039474,0.111111,2.111111,1.0,0.020776,1.065789,0.816327,0.107143,0.061728,0.430556
1,(Adidas),(101),0.355263,0.026316,0.026316,0.074074,2.814815,1.0,0.016967,1.051579,1.0,0.074074,0.049049,0.537037
111,(Adidas),(runner),0.355263,0.026316,0.026316,0.074074,2.814815,1.0,0.016967,1.051579,1.0,0.074074,0.049049,0.537037
1749,(Adidas),"(Nike, 問題)",0.355263,0.026316,0.026316,0.074074,2.814815,1.0,0.016967,1.051579,1.0,0.074074,0.049049,0.537037


### 在程式的最後進行篩選，找出包含「Adidas」和「101」同時出現的文章，並存檔為 adidas_and_101_articles.csv


In [46]:
# 篩選同時包含「Adidas」和「101」的文章
adidas_and_101 = ptt_data[
    ptt_data["所有文"].str.contains("Adidas") & ptt_data["所有文"].str.contains("101")
]

# 存檔
adidas_and_101.to_csv("adidas_and_101_articles.csv", index=False)

### 在程式的最後進行篩選，找出包含「Adidas」和「台北」同時出現的文章，並存檔為 adidas_and_taipei_articles.csv


In [47]:
# 篩選同時包含「Adidas」和「台北」的文章
adidas_and_taipei = ptt_data[
    ptt_data["所有文"].str.contains("Adidas") & ptt_data["所有文"].str.contains("台北")
]

# 存檔
adidas_and_taipei.to_csv("adidas_and_taipei_articles.csv", index=False)