<img src="splitApplyCombine.png" width="400">

Pandas 的 `groupby` 功能是一種非常強大的數據分析工具，它允許您將數據集分成多個組，然後對每個組進行獨立的計算或應用函數。這種方法的靈感來自於 SQL 的 GROUP BY 語句，但在 Pandas 中，它提供了更高的靈活性和強大的功能。

### 功能
`groupby` 操作通常涉及以下一個或多個步驟：
- **分割（Splitting）**：根據一個或多個鍵將數據分割成多組。
- **應用（Applying）**：對每個組獨立應用函數，進行計算或執行操作（如聚合、轉換、過濾）。
- **組合（Combining）**：將處理後的結果組合回一個數據結構中。

In [50]:
import pandas as pd

# 創建一個簡單的 DataFrame
df = pd.DataFrame({
    'key': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'data': [0, 5, 10, 5, 10, 15, 10, 15, 20]
})

print(df)
# 使用 groupby 進行分組並計算每組的平均值
group_sum = df.groupby('key').sum()
print(group_sum)
group_mean = df.groupby('key').mean()
print(group_mean)

  key  data
0   A     0
1   B     5
2   C    10
3   A     5
4   B    10
5   C    15
6   A    10
7   B    15
8   C    20
     data
key      
A      15
B      30
C      45
     data
key      
A     5.0
B    10.0
C    15.0


In [51]:
import yfinance as yf

In [54]:
tsmc = yf.download("2330.tw", start="2023-01-01", end="2023-12-31")

[*********************100%%**********************]  1 of 1 completed


In [57]:
tsmc.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-01-03,446.0,453.5,443.0,453.0,443.656769,14885824
2023-01-04,449.5,455.0,448.5,449.5,440.228943,19188422
2023-01-05,459.0,459.5,455.0,458.5,449.043335,23549581
2023-01-06,455.0,459.5,455.0,458.5,449.043335,20886011
2023-01-09,468.0,481.0,467.5,481.0,471.079224,46666263


In [58]:
tsmc.Close - tsmc.Close.shift()

Date
2023-01-03     NaN
2023-01-04    -3.5
2023-01-05     9.0
2023-01-06     0.0
2023-01-09    22.5
              ... 
2023-12-25    -1.0
2023-12-26     5.0
2023-12-27     6.0
2023-12-28     1.0
2023-12-29     0.0
Name: Close, Length: 239, dtype: float64

In [69]:
tsmc["change_delta"] = tsmc.Close.diff()

In [120]:
tsmc["change"] = tsmc.Close.diff().apply(lambda x: "漲" if x > 0 else ("跌" if x < 0 else "平"))

In [67]:
tsmc.groupby("change")["Close"].count()

change
平     17
漲    114
跌    108
Name: Close, dtype: int64

In [72]:
tsmc.groupby("change")[["Close", "change_delta"]].describe()

Unnamed: 0_level_0,Close,Close,Close,Close,Close,Close,Close,Close,change_delta,change_delta,change_delta,change_delta,change_delta,change_delta,change_delta,change_delta
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
change,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
平,17.0,555.382353,41.592341,453.0,544.0,565.0,585.0,593.0,16.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
漲,114.0,546.144737,28.704839,458.5,525.75,548.0,569.75,593.0,114.0,6.403509,5.802379,1.0,2.25,5.0,9.0,40.0
跌,108.0,538.731481,27.734592,449.5,519.0,537.5,562.0,590.0,108.0,-5.462963,4.434204,-21.0,-7.0,-4.0,-2.0,-0.5


紅K（或綠K）和黑K在股票市場中表示的是單日的價格變動，而「上漲」與「下跌」則通常是指相對於前一交易日收盤價的變化。這兩者之間確實存在關聯性，但它們並非完全相同。下面來詳細探討它們之間的關係：

### 單日價格行為（紅K/黑K）與日對日變化（上漲/下跌）的關聯：

1. **紅K與上漲**：
   - 如果一個交易日形成了一根紅K，這意味著該日的收盤價高於開盤價，顯示出單日內的價格上漲。
   - 如果連續幾天都是紅K，且每天的收盤價都高於前一日的收盤價，則這些紅K也反映了股票的日對日上漲趨勢。

2. **黑K與下跌**：
   - 一根黑K表示該交易日的收盤價低於開盤價，顯示出單日內的價格下跌。
   - 如果連續幾天形成黑K，且每天的收盤價都低於前一天的收盤價，這也表示股票呈現日對日的下跌趨勢。

### 關聯性：

- **單日動態與趨勢分析**：紅K和黑K提供了市場情緒和單日價格動態的直觀反映。而「上漲」與「下跌」則提供了相對於前一交易日的變化趨勢，有助於分析股票的短期走勢。
- **市場情緒指示**：一系列的紅K可能預示著積極的市場情緒和潛在的上漲趨勢，而一系列的黑K則可能表示市場情緒消極，預示著下跌趨勢。
- **價格分析的輔助工具**：紅K和黑K可以作為判斷市場短期內是否存在上漲或下跌動力的輔助工具，而日對日的上漲或下跌則有助於確定中長期趨勢。

總的來說，單日內的紅K或黑K與相對於前一日的上漲或下跌之間存在著密切的關聯性。紅K或黑K提供了對單日市場動態的洞察，而連續的紅K或黑K與日對日的價格變化結合起來，則可以揭示更加廣泛的市場趨勢。

In [74]:
tsmc["K_delta"] = tsmc.Close - tsmc.Open

In [76]:
tsmc["K_status"] = tsmc.K_delta.apply(lambda x: "紅K" if x >= 0 else "黑K")

In [80]:
tsmc.groupby("K_status").K_delta.mean()

K_status
紅K    2.829545
黑K   -3.841121
Name: K_delta, dtype: float64

In [81]:
tsmc.groupby("K_status").K_delta.describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
K_status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
紅K,132.0,2.829545,3.09631,0.0,1.0,2.0,4.0,18.0
黑K,107.0,-3.841121,2.80212,-15.0,-5.5,-3.0,-2.0,-0.5


In [85]:
tsmc.groupby(["change", "K_status"])[["change_delta", "K_delta"]].describe()

Unnamed: 0_level_0,Unnamed: 1_level_0,change_delta,change_delta,change_delta,change_delta,change_delta,change_delta,change_delta,change_delta,K_delta,K_delta,K_delta,K_delta,K_delta,K_delta,K_delta,K_delta
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
change,K_status,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
平,紅K,12.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,13.0,2.576923,2.27162,0.0,1.0,2.0,4.0,7.0
平,黑K,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,-2.5,0.57735,-3.0,-3.0,-2.5,-2.0,-2.0
漲,紅K,86.0,6.773256,5.987564,1.0,3.0,5.0,9.0,40.0,86.0,3.593023,3.394209,0.0,1.0,3.0,5.75,18.0
漲,黑K,28.0,5.267857,5.12525,1.0,2.0,4.0,7.25,23.0,28.0,-3.017857,1.897767,-8.0,-4.0,-3.0,-1.75,-0.5
跌,紅K,33.0,-4.409091,3.81087,-19.0,-6.0,-3.0,-2.0,-1.0,33.0,0.939394,1.197377,0.0,0.0,1.0,1.0,5.0
跌,黑K,75.0,-5.926667,4.630023,-21.0,-8.0,-5.0,-2.5,-0.5,75.0,-4.22,3.06929,-15.0,-6.0,-4.0,-1.75,-1.0


In [97]:
tsmc.groupby(tsmc.index.month)[["change_delta", "K_delta"]].mean()

Unnamed: 0_level_0,change_delta,K_delta
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
1,5.75,-0.269231
2,-0.611111,-0.722222
3,0.956522,0.73913
4,-1.823529,-2.088235
5,2.545455,-0.840909
6,0.9,0.6
7,-0.52381,-1.095238
8,-0.727273,-1.136364
9,-1.3,0.7
10,0.3,0.6


In [106]:
tsmc.resample("2W")[["change_delta", "K_delta"]].mean()

Unnamed: 0_level_0,change_delta,K_delta
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-01-08,1.833333,2.5
2023-01-22,6.357143,0.071429
2023-02-05,7.8,-2.4
2023-02-19,-2.4,-1.3
2023-03-05,-0.25,1.0
2023-03-19,0.2,-0.1
2023-04-02,1.5,0.8
2023-04-16,-2.428571,-2.0
2023-04-30,-1.4,-2.15
2023-05-14,-0.666667,-1.333333


In [112]:
import numpy as np
tsmc.resample("2W")[["change_delta", "K_delta"]].agg([max, np.mean, sum, np.std])

Unnamed: 0_level_0,change_delta,change_delta,change_delta,change_delta,K_delta,K_delta,K_delta,K_delta
Unnamed: 0_level_1,max,mean,sum,std,max,mean,sum,std
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
2023-01-08,9.0,1.833333,5.5,6.448514,7.0,2.5,10.0,3.488075
2023-01-22,22.5,6.357143,44.5,8.811518,13.0,0.071429,0.5,6.153783
2023-02-05,40.0,7.8,39.0,21.822007,2.0,-2.4,-12.0,7.231874
2023-02-19,17.0,-2.4,-24.0,10.864826,2.0,-1.3,-13.0,2.830391
2023-03-05,11.0,-0.25,-2.0,7.478541,18.0,1.0,8.0,9.227289
2023-03-19,13.0,0.2,2.0,6.460134,3.0,-0.1,-1.0,2.424413
2023-04-02,16.0,1.5,15.0,7.261007,10.0,0.8,8.0,5.287301
2023-04-16,6.0,-2.428571,-17.0,4.995236,2.0,-2.0,-14.0,2.645751
2023-04-30,8.5,-1.4,-14.0,5.526703,3.5,-2.15,-21.5,4.661008
2023-05-14,6.0,-0.666667,-6.0,4.41588,3.0,-1.333333,-12.0,3.427827


In [124]:
df2 = tsmc.groupby("change")[["change_delta", "K_delta"]]

In [None]:
df2.transform

In [122]:
tsmc.groupby("change")[["change_delta", "K_delta"]].mean()

Unnamed: 0_level_0,change_delta,K_delta
change,Unnamed: 1_level_1,Unnamed: 2_level_1
平,0.0,1.382353
漲,6.403509,1.969298
跌,-5.462963,-2.643519


In [123]:
tsmc.groupby("change")[["change_delta", "K_delta"]].std()

Unnamed: 0_level_0,change_delta,K_delta
change,Unnamed: 1_level_1,Unnamed: 2_level_1
平,0.0,2.976625
漲,5.802379,4.206656
跌,4.434204,3.555995
