# 政策銘柄バックテスト試作

Grok銘柄と同じ粒度で政策銘柄をバックテストする。

## 目的
- meta.parquetから政策銘柄を抽出
- tags（政策の種類）ごとのパフォーマンスを分析
- phase1, phase2, phase3_1pct/2pct/3pct の各種統計を算出
- `political_trending_archive.parquet` を作成

In [3]:
import pandas as pd
import numpy as np
from pathlib import Path
from datetime import datetime, timedelta

# パス設定
DATA_DIR = Path('../data/parquet')
META_PATH = DATA_DIR / 'meta.parquet'
GROK_ARCHIVE_PATH = DATA_DIR / 'backtest' / 'grok_trending_archive.parquet'

print(f"meta.parquet: {META_PATH.exists()}")
print(f"grok_trending_archive.parquet: {GROK_ARCHIVE_PATH.exists()}")

meta.parquet: True
grok_trending_archive.parquet: True


## 1. 政策銘柄を抽出

In [4]:
# meta.parquetから政策銘柄を抽出
meta_df = pd.read_parquet(META_PATH)
political_stocks = meta_df[meta_df['categories'].apply(lambda x: any('政策銘柄' == cat for cat in x) if x is not None else False)].copy()

print(f"政策銘柄数: {len(political_stocks)}")
print(f"\ntags分布:")
tags_exploded = political_stocks.explode('tags')['tags'].value_counts()
print(tags_exploded)

political_stocks.head()

政策銘柄数: 34

tags分布:
tags
インフラ・建設      15
経済安全保障       14
半導体・先端技術     12
エネルギー安全保障    12
地方創生・規制改革     8
防衛・安全保障       7
デジタル・AI       6
Name: count, dtype: int64


Unnamed: 0,ticker,code,stock_name,market,sectors,series,topixnewindexseries,categories,tags
2,4063.T,4063,信越化学工業,プライム,化学,素材・化学,TOPIX Core30,"[TOPIX_CORE30, 政策銘柄]","[半導体・先端技術, 経済安全保障]"
7,6501.T,6501,日立製作所,プライム,電気機器,電機・精密,TOPIX Core30,"[TOPIX_CORE30, 政策銘柄]","[インフラ・建設, 半導体・先端技術, デジタル・AI, エネルギー安全保障]"
8,6503.T,6503,三菱電機,プライム,電気機器,電機・精密,TOPIX Core30,"[TOPIX_CORE30, 政策銘柄]","[防衛・安全保障, 半導体・先端技術, インフラ・建設]"
9,6758.T,6758,ソニーグループ,プライム,電気機器,電機・精密,TOPIX Core30,"[TOPIX_CORE30, 政策銘柄]","[半導体・先端技術, デジタル・AI]"
12,7011.T,7011,三菱重工業,プライム,機械,機械,TOPIX Core30,"[TOPIX_CORE30, 政策銘柄]","[防衛・安全保障, エネルギー安全保障, インフラ・建設]"


## 2. Grokバックテストの期間を確認

In [5]:
# Grokバックテストの期間を確認
grok_archive = pd.read_parquet(GROK_ARCHIVE_PATH)
grok_archive['selection_date'] = pd.to_datetime(grok_archive['selection_date'])
grok_archive['backtest_date'] = pd.to_datetime(grok_archive['backtest_date'])

print(f"Grokバックテスト期間: {grok_archive['selection_date'].min()} 〜 {grok_archive['selection_date'].max()}")
print(f"総銘柄数: {len(grok_archive)}")
print(f"総日数: {grok_archive['selection_date'].nunique()}")

# selection_dateのリスト
selection_dates = sorted(grok_archive['selection_date'].unique())
print(f"\nselection_dates (最初5件): {selection_dates[:5]}")

Grokバックテスト期間: 2025-10-28 00:00:00 〜 2025-11-06 00:00:00
総銘柄数: 69
総日数: 6

selection_dates (最初5件): [Timestamp('2025-10-28 00:00:00'), Timestamp('2025-10-29 00:00:00'), Timestamp('2025-10-30 00:00:00'), Timestamp('2025-11-04 00:00:00'), Timestamp('2025-11-05 00:00:00')]


## 3. サンプル: 1日分の政策銘柄バックテスト

まずは1日分（例: 2025-10-30）で試してみる。

In [6]:
import yfinance as yf

# テスト用: 1日分
test_selection_date = pd.Timestamp('2025-10-30')
test_backtest_date = test_selection_date  # Grokと同じく selection_date = backtest_date

print(f"テスト日: selection_date={test_selection_date.date()}, backtest_date={test_backtest_date.date()}")

# 政策銘柄リスト
political_tickers = political_stocks['ticker'].tolist()
print(f"政策銘柄数: {len(political_tickers)}")

テスト日: selection_date=2025-10-30, backtest_date=2025-10-30
政策銘柄数: 34


In [7]:
# yfinanceで株価データ取得（5分足 & 日足）
# backtest_dateの1日分のデータを取得

# 日足データ（寄付・引け確認）
start_date = (test_backtest_date - timedelta(days=5)).strftime('%Y-%m-%d')
end_date = (test_backtest_date + timedelta(days=1)).strftime('%Y-%m-%d')

# サンプル: 最初の5銘柄で試す
sample_tickers = political_tickers[:5]
print(f"サンプル銘柄: {sample_tickers}")

daily_data = {}
for ticker in sample_tickers:
    try:
        df = yf.download(ticker, start=start_date, end=end_date, interval='1d', progress=False)
        if not df.empty:
            daily_data[ticker] = df
    except Exception as e:
        print(f"Error {ticker}: {e}")

print(f"取得成功: {len(daily_data)}銘柄")

# 確認
if daily_data:
    first_ticker = list(daily_data.keys())[0]
    print(f"\n{first_ticker} のデータ:")
    print(daily_data[first_ticker])

サンプル銘柄: ['4063.T', '6501.T', '6503.T', '6758.T', '7011.T']


  df = yf.download(ticker, start=start_date, end=end_date, interval='1d', progress=False)
  df = yf.download(ticker, start=start_date, end=end_date, interval='1d', progress=False)
  df = yf.download(ticker, start=start_date, end=end_date, interval='1d', progress=False)
  df = yf.download(ticker, start=start_date, end=end_date, interval='1d', progress=False)
  df = yf.download(ticker, start=start_date, end=end_date, interval='1d', progress=False)


取得成功: 5銘柄

4063.T のデータ:
Price        Close    High     Low    Open    Volume
Ticker      4063.T  4063.T  4063.T  4063.T    4063.T
Date                                                
2025-10-27  4725.0  4932.0  4725.0  4835.0  13987100
2025-10-28  4672.0  4764.0  4668.0  4734.0   8112400
2025-10-29  4629.0  4695.0  4623.0  4673.0   6846900
2025-10-30  4599.0  4633.0  4562.0  4620.0   8644800


In [8]:
# 5分足データ取得（morning_high/low, daily_max_gain/drawdown計算用）
intraday_data = {}
for ticker in sample_tickers:
    try:
        df = yf.download(ticker, start=test_backtest_date.strftime('%Y-%m-%d'), 
                         end=(test_backtest_date + timedelta(days=1)).strftime('%Y-%m-%d'), 
                         interval='5m', progress=False)
        if not df.empty:
            intraday_data[ticker] = df
    except Exception as e:
        print(f"Error {ticker}: {e}")

print(f"5分足取得成功: {len(intraday_data)}銘柄")

if intraday_data:
    first_ticker = list(intraday_data.keys())[0]
    print(f"\n{first_ticker} の5分足データ:")
    print(intraday_data[first_ticker].head())

  df = yf.download(ticker, start=test_backtest_date.strftime('%Y-%m-%d'),
  df = yf.download(ticker, start=test_backtest_date.strftime('%Y-%m-%d'),
  df = yf.download(ticker, start=test_backtest_date.strftime('%Y-%m-%d'),
  df = yf.download(ticker, start=test_backtest_date.strftime('%Y-%m-%d'),
  df = yf.download(ticker, start=test_backtest_date.strftime('%Y-%m-%d'),


5分足取得成功: 5銘柄

4063.T の5分足データ:
Price                       Close    High     Low    Open  Volume
Ticker                     4063.T  4063.T  4063.T  4063.T  4063.T
Datetime                                                         
2025-10-30 00:05:00+00:00  4589.0  4612.0  4585.0  4597.0  154700
2025-10-30 00:10:00+00:00  4592.0  4599.0  4585.0  4588.0  103900
2025-10-30 00:15:00+00:00  4602.0  4606.0  4592.0  4593.0  121700
2025-10-30 00:20:00+00:00  4607.0  4608.0  4599.0  4601.0   82200
2025-10-30 00:25:00+00:00  4608.0  4614.0  4603.0  4608.0   89400


## 4. バックテスト処理（Grokと同じロジック）

既存の `scripts/pipeline/save_backtest_to_archive.py` のロジックを参考に実装

In [9]:
# バックテスト用関数（簡易版）
def calculate_backtest_stats(ticker, selection_date, backtest_date, daily_df, intraday_df, meta_row):
    """
    1銘柄のバックテスト統計を計算
    
    Returns:
        dict: バックテスト結果
    """
    result = {
        'selection_date': selection_date.strftime('%Y-%m-%d'),
        'backtest_date': backtest_date.strftime('%Y-%m-%d'),
        'ticker': ticker,
        'company_name': meta_row['stock_name'],
        'category': '政策銘柄',
        'tags': ', '.join(meta_row['tags']) if isinstance(meta_row['tags'], list) else '',
    }
    
    # backtest_date の日足データ
    target_day = daily_df[daily_df.index.date == backtest_date.date()]
    if target_day.empty:
        return None
    
    open_price = target_day['Open'].iloc[0]
    close_price = target_day['Close'].iloc[0]
    high_price = target_day['High'].iloc[0]
    low_price = target_day['Low'].iloc[0]
    volume = target_day['Volume'].iloc[0]
    
    result['buy_price'] = open_price
    result['daily_close'] = close_price
    result['high'] = high_price
    result['low'] = low_price
    result['volume'] = int(volume)
    
    # Phase1: 前場引け（11:30）
    if intraday_df is not None and not intraday_df.empty:
        morning_data = intraday_df[(intraday_df.index.time >= pd.Timestamp('09:00').time()) & 
                                   (intraday_df.index.time <= pd.Timestamp('11:30').time())]
        if not morning_data.empty:
            morning_close = morning_data['Close'].iloc[-1]
            result['sell_price_phase1'] = morning_close
            result['phase1_return'] = (morning_close - open_price) / open_price
            result['phase1_win'] = result['phase1_return'] > 0
            result['profit_per_100_shares_phase1'] = (morning_close - open_price) * 100
            
            # morning max/min
            result['morning_high'] = morning_data['High'].max()
            result['morning_low'] = morning_data['Low'].min()
            result['morning_max_gain_pct'] = (result['morning_high'] - open_price) / open_price * 100
            result['morning_max_drawdown_pct'] = (result['morning_low'] - open_price) / open_price * 100
        else:
            result['sell_price_phase1'] = None
            result['phase1_return'] = None
            result['phase1_win'] = None
            result['profit_per_100_shares_phase1'] = None
            result['morning_high'] = None
            result['morning_low'] = None
            result['morning_max_gain_pct'] = None
            result['morning_max_drawdown_pct'] = None
    else:
        result['sell_price_phase1'] = None
        result['phase1_return'] = None
        result['phase1_win'] = None
        result['profit_per_100_shares_phase1'] = None
        result['morning_high'] = None
        result['morning_low'] = None
        result['morning_max_gain_pct'] = None
        result['morning_max_drawdown_pct'] = None
    
    # Phase2: 大引け（15:30）
    result['sell_price'] = close_price
    result['phase2_return'] = (close_price - open_price) / open_price
    result['phase2_win'] = result['phase2_return'] > 0
    result['profit_per_100_shares_phase2'] = (close_price - open_price) * 100
    
    # daily max/min
    result['daily_max_gain_pct'] = (high_price - open_price) / open_price * 100
    result['daily_max_drawdown_pct'] = (low_price - open_price) / open_price * 100
    
    # Phase3: ±1%/2%/3% （簡易版: 5分足データから計算）
    for pct in [1, 2, 3]:
        profit_threshold = open_price * (1 + pct/100)
        loss_threshold = open_price * (1 - pct/100)
        
        if intraday_df is not None and not intraday_df.empty:
            # 利確・損切判定
            profit_hit = intraday_df[intraday_df['High'] >= profit_threshold]
            loss_hit = intraday_df[intraday_df['Low'] <= loss_threshold]
            
            if not profit_hit.empty and not loss_hit.empty:
                # どちらが先か
                if profit_hit.index[0] < loss_hit.index[0]:
                    exit_price = profit_threshold
                    exit_reason = f'profit_take_{pct}.0%'
                else:
                    exit_price = loss_threshold
                    exit_reason = f'stop_loss_-{pct}.0%'
            elif not profit_hit.empty:
                exit_price = profit_threshold
                exit_reason = f'profit_take_{pct}.0%'
            elif not loss_hit.empty:
                exit_price = loss_threshold
                exit_reason = f'stop_loss_-{pct}.0%'
            else:
                exit_price = close_price
                exit_reason = 'eod_close'
        else:
            exit_price = close_price
            exit_reason = 'eod_close'
        
        phase_return = (exit_price - open_price) / open_price
        result[f'phase3_{pct}pct_return'] = phase_return
        result[f'phase3_{pct}pct_win'] = phase_return > 0
        result[f'phase3_{pct}pct_exit_reason'] = exit_reason
        result[f'profit_per_100_shares_phase3_{pct}pct'] = (exit_price - open_price) * 100
    
    return result

print("バックテスト関数定義完了")

バックテスト関数定義完了


In [10]:
# サンプル実行
results = []
for ticker in sample_tickers:
    if ticker not in daily_data:
        continue
    
    meta_row = political_stocks[political_stocks['ticker'] == ticker].iloc[0]
    daily_df = daily_data[ticker]
    intraday_df = intraday_data.get(ticker, None)
    
    result = calculate_backtest_stats(ticker, test_selection_date, test_backtest_date, 
                                       daily_df, intraday_df, meta_row)
    if result:
        results.append(result)

# DataFrame化
results_df = pd.DataFrame(results)
print(f"バックテスト結果: {len(results_df)}銘柄")
results_df

バックテスト結果: 5銘柄


  result['volume'] = int(volume)
  result['volume'] = int(volume)
  result['volume'] = int(volume)
  result['volume'] = int(volume)
  result['volume'] = int(volume)


Unnamed: 0,selection_date,backtest_date,ticker,company_name,category,tags,buy_price,daily_close,high,low,...,phase3_1pct_exit_reason,profit_per_100_shares_phase3_1pct,phase3_2pct_return,phase3_2pct_win,phase3_2pct_exit_reason,profit_per_100_shares_phase3_2pct,phase3_3pct_return,phase3_3pct_win,phase3_3pct_exit_reason,profit_per_100_shares_phase3_3pct
0,2025-10-30,2025-10-30,4063.T,信越化学工業,政策銘柄,,Ticker 4063.T 4620.0 Name: 2025-10-30 00:00...,Ticker 4063.T 4599.0 Name: 2025-10-30 00:00...,Ticker 4063.T 4633.0 Name: 2025-10-30 00:00...,Ticker 4063.T 4562.0 Name: 2025-10-30 00:00...,...,stop_loss_-1.0%,Ticker 4063.T -4620.0 Name: 2025-10-30 00:00...,Ticker 4063.T -0.02 Name: 2025-10-30 00:00:0...,Ticker 4063.T False Name: 2025-10-30 00:00:...,stop_loss_-2.0%,Ticker 4063.T -9240.0 Name: 2025-10-30 00:00...,Ticker 4063.T -0.03 Name: 2025-10-30 00:00:0...,Ticker 4063.T False Name: 2025-10-30 00:00:...,stop_loss_-3.0%,Ticker 4063.T -13860.0 Name: 2025-10-30 00:0...
1,2025-10-30,2025-10-30,6501.T,日立製作所,政策銘柄,,Ticker 6501.T 4890.0 Name: 2025-10-30 00:00...,Ticker 6501.T 4963.0 Name: 2025-10-30 00:00...,Ticker 6501.T 4977.0 Name: 2025-10-30 00:00...,Ticker 6501.T 4864.0 Name: 2025-10-30 00:00...,...,stop_loss_-1.0%,Ticker 6501.T -4890.0 Name: 2025-10-30 00:00...,Ticker 6501.T -0.02 Name: 2025-10-30 00:00:0...,Ticker 6501.T False Name: 2025-10-30 00:00:...,stop_loss_-2.0%,Ticker 6501.T -9780.0 Name: 2025-10-30 00:00...,Ticker 6501.T -0.03 Name: 2025-10-30 00:00:0...,Ticker 6501.T False Name: 2025-10-30 00:00:...,stop_loss_-3.0%,Ticker 6501.T -14670.0 Name: 2025-10-30 00:0...
2,2025-10-30,2025-10-30,6503.T,三菱電機,政策銘柄,,Ticker 6503.T 4175.0 Name: 2025-10-30 00:00...,Ticker 6503.T 4206.0 Name: 2025-10-30 00:00...,Ticker 6503.T 4244.0 Name: 2025-10-30 00:00...,Ticker 6503.T 4168.0 Name: 2025-10-30 00:00...,...,stop_loss_-1.0%,Ticker 6503.T -4175.0 Name: 2025-10-30 00:00...,Ticker 6503.T -0.02 Name: 2025-10-30 00:00:0...,Ticker 6503.T False Name: 2025-10-30 00:00:...,stop_loss_-2.0%,Ticker 6503.T -8350.0 Name: 2025-10-30 00:00...,Ticker 6503.T -0.03 Name: 2025-10-30 00:00:0...,Ticker 6503.T False Name: 2025-10-30 00:00:...,stop_loss_-3.0%,Ticker 6503.T -12525.0 Name: 2025-10-30 00:0...
3,2025-10-30,2025-10-30,6758.T,ソニーグループ,政策銘柄,,Ticker 6758.T 4283.0 Name: 2025-10-30 00:00...,Ticker 6758.T 4232.0 Name: 2025-10-30 00:00...,Ticker 6758.T 4289.0 Name: 2025-10-30 00:00...,Ticker 6758.T 4216.0 Name: 2025-10-30 00:00...,...,stop_loss_-1.0%,Ticker 6758.T -4283.0 Name: 2025-10-30 00:00...,Ticker 6758.T -0.02 Name: 2025-10-30 00:00:0...,Ticker 6758.T False Name: 2025-10-30 00:00:...,stop_loss_-2.0%,Ticker 6758.T -8566.0 Name: 2025-10-30 00:00...,Ticker 6758.T -0.03 Name: 2025-10-30 00:00:0...,Ticker 6758.T False Name: 2025-10-30 00:00:...,stop_loss_-3.0%,Ticker 6758.T -12849.0 Name: 2025-10-30 00:0...
4,2025-10-30,2025-10-30,7011.T,三菱重工業,政策銘柄,,Ticker 7011.T 4460.0 Name: 2025-10-30 00:00...,Ticker 7011.T 4538.0 Name: 2025-10-30 00:00...,Ticker 7011.T 4538.0 Name: 2025-10-30 00:00...,Ticker 7011.T 4439.0 Name: 2025-10-30 00:00...,...,stop_loss_-1.0%,Ticker 7011.T -4460.0 Name: 2025-10-30 00:00...,Ticker 7011.T -0.02 Name: 2025-10-30 00:00:0...,Ticker 7011.T False Name: 2025-10-30 00:00:...,stop_loss_-2.0%,Ticker 7011.T -8920.0 Name: 2025-10-30 00:00...,Ticker 7011.T -0.03 Name: 2025-10-30 00:00:0...,Ticker 7011.T False Name: 2025-10-30 00:00:...,stop_loss_-3.0%,Ticker 7011.T -13380.0 Name: 2025-10-30 00:0...


## 5. tags別集計

Grokの"reason"と同じく、tags別にパフォーマンスを集計

In [11]:
# tags別集計
if not results_df.empty:
    # tagsを展開
    results_df['tags_list'] = results_df['tags'].str.split(', ')
    tags_expanded = results_df.explode('tags_list')
    
    # Phase2で集計（データ型を明示的に変換）
    tags_expanded['profit_per_100_shares_phase2'] = pd.to_numeric(tags_expanded['profit_per_100_shares_phase2'], errors='coerce')
    tags_expanded['phase2_return'] = pd.to_numeric(tags_expanded['phase2_return'], errors='coerce')
    
    tags_stats = tags_expanded.groupby('tags_list').agg({
        'ticker': 'count',
        'profit_per_100_shares_phase2': ['sum', 'mean'],
        'phase2_return': 'mean',
        'phase2_win': lambda x: (x.sum() / len(x) * 100) if len(x) > 0 else 0
    }).round(2)
    
    tags_stats.columns = ['count', 'total_profit', 'avg_profit', 'avg_return_pct', 'win_rate_pct']
    tags_stats = tags_stats.sort_values('total_profit', ascending=False)
    
    print("\ntags別パフォーマンス (Phase2):")
    print(tags_stats)
else:
    print("データなし")


tags別パフォーマンス (Phase2):
           count  total_profit  avg_profit  avg_return_pct  \
tags_list                                                    
               5           0.0         NaN             NaN   

                        win_rate_pct  
tags_list                             
           [nan, nan, nan, nan, nan]  


## 6. 次のステップ

1. 全期間（Grokと同じ）でバックテスト実行
2. `political_trending_archive.parquet` として保存
3. フロントエンド（dev/analyze）で表示できるようにAPI追加

In [12]:
print("試作完了")
print("\n次のステップ:")
print("1. 全期間でバックテスト実行（スクリプト化）")
print("2. political_trending_archive.parquet 保存")
print("3. dev/analyze API追加")

試作完了

次のステップ:
1. 全期間でバックテスト実行（スクリプト化）
2. political_trending_archive.parquet 保存
3. dev/analyze API追加
