# EV 充電 → 長時間放置 予測 (Ranking)

このノートブックは、`AutoGluon` を用いて『充電直後に最初に発生する長時間放置 (>=6h) の場所クラスタ』をランキング予測するための学習・評価・推論の一連の手順を示します。

- 特徴量生成: 充電イベントごとの文脈 (曜日/時刻/充電クラスタ) と候補放置クラスタの過去傾向 (人気度/代表時刻/距離など) を組み合わせたペア特徴
- 学習: 候補集合に対する二値分類 (正例=実際の放置クラスタ)。陽性確率をランキングスコアとして使用
- 評価: Top-k, MRR, MAP, NDCG@k を charge-group ごとに算出


In [6]:
# ライブラリ読み込み
from pathlib import Path
import pandas as pd
import numpy as np
from autogluon.tabular import TabularPredictor
# 自作ユーティリティ (本リポジトリ内)
import sys
sys.path.append(str(Path("EV-Battery-Parking-Degradation-Mitigation").resolve()))
from ranking.dataset import (
    load_sessions,
    prepare_sessions,
    build_charge_to_next_long_table,
    compute_cluster_centroids_by_vehicle,
    build_candidate_pool_per_vehicle,
    build_ranking_training_data,
    get_feature_columns,
)
from ranking.metrics import (
    top_k_accuracy_at_k,
    mean_reciprocal_rank,
    mean_average_precision,
    ndcg_at_k,
)
DATA_PATH = Path("../eda/ev_sessions_test.csv")
OUTDIR = Path("./outputs/ranking_demo")
OUTDIR.mkdir(parents=True, exist_ok=True)


In [None]:
# パラメータ（必要に応じて調整してください）
LONG_PARK_THRESHOLD_MIN = 360  # 長時間放置の定義（分）
CAND_TOP_N_PER_VEHICLE = 10   # 車両別候補 Top-N
CAND_GLOBAL_TOP_N = 20        # 全体候補 Top-N（補完用）
NEG_SAMPLE_K = 10             # 学習用の負例サンプリング K（0/負なら無効）
HOUR_BIN_SIZE = 3             # 時刻ビン幅（h）。例: 3→ 0-2,3-5,... の 3h ビン
ALPHA_SMOOTH = 1.0            # ラプラス平滑化の α
AG_PRESETS = 'medium_quality_faster_train'  # AutoGluon のプリセット
TIME_LIMIT = 300              # 学習の時間制限（秒）


## 1. データ読込と前処理

- タイムゾーンは Asia/Tokyo に正規化
- 長時間放置 (>=6h) の識別と、充電 → 次の長時間放置 (次の充電が来るまで) のリンク付け


In [7]:
sessions = load_sessions(DATA_PATH)
sessions = prepare_sessions(sessions, long_park_threshold_minutes=LONG_PARK_THRESHOLD_MIN)
c2p = build_charge_to_next_long_table(sessions)
display(c2p.head(10))
print("charges:", len(c2p), "with_label:", c2p["park_cluster"].notna().sum())


Unnamed: 0,hashvin,weekday,charge_cluster,charge_start_time,charge_start_hour,charge_end_time,park_cluster,park_start_time,park_start_hour,park_duration_minutes,gap_minutes,dist_charge_to_park_km
0,hv_0001_demo,1,505,2025-09-02 18:30:00,18,2025-09-02 19:12:00,101.0,2025-09-02 20:30:00,20.0,660.0,78.0,3.793725
1,hv_0001_demo,2,505,2025-09-03 18:45:00,18,2025-09-03 19:29:00,101.0,2025-09-03 20:30:00,20.0,660.0,61.0,3.793725
2,hv_0001_demo,3,606,2025-09-04 18:30:00,18,2025-09-04 19:18:00,101.0,2025-09-04 20:30:00,20.0,660.0,72.0,4.558132
3,hv_0001_demo,4,505,2025-09-05 18:30:00,18,2025-09-05 19:31:00,101.0,2025-09-05 20:30:00,20.0,660.0,59.0,3.793725
4,hv_0001_demo,5,606,2025-09-06 14:30:00,14,2025-09-06 15:12:00,101.0,2025-09-06 20:30:00,20.0,660.0,318.0,4.558132
5,hv_0001_demo,6,505,2025-09-07 12:45:00,12,2025-09-07 13:30:00,101.0,2025-09-07 20:30:00,20.0,660.0,420.0,3.793725
6,hv_0001_demo,1,505,2025-09-09 20:15:00,20,2025-09-09 20:49:00,101.0,2025-09-09 20:30:00,20.0,660.0,-19.0,3.793725
7,hv_0001_demo,2,505,2025-09-10 19:00:00,19,2025-09-10 19:35:00,101.0,2025-09-10 20:30:00,20.0,660.0,55.0,3.793725
8,hv_0001_demo,3,505,2025-09-11 18:45:00,18,2025-09-11 19:24:00,101.0,2025-09-11 20:30:00,20.0,660.0,66.0,3.793725
9,hv_0001_demo,4,505,2025-09-12 20:45:00,20,2025-09-12 21:33:00,101.0,2025-09-13 20:30:00,20.0,660.0,1377.0,3.793725


charges: 21 with_label: 19


## 2. 候補集合と特徴量の構築

- 車両別の長時間放置クラスタ頻度 TopN を候補に、足りない場合は全体 TopN で補完
- 候補クラスタの代表座標 (centroid) と、充電終了地点からの距離
- 候補クラスタの人気度 (global/vehicle)、代表開始時刻、充電時刻との循環距離


In [8]:
centroids = compute_cluster_centroids_by_vehicle(sessions)
cand_pool = build_candidate_pool_per_vehicle(
    sessions, top_n_per_vehicle=10, global_top_n=20
)
df_rank = build_ranking_training_data(
    df_sessions=sessions,
    charge_to_long=c2p,
    candidate_pool=cand_pool,
    centroids_by_vehicle=centroids,
    negative_sample_k=10,
)
df_rank.head()
print("ranking rows:", len(df_rank), "groups:", df_rank["group_id"].nunique())


ranking rows: 38 groups: 19


## 3. 学習と検証分割 (group 単位)

- group_id (充電イベント) 単位で 80/20 分割
- AutoGluon の二値分類を使用 (陽性=正解クラスタ)


In [9]:
# group-wise split
gids = df_rank["group_id"].unique()
rng = np.random.default_rng(42)
rng.shuffle(gids)
n_val = max(1, int(0.2 * len(gids)))
val_ids = set(gids[:n_val])
train_df = df_rank[~df_rank["group_id"].isin(val_ids)].copy()
val_df = df_rank[df_rank["group_id"].isin(val_ids)].copy()
features, cat_cols = get_feature_columns(df_rank)
label = "label"
predictor = TabularPredictor(
    label=label,
    path=str(OUTDIR / "autogluon"),
    problem_type="binary",
    eval_metric="roc_auc",
)
predictor.fit(
    train_data=train_df[[*features, label]],
    time_limit=TIME_LIMIT,
    presets="medium_quality_faster_train",
)


Preset alias specified: 'medium_quality_faster_train' maps to 'medium_quality'.
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.4.0
Python Version:     3.12.10
Operating System:   Windows
Platform Machine:   AMD64
Platform Version:   10.0.26200
CPU Count:          16
Memory Avail:       18.09 GB / 31.17 GB (58.0%)
Disk Space Avail:   834.74 GB / 930.73 GB (89.7%)
Presets specified: ['medium_quality_faster_train']
Using hyperparameters preset: hyperparameters='default'
Beginning AutoGluon training ... Time limit = 300s
AutoGluon will save models to "c:\workspace\src\kaggle\ml-study\EV-Battery-Parking-Degradation-Mitigation\train\outputs\ranking_demo\autogluon"
Train Data Rows:    32
Train Data Columns: 14
Label Column:       label
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 = 1, class 0 = 0
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    18518.26 MB

<autogluon.tabular.predictor.predictor.TabularPredictor at 0x298e5b1df40>

## 4. 予測スコアとランキング評価

- 確率スコアで降順ソートし、Top-k 命中、MRR、MAP、NDCG@3 を算出


In [10]:
proba = predictor.predict_proba(val_df[features])
if isinstance(proba, pd.DataFrame):
    scores = proba[1].to_numpy() if 1 in proba.columns else proba.iloc[:, -1].to_numpy()
else:
    scores = np.asarray(proba)
val_scored = val_df.assign(score=scores)
metrics = {
    "top1": top_k_accuracy_at_k(val_scored, "score", "group_id", "label", k=1),
    "top3": top_k_accuracy_at_k(val_scored, "score", "group_id", "label", k=3),
    "MRR": mean_reciprocal_rank(val_scored, "score", "group_id", "label"),
    "MAP": mean_average_precision(val_scored, "score", "group_id", "label"),
    "NDCG@3": ndcg_at_k(val_scored, "score", "group_id", "label", k=3),
}
display(pd.Series(metrics))
val_scored.sort_values(["group_id", "score"], ascending=[True, False]).head(10)
val_scored.to_csv(OUTDIR / "val_scored_rows.csv", index=False)


top1      0.666667
top3      1.000000
MRR       0.833333
MAP       0.833333
NDCG@3    0.876977
dtype: float64

## 5. 推論 (Top-k 候補出力)

- 全充電イベントに対し、候補クラスタの上位 Top-k (スコア付き) を出力


In [11]:
top_k = 3
proba_all = predictor.predict_proba(df_rank[features])
scores_all = (
    proba_all[1].to_numpy()
    if isinstance(proba_all, pd.DataFrame) and 1 in proba_all.columns
    else (
        proba_all.iloc[:, -1].to_numpy()
        if isinstance(proba_all, pd.DataFrame)
        else np.asarray(proba_all)
    )
)
scored_all = df_rank.assign(score=scores_all)
rows = []
for gid, g in scored_all.groupby("group_id"):
    gg = g.sort_values("score", ascending=False).head(top_k)
    rows.append(
        {
            "group_id": gid,
            "hashvin": gg["hashvin"].iloc[0],
            "charge_cluster": gg["charge_cluster"].iloc[0],
            "ranked_candidates": ",".join(map(str, gg["candidate_cluster"].tolist())),
            "scores": ",".join(f"{s:.6f}" for s in gg["score"].tolist()),
        }
    )
pred_topk = pd.DataFrame(rows)
display(pred_topk.head(10))
pred_topk.to_csv(OUTDIR / "predictions_topk.csv", index=False)
print("saved:", OUTDIR / "predictions_topk.csv")


Unnamed: 0,group_id,hashvin,charge_cluster,ranked_candidates,scores
0,hv_0001_demo__2025-09-02T18:30:00,hv_0001_demo,505,101202,"1.000000,0.003333"
1,hv_0001_demo__2025-09-03T18:45:00,hv_0001_demo,505,101202,"1.000000,0.003333"
2,hv_0001_demo__2025-09-04T18:30:00,hv_0001_demo,606,101202,"1.000000,0.016667"
3,hv_0001_demo__2025-09-05T18:30:00,hv_0001_demo,505,101202,"1.000000,0.003333"
4,hv_0001_demo__2025-09-06T14:30:00,hv_0001_demo,606,101202,"0.990000,0.060000"
5,hv_0001_demo__2025-09-07T12:45:00,hv_0001_demo,505,101202,"0.993333,0.040000"
6,hv_0001_demo__2025-09-09T20:15:00,hv_0001_demo,505,101202,"0.980000,0.000000"
7,hv_0001_demo__2025-09-10T19:00:00,hv_0001_demo,505,101202,"0.996667,0.000000"
8,hv_0001_demo__2025-09-11T18:45:00,hv_0001_demo,505,101202,"1.000000,0.003333"
9,hv_0001_demo__2025-09-12T20:45:00,hv_0001_demo,505,101202,"0.980000,0.000000"


saved: outputs\ranking_demo\predictions_topk.csv
