## Overview

Since the final decision variable in this competition is the **position size** $p_t$, the problem naturally fits into a **predict-then-optimize** framework: first extract as much *useful probabilistic information* as possible from data, and then design a decision rule that is *explicitly aligned with the competition score*.

My solution consists of three tightly coupled components.

### 1) Probabilistic forecasting

Instead of relying on a single point prediction, I train a family of machine learning regression models to characterize the *conditional distribution* of market excess returns. Concretely, the models are used to predict:

- The conditional mean of excess return  
- The conditional variance  
- Multiple left-tail quantiles (used to approximate CVaR and other tail risk measures)

The motivation is straightforward: in an extremely low signal-to-noise environment, point forecasts alone are fragile, while distributional information provides much richer signals for downstream risk-aware decision making.

### 2) Single-step position optimization

The official competition score is defined over a rolling time window and involves non-linear, non-convex statistics (Sharpe ratio, volatility ratios, geometric means). Directly optimizing this window-based score is impractical.

Instead, I derive a **single-time-step surrogate optimization problem** that approximates the dominant behavior of the score. The resulting objective explicitly incorporates:

- Variance penalty (Sharpe-style risk control)  
- CVaR penalty (expected loss in the worst tail)  
- Tail variance penalty (via conditional second moments)  
- Multi-order turnover penalties to suppress excessive trading  
- Explicit penalties aligned with the competition rules $\rho_1$ and $\rho_2$

This step is where most of the mathematical derivation effort is spent: the goal is to translate a window-level evaluation metric into a tractable, stable, single-step decision problem.

### 3) Ensembling

In the final stage, I ensemble both **models** and **strategies**:

- Multiple ML models (LightGBM, CatBoost, XGBoost)  
- Two types of trading logic:
  - A simple signal-based position rule （as the demo submission notebook provided) 
  - The derived optimization-based position policy

The ensemble weights are tuned using walk-forward validation to balance robustness and performance.
### 4) Hyperparameter Tuning

All penalty weights, CVaR confidence levels, turnover orders, and ensemble coefficients are selected via **walk-forward hyperparameter search**, ensuring that the final solution is not only high-scoring but also stable across time.

## Problem Formulation

At trading day $t$, we define:

- Position: $p_t$  
- Market return: $r_t^m$  
- Risk-free rate: $r_t^f$  
- Market excess return: $e_t^m = r_t^m - r_t^f$

Then the strategy return is
$$
r_t^s = r_t^f(1 - p_t) + p_t r_t^m = p_t e_t^m + r_t^f,
$$
and the strategy excess return is
$$
e_t^s = r_t^s - r_t^f = p_t e_t^m.
$$

The Sharpe ratio (the most important metric in this competition) is defined as

$$\text{Sharp}=
\frac{
\left(\prod_{t=1}^{T}(1+e_t^s)\right)^{\frac{1}{T}} - 1
}{
\operatorname{std}(\{r_t^s\}_{t=1}^{T})
}
\sqrt{252}.
$$
That is, annualized mean excess return divided by the standard deviation of daily returns.

The final competition score applies two penalties to the Sharpe ratio:

- **Volatility penalty**: if annualized strategy volatility exceeds $1.2\times$ market volatility, the excess part is penalized linearly:
  $$
 \rho_1=1+\max(0,\frac{std(\{r_t^s\})_{t=1}^T}{std(\{r_t^m\})_{t=1}^T}-1.2)
  $$

- **Underperformance penalty**: the strategy cannot “sit in cash and earn interest” while consistently underperforming the market:  
  $$\rho_2=
  1 + 0.01 \max\!\left(
  0,
  \left[
  \left(\prod_{t=1}^{T}(1+e_t^m)\right)^{\frac{1}{T}}-
  \left(\prod_{t=1}^{T}(1+e_t^s)\right)^{\frac{1}{T}}
  \right]
  \cdot 252 \cdot 100
  \right).
  $$

The final score is

$$
\text{Score}=
\frac{1}{\rho_1}\cdot\frac{1}{\rho_2}\cdot\text{Sharpe}.
$$

From these equations, we can see that $e_t^m$ is a random variable, while $p_t$ is the decision variable.  
The score is mainly driven by $\{e_t^s\}_{t=1}^{T}$ and $\operatorname{std}(\{r_t^s\}_{t=1}^{T})$: we want **high returns with low variance**.

---

## Probabilistic Forecasting

I didn’t spend excessive effort on the forecasting pipeline, mainly because the explanatory power of features severely limits the upper bound of predictability. Also, my domain knowledge of financial markets is limited, so the forecasting part is intentionally kept lightweight and robust.

The core idea here is **not** to chase a highly accurate point forecast, but to extract as much *distributional information* as possible from weak signals, which is far more useful for downstream risk-aware decision making.


### Feature Engineering

- Log-mapping of `date_id` to capture long-term trends  
- Fourier encodings with periods of 5, 21, 63, and 252 trading days to model multi-scale seasonality  
- Missing indicators + dummy values for early macro and sentiment signals  
- Removal of highly collinear features to stabilize tree-based models  

### Models and Targets

Let $e_t^m$ denote the market excess return. Instead of modeling only its conditional mean, I decompose the forecasting task into **multiple regression problems** targeting different aspects of the conditional distribution:

1. **Conditional mean**
   
   A standard regression model is trained with MAE loss:
   $$
   \mu_t \;=\; \mathbb{E}[e_t^m \mid \mathcal{F}_t],
   $$
   where $\mathcal{F}_t$ denotes the available features at time $t$.

2. **Conditional variance**
   
   After fitting the mean model, I compute residuals
   $$
   \varepsilon_t = e_t^m - \mu_t,
   $$
   and train a second model to predict the conditional variance:
   $$
   \sigma_t^2 \;\approx\; \mathbb{E}[\varepsilon_t^2 \mid \mathcal{F}_t].
   $$

   This explicitly separates *directional signal* from *uncertainty magnitude*.

3. **Tail quantiles (for CVaR estimation)**
   
   Multiple quantile regression models are trained for tail levels $\alpha_1,\ldots,\alpha_M$:
   $$
   q_{\alpha,t} \;=\; \inf\{x:\mathbb{P}(e_t^m \le x \mid \mathcal{F}_t)\ge \alpha\}.
   $$

Overall, this setup aims to **maximize the utility of weak predictability**:  
improving feature explainability on one hand, and enriching the target’s distributional description on the other.

### Ensemble

All regression tasks (mean, variance, and quantiles) are implemented using an ensemble of:

- LightGBM  
- CatBoost  
- XGBoost  

These models are chosen mainly for convenience and robustness, especially their ability to handle missing values natively. For simplicity and stability, I use the **same ensemble weights across all quantile levels**, which worked well in practice and avoided unnecessary degrees of freedom.


---

## Optimization

### 1) Pure return maximization

If we only consider maximizing expected excess return, the single-step problem is

$$
\max_{p_t}\; \mathbb{E}[e_t^s]=
\max_{p_t}\; p_t \mu_t,
$$
where $\mu_t = \mathbb{E}[e_t^m]$ is the predicted mean excess return.

Because the objective is linear in $p_t$, taking the derivative implies a boundary solution:
- If $\mu_t < 0$, choose $p_t = 0$ (do not trade).
- If $\mu_t > 0$, push $p_t$ to the upper bound (e.g., $p_t = 2$).

This explains why naive “maximize return” sizing tends to produce extreme positions (either no trade or all-in).


### 2) Variance-penalized objective (Sharpe-style sizing)

Introducing variance penalty yields a Sharpe-like surrogate:
$$
\max_{p_t}\;
p_t \mu_t - \lambda_{\text{var}} p_t^2 \sigma_t^2,
$$
where $\sigma_t^2 = \operatorname{Var}(e_t^m)$ is the predicted conditional variance.

Now the decision variable appears as a quadratic term, and the unconstrained optimum becomes
$$
p_t^\star = \frac{\mu_t}{2\lambda_{\text{var}}\sigma_t^2}.
$$

Compared to pure return maximization, this already prevents extreme leverage and yields smoother position sizing.


### 3) CVaR, tail variance, and turnover penalties

Building on the variance-penalized formulation, I further incorporate a downside tail risk penalty (CVaR), a tail “volatility” penalty (via conditional second moments in the left tail), and higher-order lagged turnover penalties to smooth trading dynamics. The practical single-step objective is:
$$
\max_{p_t}\;
p_t \mu_t-
\lambda_{\text{var}} p_t^2 \sigma_t^2-
\lambda_{\text{cvar}} p_t \,\text{CVaR}_\alpha(e_t^m)
$$

$$
-\lambda_{\text{tail}} p_t^2 \mathbb{E}\!\left[(e_t^m)^2 \mid e_t^m \le q_\alpha\right]
-\lambda_k |p_t - \frac{1}{K}\sum_{k=1}^{K} p_{t-k}|.
$$

Here $q_\alpha$ is the (left-tail) $\alpha$-quantile of $e_t^m$ (i.e., VaR at level $\alpha$), and the CVaR term is defined as the conditional expectation in the worst $\alpha$ fraction of outcomes:
$$
\text{CVaR}_\alpha(e_t^m)=\mathbb{E}\!\left[e_t^m \mid e_t^m \le q_\alpha\right].
$$
In practice, I estimate $q_\alpha$ and $\text{CVaR}_\alpha$ from my **multiple quantile regression models**. Suppose the models output conditional quantile forecasts $\hat q_{\beta,t}\approx q_\beta(e_t^m\mid \mathcal{F}_t)$ for a grid of tail levels $\beta\in(0,\alpha]$ (e.g., several small quantiles up to $\alpha$). Then I use the standard identity that CVaR can be written as an average of tail quantiles:
$$
\text{CVaR}_\alpha(X)=\frac{1}{\alpha}\int_{0}^{\alpha} q_u(X)\,du,
$$
and approximate it numerically using the predicted quantiles:
$$
\widehat{\text{CVaR}}_{\alpha,t}\approx \frac{1}{\alpha}\sum_{j=1}^{M}\hat q_{\beta_j,t}\,\Delta\beta_j,
\quad \Delta\beta_j=\beta_j-\beta_{j-1},\ \beta_M=\alpha.
$$

Similarly, the tail variance proxy is implemented via the conditional second moment in the left tail. Using the same quantile-grid approximation,
$$
\mathbb{E}\!\left[X^2\mid X\le q_\alpha\right]=\frac{1}{\alpha}\int_{0}^{\alpha} q_u(X)^2\,du
\ \Rightarrow\
\widehat{m}_{2,\alpha,t}\approx \frac{1}{\alpha}\sum_{j=1}^{M}\hat q_{\beta_j,t}^2\,\Delta\beta_j.
$$


Overall, these tail-aware penalties (CVaR + tail second moment) prevent the optimizer from taking aggressive positions when the predicted downside tail is heavy, while the turnover terms suppress “chasing noise”. Together, they substantially stabilize the position path and improve realized Sharpe under short evaluation windows.



### 4) Deriving step-wise penalties from $\rho_1$ and $\rho_2$

Even though $\rho_1$ and $\rho_2$ are window-based and non-convex (ratios of std, geometric means), we can derive simple single-step hinge penalties that mimic their dominant effects.

#### (a) Volatility penalty $\rho_1$

Recall
$$
\rho
1 + \max\!\left(
0,
\frac{\operatorname{std}(\{r_t^s\}_{t=1}^{T})}
     {\operatorname{std}(\{r_t^m\}_{t=1}^{T})}-1.2
\right).
$$

Since $r_t^s = r_t^f + p_t e_t^m$ and $r_t^f$ is approximately constant within the window,
$$
\frac{\operatorname{std}(\{r_t^s\})}{\operatorname{std}(\{r_t^m\})}
\approx
\frac{\operatorname{std}(\{p_t e_t^m\})}{\operatorname{std}(\{e_t^m\})}.
$$

Let $x_t = e_t^m$ and $w_t = p_t$. Using
$$
\operatorname{std}^2(w x) = \mathbb{E}[w^2 x^2] - \big(\mathbb{E}[w x]\big)^2,
$$
and the usual daily-return approximation that $\mathbb{E}[x]$ is small (so $\big(\mathbb{E}[w x]\big)^2$ is second-order), we take
$$
\operatorname{std}^2(w x) \approx \mathbb{E}[w^2 x^2].
$$

If $w_t$ varies slowly and is weakly correlated with $x_t^2$ in the window,
$$
\mathbb{E}[w^2 x^2] \approx \mathbb{E}[w^2]\mathbb{E}[x^2].
$$

Therefore,
$$
\operatorname{std}(p_t e_t^m)
\approx
\sqrt{\mathbb{E}[p_t^2]}\sqrt{\mathbb{E}[(e_t^m)^2]},
\quad
\operatorname{std}(e_t^m)\approx \sqrt{\mathbb{E}[(e_t^m)^2]},
$$
so the ratio becomes
$$
\frac{\operatorname{std}(r_t^s)}{\operatorname{std}(r_t^m)}
\approx
\sqrt{\mathbb{E}[p_t^2]}.
$$

This shows $\rho_1$ is essentially penalizing the RMS position exceeding $1.2$.  
A conservative single-step surrogate is to softly penalize positions above $1.2$:
$$
\text{pen}_1(p_t) = \lambda_1\max(p_t-1.2,0).
$$


#### (b) Underperformance penalty $\rho_2$


Recall
$$
\rho_2=
1+0.01 \max\!\left(
0,
\left[
\left(\prod_{t=1}^{T}(1+e_t^m)\right)^{\frac{1}{T}}
\left(\prod_{t=1}^{T}(1+e_t^s)\right)^{\frac{1}{T}}
\right]
\cdot 252 \cdot 100
\right),
$$
where $e_t^s = p_t e_t^m$.

Define geometric mean growth rates
$$
G_m = \left(\prod_{t=1}^{T}(1+e_t^m)\right)^{\frac{1}{T}},
\quad
G_s = \left(\prod_{t=1}^{T}(1+e_t^s)\right)^{\frac{1}{T}}.
$$

Use log transform:
$$
\log G = \frac{1}{T}\sum_{t=1}^{T}\log(1+e_t).
$$

Then
$$
\Delta := \log G_m - \log G_s=
\frac{1}{T}\sum_{t=1}^{T}\left[\log(1+e_t^m)-\log(1+p_t e_t^m)\right].
$$

For daily returns, $|e_t^m|\ll 1$, so apply the Taylor expansion
$$
\log(1+z)=z-\frac{z^2}{2}+O(z^3).
$$

Thus
$$
\log(1+e_t^m)-\log(1+p_t e_t^m)
\approx
(1-p_t)e_t^m-\frac{1-p_t^2}{2}(e_t^m)^2,
$$
and
$$
\Delta
\approx
\frac{1}{T}\sum_{t=1}^{T}(1-p_t)e_t^m-
\frac{1}{2T}\sum_{t=1}^{T}(1-p_t^2)(e_t^m)^2.
$$

The first term is first-order in $e_t^m$ and usually dominates at daily frequency.  
Assuming the market has positive long-run excess returns (so the window-average of $e_t^m$ is positive), the sign of the dominant term is mainly controlled by $(1-p_t)$:

- If $p_t<1$, then $(1-p_t)>0$, so $\Delta>0$ and $G_m>G_s$ (strategy underperforms market)  
- If $p_t>1$, then $(1-p_t)<0$, so $\Delta<0$ and typically $G_s\ge G_m$ (no underperformance)

Finally, map back from log space to level space. Since $G=\exp(\log G)$, for small $\Delta$,
$$
G_m - G_s \approx G_s\,(e^\Delta-1)\approx G_s\,\Delta.
$$

All constants (including $0.01\times252\times100$ and the scale of $G_s$) can be absorbed into a tunable weight, giving a practical single-step surrogate that penalizes under-investment:
$$
\text{pen}_2(p_t)=\lambda_2\max(1-p_t,0).
$$

#### c) Final single-step penalty 

Putting the two surrogates together, the final single-step penalty is
$$
\lambda_{1}\max(p_t-1.2,0)+\lambda_{2}\max(1-p_t,0).
$$

### Final single-step optimization Problem
Combining all components, the final single-step optimization problem is
$$
\max_{p_t} \mathcal{J}(p_t)=
p_t \mu_t
-\lambda_{\text{var}} p_t^2 \sigma_t^2
-\lambda_{\text{cvar}} p_t \,\widehat{\text{CVaR}}_{\alpha,t}
-\lambda_{\text{tail}} p_t^2 \widehat{m}_{2,\alpha,t}
$$

$$
-\lambda_k |p_t - \frac{1}{K}\sum_{k=1}^{K} p_{t-k}|
-\lambda_{1}\max(p_t-1.2,0)
-\lambda_{2}\max(1-p_t,0).
$$

The full objective becomes non-convex but is still a univariate optimization problem in $p_t$.  
Using Big-M + MIP solvers would be overkill, so I simply used grid search, which is robust and fast in practice.





---

## Hyperparameter Search

The derivation maps a window-based score into a single-step optimization formulation, but many hyperparameters remain to be tuned (e.g., penalty weights, CVaR confidence level, turnover order, etc.).

Since lagged features are used, I applied walk-forward evaluation and optimized the following stability-aware objective:
$$
\max_{\theta}\left(
\mathbb{E}_{\text{folds}}[\text{Score}(\theta)]-
\gamma\cdot
\max\!\left(
\operatorname{Var}_{\text{folds}}[\text{Score}(\theta)],
\;\tau
\right)
\right),
$$
where $\theta$ denotes hyperparameters, $\gamma$ controls the strength of stability regularization, and $\tau$ is a variance floor to avoid overly small variance estimates dominating the objective.

This explicitly trades off high average score and low variance across folds.

## Evaluation Results

Walk-forward backtesting was conducted on the public training set provided by the competition. The initial training window length was 5,186 observations, and the validation window covered 125 trading days. The final version of the solution achieved the following results across all folds:

| fold | train_start_idx | train_end_idx | val_start_idx | val_end_idx | sharpe  | vol_penalty | return_penalty | score    |
|------|------------------|--------------|---------------|------------|---------|-------------|----------------|----------|
| 0    | 0                | 5185         | 5186          | 5310       | 2.20394 | 1.00000     | 1.35662        | 1.62458  |
| 1    | 0                | 5310         | 5311          | 5435       | 0.65284 | 1.00000     | 1.00000        | 0.65284  |
| 2    | 0                | 5435         | 5436          | 5560       | 0.44624 | 1.17504     | 1.00000        | 0.37977  |
| 3    | 0                | 5560         | 5561          | 5685       | 0.98497 | 1.00000     | 1.00000        | 0.98497  |
| 4    | 0                | 5685         | 5686          | 5810       | 2.17824 | 1.00000     | 1.00001        | 2.17821  |
| 5    | 0                | 5810         | 5811          | 5935       | 2.58647 | 1.00000     | 1.05382        | 2.45438  |
| 6    | 0                | 5935         | 5936          | 6060       | 2.19054 | 1.00000     | 1.00000        | 2.19054  |
| 7    | 0                | 6060         | 6061          | 6185       | 1.34628 | 1.00000     | 1.00000        | 1.34628  |
| 8    | 0                | 6185         | 6186          | 6310       | 1.01225 | 1.19272     | 1.00000        | 0.84869  |
| 9    | 0                | 6310         | 6311          | 6435       | 1.66722 | 1.00000     | 1.00000        | 1.66722  |
| 10   | 0                | 6435         | 6436          | 6560       | -0.67137| 1.26973     | 1.00933        | -0.52386 |
| 11   | 0                | 6560         | 6561          | 6685       | 1.46267 | 1.00000     | 1.09990        | 1.32981  |
| 12   | 0                | 6685         | 6686          | 6810       | 1.88504 | 1.00000     | 1.00000        | 1.88504  |
| 13   | 0                | 6810         | 6811          | 6935       | 2.74985 | 1.00000     | 1.00000        | 2.74985  |
| 14   | 0                | 6935         | 6936          | 7060       | 3.37012 | 1.00000     | 2.00316        | 1.68240  |
| 15   | 0                | 7060         | 7061          | 7185       | 0.69500 | 1.00000     | 1.00000        | 0.69500  |
| 16   | 0                | 7185         | 7186          | 7310       | -0.96679| 1.33497     | 1.61720        | -0.44781 |
| 17   | 0                | 7310         | 7311          | 7435       | 2.49966 | 1.00000     | 1.00000        | 2.49966  |
| 18   | 0                | 7435         | 7436          | 7560       | 1.46091 | 1.13292     | 1.00000        | 1.28951  |
| 19   | 0                | 7560         | 7561          | 7685       | 0.26140 | 1.38698     | 1.00000        | 0.18847  |
| 20   | 0                | 7685         | 7686          | 7810       | 1.97577 | 1.10118     | 1.00000        | 1.79423  |
| 21   | 0                | 7810         | 7811          | 7935       | 2.53332 | 1.00000     | 1.00000        | 2.53332  |
| 22   | 0                | 7935         | 7936          | 8060       | 2.05840 | 1.20488     | 1.00000        | 1.70838  |
| 23   | 0                | 8060         | 8061          | 8185       | -1.72920| 1.33011     | 5.65410        | -0.22993 |
| 24   | 0                | 8185         | 8186          | 8310       | 0.24246 | 1.00000     | 1.00000        | 0.24246  |
| 25   | 0                | 8310         | 8311          | 8435       | 2.07074 | 1.00000     | 1.00000        | 2.07074  |
| 26   | 0                | 8435         | 8436          | 8560       | 1.07785 | 1.00000     | 1.00041        | 1.07740  |
| 27   | 0                | 8560         | 8561          | 8685       | 2.45194 | 1.00000     | 1.00000        | 2.45194  |
| 28   | 0                | 8685         | 8686          | 8810       | 1.17868 | 1.01147     | 1.00000        | 1.16531  |
| 29   | 0                | 8810         | 8811          | 8935       | 0.07893 | 1.33028     | 1.00000        | 0.05933  |
| 30   | 0                | 8935         | 8936          | 9047       | 2.07276 | 1.00000     | 1.00000        | 2.07276  |

In [None]:
import os, sys, subprocess, shutil, glob

WHEELS_DIR = "/kaggle/input/xgb-wheels-builder/wheels"
TARGET_DIR = "/kaggle/working/pydeps"

# 0) 确认 wheels 在
assert os.path.isdir(WHEELS_DIR), WHEELS_DIR
print("Wheel files:", [os.path.basename(p) for p in glob.glob(os.path.join(WHEELS_DIR, "*.whl"))])

# 1) 清理旧的目标目录（最稳）
if os.path.exists(TARGET_DIR):
    shutil.rmtree(TARGET_DIR)
os.makedirs(TARGET_DIR, exist_ok=True)

# 2) 离线安装，强制覆盖
subprocess.check_call([
    sys.executable, "-m", "pip", "install",
    "--no-index",
    "--find-links", WHEELS_DIR,
    "--target", TARGET_DIR,
    "--upgrade",           # 强制覆盖已有目录
    "--no-deps",           # 只装 xgboost 本体，避免依赖乱动
    "xgboost==3.1.2",
    "-q"
])

# 3) 把 TARGET_DIR 放到 sys.path 最前面
if TARGET_DIR not in sys.path:
    sys.path.insert(0, TARGET_DIR)

In [None]:
'''
不采用其余文件中的代码，全部代码都放在此文件中，确保单个notebook可以运行。
线上迁移时记得：
1. 删除本地path
2. 删除score函数
3. 删除本地测试代码
'''

from pathlib import Path
import joblib
import os
import numpy as np
import pandas as pd
import polars as pl
from dataclasses import dataclass, asdict
from typing import Optional, Dict, List, Any, Callable
import lightgbm as lgb
from catboost import CatBoostRegressor
import xgboost as xgb
import kaggle_evaluation.default_inference_server


#%% ======================PATHS===============================
DATA_PATH: str= '/kaggle/input/hull-tactical-market-prediction'
MODEL_DIR: str =  "/kaggle/input/lgbm4market" 
FEATURE_DIR: str = "/kaggle/input/features4market"


#%%=================MODEL CONFIGS=========================

# LightGBM parameters
@dataclass
class LGBMParams:
    n_estimators: int = 50
    num_leaves: int =20
    max_depth: int = 8
    min_data_in_leaf: int = 800
    learning_rate: float = 0.02
    objective: str = 'quantile'
    alpha:float =0.5
    l1_regularization: float = 10
    l2_regularization: float = 5
    random_state: int = 42
    verbosity: int = -1

@dataclass
class CATBOOSTParams:
    iterations: int = 100
    # allow alpha to be injected; loss_function will be constructed in __post_init__
    alpha: float = 0.5
    loss_function: Optional[str] = None
    depth: int = 7
    learning_rate: float = 0.02
    l2_leaf_reg: float = 3.0
    min_data_in_leaf: int = 800
    random_seed: int = 42
    verbose: bool=False

    def __post_init__(self):
        # if loss_function not provided explicitly, build quantile loss using alpha
        if not self.loss_function:
            self.loss_function = f'Quantile:alpha={self.alpha}'

@dataclass
class XGBParams:
    n_estimators: int = 100
    max_depth: int = 8
    learning_rate: float = 0.03
    subsample: float = 0.8
    colsample_bytree: float = 0.8
    min_child_weight: float = 1.0
    reg_alpha: float = 0.17
    reg_lambda: float = 8.0

    # quantile
    quantile_alpha: float = 0.5
    objective: str = "reg:quantileerror"

    # runtime
    random_state: int = 42
    n_jobs: int = -1
    verbosity: int = 0


#%%=================STRATEGY CONFIGS=========================

# ============ RETURNS TO SIGNAL CONFIGS ============
MIN_SIGNAL: float = 0.0                         # Minimum value for the daily signal 
MAX_SIGNAL: float = 2.0                         # Maximum value for the daily signal 

#==============NAIVE STRATEGY HYPER-PARAMETERS=================
MULTIPLIER_Q50: float = 139.36               # Multiplier for scaling the signal in naive strategy
MULTIPLER_CVAR: float = 0.449          # Multiplier for scaling the CVaR component in naive strategy

#=============CVaR / Tail-Variance strategy hyper-parameters==========
TAIL_ALPHA: float = 0.2             # 使用下侧 0~alpha 的分位数来估计 CVaR
TAIL_ALPHA_STEP: float = 0.01        # 分位数步长

LAMBDA_CVAR: float = 0.001            # 尾部平均亏损（CVaR）惩罚权重
LAMBDA_TAIL_VAR: float = 0.2766       # 尾部方差惩罚权重
LAMBDA_VARIANCE: float = 0.1486         # 方差惩罚权重
LAMBDA_TURNOVER: float = 0.01999          # 换手惩罚权重
LAMBDA_VOL: float = 0.0001                 # 波动率惩罚权重 对应rho1
LAMBDA_RETURN_PEN: float = 0.00049        # 收益率惩罚权重 对应rho2
INIT_SIGNAL: float = 0.4               # 初始仓位
TURNOVER_ORDER: int = 2                # 换手阶数

#============Ensemble===========================
STRATEGY_ENSEMBLE_WEIGHT=0.1                 # 组合策略中，naive策略的权重

LGBM_ENSEMBLE_WEIGHT: float = 0.234         # 组合模型权重 0.11608688993930603 
CATBOOST_ENSEMBLE_WEIGHT: float = 0.6         # 组合模型权重
XGBOOST_ENSEMBLE_WEIGHT: float = 1-(LGBM_ENSEMBLE_WEIGHT + CATBOOST_ENSEMBLE_WEIGHT)  # 组合模型权重


#%% =============Utils Model Functions==================

class Regressor:
    """
    Generic regressor wrapper.
    - model_name: string like "LGBM", "CATBOOST", or a full class path "sklearn.ensemble.RandomForestRegressor"
    - params: parameter dict for the chosen estimator
    Provides fit(...) which calls the underlying estimator.fit and save(...) to persist the model
    """
    def __init__(self, model_name: str, alpha):
        self.model_name = model_name
        if model_name=="LGBM":
            self.params = asdict(LGBMParams(alpha=alpha))
        elif model_name=="CATBOOST":
            # build CATBOOST params, but remove internal-only fields (like 'alpha')
            params = asdict(CATBOOSTParams(alpha=alpha))
            # 'alpha' is used to construct loss_function but is not a valid CatBoost param
            params.pop('alpha', None)
            self.params = params
        elif model_name=="XGBOOST":
            self.params = asdict(XGBParams(quantile_alpha=alpha))
        else:
            self.params = {}
        self.model = self._build_model()

    def _build_model(self):
        name_up = self.model_name.upper()
        # LightGBM
        if name_up.startswith("LGBM"):
            return lgb.LGBMRegressor(**self.params)
        # CatBoost
        if name_up.startswith("CatBoost".upper()):
            return CatBoostRegressor(**self.params)
        # XGBoost
        if name_up.startswith("XGBOOST"):
            params = dict(self.params)
            params.setdefault("enable_categorical", True)
            return xgb.XGBRegressor(**params)

    def fit(self, X, y, **fit_kwargs):
        # Accept polars DataFrame/Series or pandas/numpy
        X_in = X.to_pandas() if hasattr(X, "to_pandas") else X
        y_in = y.to_pandas() if hasattr(y, "to_pandas") else y

        # Ensure the underlying model exists (build it if missing) and satisfy static type checkers
        if getattr(self, "model", None) is None:
            # Try to (re)build the model if it wasn't created in __init__
            self.model = self._build_model()
        assert self.model is not None, "No underlying model available; check model_name"
        self.model.fit(X_in, y_in, **fit_kwargs)
        return self

    def predict(self, X, **predict_kwargs):
        
        if getattr(self, "model", None) is None:
            raise ValueError("No model is built. Call fit(...) before predict(...) or ensure model is initialized.")

        X_in = X.to_pandas() if hasattr(X, "to_pandas") else X

        # assign to local variable so type checkers can infer it's not None
        model = getattr(self, "model", None)
        assert model is not None, "No underlying model available; check model_name"
        preds = model.predict(X_in, **predict_kwargs)

        # If original input was pandas, preserve the index and return a Series
        try:
            if isinstance(X, (pd.DataFrame, pd.Series)):
                return pd.Series(preds, index=X.index)
        except Exception:
            pass

        return preds

def load_model(directory: str= MODEL_DIR, custom_name: str= "None") -> Any:
    """
    Load model by trying known extensions in order.
    Returns the underlying estimator/booster (not the wrapper).
    """

    # 1) XGBoost json
    xgb_path = os.path.join(directory, f"{custom_name}.json")
    if os.path.exists(xgb_path):
        m = xgb.XGBRegressor()
        m.load_model(xgb_path)
        return m

    # 2) CatBoost cbm
    cb_path = os.path.join(directory, f"{custom_name}.cbm")
    if os.path.exists(cb_path):
        m = CatBoostRegressor()
        m.load_model(cb_path)
        return m

    # 3) LightGBM booster txt
    lgb_path = os.path.join(directory, f"{custom_name}.txt")
    if os.path.exists(lgb_path):
        return lgb.Booster(model_file=lgb_path)

    # 4) fallback pkl (不推荐，但保底)
    pkl_path = os.path.join(directory, f"{custom_name}.pkl")
    if os.path.exists(pkl_path):
        return joblib.load(pkl_path)

    raise FileNotFoundError(f"Model not found for name={custom_name} in {directory}")


#%% ==============Utils Strategy Classes==================

def predict_cvar_tail(
    models_tail: list[Regressor],
    X_val: pd.DataFrame,
    tail_alpha: float = TAIL_ALPHA,
    tail_alpha_step: float = TAIL_ALPHA_STEP,
) -> dict:
    
    n_samples= X_val.shape[0]

    # ---- 多个下侧分位数：估计 CVaR ----
    alphas = np.arange(tail_alpha_step, tail_alpha + 1e-8, tail_alpha_step)
    K = len(alphas)

    q_preds = np.zeros((n_samples, K), dtype=float)
    for j, a in enumerate(alphas):
        q_preds[:, j] = models_tail[j].predict(X_val)

    # CVaR_alpha^r ≈ (1/K) * sum q(τ_k)
    cvar_r = q_preds.mean(axis=1)  # shape: (n_samples,)

    # 尾部二阶矩 & 方差
    tail_second_moment = (q_preds ** 2).mean(axis=1)
    sigma_tail_sq = tail_second_moment - cvar_r ** 2
    sigma_tail_sq = np.maximum(sigma_tail_sq, 0.0)

    results_df={
        'cvar_r': cvar_r,
        'sigma_tail_sq': sigma_tail_sq,
    }

    return results_df

def predict_variance(
    model_var: Regressor,
    X_val: pd.DataFrame,
) -> pd.Series:
    
    preds = model_var.predict(X_val)
    vars= np.exp(preds) - 1e-6
    var_preds = np.maximum(vars, 0.0)
    
    return pd.Series(var_preds)

def F_w(
    w: np.ndarray, # 仓位决策
    mu: float,
    variance:float,
    cvar:float,
    tail_var: float,
    w_prev_mean: float,
    lambda_variance: float,
    lambda_cvar: float,
    lambda_tail_var: float,
    lambda_turnover: float,
    lambda_vol: float,
    lambda_return_pen: float,
) -> np.ndarray:

    w = np.asarray(w, dtype=float)

    #收益部分
    r_strategy= mu * w

    #方差惩罚
    p_variance = lambda_variance * variance * (w**2)

    # CVaR 惩罚
    p_cvar = -lambda_cvar * cvar * w

    # 尾部方差惩罚
    p_tail_var = lambda_tail_var * tail_var * (w**2)

    # 换手惩罚
    p_turnover = lambda_turnover * (w - w_prev_mean)**2

    # 波动性>市场惩罚
    p_vol= lambda_vol * np.maximum(0.0, w-1.2)

    # 收益率惩罚
    p_return_pen = lambda_return_pen * np.maximum(0.0, 1.0 - w)**2
    
    return r_strategy - p_variance - p_cvar - p_tail_var - p_turnover - p_vol - p_return_pen

def maximize_F_grid(
    F_w_func,              # 函数句柄，比如 F_w
    *F_args,               # 传给 F_w 的位置参数（除 w 以外的）
    w_min: float = 0.0,
    w_max: float = 2.0,
    n_grid: int = 1001,
    refine: bool = True,
    refine_radius: float = 0.02,
    refine_factor: int = 10,
    **F_kwargs,            # 传给 F_w 的关键字参数（除 w 以外的）
):
    """
    在区间 [w_min, w_max] 上用网格搜索最大化 F(w)。

    参数
    ----
    F_w_func : callable
        目标函数，签名类似 F_w(w, *F_args, **F_kwargs)，w 为 np.ndarray。
    *F_args, **F_kwargs :
        传递给 F_w_func 的其余参数（除 w 之外）。
    w_min, w_max : 搜索区间
    n_grid : 初始网格点数量
    refine : 是否在初始最优点附近做二次细化
    refine_radius : 细化搜索的左右半径
    refine_factor : 细化阶段相对于 n_grid 的放大倍数

    返回
    ----
    best_w : float, 近似最优 w*
    best_val : float, F(w*) 的近似最大值
    """
    # ===== 初次粗网格搜索 =====
    w_grid = np.linspace(w_min, w_max, n_grid)
    # 关键：直接把参数透传给 F_w_func
    F_vals = F_w_func(w_grid, *F_args, **F_kwargs)

    idx_best = int(np.argmax(F_vals))
    best_w = float(w_grid[idx_best])
    best_val = float(F_vals[idx_best])

    # ===== 可选：在最优点附近再细化一层 =====
    if refine:
        left = max(w_min, best_w - refine_radius)
        right = min(w_max, best_w + refine_radius)

        if right > left:
            n_refine = max(200, n_grid * refine_factor)
            w_grid_refine = np.linspace(left, right, n_refine)
            F_vals_refine = F_w_func(w_grid_refine, *F_args, **F_kwargs)

            idx_best_refine = int(np.argmax(F_vals_refine))
            best_w_refine = float(w_grid_refine[idx_best_refine])
            best_val_refine = float(F_vals_refine[idx_best_refine])

            if best_val_refine > best_val:
                best_w, best_val = best_w_refine, best_val_refine

    return best_w, best_val

class SignalGenerator:

    def __init__(
        self,
        model_q50,
        models_tail: list,
        model_var,
        ensemble_weight: float = 0.5,
    ):
        self.model_q50 = model_q50
        self.models_tail = models_tail
        self.model_var = model_var
        self.init_signal = INIT_SIGNAL
        self.turnover_order = TURNOVER_ORDER
        self.prev_signals = np.array([self.init_signal] * self.turnover_order,dtype=float)
        self.ensemble_weight = ensemble_weight

    def call_var_cvar_turnover(self, X_val: pd.DataFrame) -> pd.Series:
        n_samples = X_val.shape[0]

        mu = self.model_q50.predict(X_val)
        mu = mu.values if isinstance(mu, pd.Series) else mu

        result_tail = predict_cvar_tail(
            models_tail=self.models_tail,
            X_val=X_val,
            tail_alpha=TAIL_ALPHA,
            tail_alpha_step=TAIL_ALPHA_STEP
        )
        cvar = result_tail['cvar_r']
        tail_variance = result_tail['sigma_tail_sq']

        variance = predict_variance(
            model_var=self.model_var,
            X_val=X_val
        )
        variance = variance.values if isinstance(variance, pd.Series) else variance

        signal = np.zeros(n_samples, dtype=float)
        for i in range(n_samples):
            w_i, _ = maximize_F_grid(
                F_w_func=F_w,
                mu=mu[i],
                variance=variance[i],
                cvar=cvar[i],
                tail_var=tail_variance[i],
                w_prev_mean=self.prev_signals.mean(),
                lambda_variance=LAMBDA_VARIANCE,
                lambda_cvar=LAMBDA_CVAR,
                lambda_tail_var=LAMBDA_TAIL_VAR,
                lambda_turnover=LAMBDA_TURNOVER,
                lambda_vol=LAMBDA_VOL,
                lambda_return_pen=LAMBDA_RETURN_PEN,
            )
            signal[i] = w_i

            self.prev_signals = np.roll(self.prev_signals, -1)
            self.prev_signals[-1] = w_i

        return pd.Series(signal)
    
    def call_naive(self, X_val: pd.DataFrame, multiplier_q50: float = MULTIPLIER_Q50, multipler_cvar: float = MULTIPLER_CVAR) -> pd.Series:

        y_pred_q50 = self.model_q50.predict(X_val)
        cvar=predict_cvar_tail(
            models_tail=self.models_tail,
            X_val=X_val,
            tail_alpha=TAIL_ALPHA,
            tail_alpha_step=TAIL_ALPHA_STEP
        )['cvar_r']

        signal= np.clip(
            (y_pred_q50) * multiplier_q50 + cvar * multipler_cvar + 1.0
            , MIN_SIGNAL, MAX_SIGNAL
        )

        return pd.Series(signal)
    
    def call_ensemble(
        self,
        X_val: pd.DataFrame) -> pd.Series:
        signal_cvar = self.call_var_cvar_turnover(X_val)
        signal_naive = self.call_naive(X_val)

        # 如果两者 index 可能不一致，先对齐（很推荐）
        signal_cvar, signal_naive = signal_cvar.align(signal_naive, join='inner')

        w = float(self.ensemble_weight)

        cvar = signal_cvar.to_numpy(dtype=np.float64)
        naive = signal_naive.to_numpy(dtype=np.float64)

        data = (1.0 - w) * cvar + w * naive

        signal = pd.Series(
            data=data,
            index=signal_cvar.index,
            name="position"
        )
        
        return signal
    
#%%=================Utils Data Functions=========================
def data_cleaning(df: pd.DataFrame) -> pd.DataFrame:
    """
    Cleans the DataFrame by handling missing values.

    Args:
        df (pd.DataFrame): The input DataFrame.

    Returns:
        pd.DataFrame: The cleaned DataFrame.
    """
    
    # Convert D1-D9 to categorical
    cat_cols = [f"D{i}" for i in range(1, 10)]
    for c in cat_cols:
        if df[c].isna().any():
            df[c]= df[c].astype("category")
        else:
            df[c] = df[c].astype('int64').astype("category")

    return df


def feature_engineering(df: pd.DataFrame) -> pd.DataFrame:
    """
    Performs feature engineering on the DataFrame.

    Args:
        df (pd.DataFrame): The input DataFrame.
        ONLINE: bool: Whether in online mode (for inference).

    Returns:
        pd.DataFrame: The DataFrame with new features.
    """
    
    # create U1 and U2 
    if all(c in df.columns for c in ['I2', 'I1']):
        df['U1'] = df['I2'] - df['I1']
    if all(c in df.columns for c in ['M11', 'I2', 'I9', 'I7']):
        denom = (df['I2'] + df['I9'] + df['I7']) / 3.0
        # avoid division by zero
        denom = denom.replace(0, np.nan)
        df['U2'] = df['M11'] / denom

    # 长期趋势
    #df["t_norm"] = (df["date_id"] - df["date_id"].min()) / (df["date_id"].max() - df["date_id"].min())
    df["t_log"] = np.log1p(df["date_id"])
    
    
    # 周期性编码
    periods = [5, 21, 63, 252]  # 周、月、季度、年
    for p in periods:
        df[f"sin_{p}"] = np.sin(2 * np.pi * df["date_id"] / p)
        df[f"cos_{p}"] = np.cos(2 * np.pi * df["date_id"] / p)
    
    
    # 盈利指标与市场动量的共振效应
    if 'E1' in df.columns and 'M1' in df.columns:
        df['E1_M1_interaction'] = df['E1'] * df['M1']
    
    
    # 对市场宏观和情绪信号设置缺失指示器+哑值
    df['E7_isna'] = df['E7'].isna().astype(int)
    df['E7_filled'] = df['E7'].fillna(-1)
    df['S3_isna'] = df['S3'].isna().astype(int)
    df['S3_filled'] = df['S3'].fillna(-1)

    # 删除与其它特征共线性强的特征'I5','I9'
    df = df.drop(columns=['I5', 'I9'], errors='ignore')
    
    # 加入lagged 特征
    # if not ONLINE: #如果是ONLINE则滞后特征自动提供在test DataFrame中
    #     lagged_features = [
    #         'forward_returns', 'risk_free_rate', 'target'
    #     ]
    #     for feat in lagged_features:
    #         if feat in df.columns:
    #             df[f'lagged_{feat}'] = df[feat].shift(1)

    #     if 'lagged_target' in df.columns:
    #         df = df.rename(columns={'lagged_target': 'lagged_market_forward_excess_returns'})

    #去碎片化
    df=df.copy()
    
    return df

def load_selected_features(LAGGED:bool=True) -> List[str]:
    # Load selected features from file and convert to list
    with open(f'{FEATURE_DIR}/selected_features.txt', 'r') as f:
        selected_features = f.read().splitlines()
        if len(selected_features) and selected_features[-1] == '':
            selected_features = selected_features[:-1]

    #创建一个姊妹模型不考虑滞后特征
    if not LAGGED:
        lagged_feats = [
            'lagged_forward_returns', 'lagged_risk_free_rate', 'lagged_target'
        ]
        selected_features = [feat for feat in selected_features if feat not in lagged_feats]
     
    return selected_features

def create_dataset(
    df,
    noneFeatureCols=['date_id', 'target','forward_returns','risk_free_rate']) -> pd.DataFrame:
    """
    Feature engineering with optional noise feature injection.

    Args:
        df (pd.DataFrame): The input DataFrame.
        INJECT_NOISE (bool): Whether to inject random noise features. If False, the
            parameters `noise_features` and `noise_seed` are ignored and may be None.
        noise_features (Optional[int]): Number of noise features to add if INJECT_NOISE is True.
        noise_seed (Optional[int]): Random seed for reproducibility when INJECT_NOISE is True.

    Returns:
        pd.DataFrame: The DataFrame with new features, selected columns, and optional noise features.
    """

    # 用于调用predict API
    if not isinstance(df, pd.DataFrame):
        df = df.to_pandas()

    # --- Data Cleaning ---
    df = data_cleaning(df)

    # --- Feature Engineering ---
    df = feature_engineering(df)
    
    # --- Select Features ---
    FEATURES = load_selected_features()

    selected_cols = ["date_id", "target","forward_returns", "risk_free_rate"] + FEATURES

    # Ensure selected columns exist in df
    selected_cols = [c for c in selected_cols if c in df.columns]

    return df.loc[:, selected_cols]



#%%==================Preparation======================

# Load Models
model_q50_LGBM= load_model(custom_name=f"LGBM_model_q50")
models_tail_LGBM=[]
for j, a in enumerate(np.arange(TAIL_ALPHA_STEP, TAIL_ALPHA + 1e-8, TAIL_ALPHA_STEP)):
    model_tail= load_model(custom_name=f"LGBM_model_tail_{a:.3f}")
    models_tail_LGBM.append(model_tail)
model_var_LGBM= load_model(custom_name=f"LGBM_model_var")

model_q50_CATBOOST=load_model(custom_name=f"CATBOOST_model_q50")
models_tail_CATBOOST=[]
for j, a in enumerate(np.arange(TAIL_ALPHA_STEP, TAIL_ALPHA + 1e-8, TAIL_ALPHA_STEP)):
    model_tail= load_model(custom_name=f"CATBOOST_model_tail_{a:.3f}")
    models_tail_CATBOOST.append(model_tail)
model_var_CATBOOST= load_model(custom_name=f"CATBOOST_model_var")

model_q50_XGBOOST=load_model(custom_name=f"XGBOOST_model_q50")
models_tail_XGBOOST=[]
for j, a in enumerate(np.arange(TAIL_ALPHA_STEP, TAIL_ALPHA + 1e-8, TAIL_ALPHA_STEP)):
    model_tail= load_model(custom_name=f"XGBOOST_model_tail_{a:.3f}")
    models_tail_XGBOOST.append(model_tail)
model_var_XGBOOST= load_model(custom_name=f"XGBOOST_model_var")

# Create Signal Generators
generator_LGBM=SignalGenerator(
    model_q50=model_q50_LGBM,
    models_tail=models_tail_LGBM,
    model_var=model_var_LGBM,
    ensemble_weight=STRATEGY_ENSEMBLE_WEIGHT
)

generator_CATBOOST=SignalGenerator(
    model_q50=model_q50_CATBOOST,
    models_tail=models_tail_CATBOOST,
    model_var=model_var_CATBOOST,
    ensemble_weight=STRATEGY_ENSEMBLE_WEIGHT
)

generator_XGBOOST=SignalGenerator(
    model_q50=model_q50_XGBOOST,
    models_tail=models_tail_XGBOOST,
    model_var=model_var_XGBOOST,
    ensemble_weight=STRATEGY_ENSEMBLE_WEIGHT
)


#%% =================Prediction API==================

def predict(test:pl.DataFrame) -> float:

    # Data Pipeline
    df: pd.DataFrame = create_dataset(test)
    FEATURES: list[str] = load_selected_features()
    X_test: pd.DataFrame = df[FEATURES]   # 这里通常只有 1 行
    
    # Generate Signal
    signal_series_LGBM=generator_LGBM.call_ensemble(X_test)
    signal_series_CATBOOST=generator_CATBOOST.call_ensemble(X_test)
    signal_series_XGBOOST=generator_XGBOOST.call_ensemble(X_test)

    #signal_series= MODEL_ENSEMBLE_WEIGHT * signal_series_LGBM + (1 - MODEL_ENSEMBLE_WEIGHT) * signal_series_CATBOOST
    signal_series= (LGBM_ENSEMBLE_WEIGHT * signal_series_LGBM +
                    CATBOOST_ENSEMBLE_WEIGHT * signal_series_CATBOOST +
                    XGBOOST_ENSEMBLE_WEIGHT * signal_series_XGBOOST)

    #严格限制在[min_signal, max_signal]范围内
    signal_series = signal_series.clip(lower=MIN_SIGNAL, upper=MAX_SIGNAL)
    signal = float(signal_series.iloc[0])

    #保底，防止NaN或Inf被拒绝
    if not np.isfinite(signal):
        signal = 1.0 

    return signal

inference_server = kaggle_evaluation.default_inference_server.DefaultInferenceServer(predict)
if os.getenv('KAGGLE_IS_COMPETITION_RERUN'): #检查环境变量，如果存在则表示在Kaggle竞赛平台上运行
    inference_server.serve()
else:
    inference_server.run_local_gateway((DATA_PATH,))

## On End-to-End Learning (Value-Oriented)

End-to-end predict-then-optimize learning is definitely trending, but after several rounds of forecasting practice, I’ve come to feel that it’s still very hard to *truly* train a forecasting model directly on **decision value** in settings like this competition.

A major obstacle is that this is essentially a **single-step prediction + decision** problem, while the evaluation is a **window-level score** (Sharpe with non-linear penalties). In such a setting, we usually cannot write the decision value as a clean closed-form objective, nor can we cast it as a convex optimization problem. That means we cannot clearly answer a fundamental question: *what is the actual marginal “value” of improving a one-step forecast by some amount?*

Another issue is the additional complexity introduced by end-to-end training. Replacing highly efficient ML frameworks (e.g., gradient boosting) with an end-to-end pipeline that embeds a decision model often makes the workflow significantly heavier, while paradoxically making it harder to learn a simple and reasonable representation. As I mentioned in my earlier post, a regression model’s representation should arguably be **as simple as possible**—everything else is basically feature engineering.

That said, *forecast value* is still a fascinating topic. In equity markets, even when the $R^2$ hovers around zero, the signal can still be economically meaningful and monetizable, which is honestly quite magical. Also, even if we don’t train the forecasting model in a value-oriented way, we can still evaluate **hyperparameters** by decision performance—and this is extremely practical. Hyperparameter search is fundamentally heuristic anyway (trial-based optimization rather than a fully modelable objective), so it makes perfect sense that the target is no longer just predictive accuracy, but rather **decision utility**.

In summary, my current belief is:
- **Model parameters** should be trained with efficient frameworks using simple, smooth statistical losses (e.g., MAE/quantile loss), and
- **Hyperparameters** can be tuned with trial-based methods directly on decision metrics (score/utility),
because the hyperparameter space is much smaller, and the complexity of full end-to-end training is far more manageable at that level than at the parameter level.

## On Mindset

One last thing that feels worth sharing is the mindset required for forecasting practice. With a limited dataset, weak predictability, and a short online evaluation window, this is still a competition heavily driven by randomness. Sometimes it even makes you question time-series forecasting itself: *will what we predict really remain valid in the future?* I don’t think any forecaster can guarantee that.

When you finally finish building a solution and see good offline results, you feel happy—but also strangely powerless. Even if you beat the market on average, those folds with ridiculously bad scores can still make you doubt the whole craft and feel discouraged. This is, in the end, a statistical game: no method can make every single sample point “good.” Trying to make everything “all the best” is often just the trap of overfitting. The real question is: *will tomorrow be one of those bad samples?* No one knows.

Indeed, discussion board  is full of comments saying this is a luck-driven competition, and I agree that the final leaderboard can be extremely noisy due to the short test window. However, the only thing we can do is to keep analyzing the data, deriving better theory, refining algorithms, and improving statistical performance. Don’t get overly excited by a lucky sample, and don’t let a bad sample ruin your mental state. Trust the power of statistics—only time will tell. Maybe that’s what “faith in forecasting” means.