Fix: Remove window-normalization leakage in AR training (causal lookback normalization)

Hi Kronos team, 

While reviewing train_predictor and the dataset, I noticed a potential data-leakage issue in __getitem__:

- In AR training we use
  
  ```
  token_in  = [token_seq_0[:, :-1], token_seq_1[:, :-1]]
  token_out = [token_seq_0[:, 1:],  token_seq_1[:, 1:]]
  ```
  
- But **_token_seq_*_** comes from _**x_tensor**_ that was **Z-scored** using the entire window (_lookback_window + predict_window + 1_).
This makes each token_in position depend on statistics that include future points in the same window, which can cause distribution mismatch vs. real-time inference.


So, I opened a PR with the fix and a small comment tying the change back to the paper rationale: https://github.com/shiyu-coder/Kronos/pull/83

**What I changed (minimal fix):**
- Add some code comments to better locating the paper.
- Compute Z-score using only the lookback segment.
- Start training from the lookback boundary (predict future steps only), to better simulate deployment.

If you’d like, I can follow up with a fully causal sliding/online normalization variant as well.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Remove window-normalization leakage in AR training (causal lookback normalization) #84

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Fix: Remove window-normalization leakage in AR training (causal lookback normalization) #84

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions