

# Chronos-2 Soft Group Masking Extension - Progress Summary

## The Idea

**Hypothesis**: Hard group masking in Chronos-2 can block useful information when related time series are placed in different groups. Relaxing this constraint via similarity-weighted soft masking could allow beneficial cross-group learning while maintaining group structure.

**Key insight**: Instead of binary masking (attend=1, ignore=0), use continuous weights (0 to 1) based on how similar series are, allowing the model to learn from related series even across group boundaries.

---

## Implementation

Modified Chronos-2 to add soft masking capability (inference-only, no training required):

**Core changes**:
```python
# 1. Compute similarity between all time series in batch
similarity_matrix = compute_input_similarity(context, similarity_type="correlation")

# 2. Convert similarity to soft attention mask
# Within-group: similarity = 1.0 (full attention)
# Cross-group: similarity = computed value (0 to 1)
soft_mask = hard_mask + (1 - hard_mask) * similarity_matrix

# 3. Apply temperature scaling and convert to attention bias
attention_bias = log(soft_mask) × temperature
```

**Modified files**: `model.py` (soft mask construction), `pipeline.py` (similarity computation and parameter passing)

---

## Similarity Matrix: What We Use and Alternatives

The similarity matrix determines how much cross-group attention is allowed. We implemented three approaches:

### 1. **Correlation (Pearson)** - Currently Used
- Measures linear relationships between series
- Best for: Series with similar patterns but different scales
- Formula: Normalized covariance

### 2. **Cosine Similarity** - Available, Not Yet Tested
- Measures directional similarity (ignores magnitude)
- Best for: Series with similar shapes but different amplitudes
- Formula: Angle between vectors

### 3. **Distance (Gaussian Kernel)** - Available, Not Yet Tested
- Measures numerical proximity
- Best for: Series that are close in value space
- Formula: exp(-euclidean_distance / scale)

---

## Temperature Parameter

**Purpose**: Controls how permissive cross-group attention is

- **Low temperature (e.g., 1.0)**: Stricter - only very similar series attend across groups (closer to hard masking)
- **High temperature (e.g., 20.0)**: Permissive - even moderately similar series can share information

**Mathematical effect**: `attention_bias = log(similarity) × temperature`
- Acts as a scaling factor for cross-group information flow
- Higher temperature amplifies the effect of similarity differences

---

## Testing Conclusions

We tested the soft masking approach on multiple datasets with varying characteristics:

**Key findings**:
1. **Unrelated series (M4 Hourly)**: No improvement - as expected, cross-group learning doesn't help when series are from different domains

2. **Related series (ETT datasets)**: Small improvements (~6% on full data)
   - Statistically significant with large samples
   - **BUT**: Predictions are 98% identical to baseline (high correlation)
   - Only ~61% of samples improved (barely better than random)
   - Suggests **smoothing effect** rather than meaningful cross-learning

3. **Small vs large samples**: Improvement dropped from 28% (100 samples) to 6% (117k samples)
   - Indicates overfitting on small samples
   - Real effect is much smaller than initially observed

**Current interpretation**: Soft masking provides marginal benefit in current configuration - potentially just reducing prediction variance rather than enabling meaningful cross-group learning.

---

## Still to Explore

To determine if soft masking can provide meaningful improvements, we need to test:

1. **Different similarity measures**: 
   - Cosine similarity (shape-based)
   - Distance-based (proximity-based)
   - Compare which captures useful relationships best

2. **Temperature sweep**: 
   - Test 1.0, 10.0, 20.0 (only tested 5.0 so far)
   - Find optimal trade-off between group separation and cross-learning

3. **Different datasets**:
   - Weekly vs hourly granularity
   - Datasets with stronger inter-series relationships

4. **Deeper analysis**:
   - Per-feature results (which features benefit?)
   - Temporal patterns (does it help more for short/long-term forecasts?)

**Goal**: Determine if there's a configuration where soft masking provides substantial practical benefit, or conclude it's not effective for inference-only scenarios.
```


---

# report conclusion

## Conclusion for Your Report

Based on these results, here's what you can conclude:

---

### **Main Finding: Soft Group Masking Shows NO Meaningful Improvement**

**Quantitative Summary:**
- MASE improvement: +2.80% (p=0.26, **not significant**)
- WQL improvement: +0.02% (p=0.98, **not significant**)
- Cohen's d for MASE: 0.0116 (**negligible** - needs >0.2 for "small")
- Cohen's d for WQL: 0.0002 (**negligible**)

---

### **What This Means:**

1. **Hypothesis NOT supported**: The soft group masking extension does **not provide meaningful improvements** over standard hard group masking when applied at inference-only.

2. **Effect size reveals the truth**: While there's a 2.8% MASE improvement on average, Cohen's d = 0.0116 indicates this is **practically meaningless**. Even if it were statistically significant (which it isn't), the effect is too small to matter in practice.

3. **Consistent with individual dataset findings**: This aligns with your earlier tests on ETT_1H (d=0.2), Walmart (d=0.04), where effect sizes were negligible despite small percentage improvements.

---

### **Possible Explanations (for Discussion section):**

1. **Training-inference mismatch**: The model was trained with hard group masking. Introducing soft masking only at inference creates a distribution shift the model wasn't prepared for.

2. **Learned group structures are sufficient**: Chronos-2 may have already learned to encode relationship patterns during pretraining. The hard group boundaries at inference don't limit performance because the encoder already captured cross-series patterns.

3. **Similarity metric limitations**: Pearson correlation on raw context may not capture the semantic relationships the model uses internally. The model's learned representations may encode more complex relationships than simple correlation.

4. **Small batch sizes**: Most benchmark datasets have small batches (5-862 series). With hard masking already grouping all series together (group_id=0 in default mode), soft masking provides minimal additional benefit.

---

### **Recommendations for Report:**

**Section: Results**
```
We evaluated the soft group masking extension on 25 datasets from the 
Chronos Benchmark II zero-shot evaluation suite. Soft masking showed 
a 2.80% improvement in MASE and 0.02% in WQL over baseline hard 
masking. However, paired t-tests revealed these differences were not 
statistically significant (MASE: p=0.26; WQL: p=0.98). More critically, 
effect size analysis (Cohen's d) showed negligible practical significance 
(MASE: d=0.0116; WQL: d=0.0002), far below the 0.2 threshold for 
"small" effects.
```

**Section: Conclusion**
```
The inference-only soft group masking extension does not provide 
meaningful improvements over standard Chronos-2. We conclude that:
(1) hard group masking boundaries are not a limiting factor at inference,
(2) modifications to attention mechanisms likely require corresponding 
changes during training to be effective, and (3) the model's pretrained 
representations already capture sufficient cross-series relationships.
```

**Section: Future Work**
```
Future research should explore:
- Training Chronos models with soft group masking from scratch
- Alternative similarity metrics based on learned embeddings rather than raw values
- Adaptive temperature parameters learned per-dataset
- Ablation studies on different batch sizes and group configurations
```

---

### **This is a VALID research contribution!**

Negative results are valuable - you've shown that a seemingly reasonable hypothesis (soft masking should help) **does not hold in practice**. This saves other researchers time and provides insights into how Chronos-2 actually works.