In [None]:
# ----------------------------------------------------------------------------------------------------------------
# Model Misspecification Robustness Evaluation
# ----------------------------------------------------------------------------------------------------------------

"""
We evaluate robustness of three DP-derived strategies to model errors across 997 Polymarket paths.

Core Question: How wrong can traders be about probabilities and price dynamics before strategies fail?

DP Strategies (tested under varying degrees of model error):
* Uniform Regime Policy: Conservative - assumes no predictable price dynamics
* Random Walk Regime Policy: Moderate - assumes momentum/drift optionality  
* Mean Reverting Regime Policy: Aggressive - assumes prices revert to fair value

Model Misspecification Stress Tests:
1. Probability Estimation Errors:
   * Test p_subj ∈ {0.1, 0.3, 0.5, 0.7, 0.9} representing varying confidence levels
   * Measure performance degradation vs |p_subj - true_outcome|
   
2. Regime Misclassification Errors:
   * Each strategy assumes different price dynamics (Uniform/RW/MR)
   * Measure LL_trans = how well assumed regime fits actual price path
   
3. Extreme Case: Random Side Selection
   * Tests completely uninformative probability estimates
   * Represents worst-case scenario of no predictive ability

Experimental Design:
* Each path simulated with initial wealth (W₀) = 1, contracts (x₀) = 0
* Variable horizons: DP table computed for T_max, paths start at appropriate T_market
* For Mean Reverting: fair_value = p_subj (assumes market converges to consensus)
* Realistic frictions: 5% buy spread, 10% sell spread

Dataset Construction (Addressing Historical Bias):
* Problem: Raw data has 90% YES contracts resolve favorably → biased "always bet YES" strategies
* Solution: Per-market random contract sampling:
  - One contract (YES or NO) per market → independent observations
  - Random selection → 50/50 balanced outcomes by construction
  - All real price paths, no synthetic data
* Result: 997 independent balanced paths enabling valid inference

Robustness Metrics:
* Maximum Tolerable Error: Largest |p_subj - true_outcome| before strategy underperforms baseline
* Regime Sensitivity: Performance degradation slope vs LL_trans (how regime fit affects returns)
* Risk-Robustness Tradeoff: How risk of ruin increases with model errors
* Graceful Degradation Score: How smoothly performance declines with increasing errors

Key Analyses:
1. Error Tolerance Curves: Performance vs |p_subj - outcome| for each strategy
2. Regime Fit Sensitivity: Performance vs LL_trans for each strategy
3. Combined Error Surface: Performance over (LL_trans, |p_subj-outcome|) space
4. Worst-Case Protection: Minimum performance across all error scenarios

"""

# ----------------------------------------------------------------------------------------------------------------
# Model Error Robustness Surfaces
# ----------------------------------------------------------------------------------------------------------------

"""
Map strategy robustness to combined model errors in probability estimation and regime classification.

Error Dimensions:
1. Probability Error: |p_subj - true_outcome| ∈ [0, 1]
   - 0 = Perfect prediction, 0.5 = Random guessing, 1 = Perfectly wrong
   
2. Regime Fit Error: LL_trans = (1/T) x Σ log p(cₜ₊₁|cₜ)
   - Higher LL_trans = better fit between assumed and actual price dynamics
   - Lower LL_trans = worse fit (more regime misclassification)

Robustness Surface Analysis:
For each (probability_error, regime_fit) combination:
* Calculate each strategy's risk-adjusted performance
* Identify which strategy maximizes worst-case performance
* Map regions of strategy dominance under uncertainty

Robustness Interpretation:
The optimal strategy depends not on being right, but on how wrong we might be:
- When confident AND regime fits well → Mean Reverting maximizes returns
- When uncertain OR regime fits poorly → Uniform minimizes losses  
- Moderate uncertainty → Random Walk provides balance

Practical Application:
Given estimates of our prediction accuracy (from past performance) and regime fit confidence,
this surface tells us which strategy to use to maximize robustness to our inevitable model errors.

Methodological Note:
* 997 independent paths provide statistical power for error surface estimation
* Analysis accounts for within-path dependence across p_subj variations
* Surfaces estimated via kernel smoothing over empirical distributions
"""