# Testing Optional Early Stopping

This notebook demonstrates the new optional early stopping feature in ICOE using the California Housing dataset.
We will run two scenarios:
1. **Validation of Early Stopping (Default)**: Show how optimization stops when no improvement is found.
2. **Forced Full Run (Disabled)**: Show how optimization continues even without improvement.

**Note**: Since real optimization usually improves, we will use a very strict tolerance (0.0) and few trials to make it likely to "fail" to improve significantly in later phases, triggering the stop.

In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from icoe.estimator import ICOERegressor

# Load Real Data
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Use a small subset for speed
X = X.iloc[:1000]
y = y[:1000]

print(f"Dataset shape: {X.shape}")

Dataset shape: (1000, 8)


## Scenario 1: Early Stopping Enabled (Default)

We run with `early_stopping=True`. We set `n_phases=5` but expect it to stop earlier if it can't beat the best score consistently.

In [2]:
print("--- Scenario 1: Early Stopping ENABLED ---")

est_stop = ICOERegressor(
    objective='regression',
    metric='rmse',
    n_phases=10,        # Request 5 phases
    n_trials=5,        # Few trials per phase to limit chance of improvement
    n_jobs=1,
    verbose=1,
    early_stopping=True, # ENABLED
    early_stopping_tolerance=0.0 # Strict: Must beat global best to continue if minimizing
)

est_stop.fit(X, y)

# Check how many phases actually ran
phases_run = set()
for t in est_stop.history_.get_trials():
    phases_run.add(t['phase_id'])
    
print(f"Requested Phases: {est_stop.n_phases}")
print(f"Actual Phases Run: {len(phases_run)}")

if len(phases_run) < est_stop.n_phases:
    print("SUCCESS: Early stopping triggered.")
else:
    print("NOTE: Optimization improved enough to run all phases (or just got lucky).")

--- Scenario 1: Early Stopping ENABLED ---
[ICOE] Phase 1/10 Start. Active Features: 8
[ICOE] Phase 1 Best Score: 0.3656 (Global Best: 0.3656)
[ICOE] Phase 2/10 Start. Active Features: 7
[ICOE] Phase 2 Best Score: 0.3788 (Global Best: 0.3656)
[ICOE] Early Stopping: Performance degraded. Stopping.
[ICOE] Optimization Complete. Refitting best model...
Requested Phases: 10
Actual Phases Run: 2
SUCCESS: Early stopping triggered.


## Scenario 2: Early Stopping Disabled

We run with `early_stopping=False`. Even if performance plateaus, it should force all requested phases to run.

In [3]:
print("\n--- Scenario 2: Early Stopping DISABLED ---")

est_full = ICOERegressor(
    objective='regression',
    metric='rmse',
    n_phases=10,        # Request 5 phases
    n_trials=5,
    n_jobs=1,
    verbose=1,
    early_stopping=False, # DISABLED
    early_stopping_tolerance=0.0
)

est_full.fit(X, y)

# Check how many phases actually ran
phases_run_full = set()
for t in est_full.history_.get_trials():
    phases_run_full.add(t['phase_id'])
    
print(f"Requested Phases: {est_full.n_phases}")
print(f"Actual Phases Run: {len(phases_run_full)}")

if len(phases_run_full) == est_full.n_phases:
    print("SUCCESS: Ran all phases despite potential stagnation.")
else:
    print("FAILURE: Stopped early unexpectedly.")


--- Scenario 2: Early Stopping DISABLED ---
[ICOE] Phase 1/10 Start. Active Features: 8
[ICOE] Phase 1 Best Score: 0.3619 (Global Best: 0.3619)
[ICOE] Phase 2/10 Start. Active Features: 8
[ICOE] Phase 2 Best Score: 0.4195 (Global Best: 0.3619)
[ICOE] Phase 3/10 Start. Active Features: 7
[ICOE] Phase 3 Best Score: 0.4762 (Global Best: 0.3619)
[ICOE] Phase 4/10 Start. Active Features: 8
[ICOE] Phase 4 Best Score: 0.3592 (Global Best: 0.3592)
[ICOE] Phase 5/10 Start. Active Features: 8
[ICOE] Phase 5 Best Score: 0.4196 (Global Best: 0.3592)
[ICOE] Phase 6/10 Start. Active Features: 8
[ICOE] Phase 6 Best Score: 0.4026 (Global Best: 0.3592)
[ICOE] Phase 7/10 Start. Active Features: 8
[ICOE] Phase 7 Best Score: 0.3533 (Global Best: 0.3533)
[ICOE] Phase 8/10 Start. Active Features: 8
[ICOE] Phase 8 Best Score: 0.3780 (Global Best: 0.3533)
[ICOE] Phase 9/10 Start. Active Features: 8
[ICOE] Phase 9 Best Score: 0.4022 (Global Best: 0.3533)
[ICOE] Phase 10/10 Start. Active Features: 8
[ICOE] Pha