# TPS Dec 2021 — Plan and Experiment Log

Goal: WIN A MEDAL (>= 0.95658 accuracy on LB).

Approach:
- Baseline: LightGBM multiclass with stratified 5-fold CV, robust seed control, early stopping.
- Features: start with raw features; handle Id if present; map target to 0-based for LGBM then back.
- Validation: 5-fold StratifiedKFold; track per-fold accuracy and OOF accuracy.
- Inference: average fold probs; argmax; write submission.csv.

Next Steps:
1) Load data, inspect shapes and target distribution.
2) Train LGBM baseline; log fold timings and accuracy.
3) If CV < 0.955, iterate: tune num_leaves, max_depth, feature_fraction, learning_rate; try more folds, seeds, or XGBoost/CatBoost, and simple feature interactions.
4) Ensembling/seeds if needed to push over medal threshold.

Experiment Log:
- [TBD] Exp01: LGBM 5F baseline.

In [2]:
import time, pandas as pd
print("Sanity check: starting quick I/O test...", flush=True)
t0 = time.time()
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
print(f"Loaded. train: {train.shape}, test: {test.shape}", flush=True)
print("Train columns (first 10):", list(train.columns[:10]))
print("Target value_counts (top):\n", train['Cover_Type'].value_counts().sort_index().head(10))
print(f"Done quick I/O test in {time.time()-t0:.2f}s", flush=True)

Sanity check: starting quick I/O test...


Loaded. train: (3600000, 56), test: (400000, 55)


Train columns (first 10): ['Id', 'Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology', 'Vertical_Distance_To_Hydrology', 'Horizontal_Distance_To_Roadways', 'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm']
Target value_counts (top):
 Cover_Type
1    1320866
2    2036254
3     176184
4        333
5          1
6      10237
7      56125
Name: count, dtype: int64
Done quick I/O test in 12.00s


In [12]:
import sys, subprocess, time, os, math, traceback, gc
t0_all = time.time()

def ensure(pkg):
    try:
        __import__(pkg)
        print(f"[ok] {pkg} already installed")
    except ImportError:
        print(f"[install] {pkg} ...", flush=True)
        subprocess.check_call([sys.executable, "-m", "pip", "install", pkg, "-q"])
        __import__(pkg)

# Ensure key packages
ensure('numpy'); ensure('pandas'); ensure('lightgbm')
import numpy as np, pandas as pd, lightgbm as lgb

from sklearn.model_selection import StratifiedKFold, StratifiedShuffleSplit, KFold
from sklearn.metrics import accuracy_score

def downcast_df(df: pd.DataFrame) -> pd.DataFrame:
    for c in df.columns:
        if pd.api.types.is_integer_dtype(df[c]):
            df[c] = pd.to_numeric(df[c], downcast='integer')
        elif pd.api.types.is_float_dtype(df[c]):
            df[c] = pd.to_numeric(df[c], downcast='float')
    return df

def fast_onehot_to_cat(df: pd.DataFrame, prefix: str, start_at_one: bool = True):
    cols = [c for c in df.columns if c.startswith(prefix)]
    if not cols:
        return None, None
    # Sort by numeric suffix to ensure correct order
    def suf(c):
        try:
            return int(c.split('_')[-1])
        except Exception:
            return c
    cols = sorted(cols, key=suf)
    arr = df[cols].to_numpy(copy=False)
    if arr.dtype != np.int8:
        arr = arr.astype(np.int8, copy=False)
    cat = arr.argmax(axis=1).astype(np.int16)
    if start_at_one:
        cat = (cat + 1).astype(np.int16)
    cat_sum = arr.sum(axis=1).astype(np.int16)
    df.drop(columns=cols, inplace=True)
    return cat, cat_sum

def add_features(df: pd.DataFrame) -> pd.DataFrame:
    Hhyd = 'Horizontal_Distance_To_Hydrology'
    Vhyd = 'Vertical_Distance_To_Hydrology'
    elev = 'Elevation'
    hs9, hs12, hs3 = 'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm'
    aspect = 'Aspect'
    Hroad = 'Horizontal_Distance_To_Roadways'
    Hfire = 'Horizontal_Distance_To_Fire_Points'

    # Core engineered features
    if all(col in df.columns for col in [Hhyd, Vhyd]):
        df['Euclidean_Distance_To_Hydrology'] = np.sqrt(df[Hhyd]**2 + df[Vhyd]**2)
        df['Manhattan_Distance_To_Hydrology'] = np.abs(df[Hhyd]) + np.abs(df[Vhyd])
        df['Elevation_VD_Hydrology'] = df['Elevation'] - df[Vhyd]
        df['Elevation_Plus_VD_Hydrology'] = df['Elevation'] + df[Vhyd]
        # Extra FE per expert
        df['Elevation_minus_Euclidean_Dist_Hydrology'] = df['Elevation'] - df['Euclidean_Distance_To_Hydrology']
        df['Hydro_Ratio'] = df[Hhyd] / (df[Vhyd].abs() + 1.0)
        # Hydrology angle features
        ang = np.arctan2(df[Vhyd].astype(float), df[Hhyd].astype(float))
        df['Hydro_Angle_Sin'] = np.sin(ang)
        df['Hydro_Angle_Cos'] = np.cos(ang)
    else:
        df['Euclidean_Distance_To_Hydrology'] = 0.0
        df['Manhattan_Distance_To_Hydrology'] = 0.0
        df['Elevation_VD_Hydrology'] = 0.0
        df['Elevation_Plus_VD_Hydrology'] = 0.0
        df['Elevation_minus_Euclidean_Dist_Hydrology'] = 0.0
        df['Hydro_Ratio'] = 0.0
        df['Hydro_Angle_Sin'] = 0.0
        df['Hydro_Angle_Cos'] = 0.0

    if all(col in df.columns for col in [hs9, hs12, hs3]):
        df['Hillshade_Mean'] = (df[hs9] + df[hs12] + df[hs3]) / 3.0
        df['Hillshade_Min'] = df[[hs9, hs12, hs3]].min(axis=1)
        df['Hillshade_Max'] = df[[hs9, hs12, hs3]].max(axis=1)
        df['Hillshade_Range'] = df['Hillshade_Max'] - df['Hillshade_Min']
        df['Hillshade_Diff_9_3'] = df[hs9] - df[hs3]
    else:
        df['Hillshade_Mean'] = 0.0
        df['Hillshade_Min'] = 0.0
        df['Hillshade_Max'] = 0.0
        df['Hillshade_Range'] = 0.0
        df['Hillshade_Diff_9_3'] = 0.0

    if aspect in df.columns:
        rad = np.deg2rad(df[aspect].astype(float))
        df['Aspect_Sin'] = np.sin(rad)
        df['Aspect_Cos'] = np.cos(rad)
    else:
        df['Aspect_Sin'] = 0.0
        df['Aspect_Cos'] = 0.0

    # Distance interactions
    if Hroad in df.columns and Hfire in df.columns:
        df['Road_Fire_AbsDiff'] = np.abs(df[Hroad] - df[Hfire])
    else:
        df['Road_Fire_AbsDiff'] = 0.0
    if Hhyd in df.columns and Hroad in df.columns:
        df['Hydro_Road_AbsDiff'] = np.abs(df[Hhyd] - df[Hroad])
    else:
        df['Hydro_Road_AbsDiff'] = 0.0
    if Hhyd in df.columns and Hfire in df.columns:
        df['Hydro_Fire_AbsDiff'] = np.abs(df[Hhyd] - df[Hfire])
    else:
        df['Hydro_Fire_AbsDiff'] = 0.0

    # Compress Wilderness_Area and Soil_Type
    w_cat, w_sum = fast_onehot_to_cat(df, 'Wilderness_Area', start_at_one=True)
    if w_cat is not None:
        df['Wilderness_Area_cat'] = w_cat
        df['Wilderness_Area_Sum'] = w_sum
    else:
        df['Wilderness_Area_cat'] = 0
        df['Wilderness_Area_Sum'] = 0
    s_cat, s_sum = fast_onehot_to_cat(df, 'Soil_Type', start_at_one=True)
    if s_cat is not None:
        df['Soil_Type_cat'] = s_cat
        df['Soil_Type_Sum'] = s_sum
    else:
        df['Soil_Type_cat'] = 0
        df['Soil_Type_Sum'] = 0

    # Additional categorical interaction
    df['Soil_Wilderness_Interaction'] = (df['Soil_Type_cat'].astype(np.int32) * 100 + df['Wilderness_Area_cat'].astype(np.int32)).astype(np.int32)
    # Elevation binned (coarse)
    try:
        df['Elevation_Binned'] = pd.cut(df['Elevation'], bins=30, labels=False).astype('float32').fillna(-1).astype('int16')
    except Exception:
        df['Elevation_Binned'] = -1

    return downcast_df(df)

def compute_class_weights(y_arr, max_w=10.0):
    # y_arr expected 0..6 (may be missing some classes in subset)
    counts = np.bincount(y_arr, minlength=7).astype(np.float64)
    inv = np.zeros_like(counts)
    nonzero = counts > 0
    inv[nonzero] = 1.0 / counts[nonzero]
    inv[~nonzero] = 0.0
    # normalize to mean 1 over present classes
    if nonzero.any():
        inv = inv * (nonzero.sum() / inv[nonzero].sum())
    inv = np.clip(inv, 0.0, max_w)
    return inv

try:
    print("Loading data...", flush=True)
    train = pd.read_csv('train.csv')
    test = pd.read_csv('test.csv')
    print(f"train shape: {train.shape}, test shape: {test.shape}")

    # Optional DEV subset
    SEED = 42
    DEV_N = int(os.environ.get('DEV_N', '200000'))
    if DEV_N and DEV_N < len(train):
        print(f"Subsampling train to {DEV_N} rows for fast dev...", flush=True)
        y_full = train['Cover_Type']
        if y_full.value_counts().min() >= 2:
            try:
                sss = StratifiedShuffleSplit(n_splits=1, test_size=len(train)-DEV_N, random_state=SEED)
                for keep_idx, _ in sss.split(train, y_full):
                    train = train.iloc[keep_idx].reset_index(drop=True)
                    break
            except Exception as e:
                print(f"Stratified subsample failed ({e}); falling back to random sample.")
                train = train.sample(n=DEV_N, random_state=SEED).reset_index(drop=True)
        else:
            train = train.sample(n=DEV_N, random_state=SEED).reset_index(drop=True)
        print(f"New train shape: {train.shape}")

    # Identify Id column if present
    id_col = None
    for c in ['Id', 'id', 'ID']:
        if c in train.columns:
            id_col = c; break

    target_col = 'Cover_Type'
    assert target_col in train.columns, 'Target Cover_Type not found'

    # Map target 1..7 -> 0..6
    y = (train[target_col].astype(int) - 1)
    print("y class counts:")
    print(y.value_counts().sort_index())

    print('Engineering & downcasting features...', flush=True)
    t_feat = time.time()
    train_fe = add_features(train.copy())
    gc.collect()
    test_fe = add_features(test.copy())
    gc.collect()
    print(f"Feature engineering done in {time.time()-t_feat:.1f}s", flush=True)

    # Build feature list
    drop_cols = [c for c in [target_col, id_col] if c is not None]
    features = [c for c in train_fe.columns if c not in drop_cols]
    print(f"Using {len(features)} features")

    X = train_fe[features]
    X_test = test_fe[features]

    # CV setup: only use StratifiedKFold if ALL 7 classes present in subset
    N_SPLITS = 3
    present_classes = np.unique(y.values)
    if len(present_classes) == 7 and (y.value_counts().min() >= N_SPLITS):
        print("Using StratifiedKFold", flush=True)
        splitter = StratifiedKFold(n_splits=N_SPLITS, shuffle=True, random_state=SEED)
        split_iter = splitter.split(X, y)
    else:
        print(f"Using KFold (unique classes in subset={len(present_classes)})", flush=True)
        splitter = KFold(n_splits=N_SPLITS, shuffle=True, random_state=SEED)
        split_iter = splitter.split(X)

    # LightGBM CPU params (dev-fast)
    lgb_params = dict(
        objective='multiclass',
        num_class=7,
        metric='multi_logloss',
        learning_rate=0.08,
        num_leaves=256,
        max_depth=12,
        min_data_in_leaf=80,
        feature_fraction=0.9,
        bagging_fraction=0.9,
        bagging_freq=1,
        lambda_l1=0.0,
        lambda_l2=0.0,
        max_bin=255,
        n_estimators=2000,
        random_state=SEED,
        n_jobs=-1,
        force_row_wise=True
    )

    oof_probs = np.zeros((len(train), 7), dtype=np.float32)
    test_probs = np.zeros((len(test), 7), dtype=np.float32)
    fold_accuracies = []

    # Toggle weights for dev (disable to ensure splits form)
    USE_WEIGHTS = False
    cls_w = compute_class_weights(y.values, max_w=10.0) if USE_WEIGHTS else None
    if USE_WEIGHTS:
        print("Class weights:", {i: float(w) for i, w in enumerate(cls_w)})

    for fold, (tr_idx, va_idx) in enumerate(split_iter, 1):
        t0 = time.time()
        print(f"\n[Fold {fold}/{N_SPLITS}] train={len(tr_idx)} valid={len(va_idx)}", flush=True)
        X_tr, X_va = X.iloc[tr_idx], X.iloc[va_idx]
        y_tr, y_va = y.iloc[tr_idx], y.iloc[va_idx]

        sw = (cls_w[y_tr.values] if USE_WEIGHTS else None)

        model = lgb.LGBMClassifier(**lgb_params)
        model.fit(
            X_tr, y_tr,
            sample_weight=sw,
            eval_set=[(X_va, y_va)],
            eval_metric='multi_logloss',
            callbacks=[lgb.early_stopping(stopping_rounds=100, verbose=True)]
        )

        va_proba = model.predict_proba(X_va, raw_score=False)
        oof_probs[va_idx] = va_proba
        va_pred = np.argmax(va_proba, axis=1)
        acc = accuracy_score(y_va, va_pred)
        fold_accuracies.append(acc)
        print(f"Fold {fold} ACC: {acc:.6f} | best_iter: {model.best_iteration_} | elapsed: {time.time()-t0:.1f}s", flush=True)

        test_probs += model.predict_proba(X_test) / N_SPLITS

    oof_pred = np.argmax(oof_probs, axis=1)
    oof_acc = accuracy_score(y, oof_pred)
    print(f"\nOOF accuracy: {oof_acc:.6f}")
    print("Per-fold ACC:", ', '.join(f"{a:.6f}" for a in fold_accuracies))
    print(f"Total elapsed: {time.time()-t0_all:.1f}s")

    # Make submission
    sub = pd.DataFrame()
    sub['Id'] = test[id_col] if id_col is not None else np.arange(len(test))
    sub['Cover_Type'] = np.argmax(test_probs, axis=1) + 1  # back to 1..7
    sub_path = 'submission.csv'
    sub.to_csv(sub_path, index=False)
    print(f"Saved submission to {sub_path} with shape {sub.shape}")
except Exception as e:
    print("ERROR during run:", e)
    traceback.print_exc()

[ok] numpy already installed
[ok] pandas already installed
[ok] lightgbm already installed
Loading data...


train shape: (3600000, 56), test shape: (400000, 55)
Subsampling train to 200000 rows for fast dev...


New train shape: (200000, 56)
y class counts:
Cover_Type
0     73824
1    112812
2      9709
3        15
5       553
6      3087
Name: count, dtype: int64
Engineering & downcasting features...


Feature engineering done in 1.0s


Using 34 features
Using KFold (unique classes in subset=6)


Class weights: {0: 0.0011791827472196262, 1: 0.0007716553835650611, 2: 0.008966112589426478, 3: 5.803465808716112, 4: 0.0, 5: 0.15741769824727248, 6: 0.02819954231640482}

[Fold 1/3] train=133333 valid=66667




[LightGBM] [Info] Total Bins 6770
[LightGBM] [Info] Number of data points in the train set: 133333, number of used features: 34
[LightGBM] [Info] Start training from score -1.789562
[LightGBM] [Info] Start training from score -1.787947
[LightGBM] [Info] Start training from score -1.798613
[LightGBM] [Info] Start training from score -1.789196
[LightGBM] [Info] Start training from score -1.788293
[LightGBM] [Info] Start training from score -1.797001
Training until validation scores don't improve for 100 rounds




































































































































































































































































































































































































































































Did not meet early stopping. Best iteration is:
[3994]	valid_0's multi_logloss: 0.155231




ERROR during run: shape mismatch: value array of shape (66667,6) could not be broadcast to indexing result of shape (66667,7)


Traceback (most recent call last):
  File "/tmp/ipykernel_237/2958152172.py", line 270, in <module>
    oof_probs[va_idx] = va_proba
    ~~~~~~~~~^^^^^^^^
ValueError: shape mismatch: value array of shape (66667,6) could not be broadcast to indexing result of shape (66667,7)


In [13]:
import os, gc, json, pandas as pd, numpy as np, psutil, shutil, time
print("Post-run probe: checking OOF metrics and submission file...", flush=True)

def show(var):
    if var in globals():
        val = globals()[var]
        try:
            if isinstance(val, (list, tuple, np.ndarray)):
                print(f"- {var}: type={type(val).__name__}, len/shape={len(val) if hasattr(val,'__len__') else 'NA'}/{getattr(val,'shape', 'NA')}")
            else:
                print(f"- {var}: {val}")
        except Exception:
            print(f"- {var}: type={type(val).__name__}")
        return True
    else:
        print(f"- {var}: <not in globals>")
        return False

checked = []
for v in ['oof_acc','fold_accuracies','features','N_SPLITS','SEED','train','test','y','oof_probs','test_probs']:
    checked.append(show(v))

if 'y' in globals():
    try:
        vc = y.value_counts().sort_index()
        print("y value_counts (sorted):\n", vc)
    except Exception as e:
        print("Could not show y value_counts:", e)

sub_path = 'submission.csv'
if os.path.exists(sub_path):
    sub = pd.read_csv(sub_path)
    print(f"submission.csv exists. shape={sub.shape}; Cover_Type head:")
    print(sub.head())
    print("Cover_Type distribution (top 10):\n", sub['Cover_Type'].value_counts().sort_index().head(10))
else:
    print("submission.csv not found.")

# Quick disk/mem stats
total, used, free = shutil.disk_usage('/')
print(f"Disk free: {free/1e9:.1f} GB of {total/1e9:.1f} GB")
mem = psutil.virtual_memory()
print(f"RAM used: {mem.used/1e9:.1f} GB / {mem.total/1e9:.1f} GB")
gc.collect()
print("Probe done.", flush=True)

Post-run probe: checking OOF metrics and submission file...


- oof_acc: 0.36912
- fold_accuracies: type=list, len/shape=0/NA
- features: type=list, len/shape=34/NA
- N_SPLITS: 3
- SEED: 42
- train:              Id  Elevation  Aspect  Slope  Horizontal_Distance_To_Hydrology  \
0        379246       2893     112      7                               250   
1         70624       3254      35      8                               133   
2       1699813       2598      32      3                               207   
3        258430       3260      90     12                               424   
4       2721920       3355     280     18                               143   
...         ...        ...     ...    ...                               ...   
199995  1817267       2705     102     16                               176   
199996   702296       2997     102     24                               205   
199997  1383413       3342     281     27                               354   
199998  1982758       3010     158     27                               1

In [10]:
import numpy as np, pandas as pd, os, time
print("Computing OOF accuracy from existing oof_probs/y and saving submission from test_probs...", flush=True)
t0 = time.time()
if 'oof_probs' in globals() and 'y' in globals():
    oof_pred = np.argmax(oof_probs, axis=1)
    # y is 0-based already
    from sklearn.metrics import accuracy_score
    oof_acc = accuracy_score(y, oof_pred)
    print(f"OOF accuracy (recomputed): {oof_acc:.6f}")
else:
    print("oof_probs or y missing; cannot compute OOF.")

sub_path = 'submission.csv'
if 'test_probs' in globals() and 'test' in globals():
    # Determine ID column
    id_col = None
    for c in ['Id','id','ID']:
        if c in test.columns:
            id_col = c; break
    sub = pd.DataFrame()
    sub['Id'] = test[id_col] if id_col is not None else np.arange(len(test_probs))
    sub['Cover_Type'] = np.argmax(test_probs, axis=1) + 1  # back to 1..7
    sub.to_csv(sub_path, index=False)
    print(f"Saved submission to {sub_path} with shape {sub.shape}")
else:
    print("test_probs or test missing; cannot write submission.")
print(f"Done in {time.time()-t0:.2f}s", flush=True)

Computing OOF accuracy from existing oof_probs/y and saving submission from test_probs...


OOF accuracy (recomputed): 0.369120


Saved submission to submission.csv with shape (400000, 2)
Done in 0.29s


In [19]:
import sys, subprocess, time, os, gc, traceback
t0 = time.time()
print("Setting up GPU XGBoost and running 3-fold CV on DEV subset (200k)...", flush=True)

def pip_install(pkg):
    try:
        __import__(pkg.split('==')[0])
        print(f"[ok] {pkg} present")
    except Exception:
        print(f"[install] {pkg}")
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', pkg, '-q'])

try:
    pip_install('xgboost==2.0.3')
    import numpy as np, pandas as pd, xgboost as xgb
    from sklearn.model_selection import KFold, StratifiedShuffleSplit
    from sklearn.metrics import accuracy_score

    # Load data fresh
    train = pd.read_csv('train.csv')
    test = pd.read_csv('test.csv')

    # DEV subset 200k with stratified attempt else random
    SEED = 42
    DEV_N = int(os.environ.get('DEV_N', '200000'))
    y_full = train['Cover_Type']
    if DEV_N and DEV_N < len(train):
        if y_full.value_counts().min() >= 2:
            try:
                sss = StratifiedShuffleSplit(n_splits=1, test_size=len(train)-DEV_N, random_state=SEED)
                for keep_idx, _ in sss.split(train, y_full):
                    train = train.iloc[keep_idx].reset_index(drop=True)
                    break
            except Exception:
                train = train.sample(n=DEV_N, random_state=SEED).reset_index(drop=True)
        else:
            train = train.sample(n=DEV_N, random_state=SEED).reset_index(drop=True)

    id_col = 'Id' if 'Id' in train.columns else None
    y = (train['Cover_Type'].astype(int) - 1).astype(np.int32)

    # Self-contained feature engineering to avoid cross-cell function conflicts
    def fe(df: pd.DataFrame) -> pd.DataFrame:
        df = df.copy()
        Hhyd = 'Horizontal_Distance_To_Hydrology'; Vhyd = 'Vertical_Distance_To_Hydrology'
        hs9, hs12, hs3 = 'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm'
        aspect = 'Aspect'; elev = 'Elevation'
        Hroad='Horizontal_Distance_To_Roadways'; Hfire='Horizontal_Distance_To_Fire_Points'

        if all(c in df.columns for c in [Hhyd, Vhyd]):
            df['Euclidean_Distance_To_Hydrology'] = np.sqrt(df[Hhyd]**2 + df[Vhyd]**2)
            df['Manhattan_Distance_To_Hydrology'] = np.abs(df[Hhyd]) + np.abs(df[Vhyd])
            df['Elevation_VD_Hydrology'] = df[elev] - df[Vhyd]
            df['Elevation_Plus_VD_Hydrology'] = df[elev] + df[Vhyd]
            df['Elevation_minus_Euclidean_Dist_Hydrology'] = df[elev] - df['Euclidean_Distance_To_Hydrology']
            df['Hydro_Ratio'] = df[Hhyd] / (df[Vhyd].abs() + 1.0)
            ang = np.arctan2(df[Vhyd].astype(float), df[Hhyd].astype(float))
            df['Hydro_Angle_Sin'] = np.sin(ang); df['Hydro_Angle_Cos'] = np.cos(ang)

        if all(c in df.columns for c in [hs9, hs12, hs3]):
            df['Hillshade_Mean'] = (df[hs9] + df[hs12] + df[hs3]) / 3.0
            df['Hillshade_Min'] = df[[hs9, hs12, hs3]].min(axis=1)
            df['Hillshade_Max'] = df[[hs9, hs12, hs3]].max(axis=1)
            df['Hillshade_Range'] = df['Hillshade_Max'] - df['Hillshade_Min']
            df['Hillshade_Diff_9_3'] = df[hs9] - df[hs3]

        if aspect in df.columns:
            rad = np.deg2rad(df[aspect].astype(float))
            df['Aspect_Sin'] = np.sin(rad); df['Aspect_Cos'] = np.cos(rad)

        if Hroad in df.columns and Hfire in df.columns:
            df['Road_Fire_AbsDiff'] = np.abs(df[Hroad] - df[Hfire])
        if Hhyd in df.columns and Hroad in df.columns:
            df['Hydro_Road_AbsDiff'] = np.abs(df[Hhyd] - df[Hroad])
        if Hhyd in df.columns and Hfire in df.columns:
            df['Hydro_Fire_AbsDiff'] = np.abs(df[Hhyd] - df[Hfire])

        # Compress Wilderness_Area one-hots (lexicographic order is fine for argmax index)
        w_cols = [c for c in df.columns if c.startswith('Wilderness_Area')]
        if w_cols:
            w_cols_sorted = sorted(w_cols)
            warr = df[w_cols_sorted].to_numpy(dtype=np.int8, copy=False)
            w_cat = warr.argmax(axis=1).astype(np.int16) + 1
            df['Wilderness_Area_cat'] = w_cat
            df.drop(columns=w_cols_sorted, inplace=True)
        else:
            df['Wilderness_Area_cat'] = 0

        # Compress Soil_Type one-hots
        s_cols = [c for c in df.columns if c.startswith('Soil_Type')]
        if s_cols:
            s_cols_sorted = sorted(s_cols)
            sarr = df[s_cols_sorted].to_numpy(dtype=np.int8, copy=False)
            s_cat = sarr.argmax(axis=1).astype(np.int16) + 1
            df['Soil_Type_cat'] = s_cat
            df.drop(columns=s_cols_sorted, inplace=True)
        else:
            df['Soil_Type_cat'] = 0

        # Interaction and binned elevation
        df['Soil_Wilderness_Interaction'] = (df['Soil_Type_cat'].astype(np.int32)*100 + df['Wilderness_Area_cat'].astype(np.int32)).astype(np.int32)
        try:
            df['Elevation_Binned'] = pd.cut(df[elev], bins=30, labels=False).astype('float32').fillna(-1).astype('int16')
        except Exception:
            df['Elevation_Binned'] = -1

        # Downcast numerics to float32 for XGB
        for c in df.columns:
            if c == 'Cover_Type' or c == 'Id':
                continue
            if pd.api.types.is_float_dtype(df[c]) or pd.api.types.is_integer_dtype(df[c]):
                df[c] = df[c].astype(np.float32)
        return df

    train_fe = fe(train)
    test_fe = fe(test)

    # Build feature list (drop Id and target)
    drop_cols = [c for c in ['Cover_Type', id_col] if c is not None]
    features = [c for c in train_fe.columns if c not in drop_cols]
    X = train_fe[features].astype(np.float32)
    X_test = test_fe[features].astype(np.float32)

    print(f"X shape: {X.shape}, X_test: {X_test.shape}, features: {len(features)}", flush=True)

    # XGBoost params (GPU)
    xgb_params = {
        'objective': 'multi:softprob',
        'num_class': 7,
        'tree_method': 'gpu_hist',
        'predictor': 'gpu_predictor',
        'learning_rate': 0.06,
        'max_depth': 8,
        'min_child_weight': 8,
        'subsample': 0.8,
        'colsample_bytree': 0.8,
        'lambda': 1.0,
        'alpha': 0.0,
        'eval_metric': 'mlogloss'
    }

    N_SPLITS = 3
    kf = KFold(n_splits=N_SPLITS, shuffle=True, random_state=SEED)
    oof = np.zeros((len(X), 7), dtype=np.float32)
    test_pred = np.zeros((len(X_test), 7), dtype=np.float32)
    fold_acc = []

    for i, (tr, va) in enumerate(kf.split(X), 1):
        t_fold = time.time()
        print(f"[XGB Fold {i}/{N_SPLITS}] train={len(tr)} valid={len(va)}", flush=True)
        dtr = xgb.DMatrix(X.iloc[tr], label=y.iloc[tr])
        dva = xgb.DMatrix(X.iloc[va], label=y.iloc[va])
        dte = xgb.DMatrix(X_test)
        booster = xgb.train(
            params=xgb_params,
            dtrain=dtr,
            num_boost_round=2500,
            evals=[(dtr,'train'), (dva,'valid')],
            early_stopping_rounds=100,
            verbose_eval=200
        )
        p_va = booster.predict(dva, iteration_range=(0, booster.best_iteration+1))
        oof[va] = p_va
        p_te = booster.predict(dte, iteration_range=(0, booster.best_iteration+1))
        test_pred += p_te / N_SPLITS
        acc = accuracy_score(y.iloc[va], np.argmax(p_va, axis=1))
        fold_acc.append(acc)
        print(f"Fold {i} ACC={acc:.6f} | iters={booster.best_iteration+1} | elapsed={time.time()-t_fold:.1f}s", flush=True)
        del dtr, dva, dte, booster; gc.collect()

    oof_acc = accuracy_score(y, np.argmax(oof, axis=1))
    print(f"XGB OOF ACC: {oof_acc:.6f}; per-fold: {', '.join(f'{a:.6f}' for a in fold_acc)}")

    sub = pd.DataFrame({'Id': test['Id'] if 'Id' in test.columns else np.arange(len(test_pred)),
                        'Cover_Type': np.argmax(test_pred, axis=1) + 1})
    sub.to_csv('submission.csv', index=False)
    print(f"Saved submission.csv with shape {sub.shape}")

except Exception as e:
    print('ERROR in XGB cell:', e)
    traceback.print_exc()
print(f"Done in {time.time()-t0:.1f}s")

Setting up GPU XGBoost and running 3-fold CV on DEV subset (200k)...


[ok] xgboost==2.0.3 present


X shape: (200000, 32), X_test: (400000, 32), features: 32


[XGB Fold 1/3] train=133333 valid=66667


[0]	train-mlogloss:1.76225	valid-mlogloss:1.76322



    E.g. tree_method = "hist", device = "cuda"

Parameters: { "predictor" } are not used.



[200]	train-mlogloss:0.07214	valid-mlogloss:0.11955


[330]	train-mlogloss:0.05340	valid-mlogloss:0.11984


Fold 1 ACC=0.949240 | iters=232 | elapsed=6.7s



    E.g. tree_method = "hist", device = "cuda"



[XGB Fold 2/3] train=133333 valid=66667


[0]	train-mlogloss:1.76236	valid-mlogloss:1.76328



    E.g. tree_method = "hist", device = "cuda"

Parameters: { "predictor" } are not used.



[200]	train-mlogloss:0.07286	valid-mlogloss:0.11727


[382]	train-mlogloss:0.04758	valid-mlogloss:0.11737


Fold 2 ACC=0.951415 | iters=284 | elapsed=7.4s



    E.g. tree_method = "hist", device = "cuda"



[XGB Fold 3/3] train=133334 valid=66666


[0]	train-mlogloss:1.76234	valid-mlogloss:1.76319



    E.g. tree_method = "hist", device = "cuda"

Parameters: { "predictor" } are not used.



[200]	train-mlogloss:0.07347	valid-mlogloss:0.11758


[345]	train-mlogloss:0.05213	valid-mlogloss:0.11755


Fold 3 ACC=0.950875 | iters=246 | elapsed=6.7s



    E.g. tree_method = "hist", device = "cuda"



XGB OOF ACC: 0.950510; per-fold: 0.949240, 0.951415, 0.950875


Saved submission.csv with shape (400000, 2)
Done in 34.1s


In [25]:
import sys, subprocess, time, os, gc, traceback
t0 = time.time()
print("Setting up GPU CatBoost and running 3-fold CV on DEV subset (200k)...", flush=True)

def pip_install(pkg):
    try:
        __import__(pkg.split('==')[0])
        print(f"[ok] {pkg} present")
    except Exception:
        print(f"[install] {pkg}")
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', pkg, '-q'])

try:
    pip_install('catboost==1.2.5')
    import numpy as np, pandas as pd
    from catboost import CatBoostClassifier, Pool
    from sklearn.model_selection import KFold, StratifiedShuffleSplit
    from sklearn.metrics import accuracy_score

    # Load data fresh
    train = pd.read_csv('train.csv')
    test = pd.read_csv('test.csv')

    # DEV subset 200k with stratified attempt else random
    SEED = 42
    DEV_N = int(os.environ.get('DEV_N', '200000'))
    y_full = train['Cover_Type']
    if DEV_N and DEV_N < len(train):
        if y_full.value_counts().min() >= 2:
            try:
                sss = StratifiedShuffleSplit(n_splits=1, test_size=len(train)-DEV_N, random_state=SEED)
                for keep_idx, _ in sss.split(train, y_full):
                    train = train.iloc[keep_idx].reset_index(drop=True)
                    break
            except Exception:
                train = train.sample(n=DEV_N, random_state=SEED).reset_index(drop=True)
        else:
            train = train.sample(n=DEV_N, random_state=SEED).reset_index(drop=True)

    id_col = 'Id' if 'Id' in train.columns else None
    y = (train['Cover_Type'].astype(int) - 1).astype(np.int32)

    # Self-contained FE (same as XGB cell) to avoid cross-cell deps
    def fe(df: pd.DataFrame) -> pd.DataFrame:
        df = df.copy()
        Hhyd = 'Horizontal_Distance_To_Hydrology'; Vhyd = 'Vertical_Distance_To_Hydrology'
        hs9, hs12, hs3 = 'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm'
        aspect = 'Aspect'; elev = 'Elevation'
        Hroad='Horizontal_Distance_To_Roadways'; Hfire='Horizontal_Distance_To_Fire_Points'

        if all(c in df.columns for c in [Hhyd, Vhyd]):
            df['Euclidean_Distance_To_Hydrology'] = np.sqrt(df[Hhyd]**2 + df[Vhyd]**2)
            df['Manhattan_Distance_To_Hydrology'] = np.abs(df[Hhyd]) + np.abs(df[Vhyd])
            df['Elevation_VD_Hydrology'] = df[elev] - df[Vhyd]
            df['Elevation_Plus_VD_Hydrology'] = df[elev] + df[Vhyd]
            df['Elevation_minus_Euclidean_Dist_Hydrology'] = df[elev] - df['Euclidean_Distance_To_Hydrology']
            df['Hydro_Ratio'] = df[Hhyd] / (df[Vhyd].abs() + 1.0)
            ang = np.arctan2(df[Vhyd].astype(float), df[Hhyd].astype(float))
            df['Hydro_Angle_Sin'] = np.sin(ang); df['Hydro_Angle_Cos'] = np.cos(ang)

        if all(c in df.columns for c in [hs9, hs12, hs3]):
            df['Hillshade_Mean'] = (df[hs9] + df[hs12] + df[hs3]) / 3.0
            df['Hillshade_Min'] = df[[hs9, hs12, hs3]].min(axis=1)
            df['Hillshade_Max'] = df[[hs9, hs12, hs3]].max(axis=1)
            df['Hillshade_Range'] = df['Hillshade_Max'] - df['Hillshade_Min']
            df['Hillshade_Diff_9_3'] = df[hs9] - df[hs3]

        if aspect in df.columns:
            rad = np.deg2rad(df[aspect].astype(float))
            df['Aspect_Sin'] = np.sin(rad); df['Aspect_Cos'] = np.cos(rad)

        if Hroad in df.columns and Hfire in df.columns:
            df['Road_Fire_AbsDiff'] = np.abs(df[Hroad] - df[Hfire])
        if Hhyd in df.columns and Hroad in df.columns:
            df['Hydro_Road_AbsDiff'] = np.abs(df[Hhyd] - df[Hroad])
        if Hhyd in df.columns and Hfire in df.columns:
            df['Hydro_Fire_AbsDiff'] = np.abs(df[Hhyd] - df[Hfire])

        # Compress Wilderness_Area one-hots (lexicographic order)
        w_cols = [c for c in df.columns if c.startswith('Wilderness_Area')]
        if w_cols:
            w_cols_sorted = sorted(w_cols)
            warr = df[w_cols_sorted].to_numpy(dtype=np.int8, copy=False)
            w_cat = warr.argmax(axis=1).astype(np.int16) + 1
            df['Wilderness_Area_cat'] = w_cat
            df.drop(columns=w_cols_sorted, inplace=True)
        else:
            df['Wilderness_Area_cat'] = 0

        # Compress Soil_Type one-hots (lexicographic order)
        s_cols = [c for c in df.columns if c.startswith('Soil_Type')]
        if s_cols:
            s_cols_sorted = sorted(s_cols)
            sarr = df[s_cols_sorted].to_numpy(dtype=np.int8, copy=False)
            s_cat = sarr.argmax(axis=1).astype(np.int16) + 1
            df['Soil_Type_cat'] = s_cat
            df.drop(columns=s_cols_sorted, inplace=True)
        else:
            df['Soil_Type_cat'] = 0

        # Interaction and binned elevation
        df['Soil_Wilderness_Interaction'] = (df['Soil_Type_cat'].astype(np.int32)*100 + df['Wilderness_Area_cat'].astype(np.int32)).astype(np.int32)
        try:
            df['Elevation_Binned'] = pd.cut(df[elev], bins=30, labels=False).astype('float32').fillna(-1).astype('int16')
        except Exception:
            df['Elevation_Binned'] = -1

        # Cast numerics to float32 for CatBoost too (it can handle float features well)
        for c in df.columns:
            if c == 'Cover_Type' or c == 'Id':
                continue
            if pd.api.types.is_float_dtype(df[c]) or pd.api.types.is_integer_dtype(df[c]):
                df[c] = df[c].astype(np.float32)
        return df

    train_fe = fe(train)
    test_fe = fe(test)

    # Build feature list (drop Id and target)
    drop_cols = [c for c in ['Cover_Type', id_col] if c is not None]
    features = [c for c in train_fe.columns if c not in drop_cols]
    X = train_fe[features].astype(np.float32)
    X_test = test_fe[features].astype(np.float32)
    print(f"X shape: {X.shape}, X_test: {X_test.shape}, features: {len(features)}", flush=True)

    # CatBoost params (GPU)
    cat_params = {
        'loss_function': 'MultiClass',
        'task_type': 'GPU',
        'devices': '0',
        'iterations': 5000,
        'learning_rate': 0.06,
        'depth': 8,
        'l2_leaf_reg': 4.0,
        'border_count': 254,
        'random_strength': 0.8,
        'bagging_temperature': 0.7,
        'od_type': 'Iter',
        'od_wait': 200,
        'verbose': 200,
        'random_seed': SEED,
        'classes_count': 7
    }

    def to_full_proba(p: np.ndarray, present_classes: np.ndarray, n_classes: int = 7) -> np.ndarray:
        # If CatBoost already outputs n_classes columns, return as-is; otherwise expand by present indices
        if p.shape[1] == n_classes:
            return p.astype(np.float32, copy=False)
        full = np.zeros((p.shape[0], n_classes), dtype=np.float32)
        full[:, present_classes] = p
        return full

    N_SPLITS = 3
    kf = KFold(n_splits=N_SPLITS, shuffle=True, random_state=SEED)
    oof_cat = np.zeros((len(X), 7), dtype=np.float32)
    test_pred_cat = np.zeros((len(X_test), 7), dtype=np.float32)
    fold_acc = []

    for i, (tr, va) in enumerate(kf.split(X), 1):
        t_fold = time.time()
        print(f"[CAT Fold {i}/{N_SPLITS}] train={len(tr)} valid={len(va)}", flush=True)
        tr_pool = Pool(X.iloc[tr], label=y.iloc[tr])
        va_pool = Pool(X.iloc[va], label=y.iloc[va])
        model = CatBoostClassifier(**cat_params)
        model.fit(tr_pool, eval_set=va_pool, use_best_model=True)
        present = np.unique(y.iloc[tr].values)
        p_va = model.predict_proba(va_pool)
        p_va_full = to_full_proba(p_va, present, n_classes=7)
        oof_cat[va] = p_va_full
        acc = accuracy_score(y.iloc[va], np.argmax(p_va_full, axis=1))
        fold_acc.append(acc)
        p_te = model.predict_proba(X_test)
        p_te_full = to_full_proba(p_te, present, n_classes=7)
        test_pred_cat += p_te_full / N_SPLITS
        print(f"Fold {i} ACC={acc:.6f} | best_tree={model.get_best_iteration()} | elapsed={time.time()-t_fold:.1f}s", flush=True)
        del tr_pool, va_pool, model; gc.collect()

    oof_acc_cat = accuracy_score(y, np.argmax(oof_cat, axis=1))
    print(f"CAT OOF ACC: {oof_acc_cat:.6f}; per-fold: {', '.join(f'{a:.6f}' for a in fold_acc)}")

    # Save a CatBoost-only submission preview (will be overwritten by ensemble later)
    sub_cat = pd.DataFrame({'Id': test['Id'] if 'Id' in test.columns else np.arange(len(test_pred_cat)),
                            'Cover_Type': np.argmax(test_pred_cat, axis=1) + 1})
    sub_cat.to_csv('submission_cat.csv', index=False)
    print(f"Saved submission_cat.csv with shape {sub_cat.shape}")

    # If XGB preds available (from previous cell), create a simple ensemble and write submission.csv
    if 'test_pred' in globals():
        # test_pred from XGB cell; oof available as 'oof'
        w_xgb, w_cat = 0.6, 0.4
        ens_test = w_xgb * test_pred + w_cat * test_pred_cat
        sub = pd.DataFrame({'Id': test['Id'] if 'Id' in test.columns else np.arange(len(ens_test)),
                            'Cover_Type': np.argmax(ens_test, axis=1) + 1})
        sub.to_csv('submission.csv', index=False)
        print(f"Saved ensemble submission.csv (XGB {w_xgb:.2f} + CAT {w_cat:.2f}) with shape {sub.shape}")
    else:
        print("XGB predictions not found in kernel; keeping submission_cat.csv only.")

except Exception as e:
    print('ERROR in CatBoost cell:', e)
    traceback.print_exc()
print(f"Done in {time.time()-t0:.1f}s")

Setting up GPU CatBoost and running 3-fold CV on DEV subset (200k)...


[ok] catboost==1.2.5 present


X shape: (200000, 32), X_test: (400000, 32), features: 32


[CAT Fold 1/3] train=133333 valid=66667


Found only 6 unique classes in the data, but have defined 7 classes. Probably something is wrong with data.


0:	learn: 1.5503531	test: 1.5503608	best: 1.5503608 (0)	total: 13.8ms	remaining: 1m 8s


200:	learn: 0.1262748	test: 0.1340450	best: 0.1340450 (200)	total: 2.38s	remaining: 56.9s


400:	learn: 0.1107387	test: 0.1251421	best: 0.1251421 (400)	total: 4.68s	remaining: 53.7s


600:	learn: 0.1018655	test: 0.1223072	best: 0.1223001 (599)	total: 7s	remaining: 51.2s


800:	learn: 0.0944463	test: 0.1212213	best: 0.1212208 (790)	total: 9.44s	remaining: 49.5s


1000:	learn: 0.0882855	test: 0.1204898	best: 0.1204823 (992)	total: 11.8s	remaining: 47s


1200:	learn: 0.0827691	test: 0.1199880	best: 0.1199839 (1198)	total: 14.1s	remaining: 44.7s


1400:	learn: 0.0775484	test: 0.1198333	best: 0.1197983 (1349)	total: 16.5s	remaining: 42.4s


1600:	learn: 0.0728532	test: 0.1197322	best: 0.1196356 (1495)	total: 19s	remaining: 40.3s


bestTest = 0.1196356372
bestIteration = 1495
Shrink model to first 1496 iterations.


Fold 1 ACC=0.949675 | best_tree=1495 | elapsed=21.2s


[CAT Fold 2/3] train=133333 valid=66667


Found only 6 unique classes in the data, but have defined 7 classes. Probably something is wrong with data.


0:	learn: 1.5500470	test: 1.5505635	best: 1.5505635 (0)	total: 13.4ms	remaining: 1m 6s


200:	learn: 0.1267807	test: 0.1347256	best: 0.1347256 (200)	total: 2.28s	remaining: 54.5s


400:	learn: 0.1101917	test: 0.1249686	best: 0.1249674 (399)	total: 4.56s	remaining: 52.3s


600:	learn: 0.1010499	test: 0.1217883	best: 0.1217883 (600)	total: 6.83s	remaining: 50s


800:	learn: 0.0940326	test: 0.1202643	best: 0.1202620 (799)	total: 9.22s	remaining: 48.3s


1000:	learn: 0.0876433	test: 0.1195130	best: 0.1195130 (1000)	total: 11.5s	remaining: 45.9s


1200:	learn: 0.0818445	test: 0.1189827	best: 0.1189740 (1187)	total: 13.8s	remaining: 43.6s


1400:	learn: 0.0765303	test: 0.1186273	best: 0.1186244 (1398)	total: 16.1s	remaining: 41.4s


1600:	learn: 0.0717977	test: 0.1184666	best: 0.1184499 (1596)	total: 18.5s	remaining: 39.2s


1800:	learn: 0.0674341	test: 0.1185135	best: 0.1184371 (1697)	total: 20.9s	remaining: 37.1s


bestTest = 0.1184370787
bestIteration = 1697
Shrink model to first 1698 iterations.


Fold 2 ACC=0.951085 | best_tree=1697 | elapsed=23.1s


[CAT Fold 3/3] train=133334 valid=66666


Found only 6 unique classes in the data, but have defined 7 classes. Probably something is wrong with data.


0:	learn: 1.5506373	test: 1.5503739	best: 1.5503739 (0)	total: 12.7ms	remaining: 1m 3s


200:	learn: 0.1272975	test: 0.1336025	best: 0.1336025 (200)	total: 2.35s	remaining: 56.2s


400:	learn: 0.1116826	test: 0.1246460	best: 0.1246460 (400)	total: 4.67s	remaining: 53.5s


600:	learn: 0.1025324	test: 0.1216219	best: 0.1216219 (600)	total: 7.04s	remaining: 51.6s


800:	learn: 0.0955271	test: 0.1202064	best: 0.1202036 (797)	total: 9.4s	remaining: 49.3s


1000:	learn: 0.0896963	test: 0.1197817	best: 0.1197770 (998)	total: 11.7s	remaining: 46.8s


1200:	learn: 0.0841207	test: 0.1191333	best: 0.1191333 (1200)	total: 14.1s	remaining: 44.5s


1400:	learn: 0.0790783	test: 0.1188381	best: 0.1187868 (1370)	total: 16.4s	remaining: 42.2s


1600:	learn: 0.0743322	test: 0.1188102	best: 0.1187413 (1534)	total: 18.9s	remaining: 40s


1800:	learn: 0.0700924	test: 0.1187026	best: 0.1186862 (1771)	total: 21.2s	remaining: 37.7s


bestTest = 0.1186861893
bestIteration = 1771
Shrink model to first 1772 iterations.


Fold 3 ACC=0.950395 | best_tree=1771 | elapsed=24.4s


CAT OOF ACC: 0.950385; per-fold: 0.949675, 0.951085, 0.950395


Saved submission_cat.csv with shape (400000, 2)


Saved ensemble submission.csv (XGB 0.60 + CAT 0.40) with shape (400000, 2)
Done in 85.9s


In [29]:
import sys, subprocess, time, os, gc, traceback
print("Full-data 5-fold training: XGBoost (GPU) + CatBoost (GPU) with enhanced FE and OOF weight optimization", flush=True)

def ensure(pkg):
    try:
        __import__(pkg.split('==')[0])
        print(f"[ok] {pkg} present")
    except Exception:
        print(f"[install] {pkg}")
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', pkg, '-q'])

t0_all = time.time()
try:
    ensure('xgboost==2.0.3')
    ensure('catboost==1.2.5')
    import numpy as np, pandas as pd, xgboost as xgb
    from catboost import CatBoostClassifier, Pool
    from sklearn.model_selection import KFold
    from sklearn.metrics import accuracy_score

    SEED = int(os.environ.get('SEED','42'))
    N_SPLITS = 5
    np.random.seed(SEED)

    print("Loading data...", flush=True)
    train = pd.read_csv('train.csv')
    test = pd.read_csv('test.csv')
    id_col = 'Id' if 'Id' in train.columns else None
    y = (train['Cover_Type'].astype(int) - 1).astype(np.int32)
    print(f"train: {train.shape}, test: {test.shape}")

    # Enhanced FE (keep one-hots; add compressed cats and additional interactions)
    def fe(df: pd.DataFrame) -> pd.DataFrame:
        df = df.copy()
        Hhyd = 'Horizontal_Distance_To_Hydrology'; Vhyd = 'Vertical_Distance_To_Hydrology'
        hs9, hs12, hs3 = 'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm'
        aspect = 'Aspect'; elev = 'Elevation'
        Hroad = 'Horizontal_Distance_To_Roadways'; Hfire = 'Horizontal_Distance_To_Fire_Points'

        # Core hydrology distances/angles
        df['Euclidean_Distance_To_Hydrology'] = np.sqrt(df[Hhyd]**2 + df[Vhyd]**2)
        df['Manhattan_Distance_To_Hydrology'] = np.abs(df[Hhyd]) + np.abs(df[Vhyd])
        df['Elevation_VD_Hydrology'] = df[elev] - df[Vhyd]
        df['Elevation_Plus_VD_Hydrology'] = df[elev] + df[Vhyd]
        df['Elevation_minus_Euclidean_Dist_Hydrology'] = df[elev] - df['Euclidean_Distance_To_Hydrology']
        df['Hydro_Ratio'] = df[Hhyd] / (df[Vhyd].abs() + 1.0)
        ang = np.arctan2(df[Vhyd].astype(float), df[Hhyd].astype(float))
        df['Hydro_Angle_Sin'] = np.sin(ang); df['Hydro_Angle_Cos'] = np.cos(ang)

        # Hillshade statistics
        df['Hillshade_Mean'] = (df[hs9] + df[hs12] + df[hs3]) / 3.0
        df['Hillshade_Min'] = df[[hs9, hs12, hs3]].min(axis=1)
        df['Hillshade_Max'] = df[[hs9, hs12, hs3]].max(axis=1)
        df['Hillshade_Range'] = df['Hillshade_Max'] - df['Hillshade_Min']
        df['Hillshade_Diff_9_3'] = df[hs9] - df[hs3]

        # Aspect transforms
        rad = np.deg2rad(df[aspect].astype(float))
        df['Aspect_Sin'] = np.sin(rad); df['Aspect_Cos'] = np.cos(rad)

        # Distance interactions
        df['Road_Fire_AbsDiff'] = np.abs(df[Hroad] - df[Hfire])
        df['Hydro_Road_AbsDiff'] = np.abs(df[Hhyd] - df[Hroad])
        df['Hydro_Fire_AbsDiff'] = np.abs(df[Hhyd] - df[Hfire])

        # Expert additions
        df['Slope_Elevation_Product'] = df['Slope'] * df['Elevation']
        df['Elevation_div_Slope'] = df['Elevation'] / (df['Slope'] + 1.0)
        df['Hillshade_Std'] = df[[hs9, hs12, hs3]].std(axis=1)
        df['HS_9_over_Noon'] = df[hs9] / (df[hs12] + 1.0)
        df['Elevation_minus_H_Roads'] = df['Elevation'] - df[Hroad]
        df['Total_Distance_Sum'] = df[Hhyd] + df[Hroad] + df[Hfire]
        df['VHyd_Neg'] = (df[Vhyd] < 0).astype(np.int16)

        # Keep one-hots, also add compressed categories and sums
        w_cols = [c for c in df.columns if c.startswith('Wilderness_Area')]
        if w_cols:
            w_cols_sorted = sorted(w_cols)
            warr = df[w_cols_sorted].to_numpy(dtype=np.int8, copy=False)
            w_cat = warr.argmax(axis=1).astype(np.int16) + 1
            df['Wilderness_Area_cat'] = w_cat
            df['Wilderness_Area_Sum'] = warr.sum(axis=1).astype(np.int16)
        else:
            df['Wilderness_Area_cat'] = 0; df['Wilderness_Area_Sum'] = 0
        s_cols = [c for c in df.columns if c.startswith('Soil_Type')]
        if s_cols:
            s_cols_sorted = sorted(s_cols)
            sarr = df[s_cols_sorted].to_numpy(dtype=np.int8, copy=False)
            s_cat = sarr.argmax(axis=1).astype(np.int16) + 1
            df['Soil_Type_cat'] = s_cat
            df['Soil_Type_Sum'] = sarr.sum(axis=1).astype(np.int16)
        else:
            df['Soil_Type_cat'] = 0; df['Soil_Type_Sum'] = 0
        df['Soil_Wilderness_Interaction'] = (df['Soil_Type_cat'].astype(np.int32)*100 + df['Wilderness_Area_cat'].astype(np.int32)).astype(np.int32)
        # Elevation binned
        try:
            df['Elevation_Binned'] = pd.cut(df['Elevation'], bins=30, labels=False).astype('float32').fillna(-1).astype('int16')
        except Exception:
            df['Elevation_Binned'] = -1

        # Downcast numerics
        for c in df.columns:
            if c == 'Cover_Type':
                continue
            if pd.api.types.is_integer_dtype(df[c]):
                df[c] = pd.to_numeric(df[c], downcast='integer')
            elif pd.api.types.is_float_dtype(df[c]):
                df[c] = pd.to_numeric(df[c], downcast='float')
        return df

    t_feat = time.time()
    train_fe = fe(train)
    test_fe = fe(test)
    print(f"FE done in {time.time()-t_feat:.1f}s", flush=True)

    # Features (keep one-hots); drop Id and target
    drop_cols = ['Cover_Type'] + ([id_col] if id_col is not None else [])
    features = [c for c in train_fe.columns if c not in drop_cols]
    X = train_fe[features].astype(np.float32)
    X_test = test_fe[features].astype(np.float32)

    # Sanitize non-finite values
    def sanitize(df_: pd.DataFrame, name: str):
        a = df_.to_numpy()
        n_inf = np.isinf(a).sum()
        n_nan = np.isnan(a).sum()
        if n_inf or n_nan:
            print(f"Before sanitize {name}: inf={n_inf}, nan={n_nan}", flush=True)
        df_.replace([np.inf, -np.inf], np.nan, inplace=True)
        a = df_.to_numpy()
        n_inf2 = np.isinf(a).sum()
        n_nan2 = np.isnan(a).sum()
        if n_inf2 or n_nan2:
            print(f"After replace {name}: inf={n_inf2}, nan={n_nan2}", flush=True)
    sanitize(X, 'X')
    sanitize(X_test, 'X_test')
    print(f"Using {len(features)} features; X={X.shape}, X_test={X_test.shape}")

    # Common CV
    kf = KFold(n_splits=N_SPLITS, shuffle=True, random_state=SEED)

    # XGBoost params (expert)
    xgb_params = {
        'objective': 'multi:softprob',
        'num_class': 7,
        'device': 'cuda',
        'tree_method': 'hist',
        'learning_rate': 0.03,
        'max_depth': 9,
        'min_child_weight': 5,
        'subsample': 0.8,
        'colsample_bytree': 0.75,
        'lambda': 1.5,
        'alpha': 0.5,
        'eval_metric': 'mlogloss'
    }

    # CatBoost params (expert)
    cat_params = {
        'loss_function': 'MultiClass',
        'task_type': 'GPU',
        'iterations': 8000,
        'learning_rate': 0.035,
        'depth': 9,
        'l2_leaf_reg': 5.0,
        'border_count': 254,
        'random_strength': 0.8,
        'bagging_temperature': 0.7,
        'od_type': 'Iter',
        'od_wait': 250,
        'classes_count': 7,
        'random_seed': SEED,
        'verbose': 200
    }

    # Storage
    oof_xgb = np.zeros((len(X), 7), dtype=np.float32)
    oof_cat = np.zeros((len(X), 7), dtype=np.float32)
    test_xgb = np.zeros((len(X_test), 7), dtype=np.float32)
    test_cat = np.zeros((len(X_test), 7), dtype=np.float32)
    acc_xgb, acc_cat = [], []

    # Train per fold
    for fold, (tr, va) in enumerate(kf.split(X), 1):
        t_fold = time.time()
        print(f"\n[Fold {fold}/{N_SPLITS}] train={len(tr)} valid={len(va)}", flush=True)
        # XGB
        dtr = xgb.DMatrix(X.iloc[tr], label=y.iloc[tr], missing=np.nan)
        dva = xgb.DMatrix(X.iloc[va], label=y.iloc[va], missing=np.nan)
        dte = xgb.DMatrix(X_test, missing=np.nan)
        booster = xgb.train(
            params=xgb_params,
            dtrain=dtr,
            num_boost_round=5000,
            evals=[(dtr,'train'), (dva,'valid')],
            early_stopping_rounds=200,
            verbose_eval=250
        )
        p_va = booster.predict(dva, iteration_range=(0, booster.best_iteration+1))
        oof_xgb[va] = p_va
        p_te = booster.predict(dte, iteration_range=(0, booster.best_iteration+1))
        test_xgb += p_te / N_SPLITS
        ax = accuracy_score(y.iloc[va], np.argmax(p_va, axis=1)); acc_xgb.append(ax)
        print(f"XGB fold ACC={ax:.6f}; iters={booster.best_iteration+1}")
        del dtr, dva, dte, booster; gc.collect()

        # CatBoost
        tr_pool = Pool(X.iloc[tr], label=y.iloc[tr])
        va_pool = Pool(X.iloc[va], label=y.iloc[va])
        model = CatBoostClassifier(**cat_params)
        model.fit(tr_pool, eval_set=va_pool, use_best_model=True)
        p_va = model.predict_proba(va_pool)
        oof_cat[va] = p_va.astype(np.float32)
        p_te = model.predict_proba(X_test)
        test_cat += p_te.astype(np.float32) / N_SPLITS
        ac = accuracy_score(y.iloc[va], np.argmax(p_va, axis=1)); acc_cat.append(ac)
        print(f"CAT fold ACC={ac:.6f}; best_iter={model.get_best_iteration()}")
        del tr_pool, va_pool, model; gc.collect()
        print(f"Fold {fold} elapsed {time.time()-t_fold:.1f}s", flush=True)

    # OOF metrics
    oof_acc_xgb = accuracy_score(y, np.argmax(oof_xgb, axis=1))
    oof_acc_cat = accuracy_score(y, np.argmax(oof_cat, axis=1))
    print(f"\nOOF XGB: {oof_acc_xgb:.6f}; per-fold: {', '.join(f'{a:.6f}' for a in acc_xgb)}")
    print(f"OOF CAT: {oof_acc_cat:.6f}; per-fold: {', '.join(f'{a:.6f}' for a in acc_cat)}")

    # Optimize blend weight on OOF
    def blend_acc(w):
        p = w * oof_xgb + (1.0 - w) * oof_cat
        return accuracy_score(y, np.argmax(p, axis=1))
    best_w, best_acc = 0.5, 0.0
    for w in np.linspace(0.3, 0.7, 41):
        a = blend_acc(float(w))
        if a > best_acc:
            best_acc, best_w = a, float(w)
    print(f"Best OOF blend: w_xgb={best_w:.2f}, acc={best_acc:.6f}")

    # Blend test
    ens_test = best_w * test_xgb + (1.0 - best_w) * test_cat
    sub = pd.DataFrame({'Id': test[id_col] if id_col in test.columns else np.arange(len(ens_test)),
                        'Cover_Type': np.argmax(ens_test, axis=1) + 1})
    sub.to_csv('submission.csv', index=False)
    # Also save model-wise submissions for debugging
    pd.DataFrame({'Id': test[id_col] if id_col in test.columns else np.arange(len(test_xgb)),
                  'Cover_Type': np.argmax(test_xgb, axis=1) + 1}).to_csv('submission_xgb.csv', index=False)
    pd.DataFrame({'Id': test[id_col] if id_col in test.columns else np.arange(len(test_cat)),
                  'Cover_Type': np.argmax(test_cat, axis=1) + 1}).to_csv('submission_cat_full.csv', index=False)
    print(f"Saved submission.csv (blended) with shape {sub.shape}")
    print(f"Total elapsed: {time.time()-t0_all:.1f}s")

except Exception as e:
    print('ERROR in full-data training cell:', e)
    traceback.print_exc()

Full-data 5-fold training: XGBoost (GPU) + CatBoost (GPU) with enhanced FE and OOF weight optimization


[ok] xgboost==2.0.3 present
[ok] catboost==1.2.5 present
Loading data...


train: (3600000, 56), test: (400000, 55)


FE done in 12.3s


Before sanitize X: inf=5911, nan=0


After replace X: inf=0, nan=5911


Before sanitize X_test: inf=658, nan=0


After replace X_test: inf=0, nan=658


Using 85 features; X=(3600000, 85), X_test=(400000, 85)

[Fold 1/5] train=2880000 valid=720000


[0]	train-mlogloss:1.85145	valid-mlogloss:1.85155


[250]	train-mlogloss:0.07740	valid-mlogloss:0.08435


[500]	train-mlogloss:0.06629	valid-mlogloss:0.07756


[750]	train-mlogloss:0.06038	valid-mlogloss:0.07632


[1000]	train-mlogloss:0.05541	valid-mlogloss:0.07592


[1250]	train-mlogloss:0.05106	valid-mlogloss:0.07582


[1500]	train-mlogloss:0.04710	valid-mlogloss:0.07581


[1584]	train-mlogloss:0.04588	valid-mlogloss:0.07583


XGB fold ACC=0.962311; iters=1386




Found only 6 unique classes in the data, but have defined 7 classes. Probably something is wrong with data.
Label(s) 4 are not present in the train set. Perhaps, something is wrong with the data.


0:	learn: 1.6455023	test: 1.6455318	best: 1.6455318 (0)	total: 54.1ms	remaining: 7m 12s


200:	learn: 0.1184775	test: 0.1193431	best: 0.1193431 (200)	total: 9.57s	remaining: 6m 11s


400:	learn: 0.0965023	test: 0.0978437	best: 0.0978437 (400)	total: 19.3s	remaining: 6m 5s


600:	learn: 0.0884318	test: 0.0902715	best: 0.0902715 (600)	total: 29.1s	remaining: 5m 58s


800:	learn: 0.0838452	test: 0.0861929	best: 0.0861929 (800)	total: 38.7s	remaining: 5m 47s


1000:	learn: 0.0810057	test: 0.0838537	best: 0.0838537 (1000)	total: 48.6s	remaining: 5m 39s


1200:	learn: 0.0789459	test: 0.0823000	best: 0.0823000 (1200)	total: 58s	remaining: 5m 28s


1400:	learn: 0.0773026	test: 0.0811492	best: 0.0811492 (1400)	total: 1m 7s	remaining: 5m 17s


1600:	learn: 0.0760091	test: 0.0803490	best: 0.0803490 (1600)	total: 1m 16s	remaining: 5m 7s


1800:	learn: 0.0749190	test: 0.0797603	best: 0.0797603 (1800)	total: 1m 26s	remaining: 4m 57s


2000:	learn: 0.0739730	test: 0.0792901	best: 0.0792901 (2000)	total: 1m 35s	remaining: 4m 47s


2200:	learn: 0.0731134	test: 0.0789024	best: 0.0789024 (2200)	total: 1m 45s	remaining: 4m 36s


2400:	learn: 0.0723539	test: 0.0786229	best: 0.0786229 (2400)	total: 1m 54s	remaining: 4m 27s


2600:	learn: 0.0716611	test: 0.0783879	best: 0.0783879 (2600)	total: 2m 3s	remaining: 4m 16s


2800:	learn: 0.0710067	test: 0.0782005	best: 0.0782005 (2800)	total: 2m 12s	remaining: 4m 6s


3000:	learn: 0.0703777	test: 0.0780337	best: 0.0780337 (3000)	total: 2m 21s	remaining: 3m 56s


3200:	learn: 0.0697759	test: 0.0778789	best: 0.0778789 (3200)	total: 2m 31s	remaining: 3m 46s


3400:	learn: 0.0692201	test: 0.0777498	best: 0.0777498 (3400)	total: 2m 40s	remaining: 3m 36s


3600:	learn: 0.0686636	test: 0.0776363	best: 0.0776362 (3599)	total: 2m 49s	remaining: 3m 26s


3800:	learn: 0.0681093	test: 0.0775461	best: 0.0775461 (3800)	total: 2m 58s	remaining: 3m 17s


4000:	learn: 0.0675624	test: 0.0774365	best: 0.0774365 (4000)	total: 3m 7s	remaining: 3m 7s


4200:	learn: 0.0670228	test: 0.0773477	best: 0.0773477 (4200)	total: 3m 16s	remaining: 2m 57s


4400:	learn: 0.0665188	test: 0.0772770	best: 0.0772766 (4399)	total: 3m 26s	remaining: 2m 48s


4600:	learn: 0.0660356	test: 0.0772031	best: 0.0772031 (4600)	total: 3m 35s	remaining: 2m 38s


4800:	learn: 0.0655430	test: 0.0771417	best: 0.0771417 (4800)	total: 3m 44s	remaining: 2m 29s


5000:	learn: 0.0650552	test: 0.0770978	best: 0.0770978 (5000)	total: 3m 53s	remaining: 2m 20s


5200:	learn: 0.0645790	test: 0.0770483	best: 0.0770483 (5198)	total: 4m 2s	remaining: 2m 10s


5400:	learn: 0.0641023	test: 0.0769872	best: 0.0769872 (5400)	total: 4m 11s	remaining: 2m 1s


5600:	learn: 0.0636466	test: 0.0769580	best: 0.0769579 (5599)	total: 4m 21s	remaining: 1m 51s


5800:	learn: 0.0631815	test: 0.0769275	best: 0.0769271 (5799)	total: 4m 30s	remaining: 1m 42s


6000:	learn: 0.0627283	test: 0.0768934	best: 0.0768934 (6000)	total: 4m 39s	remaining: 1m 33s


6200:	learn: 0.0622703	test: 0.0768585	best: 0.0768575 (6194)	total: 4m 48s	remaining: 1m 23s


6400:	learn: 0.0618244	test: 0.0768372	best: 0.0768362 (6393)	total: 4m 58s	remaining: 1m 14s


6600:	learn: 0.0613808	test: 0.0768140	best: 0.0768139 (6599)	total: 5m 7s	remaining: 1m 5s


6800:	learn: 0.0609593	test: 0.0768007	best: 0.0767975 (6728)	total: 5m 16s	remaining: 55.9s


7000:	learn: 0.0605590	test: 0.0767846	best: 0.0767846 (7000)	total: 5m 26s	remaining: 46.5s


7200:	learn: 0.0601296	test: 0.0767798	best: 0.0767764 (7146)	total: 5m 35s	remaining: 37.2s


7400:	learn: 0.0597205	test: 0.0767606	best: 0.0767601 (7397)	total: 5m 44s	remaining: 27.9s


7600:	learn: 0.0592962	test: 0.0767496	best: 0.0767494 (7591)	total: 5m 54s	remaining: 18.6s


7800:	learn: 0.0588882	test: 0.0767377	best: 0.0767374 (7799)	total: 6m 3s	remaining: 9.28s


7999:	learn: 0.0584697	test: 0.0767264	best: 0.0767240 (7965)	total: 6m 13s	remaining: 0us
bestTest = 0.07672402344
bestIteration = 7965
Shrink model to first 7966 iterations.


CAT fold ACC=0.962121; best_iter=7965
Fold 1 elapsed 2477.7s



[Fold 2/5] train=2880000 valid=720000


[0]	train-mlogloss:1.85146	valid-mlogloss:1.85154


[250]	train-mlogloss:0.07760	valid-mlogloss:0.08403


[500]	train-mlogloss:0.06641	valid-mlogloss:0.07728


[750]	train-mlogloss:0.06043	valid-mlogloss:0.07606


[1000]	train-mlogloss:0.05551	valid-mlogloss:0.07570


[1250]	train-mlogloss:0.05115	valid-mlogloss:0.07561


[1500]	train-mlogloss:0.04720	valid-mlogloss:0.07562


[1580]	train-mlogloss:0.04603	valid-mlogloss:0.07564


XGB fold ACC=0.962483; iters=1381




0:	learn: 1.7695535	test: 1.7695566	best: 1.7695566 (0)	total: 59.1ms	remaining: 7m 52s


200:	learn: 0.1179695	test: 0.1182176	best: 0.1182176 (200)	total: 10.7s	remaining: 6m 54s


400:	learn: 0.0965680	test: 0.0972128	best: 0.0972128 (400)	total: 21.2s	remaining: 6m 41s


600:	learn: 0.0885758	test: 0.0896206	best: 0.0896206 (600)	total: 31.7s	remaining: 6m 30s


800:	learn: 0.0840530	test: 0.0855682	best: 0.0855682 (800)	total: 42.3s	remaining: 6m 20s


1000:	learn: 0.0811119	test: 0.0831105	best: 0.0831105 (1000)	total: 52.9s	remaining: 6m 9s


1200:	learn: 0.0790461	test: 0.0815440	best: 0.0815440 (1200)	total: 1m 3s	remaining: 5m 58s


1400:	learn: 0.0774096	test: 0.0804291	best: 0.0804291 (1400)	total: 1m 13s	remaining: 5m 47s


1600:	learn: 0.0761480	test: 0.0796449	best: 0.0796449 (1600)	total: 1m 23s	remaining: 5m 35s


1800:	learn: 0.0751263	test: 0.0790954	best: 0.0790954 (1800)	total: 1m 34s	remaining: 5m 24s


2000:	learn: 0.0742239	test: 0.0786984	best: 0.0786984 (2000)	total: 1m 44s	remaining: 5m 12s


2200:	learn: 0.0733996	test: 0.0783419	best: 0.0783419 (2200)	total: 1m 54s	remaining: 5m 1s


2400:	learn: 0.0726166	test: 0.0780524	best: 0.0780524 (2400)	total: 2m 4s	remaining: 4m 49s


2600:	learn: 0.0719567	test: 0.0778571	best: 0.0778571 (2600)	total: 2m 14s	remaining: 4m 38s


2800:	learn: 0.0713163	test: 0.0776675	best: 0.0776675 (2800)	total: 2m 24s	remaining: 4m 28s


3000:	learn: 0.0706908	test: 0.0775054	best: 0.0775054 (3000)	total: 2m 34s	remaining: 4m 17s


3200:	learn: 0.0700926	test: 0.0773785	best: 0.0773785 (3200)	total: 2m 44s	remaining: 4m 6s


3400:	learn: 0.0694950	test: 0.0772403	best: 0.0772403 (3400)	total: 2m 54s	remaining: 3m 56s


3600:	learn: 0.0689345	test: 0.0771334	best: 0.0771334 (3600)	total: 3m 4s	remaining: 3m 45s


3800:	learn: 0.0683834	test: 0.0770219	best: 0.0770219 (3800)	total: 3m 15s	remaining: 3m 35s


4000:	learn: 0.0678316	test: 0.0769282	best: 0.0769275 (3991)	total: 3m 25s	remaining: 3m 25s


4200:	learn: 0.0673110	test: 0.0768492	best: 0.0768492 (4200)	total: 3m 35s	remaining: 3m 14s


4400:	learn: 0.0668221	test: 0.0767862	best: 0.0767862 (4400)	total: 3m 45s	remaining: 3m 4s


4600:	learn: 0.0662927	test: 0.0767359	best: 0.0767359 (4599)	total: 3m 56s	remaining: 2m 54s


4800:	learn: 0.0658087	test: 0.0766839	best: 0.0766839 (4800)	total: 4m 6s	remaining: 2m 44s


5000:	learn: 0.0652998	test: 0.0766401	best: 0.0766401 (5000)	total: 4m 16s	remaining: 2m 33s


5200:	learn: 0.0648215	test: 0.0765865	best: 0.0765859 (5196)	total: 4m 26s	remaining: 2m 23s


5400:	learn: 0.0643342	test: 0.0765423	best: 0.0765423 (5400)	total: 4m 36s	remaining: 2m 13s


5600:	learn: 0.0638546	test: 0.0765045	best: 0.0765045 (5600)	total: 4m 47s	remaining: 2m 3s


5800:	learn: 0.0633735	test: 0.0764791	best: 0.0764787 (5785)	total: 4m 57s	remaining: 1m 52s


6000:	learn: 0.0629476	test: 0.0764617	best: 0.0764617 (6000)	total: 5m 7s	remaining: 1m 42s


6200:	learn: 0.0625058	test: 0.0764345	best: 0.0764343 (6198)	total: 5m 17s	remaining: 1m 32s


6400:	learn: 0.0620326	test: 0.0764046	best: 0.0764046 (6400)	total: 5m 28s	remaining: 1m 21s


6600:	learn: 0.0615866	test: 0.0763893	best: 0.0763890 (6596)	total: 5m 38s	remaining: 1m 11s


6800:	learn: 0.0611524	test: 0.0763820	best: 0.0763805 (6786)	total: 5m 48s	remaining: 1m 1s


7000:	learn: 0.0607125	test: 0.0763697	best: 0.0763680 (6940)	total: 5m 58s	remaining: 51.2s


7200:	learn: 0.0602803	test: 0.0763572	best: 0.0763566 (7182)	total: 6m 9s	remaining: 41s


7400:	learn: 0.0598588	test: 0.0763450	best: 0.0763450 (7399)	total: 6m 19s	remaining: 30.7s


7600:	learn: 0.0594200	test: 0.0763326	best: 0.0763316 (7596)	total: 6m 29s	remaining: 20.5s


7800:	learn: 0.0590148	test: 0.0763230	best: 0.0763222 (7784)	total: 6m 39s	remaining: 10.2s


7999:	learn: 0.0586132	test: 0.0763149	best: 0.0763149 (7999)	total: 6m 49s	remaining: 0us
bestTest = 0.07631486003
bestIteration = 7999


CAT fold ACC=0.962536; best_iter=7999
Fold 2 elapsed 2523.8s



[Fold 3/5] train=2880000 valid=720000


[0]	train-mlogloss:1.85145	valid-mlogloss:1.85151


[250]	train-mlogloss:0.07757	valid-mlogloss:0.08352


[500]	train-mlogloss:0.06652	valid-mlogloss:0.07669


[750]	train-mlogloss:0.06053	valid-mlogloss:0.07546


[1000]	train-mlogloss:0.05555	valid-mlogloss:0.07508


[1250]	train-mlogloss:0.05120	valid-mlogloss:0.07497


[1500]	train-mlogloss:0.04725	valid-mlogloss:0.07499


[1547]	train-mlogloss:0.04655	valid-mlogloss:0.07500


XGB fold ACC=0.962624; iters=1349




0:	learn: 1.7695536	test: 1.7695500	best: 1.7695500 (0)	total: 59ms	remaining: 7m 51s


200:	learn: 0.1181607	test: 0.1183694	best: 0.1183694 (200)	total: 10.7s	remaining: 6m 53s


400:	learn: 0.0967734	test: 0.0972338	best: 0.0972338 (400)	total: 21.1s	remaining: 6m 40s


600:	learn: 0.0886889	test: 0.0895456	best: 0.0895456 (600)	total: 31.7s	remaining: 6m 29s


800:	learn: 0.0841909	test: 0.0854915	best: 0.0854915 (800)	total: 42.3s	remaining: 6m 19s


1000:	learn: 0.0812743	test: 0.0830208	best: 0.0830208 (1000)	total: 52.9s	remaining: 6m 9s


1200:	learn: 0.0791647	test: 0.0813715	best: 0.0813715 (1200)	total: 1m 3s	remaining: 6m


1400:	learn: 0.0775187	test: 0.0802109	best: 0.0802109 (1400)	total: 1m 14s	remaining: 5m 48s


1600:	learn: 0.0762640	test: 0.0794330	best: 0.0794330 (1600)	total: 1m 24s	remaining: 5m 37s


1800:	learn: 0.0752359	test: 0.0788572	best: 0.0788572 (1800)	total: 1m 34s	remaining: 5m 25s


2000:	learn: 0.0743117	test: 0.0784029	best: 0.0784029 (2000)	total: 1m 44s	remaining: 5m 13s


2200:	learn: 0.0734626	test: 0.0780397	best: 0.0780397 (2200)	total: 1m 54s	remaining: 5m 2s


2400:	learn: 0.0727513	test: 0.0777843	best: 0.0777843 (2400)	total: 2m 4s	remaining: 4m 50s


2600:	learn: 0.0720499	test: 0.0775504	best: 0.0775504 (2600)	total: 2m 14s	remaining: 4m 39s


2800:	learn: 0.0714158	test: 0.0773515	best: 0.0773512 (2799)	total: 2m 24s	remaining: 4m 28s


3000:	learn: 0.0707845	test: 0.0771751	best: 0.0771751 (3000)	total: 2m 34s	remaining: 4m 17s


3200:	learn: 0.0702135	test: 0.0770333	best: 0.0770333 (3200)	total: 2m 44s	remaining: 4m 7s


3400:	learn: 0.0696079	test: 0.0768890	best: 0.0768890 (3400)	total: 2m 55s	remaining: 3m 56s


3600:	learn: 0.0690562	test: 0.0767717	best: 0.0767716 (3599)	total: 3m 5s	remaining: 3m 46s


3800:	learn: 0.0685094	test: 0.0766877	best: 0.0766870 (3798)	total: 3m 15s	remaining: 3m 35s


4000:	learn: 0.0679887	test: 0.0765926	best: 0.0765918 (3998)	total: 3m 25s	remaining: 3m 25s


4200:	learn: 0.0674610	test: 0.0765143	best: 0.0765140 (4199)	total: 3m 35s	remaining: 3m 14s


4400:	learn: 0.0669422	test: 0.0764348	best: 0.0764348 (4400)	total: 3m 45s	remaining: 3m 4s


4600:	learn: 0.0664526	test: 0.0763722	best: 0.0763713 (4596)	total: 3m 55s	remaining: 2m 54s


4800:	learn: 0.0659335	test: 0.0763217	best: 0.0763217 (4800)	total: 4m 6s	remaining: 2m 44s


5000:	learn: 0.0654390	test: 0.0762590	best: 0.0762590 (5000)	total: 4m 16s	remaining: 2m 33s


5200:	learn: 0.0649529	test: 0.0761975	best: 0.0761975 (5200)	total: 4m 26s	remaining: 2m 23s


5400:	learn: 0.0645121	test: 0.0761497	best: 0.0761497 (5400)	total: 4m 36s	remaining: 2m 13s


5600:	learn: 0.0640551	test: 0.0761189	best: 0.0761174 (5573)	total: 4m 46s	remaining: 2m 2s


5800:	learn: 0.0635927	test: 0.0760755	best: 0.0760755 (5800)	total: 4m 57s	remaining: 1m 52s


6000:	learn: 0.0631468	test: 0.0760488	best: 0.0760488 (6000)	total: 5m 7s	remaining: 1m 42s


6200:	learn: 0.0626709	test: 0.0760273	best: 0.0760273 (6200)	total: 5m 17s	remaining: 1m 32s


6400:	learn: 0.0622024	test: 0.0759879	best: 0.0759879 (6400)	total: 5m 28s	remaining: 1m 21s


6600:	learn: 0.0617661	test: 0.0759719	best: 0.0759684 (6544)	total: 5m 38s	remaining: 1m 11s


6800:	learn: 0.0613147	test: 0.0759462	best: 0.0759462 (6800)	total: 5m 48s	remaining: 1m 1s


7000:	learn: 0.0608590	test: 0.0759118	best: 0.0759118 (7000)	total: 5m 59s	remaining: 51.2s


7200:	learn: 0.0604121	test: 0.0759008	best: 0.0759008 (7200)	total: 6m 9s	remaining: 41s


7400:	learn: 0.0599882	test: 0.0758844	best: 0.0758819 (7364)	total: 6m 19s	remaining: 30.7s


7600:	learn: 0.0595637	test: 0.0758808	best: 0.0758797 (7496)	total: 6m 30s	remaining: 20.5s


7800:	learn: 0.0591465	test: 0.0758696	best: 0.0758696 (7800)	total: 6m 40s	remaining: 10.2s


7999:	learn: 0.0587223	test: 0.0758623	best: 0.0758613 (7993)	total: 6m 51s	remaining: 0us
bestTest = 0.0758612576
bestIteration = 7993
Shrink model to first 7994 iterations.


CAT fold ACC=0.962640; best_iter=7993
Fold 3 elapsed 2460.9s



[Fold 4/5] train=2880000 valid=720000


[0]	train-mlogloss:1.85144	valid-mlogloss:1.85152


[250]	train-mlogloss:0.07743	valid-mlogloss:0.08431


[500]	train-mlogloss:0.06633	valid-mlogloss:0.07754


[750]	train-mlogloss:0.06043	valid-mlogloss:0.07634


[1000]	train-mlogloss:0.05541	valid-mlogloss:0.07596


[1250]	train-mlogloss:0.05103	valid-mlogloss:0.07589


[1482]	train-mlogloss:0.04735	valid-mlogloss:0.07592


XGB fold ACC=0.962311; iters=1283




0:	learn: 1.7695922	test: 1.7696104	best: 1.7696104 (0)	total: 60.7ms	remaining: 8m 5s


200:	learn: 0.1180122	test: 0.1188137	best: 0.1188137 (200)	total: 10.5s	remaining: 6m 46s


400:	learn: 0.0965058	test: 0.0977058	best: 0.0977058 (400)	total: 20.9s	remaining: 6m 35s


600:	learn: 0.0884902	test: 0.0901654	best: 0.0901654 (600)	total: 31.3s	remaining: 6m 25s


800:	learn: 0.0840174	test: 0.0861931	best: 0.0861931 (800)	total: 41.8s	remaining: 6m 16s


1000:	learn: 0.0810555	test: 0.0837157	best: 0.0837157 (1000)	total: 52.2s	remaining: 6m 4s


1200:	learn: 0.0789357	test: 0.0821033	best: 0.0821033 (1200)	total: 1m 2s	remaining: 5m 54s


1400:	learn: 0.0773302	test: 0.0810007	best: 0.0810007 (1400)	total: 1m 12s	remaining: 5m 42s


1600:	learn: 0.0760722	test: 0.0802375	best: 0.0802375 (1600)	total: 1m 22s	remaining: 5m 31s


1800:	learn: 0.0749866	test: 0.0796509	best: 0.0796509 (1800)	total: 1m 33s	remaining: 5m 20s


2000:	learn: 0.0740886	test: 0.0792378	best: 0.0792378 (2000)	total: 1m 43s	remaining: 5m 8s


2200:	learn: 0.0732422	test: 0.0788850	best: 0.0788850 (2200)	total: 1m 58s	remaining: 5m 12s


2400:	learn: 0.0724750	test: 0.0786096	best: 0.0786096 (2400)	total: 2m 13s	remaining: 5m 11s


2600:	learn: 0.0717621	test: 0.0783629	best: 0.0783629 (2600)	total: 2m 23s	remaining: 4m 58s


2800:	learn: 0.0711059	test: 0.0781803	best: 0.0781803 (2800)	total: 2m 33s	remaining: 4m 45s


3000:	learn: 0.0704815	test: 0.0780026	best: 0.0780023 (2998)	total: 2m 43s	remaining: 4m 32s


3200:	learn: 0.0698774	test: 0.0778409	best: 0.0778409 (3200)	total: 2m 53s	remaining: 4m 20s


3400:	learn: 0.0692834	test: 0.0777225	best: 0.0777225 (3400)	total: 3m 3s	remaining: 4m 8s


3600:	learn: 0.0686990	test: 0.0776117	best: 0.0776117 (3600)	total: 3m 13s	remaining: 3m 56s


3800:	learn: 0.0681134	test: 0.0775186	best: 0.0775186 (3799)	total: 3m 23s	remaining: 3m 44s


4000:	learn: 0.0675808	test: 0.0774278	best: 0.0774278 (4000)	total: 3m 33s	remaining: 3m 33s


4200:	learn: 0.0670646	test: 0.0773571	best: 0.0773571 (4200)	total: 3m 43s	remaining: 3m 22s


4400:	learn: 0.0665319	test: 0.0772709	best: 0.0772709 (4400)	total: 3m 53s	remaining: 3m 10s


4600:	learn: 0.0660405	test: 0.0772108	best: 0.0772108 (4600)	total: 4m 3s	remaining: 2m 59s


4800:	learn: 0.0655412	test: 0.0771594	best: 0.0771594 (4800)	total: 4m 13s	remaining: 2m 49s


5000:	learn: 0.0650493	test: 0.0771122	best: 0.0771121 (4993)	total: 4m 29s	remaining: 2m 41s


5200:	learn: 0.0645606	test: 0.0770750	best: 0.0770750 (5199)	total: 4m 44s	remaining: 2m 33s


5400:	learn: 0.0640860	test: 0.0770404	best: 0.0770404 (5400)	total: 4m 58s	remaining: 2m 23s


5600:	learn: 0.0636044	test: 0.0770041	best: 0.0770040 (5594)	total: 5m 14s	remaining: 2m 14s


5800:	learn: 0.0631179	test: 0.0769669	best: 0.0769665 (5794)	total: 5m 28s	remaining: 2m 4s


6000:	learn: 0.0626570	test: 0.0769305	best: 0.0769304 (5999)	total: 5m 45s	remaining: 1m 54s


6200:	learn: 0.0622300	test: 0.0769002	best: 0.0769000 (6199)	total: 5m 59s	remaining: 1m 44s


6400:	learn: 0.0617823	test: 0.0768824	best: 0.0768819 (6395)	total: 6m 15s	remaining: 1m 33s


6600:	learn: 0.0613321	test: 0.0768531	best: 0.0768531 (6600)	total: 6m 30s	remaining: 1m 22s


6800:	learn: 0.0609152	test: 0.0768363	best: 0.0768363 (6800)	total: 6m 45s	remaining: 1m 11s


7000:	learn: 0.0604923	test: 0.0768181	best: 0.0768181 (7000)	total: 7m	remaining: 1m


7200:	learn: 0.0600556	test: 0.0767998	best: 0.0767982 (7181)	total: 7m 15s	remaining: 48.4s


7400:	learn: 0.0596499	test: 0.0767951	best: 0.0767907 (7346)	total: 7m 31s	remaining: 36.6s


7600:	learn: 0.0592161	test: 0.0767854	best: 0.0767834 (7592)	total: 7m 45s	remaining: 24.4s


7800:	learn: 0.0588050	test: 0.0767678	best: 0.0767665 (7794)	total: 8m 1s	remaining: 12.3s


7999:	learn: 0.0583894	test: 0.0767595	best: 0.0767581 (7954)	total: 8m 15s	remaining: 0us
bestTest = 0.07675805664
bestIteration = 7954
Shrink model to first 7955 iterations.


CAT fold ACC=0.962318; best_iter=7954
Fold 4 elapsed 2466.8s



[Fold 5/5] train=2880000 valid=720000


[0]	train-mlogloss:1.85147	valid-mlogloss:1.85154


[250]	train-mlogloss:0.07763	valid-mlogloss:0.08365


[500]	train-mlogloss:0.06653	valid-mlogloss:0.07695


[750]	train-mlogloss:0.06051	valid-mlogloss:0.07573


[1000]	train-mlogloss:0.05557	valid-mlogloss:0.07540


[1250]	train-mlogloss:0.05113	valid-mlogloss:0.07530


[1500]	train-mlogloss:0.04716	valid-mlogloss:0.07529


[1694]	train-mlogloss:0.04434	valid-mlogloss:0.07535


XGB fold ACC=0.962760; iters=1496


0:	learn: 1.7695587	test: 1.7695510	best: 1.7695510 (0)	total: 107ms	remaining: 14m 19s


200:	learn: 0.1177521	test: 0.1177183	best: 0.1177183 (200)	total: 16.2s	remaining: 10m 30s


400:	learn: 0.0967160	test: 0.0971537	best: 0.0971537 (400)	total: 30s	remaining: 9m 29s


600:	learn: 0.0885845	test: 0.0895287	best: 0.0895287 (600)	total: 46.7s	remaining: 9m 35s


800:	learn: 0.0841484	test: 0.0856031	best: 0.0856031 (800)	total: 1m 2s	remaining: 9m 19s


1000:	learn: 0.0812089	test: 0.0831830	best: 0.0831830 (1000)	total: 1m 12s	remaining: 8m 28s


1200:	learn: 0.0790644	test: 0.0815589	best: 0.0815589 (1200)	total: 1m 23s	remaining: 7m 50s


1400:	learn: 0.0774654	test: 0.0804554	best: 0.0804554 (1400)	total: 1m 33s	remaining: 7m 19s


1600:	learn: 0.0762152	test: 0.0796982	best: 0.0796982 (1600)	total: 1m 43s	remaining: 6m 53s


1800:	learn: 0.0751449	test: 0.0790995	best: 0.0790995 (1800)	total: 1m 53s	remaining: 6m 31s


2000:	learn: 0.0741916	test: 0.0786678	best: 0.0786678 (2000)	total: 2m 5s	remaining: 6m 15s


2200:	learn: 0.0733452	test: 0.0783122	best: 0.0783122 (2200)	total: 2m 21s	remaining: 6m 12s


2400:	learn: 0.0725612	test: 0.0780136	best: 0.0780136 (2400)	total: 2m 35s	remaining: 6m 3s


2600:	learn: 0.0718848	test: 0.0777969	best: 0.0777969 (2600)	total: 2m 51s	remaining: 5m 55s


2800:	learn: 0.0712355	test: 0.0776198	best: 0.0776198 (2800)	total: 3m 5s	remaining: 5m 44s


3000:	learn: 0.0706051	test: 0.0774487	best: 0.0774487 (3000)	total: 3m 20s	remaining: 5m 34s


3200:	learn: 0.0699848	test: 0.0773035	best: 0.0773031 (3199)	total: 3m 36s	remaining: 5m 24s


3400:	learn: 0.0693933	test: 0.0771746	best: 0.0771746 (3400)	total: 3m 50s	remaining: 5m 11s


3600:	learn: 0.0688197	test: 0.0770772	best: 0.0770772 (3599)	total: 4m 6s	remaining: 5m


3800:	learn: 0.0682640	test: 0.0769949	best: 0.0769949 (3800)	total: 4m 20s	remaining: 4m 47s


4000:	learn: 0.0677276	test: 0.0769110	best: 0.0769110 (4000)	total: 4m 36s	remaining: 4m 36s


4200:	learn: 0.0672009	test: 0.0768456	best: 0.0768456 (4200)	total: 4m 50s	remaining: 4m 22s


4400:	learn: 0.0666647	test: 0.0767649	best: 0.0767649 (4400)	total: 5m 6s	remaining: 4m 10s


4600:	learn: 0.0661496	test: 0.0766992	best: 0.0766992 (4600)	total: 5m 21s	remaining: 3m 57s


4800:	learn: 0.0656240	test: 0.0766368	best: 0.0766368 (4800)	total: 5m 36s	remaining: 3m 44s


5000:	learn: 0.0651610	test: 0.0765944	best: 0.0765944 (5000)	total: 5m 52s	remaining: 3m 31s


5200:	learn: 0.0646706	test: 0.0765516	best: 0.0765505 (5194)	total: 6m 7s	remaining: 3m 17s


5400:	learn: 0.0641920	test: 0.0765104	best: 0.0765104 (5400)	total: 6m 23s	remaining: 3m 4s


5600:	learn: 0.0637003	test: 0.0764600	best: 0.0764600 (5600)	total: 6m 36s	remaining: 2m 49s


5800:	learn: 0.0632296	test: 0.0764270	best: 0.0764264 (5797)	total: 6m 53s	remaining: 2m 36s


6000:	learn: 0.0627804	test: 0.0764096	best: 0.0764090 (5964)	total: 7m 8s	remaining: 2m 22s


6200:	learn: 0.0623232	test: 0.0763975	best: 0.0763975 (6200)	total: 7m 24s	remaining: 2m 8s


6400:	learn: 0.0618711	test: 0.0763763	best: 0.0763762 (6392)	total: 7m 39s	remaining: 1m 54s


6600:	learn: 0.0614146	test: 0.0763423	best: 0.0763423 (6600)	total: 7m 54s	remaining: 1m 40s


6800:	learn: 0.0609919	test: 0.0763253	best: 0.0763249 (6793)	total: 8m 5s	remaining: 1m 25s


7000:	learn: 0.0605772	test: 0.0763169	best: 0.0763169 (6990)	total: 8m 16s	remaining: 1m 10s


7200:	learn: 0.0601567	test: 0.0763104	best: 0.0763086 (7178)	total: 8m 26s	remaining: 56.2s


7400:	learn: 0.0597229	test: 0.0762935	best: 0.0762935 (7400)	total: 8m 36s	remaining: 41.8s


7600:	learn: 0.0593008	test: 0.0762831	best: 0.0762830 (7558)	total: 8m 46s	remaining: 27.7s


7800:	learn: 0.0588789	test: 0.0762761	best: 0.0762739 (7764)	total: 8m 57s	remaining: 13.7s


7999:	learn: 0.0584976	test: 0.0762700	best: 0.0762689 (7975)	total: 9m 7s	remaining: 0us
bestTest = 0.07626885851
bestIteration = 7975
Shrink model to first 7976 iterations.


CAT fold ACC=0.962478; best_iter=7975
Fold 5 elapsed 2815.0s



OOF XGB: 0.962498; per-fold: 0.962311, 0.962483, 0.962624, 0.962311, 0.962760
OOF CAT: 0.962419; per-fold: 0.962121, 0.962536, 0.962640, 0.962318, 0.962478


Best OOF blend: w_xgb=0.57, acc=0.962619


Saved submission.csv (blended) with shape (400000, 2)
Total elapsed: 12803.6s
