# Battery RUL Prediction - Complete End-to-End Training

This notebook provides a complete workflow from data acquisition to trained model export.

## üéØ What This Notebook Does

1. **Data Acquisition** - Auto-loads dataset or generates if needed
2. **Feature Engineering** - Creates ML-ready features
3. **Model Training** - GPU-accelerated CatBoost
4. **Evaluation** - Comprehensive metrics and visualizations
5. **Export** - Multi-format model packaging

## ‚è±Ô∏è Runtime

- With uploaded dataset: **~30 minutes**
- With auto-generation: **~45 minutes**

## üîß Requirements

1. **Enable GPU** (Settings ‚Üí Accelerator ‚Üí GPU P100/T4)
2. **Enable Internet** (Settings ‚Üí Internet ‚Üí On)
3. **Optional**: Upload battery-rul-parquet dataset for faster execution

## üìä Expected Output

- Trained CatBoost model (.cbm, .onnx)
- Model metadata (metrics, features)
- Feature importance rankings
- Visualizations (5 plots)
- Training report
- Deployment package (.zip)

## üìã Step 1: Environment Setup

In [None]:
%%time
import sys
import subprocess

print("üîß Installing dependencies...")
print("(This may take 2-3 minutes)\n")

# Install required packages
!pip install -q catboost==1.2 pyarrow==15.0.0 pandas==2.1.4 scikit-learn matplotlib seaborn

# Also install data generation deps in case we need to generate
!pip install -q numpy scipy pytz faker tqdm

print("\n‚úÖ Dependencies installed successfully!")

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import json
import os
from datetime import datetime, timedelta

# ML libraries
from catboost import CatBoostRegressor, Pool
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Configure plotting
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úÖ Libraries imported successfully")

# Check GPU availability
try:
    result = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
    gpu_available = result.returncode == 0
    print(f"\nüéÆ GPU Available: {gpu_available}")
    if gpu_available:
        print("   GPU detected - Training will use GPU acceleration")
except:
    print("\n‚ö†Ô∏è  GPU status unknown - Will attempt GPU training anyway")

## üì• Step 2: Smart Data Acquisition

This cell intelligently handles data loading:
- **Option A**: If you uploaded a dataset, it loads from `/kaggle/input/`
- **Option B**: If no dataset found, generates a small training set (7 days, 24 batteries)

### To use uploaded dataset:
1. Upload `battery-rul-parquet` as a Kaggle dataset
2. Add it to this notebook (Add Data ‚Üí Your Datasets)
3. The cell will automatically detect and use it

In [None]:
%%time
print("="*80)
print("DATA ACQUISITION")
print("="*80)

# Check for uploaded Parquet dataset
KAGGLE_INPUT = Path('/kaggle/input')
possible_paths = [
    KAGGLE_INPUT / 'battery-rul-parquet',
    KAGGLE_INPUT / 'battery-rul-training-data',
    KAGGLE_INPUT / 'battery-rul-data'
]

dataset_path = None
for path in possible_paths:
    if path.exists():
        dataset_path = path
        print(f"\n‚úÖ Found uploaded dataset at: {path}")
        print(f"   Using pre-generated data (fast mode)")
        break

if dataset_path is None:
    print("\n‚ö†Ô∏è  No uploaded dataset found")
    print("   Will generate training data (takes ~15 minutes)")
    print("\nüìä Generation Parameters:")
    print("   - Duration: 7 days")
    print("   - Batteries: 24 (1 string)")
    print("   - Sampling: 60 seconds")
    print("   - Expected records: ~10,080 telemetry samples")
    print("\nüöÄ Starting data generation...\n")
    
    # Clone repository if needed
    if not os.path.exists('battery-rul-data-generation'):
        !git clone -q https://github.com/khiwniti/battery-rul-data-generation.git
        print("‚úì Repository cloned")
    
    os.chdir('battery-rul-data-generation')
    
    # Generate data
    !python generate_battery_data.py --duration-days 7 --limit-batteries 24 --sampling-seconds 60 --output-dir ./output/training_data --seed 42
    
    dataset_path = Path('./output/training_data')
    os.chdir('/kaggle/working')
    print("\n‚úÖ Data generation complete!")

print(f"\nüìÅ Dataset location: {dataset_path}")

## üìä Step 3: Load and Prepare Data

In [None]:
%%time
print("="*80)
print("DATA LOADING")
print("="*80)

# Determine file format (Parquet or CSV)
use_parquet = (dataset_path / 'telemetry' / 'raw_telemetry.parquet').exists()
use_csv = (dataset_path / 'telemetry_jar_raw.csv.gz').exists()

if use_parquet:
    print("\nüì¶ Loading from Parquet files...")
    
    # Load master data
    df_battery = pd.read_parquet(dataset_path / 'master' / 'battery.parquet')
    df_location = pd.read_parquet(dataset_path / 'master' / 'location.parquet')
    
    # Load telemetry
    df_raw_telemetry = pd.read_parquet(dataset_path / 'telemetry' / 'raw_telemetry.parquet')
    df_calc_telemetry = pd.read_parquet(dataset_path / 'telemetry' / 'calc_telemetry.parquet')
    
    # Load RUL predictions
    df_rul = pd.read_parquet(dataset_path / 'ml' / 'rul_predictions.parquet')
    
    # Load feature store if exists
    feature_store_path = dataset_path / 'ml' / 'feature_store.parquet'
    if feature_store_path.exists():
        df_features = pd.read_parquet(feature_store_path)
    else:
        df_features = None
        print("   ‚ö†Ô∏è  Feature store not found, will create from raw data")
    
elif use_csv:
    print("\nüìÑ Loading from CSV files...")
    
    # Load master data
    df_battery = pd.read_csv(dataset_path / 'battery.csv')
    df_location = pd.read_csv(dataset_path / 'location.csv')
    
    # Load telemetry
    df_raw_telemetry = pd.read_csv(dataset_path / 'telemetry_jar_raw.csv.gz')
    df_calc_telemetry = pd.read_csv(dataset_path / 'telemetry_jar_calc.csv')
    
    # Load RUL predictions
    df_rul = pd.read_csv(dataset_path / 'rul_prediction.csv')
    
    # Load feature store if exists
    feature_store_path = dataset_path / 'feature_store.csv.gz'
    if feature_store_path.exists():
        df_features = pd.read_csv(feature_store_path)
    else:
        df_features = None
        print("   ‚ö†Ô∏è  Feature store not found, will create from raw data")
else:
    raise FileNotFoundError("No valid data files found in dataset")

print(f"\n‚úÖ Data loaded successfully!")
print(f"   Batteries: {len(df_battery)}")
print(f"   Locations: {len(df_location)}")
print(f"   Telemetry records: {len(df_raw_telemetry):,}")
print(f"   RUL predictions: {len(df_rul):,}")
if df_features is not None:
    print(f"   Feature store records: {len(df_features):,}")

In [None]:
# Create feature store if it doesn't exist
if df_features is None:
    print("üî® Creating feature store from raw telemetry...")
    print("   (This may take a few minutes)\n")
    
    # Convert timestamps
    ts_col = 'ts' if 'ts' in df_raw_telemetry.columns else 'timestamp'
    df_raw_telemetry[ts_col] = pd.to_datetime(df_raw_telemetry[ts_col])
    
    # Create hourly features
    df_features = df_raw_telemetry.groupby([
        'battery_id',
        pd.Grouper(key=ts_col, freq='1H')
    ]).agg({
        'voltage_v': ['mean', 'std', 'min', 'max'],
        'temperature_c': ['mean', 'std', 'min', 'max'],
        'resistance_mohm': ['mean', 'std'],
        'current_a': ['mean', 'max']
    }).reset_index()
    
    # Flatten column names
    df_features.columns = ['battery_id', 'window_end'] + [
        f"{col[0].replace('_v','').replace('_c','').replace('_mohm','').replace('_a','')}_{col[1]}"
        for col in df_features.columns[2:]
    ]
    
    # Rename to match expected schema
    df_features = df_features.rename(columns={
        'voltage_mean': 'v_mean',
        'voltage_std': 'v_std',
        'voltage_min': 'v_min',
        'voltage_max': 'v_max',
        'temperature_mean': 't_mean',
        'temperature_std': 't_std',
        'temperature_min': 't_min',
        'temperature_max': 't_max',
        'resistance_mean': 'r_internal_latest',
        'resistance_std': 'r_internal_trend',
        'current_mean': 'current_mean',
        'current_max': 'current_max'
    })
    
    # Add derived features
    df_features['v_range'] = df_features['v_max'] - df_features['v_min']
    df_features['t_delta_from_ambient'] = df_features['t_mean'] - 25.0  # Assume 25¬∞C ambient
    
    # Add operational features (approximations)
    df_features['discharge_cycles_count'] = 0  # Would need more complex logic
    df_features['ah_throughput'] = df_features['current_mean'] * 1.0  # Simplified
    df_features['time_at_high_temp_pct'] = (df_features['t_max'] > 35).astype(float)
    
    print(f"‚úÖ Feature store created: {len(df_features):,} records")

# Display sample
print("\nüìä Feature Store Sample:")
print(df_features.head())
print(f"\nFeatures available: {df_features.columns.tolist()}")

## üî¨ Step 4: Feature Engineering & Data Preparation

In [None]:
%%time
print("="*80)
print("FEATURE ENGINEERING")
print("="*80)

# Merge features with RUL labels
print("\nüîó Merging features with RUL labels...")

# Ensure timestamps are datetime
df_features['window_end'] = pd.to_datetime(df_features['window_end'])
rul_ts_col = 'prediction_time' if 'prediction_time' in df_rul.columns else 'timestamp'
df_rul[rul_ts_col] = pd.to_datetime(df_rul[rul_ts_col])

# Merge using nearest timestamp
df_train = pd.merge_asof(
    df_features.sort_values('window_end'),
    df_rul.sort_values(rul_ts_col),
    left_on='window_end',
    right_on=rul_ts_col,
    by='battery_id',
    direction='nearest',
    tolerance=pd.Timedelta('2 hours')
)

# Remove rows without RUL labels
df_train = df_train.dropna(subset=['rul_days'])

print(f"‚úì Training samples after merge: {len(df_train):,}")
print(f"‚úì Batteries in training set: {df_train['battery_id'].nunique()}")

# Define feature columns
voltage_features = ['v_mean', 'v_std', 'v_min', 'v_max', 'v_range']
temperature_features = ['t_mean', 't_std', 't_min', 't_max', 't_delta_from_ambient']
resistance_features = ['r_internal_latest', 'r_internal_trend']
operational_features = ['discharge_cycles_count', 'ah_throughput', 'time_at_high_temp_pct']

# Combine all features
feature_cols = (
    voltage_features + 
    temperature_features + 
    resistance_features + 
    operational_features
)

# Keep only features that exist
feature_cols = [f for f in feature_cols if f in df_train.columns]

print(f"\nüìä Features selected for training ({len(feature_cols)}):")
for i, feat in enumerate(feature_cols, 1):
    print(f"  {i:2d}. {feat}")

# Create derived features
print("\nüî® Creating derived features...")

# Voltage health indicator
df_train['v_health_score'] = (
    (df_train['v_mean'] - 11.5) / (13.65 - 11.5)
).clip(0, 1)

# Temperature stress indicator
df_train['t_stress_score'] = (
    (df_train['t_max'] - 25) / 20
).clip(0, 1)

# Add derived features to list
feature_cols.extend(['v_health_score', 't_stress_score'])

# Handle missing values
print("\nüßπ Cleaning data...")
for col in feature_cols:
    if df_train[col].isnull().any():
        df_train[col].fillna(df_train[col].median(), inplace=True)

print(f"‚úì Final dataset shape: {df_train.shape}")
print(f"‚úì Total features: {len(feature_cols)}")

## üìà Step 5: Exploratory Data Analysis

In [None]:
# RUL distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].hist(df_train['rul_days'], bins=50, edgecolor='black', alpha=0.7, color='steelblue')
axes[0].set_xlabel('RUL (days)', fontsize=12)
axes[0].set_ylabel('Frequency', fontsize=12)
axes[0].set_title('RUL Distribution', fontsize=14, fontweight='bold')
axes[0].axvline(df_train['rul_days'].median(), color='red', linestyle='--', 
                linewidth=2, label=f'Median: {df_train["rul_days"].median():.1f}')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].boxplot(df_train['rul_days'], patch_artist=True,
                boxprops=dict(facecolor='lightblue'))
axes[1].set_ylabel('RUL (days)', fontsize=12)
axes[1].set_title('RUL Box Plot', fontsize=14, fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('/kaggle/working/rul_distribution.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"\nüìä RUL Statistics:")
print(df_train['rul_days'].describe())

In [None]:
# Feature correlations with RUL
correlations = df_train[feature_cols + ['rul_days']].corr()['rul_days'].drop('rul_days').sort_values()

plt.figure(figsize=(10, 8))
colors = ['red' if x < 0 else 'green' for x in correlations]
correlations.plot(kind='barh', color=colors, alpha=0.7)
plt.xlabel('Correlation with RUL', fontsize=12)
plt.title('Feature Correlations with RUL', fontsize=14, fontweight='bold')
plt.axvline(x=0, color='black', linestyle='-', linewidth=0.5)
plt.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.savefig('/kaggle/working/feature_correlations.png', dpi=150, bbox_inches='tight')
plt.show()

print("\nüìä Top 10 features correlated with RUL:")
print(correlations.abs().sort_values(ascending=False).head(10))

## üéØ Step 6: Train/Test Split

In [None]:
# Prepare features and target
X = df_train[feature_cols].copy()
y = df_train['rul_days'].copy()

print(f"üìä Dataset Summary:")
print(f"   Feature matrix shape: {X.shape}")
print(f"   Target vector shape: {y.shape}")
print(f"   RUL range: {y.min():.1f} - {y.max():.1f} days")

# Stratified split by RUL bins
rul_bins = pd.cut(y, bins=5, labels=['very_low', 'low', 'medium', 'high', 'very_high'])

X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=42,
    stratify=rul_bins
)

print(f"\n‚úÖ Train/Test Split Complete:")
print(f"   Training set: {len(X_train):,} samples ({len(X_train)/len(X)*100:.1f}%)")
print(f"   Test set: {len(X_test):,} samples ({len(X_test)/len(X)*100:.1f}%)")

## üöÄ Step 7: GPU-Accelerated Model Training

Training CatBoost regression model with GPU acceleration.
This typically takes 10-20 minutes depending on dataset size.

In [None]:
# Create CatBoost pools
train_pool = Pool(X_train, y_train)
test_pool = Pool(X_test, y_test)

print("‚úÖ CatBoost data pools created")

In [None]:
%%time
print("="*80)
print("MODEL TRAINING")
print("="*80)

# Configure CatBoost model with GPU
model = CatBoostRegressor(
    # GPU Configuration
    task_type='GPU',
    devices='0',
    
    # Model hyperparameters
    iterations=2000,
    learning_rate=0.05,
    depth=8,
    l2_leaf_reg=3,
    
    # Loss function
    loss_function='RMSE',
    eval_metric='MAE',
    
    # Regularization
    random_strength=1,
    bagging_temperature=1,
    
    # Early stopping
    early_stopping_rounds=100,
    use_best_model=True,
    
    # Output
    verbose=100,
    random_seed=42
)

print("\nüîß Model Configuration:")
print(f"   Task type: GPU")
print(f"   Iterations: 2000")
print(f"   Learning rate: 0.05")
print(f"   Tree depth: 8")
print(f"   Early stopping: 100 rounds")

print("\nüöÄ Starting training...\n")
start_time = datetime.now()

model.fit(
    train_pool,
    eval_set=test_pool,
    plot=True
)

training_time = (datetime.now() - start_time).total_seconds()
print(f"\n{'='*80}")
print(f"‚úÖ Training completed!")
print(f"   Time: {training_time:.1f} seconds ({training_time/60:.1f} minutes)")
print(f"   Best iteration: {model.get_best_iteration()}")
print(f"{'='*80}")

## üìä Step 8: Model Evaluation

In [None]:
# Make predictions
y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

# Calculate metrics
train_mae = mean_absolute_error(y_train, y_pred_train)
test_mae = mean_absolute_error(y_test, y_pred_test)

train_rmse = np.sqrt(mean_squared_error(y_train, y_pred_train))
test_rmse = np.sqrt(mean_squared_error(y_test, y_pred_test))

train_r2 = r2_score(y_train, y_pred_train)
test_r2 = r2_score(y_test, y_pred_test)

# Display results
print("="*80)
print("MODEL PERFORMANCE")
print("="*80)
print(f"\nüìà Training Set:")
print(f"   MAE:  {train_mae:.2f} days")
print(f"   RMSE: {train_rmse:.2f} days")
print(f"   R¬≤:   {train_r2:.4f}")

print(f"\nüìä Test Set:")
print(f"   MAE:  {test_mae:.2f} days")
print(f"   RMSE: {test_rmse:.2f} days")
print(f"   R¬≤:   {test_r2:.4f}")

print(f"\nüéØ Overfitting Check:")
print(f"   MAE gap:  {abs(test_mae - train_mae):.2f} days")
print(f"   RMSE gap: {abs(test_rmse - train_rmse):.2f} days")

# Accuracy within thresholds
test_errors = np.abs(y_test - y_pred_test)
within_7_days = (test_errors <= 7).mean() * 100
within_30_days = (test_errors <= 30).mean() * 100
within_60_days = (test_errors <= 60).mean() * 100

print(f"\nüéØ Prediction Accuracy:")
print(f"   Within 7 days:  {within_7_days:.1f}%")
print(f"   Within 30 days: {within_30_days:.1f}%")
print(f"   Within 60 days: {within_60_days:.1f}%")
print("="*80)

# Success criteria check
print("\n‚úÖ Success Criteria:")
if test_mae < 30:
    print(f"   ‚úì MAE < 30 days: PASS ({test_mae:.2f} days)")
else:
    print(f"   ‚úó MAE < 30 days: FAIL ({test_mae:.2f} days)")

if test_r2 > 0.85:
    print(f"   ‚úì R¬≤ > 0.85: PASS ({test_r2:.4f})")
else:
    print(f"   ‚úó R¬≤ > 0.85: FAIL ({test_r2:.4f})")

In [None]:
# Visualization: Predicted vs Actual
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Scatter plot
axes[0].scatter(y_test, y_pred_test, alpha=0.5, s=20, color='steelblue')
axes[0].plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 
             'r--', lw=2, label='Perfect prediction')
axes[0].set_xlabel('Actual RUL (days)', fontsize=12)
axes[0].set_ylabel('Predicted RUL (days)', fontsize=12)
axes[0].set_title(f'Predicted vs Actual RUL\nMAE: {test_mae:.2f} days, R¬≤: {test_r2:.4f}', 
                  fontsize=14, fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Residuals
residuals = y_test - y_pred_test
axes[1].scatter(y_pred_test, residuals, alpha=0.5, s=20, color='coral')
axes[1].axhline(y=0, color='r', linestyle='--', lw=2)
axes[1].set_xlabel('Predicted RUL (days)', fontsize=12)
axes[1].set_ylabel('Residuals (days)', fontsize=12)
axes[1].set_title('Residual Plot', fontsize=14, fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('/kaggle/working/prediction_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

## üîç Step 9: Feature Importance Analysis

In [None]:
# Get feature importance
feature_importance = model.get_feature_importance()
feature_names = X_train.columns

# Create DataFrame
importance_df = pd.DataFrame({
    'feature': feature_names,
    'importance': feature_importance
}).sort_values('importance', ascending=False)

print("üîç Feature Importance (Top 15):")
print(importance_df.head(15).to_string(index=False))

# Save to CSV
importance_df.to_csv('/kaggle/working/feature_importance.csv', index=False)
print("\n‚úÖ Feature importance saved to feature_importance.csv")

In [None]:
# Visualize feature importance
plt.figure(figsize=(10, 8))
top_n = min(20, len(importance_df))
top_features = importance_df.head(top_n)

colors = plt.cm.viridis(np.linspace(0, 1, len(top_features)))
plt.barh(range(len(top_features)), top_features['importance'], color=colors)
plt.yticks(range(len(top_features)), top_features['feature'])
plt.xlabel('Importance', fontsize=12)
plt.title(f'Top {top_n} Feature Importance', fontsize=14, fontweight='bold')
plt.gca().invert_yaxis()
plt.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.savefig('/kaggle/working/feature_importance.png', dpi=150, bbox_inches='tight')
plt.show()

## üíæ Step 10: Model Export

Saving model in multiple formats for deployment.

In [None]:
%%time
print("="*80)
print("MODEL EXPORT")
print("="*80)

model_dir = Path('/kaggle/working')

# 1. Native CatBoost format (.cbm)
model_path_cbm = model_dir / 'rul_model.cbm'
model.save_model(str(model_path_cbm))
print(f"\n‚úÖ Model saved (CatBoost): {model_path_cbm}")
print(f"   Size: {model_path_cbm.stat().st_size / 1024 / 1024:.2f} MB")

# 2. ONNX format for deployment
try:
    model_path_onnx = model_dir / 'rul_model.onnx'
    model.save_model(
        str(model_path_onnx),
        format="onnx",
        export_parameters={
            'onnx_domain': 'ai.catboost',
            'onnx_model_version': 1,
            'onnx_doc_string': 'Battery RUL Prediction Model'
        }
    )
    print(f"‚úÖ Model saved (ONNX): {model_path_onnx}")
    print(f"   Size: {model_path_onnx.stat().st_size / 1024 / 1024:.2f} MB")
except Exception as e:
    print(f"‚ö†Ô∏è  ONNX export not available: {e}")

In [None]:
# Save model metadata
metadata = {
    'model_type': 'CatBoostRegressor',
    'task': 'Battery RUL Prediction',
    'target': 'rul_days',
    'training_date': datetime.now().isoformat(),
    'training_time_seconds': training_time,
    
    # Data info
    'training_samples': len(X_train),
    'test_samples': len(X_test),
    'features': feature_cols,
    'num_features': len(feature_cols),
    
    # Hyperparameters
    'hyperparameters': {
        'iterations': model.get_param('iterations'),
        'learning_rate': model.get_param('learning_rate'),
        'depth': model.get_param('depth'),
        'l2_leaf_reg': model.get_param('l2_leaf_reg'),
    },
    
    # Performance metrics
    'metrics': {
        'train': {
            'mae': float(train_mae),
            'rmse': float(train_rmse),
            'r2': float(train_r2)
        },
        'test': {
            'mae': float(test_mae),
            'rmse': float(test_rmse),
            'r2': float(test_r2)
        },
        'accuracy_thresholds': {
            'within_7_days_pct': float(within_7_days),
            'within_30_days_pct': float(within_30_days),
            'within_60_days_pct': float(within_60_days)
        }
    },
    
    # Feature importance
    'top_10_features': importance_df.head(10).to_dict('records')
}

# Save metadata
metadata_path = model_dir / 'model_metadata.json'
with open(metadata_path, 'w') as f:
    json.dump(metadata, f, indent=2)

print(f"\n‚úÖ Metadata saved: {metadata_path}")
print(f"   Test MAE: {metadata['metrics']['test']['mae']:.2f} days")
print(f"   Test R¬≤: {metadata['metrics']['test']['r2']:.4f}")

In [None]:
# Create deployment package
import zipfile

deployment_package = model_dir / 'rul_model_deployment.zip'

with zipfile.ZipFile(deployment_package, 'w', zipfile.ZIP_DEFLATED) as zipf:
    # Add model files
    zipf.write(model_path_cbm, 'rul_model.cbm')
    if (model_dir / 'rul_model.onnx').exists():
        zipf.write(model_dir / 'rul_model.onnx', 'rul_model.onnx')
    
    # Add metadata and documentation
    zipf.write(metadata_path, 'model_metadata.json')
    zipf.write(model_dir / 'feature_importance.csv', 'feature_importance.csv')
    
    # Add visualizations
    for viz in ['rul_distribution.png', 'feature_correlations.png', 
                'prediction_analysis.png', 'feature_importance.png']:
        viz_path = model_dir / viz
        if viz_path.exists():
            zipf.write(viz_path, viz)

print(f"\n‚úÖ Deployment package created: {deployment_package}")
print(f"   Size: {deployment_package.stat().st_size / 1024 / 1024:.2f} MB")

## ‚úÖ Step 11: Model Verification

In [None]:
print("="*80)
print("MODEL VERIFICATION")
print("="*80)

# Test model loading
print("\nüîç Testing model loading...")
test_model = CatBoostRegressor()
test_model.load_model(str(model_path_cbm))
print("‚úÖ Model loaded successfully")

# Test prediction
print("\nüîç Testing prediction...")
sample_prediction = test_model.predict(X_test.iloc[:1])
print(f"‚úÖ Sample prediction: {sample_prediction[0]:.1f} days")
print(f"   Actual RUL: {y_test.iloc[0]:.1f} days")
print(f"   Error: {abs(sample_prediction[0] - y_test.iloc[0]):.1f} days")

# List all output files
print("\nüìÇ Output Files:")
print("="*80)
for file in sorted(model_dir.glob('*')):
    if file.is_file() and not file.name.startswith('.'):
        size_mb = file.stat().st_size / 1024 / 1024
        print(f"   {file.name:40s} {size_mb:8.2f} MB")
print("="*80)

print("\nüéâ Model verification complete!")
print("\n‚úÖ ALL OUTPUTS READY FOR DOWNLOAD")

## üì• Download Instructions

### Method 1: Direct Download (Easiest)

1. Click the **folder icon** (üìÅ) on the left sidebar
2. Navigate to files in `/kaggle/working/`
3. Click **three dots** (‚ãÆ) next to each file
4. Select **"Download"**

### Method 2: Kaggle API

After saving this notebook version:

```bash
kaggle kernels output YOUR_USERNAME/battery-rul-training -p ./model
```

### Files to Download

- `rul_model.cbm` - Main model file (1-5 MB)
- `model_metadata.json` - Performance metrics
- `feature_importance.csv` - Feature rankings
- `rul_model_deployment.zip` - Complete package
- `*.png` - Visualizations (5 files)

---

## üöÄ Next Steps

1. **Download Model**: Use methods above
2. **Test Locally**: Load and test model
3. **Deploy**: Integrate with backend API
4. **Monitor**: Track performance in production

---

**üéâ Congratulations! You've successfully trained a Battery RUL prediction model!**