# üõ©Ô∏è Aircraft Engine RUL Prediction - ML/DL Models

## Remaining Useful Life Prediction using Machine Learning & Deep Learning

This notebook builds and compares multiple approaches for predicting Remaining Useful Life (RUL) of aircraft turbofan engines using the NASA C-MAPSS dataset.

### Models Implemented:
1. **Random Forest Regressor** - Baseline ML model
2. **Gradient Boosting (XGBoost)** - Advanced ensemble method
3. **LSTM Neural Network** - Deep learning for sequence modeling
4. **1D CNN** - Convolutional approach for time series

### Evaluation Metric:
The PHM scoring function penalizes late predictions more heavily than early ones:
$$s = \sum_{i=1}^{n} \begin{cases} e^{-d/a_1} - 1 & \text{if } d < 0 \text{ (early)} \\ e^{d/a_2} - 1 & \text{if } d \geq 0 \text{ (late)} \end{cases}$$
where $d = \hat{RUL} - RUL_{true}$, $a_1 = 13$, $a_2 = 10$


## 1. Import Libraries


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings

# Scikit-learn
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# XGBoost
try:
    import xgboost as xgb
    XGBOOST_AVAILABLE = True
except ImportError:
    print("‚ö†Ô∏è XGBoost not installed. Run: pip install xgboost")
    XGBOOST_AVAILABLE = False

# Deep Learning
try:
    import tensorflow as tf
    from tensorflow.keras.models import Sequential, Model
    from tensorflow.keras.layers import (Dense, LSTM, Dropout, Conv1D, MaxPooling1D, 
                                          Flatten, BatchNormalization, Input, Bidirectional)
    from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
    from tensorflow.keras.optimizers import Adam
    TENSORFLOW_AVAILABLE = True
    print(f"‚úÖ TensorFlow version: {tf.__version__}")
except ImportError:
    print("‚ö†Ô∏è TensorFlow not installed. Run: pip install tensorflow")
    TENSORFLOW_AVAILABLE = False

warnings.filterwarnings('ignore')

# Plotting settings
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)
plt.rcParams['font.size'] = 11

COLORS = {
    'primary': '#1E3A5F',
    'secondary': '#3D7EAA',
    'accent': '#F39C12',
    'success': '#27AE60',
    'danger': '#E74C3C',
    'warning': '#F1C40F'
}

print("‚úÖ Libraries loaded successfully!")


## 2. Load and Prepare Data


In [None]:
# Data path
DATA_PATH = Path(r"C:\Users\Prshant Verma\Documents\Projects\DataSets\aircraft-engine-failure-data")

# Column definitions
index_columns = ['unit_number', 'time_cycles']
operational_settings = ['op_setting_1', 'op_setting_2', 'op_setting_3']

sensor_columns = [
    'T2', 'T24', 'T30', 'T48', 'T50', 'P2', 'P15', 'P30',
    'Nf', 'Nc', 'Ps30', 'phi', 'NRf', 'NRc', 'BPR', 'farB',
    'htBleed', 'Nf_dmd', 'PCNfR_dmd', 'W31', 'W32'
]

column_names = index_columns + operational_settings + sensor_columns

# Sensors to drop (low variance based on EDA)
drop_sensors = ['T2', 'P2', 'P15', 'Nf_dmd', 'PCNfR_dmd', 'farB']

# Maximum RUL cap (piecewise linear degradation assumption)
RUL_CAP = 125


In [None]:
def load_dataset(dataset_id='FD001'):
    """Load train, test, and RUL data for a given dataset."""
    
    train_df = pd.read_csv(
        DATA_PATH / f'train_{dataset_id}.txt',
        sep='\\s+', header=None, names=column_names
    )
    
    test_df = pd.read_csv(
        DATA_PATH / f'test_{dataset_id}.txt',
        sep='\\s+', header=None, names=column_names
    )
    
    rul_df = pd.read_csv(
        DATA_PATH / f'RUL_{dataset_id}.txt',
        sep='\\s+', header=None, names=['RUL']
    )
    
    return train_df, test_df, rul_df


def compute_rul(df, rul_cap=RUL_CAP):
    """Compute RUL for training data with optional capping."""
    max_cycles = df.groupby('unit_number')['time_cycles'].max()
    
    df = df.copy()
    df['RUL'] = df.apply(
        lambda row: max_cycles[row['unit_number']] - row['time_cycles'], 
        axis=1
    )
    
    # Cap RUL (piecewise linear assumption)
    if rul_cap:
        df['RUL'] = df['RUL'].clip(upper=rul_cap)
    
    return df


def add_features(df):
    """Add rolling statistics and derived features."""
    df = df.copy()
    
    # Useful sensors (excluding low variance)
    useful_sensors = [s for s in sensor_columns if s not in drop_sensors]
    
    # Rolling statistics per engine
    for sensor in useful_sensors:
        # Rolling mean (window = 5 cycles)
        df[f'{sensor}_roll_mean'] = df.groupby('unit_number')[sensor].transform(
            lambda x: x.rolling(window=5, min_periods=1).mean()
        )
        
        # Rolling std (window = 5 cycles)
        df[f'{sensor}_roll_std'] = df.groupby('unit_number')[sensor].transform(
            lambda x: x.rolling(window=5, min_periods=1).std()
        )
    
    # Fill NaN from rolling operations
    df = df.fillna(method='bfill').fillna(method='ffill')
    
    return df


# Load FD001 dataset
train_df, test_df, rul_df = load_dataset('FD001')

# Compute RUL for training data
train_df = compute_rul(train_df)

# Add engineered features
train_df = add_features(train_df)
test_df = add_features(test_df)

print(f"Training data shape: {train_df.shape}")
print(f"Test data shape: {test_df.shape}")
print(f"Number of features: {len([c for c in train_df.columns if c not in index_columns + ['RUL']])}")


## 3. Feature Preparation


In [None]:
# Define feature columns (exclude index and target)
feature_columns = [c for c in train_df.columns if c not in index_columns + ['RUL']]

# Remove low variance sensors from features
feature_columns = [c for c in feature_columns if not any(s in c for s in drop_sensors)]

print(f"Number of features: {len(feature_columns)}")
print(f"Features: {feature_columns[:10]}...")  # Show first 10


In [None]:
# Prepare training data for ML models
X_train_full = train_df[feature_columns].values
y_train_full = train_df['RUL'].values

# Normalize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_full)

# Split into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(
    X_train_scaled, y_train_full, test_size=0.2, random_state=42
)

print(f"Training set: {X_train.shape}")
print(f"Validation set: {X_val.shape}")


In [None]:
# Prepare test data
# For test set, we use the LAST observation of each engine
test_last = test_df.groupby('unit_number').last().reset_index()

X_test = test_last[feature_columns].values
X_test_scaled = scaler.transform(X_test)
y_test = rul_df['RUL'].values

print(f"Test set: {X_test_scaled.shape}")
print(f"True RUL values: {len(y_test)}")


## 4. Evaluation Metrics


In [None]:
def phm_score(y_true, y_pred, a1=13, a2=10):
    """
    PHM Challenge scoring function.
    Penalizes late predictions more heavily than early predictions.
    
    Args:
        y_true: True RUL values
        y_pred: Predicted RUL values
        a1: Parameter for early predictions (default: 13)
        a2: Parameter for late predictions (default: 10)
    
    Returns:
        Total PHM score (lower is better)
    """
    d = y_pred - y_true  # Estimated - True
    
    scores = np.where(
        d < 0,
        np.exp(-d / a1) - 1,  # Early prediction
        np.exp(d / a2) - 1    # Late prediction
    )
    
    return np.sum(scores)


def evaluate_model(y_true, y_pred, model_name="Model"):
    """Comprehensive model evaluation."""
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    mae = mean_absolute_error(y_true, y_pred)
    r2 = r2_score(y_true, y_pred)
    score = phm_score(y_true, y_pred)
    
    print(f"\n{'='*50}")
    print(f"üìä {model_name} Performance")
    print(f"{'='*50}")
    print(f"  RMSE:      {rmse:.2f} cycles")
    print(f"  MAE:       {mae:.2f} cycles")
    print(f"  R¬≤ Score:  {r2:.4f}")
    print(f"  PHM Score: {score:.2f} (lower is better)")
    
    return {'model': model_name, 'rmse': rmse, 'mae': mae, 'r2': r2, 'phm_score': score}


def plot_predictions(y_true, y_pred, model_name="Model"):
    """Plot predicted vs actual RUL."""
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Scatter plot
    axes[0].scatter(y_true, y_pred, alpha=0.6, c=COLORS['primary'], s=50)
    axes[0].plot([0, max(y_true)], [0, max(y_true)], 'r--', linewidth=2, label='Perfect prediction')
    axes[0].set_xlabel('True RUL (cycles)', fontsize=12)
    axes[0].set_ylabel('Predicted RUL (cycles)', fontsize=12)
    axes[0].set_title(f'{model_name}: Predicted vs True RUL', fontsize=14, fontweight='bold')
    axes[0].legend()
    
    # Error distribution
    errors = y_pred - y_true
    axes[1].hist(errors, bins=30, color=COLORS['secondary'], edgecolor='white', alpha=0.8)
    axes[1].axvline(0, color='red', linestyle='--', linewidth=2, label='Zero error')
    axes[1].axvline(errors.mean(), color='green', linestyle='-', linewidth=2, 
                    label=f'Mean error: {errors.mean():.1f}')
    axes[1].set_xlabel('Prediction Error (cycles)', fontsize=12)
    axes[1].set_ylabel('Frequency', fontsize=12)
    axes[1].set_title(f'{model_name}: Error Distribution', fontsize=14, fontweight='bold')
    axes[1].legend()
    
    plt.tight_layout()
    plt.show()

# Store results for comparison
results = []


## 5. Model 1: Random Forest Regressor


In [None]:
print("üå≤ Training Random Forest Regressor...")

rf_model = RandomForestRegressor(
    n_estimators=100,
    max_depth=15,
    min_samples_split=5,
    min_samples_leaf=2,
    random_state=42,
    n_jobs=-1
)

rf_model.fit(X_train, y_train)

# Predictions
rf_val_pred = rf_model.predict(X_val)
rf_test_pred = rf_model.predict(X_test_scaled)

# Evaluate on validation set
print("\n--- Validation Set ---")
rf_val_result = evaluate_model(y_val, rf_val_pred, "Random Forest (Val)")

# Evaluate on test set
print("\n--- Test Set ---")
rf_test_result = evaluate_model(y_test, rf_test_pred, "Random Forest (Test)")
results.append(rf_test_result)

# Plot predictions
plot_predictions(y_test, rf_test_pred, "Random Forest")


In [None]:
# Feature importance
feature_importance = pd.DataFrame({
    'feature': feature_columns,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

# Plot top 20 features
fig, ax = plt.subplots(figsize=(10, 8))
top_features = feature_importance.head(20)
ax.barh(top_features['feature'], top_features['importance'], color=COLORS['accent'])
ax.set_xlabel('Feature Importance', fontsize=12)
ax.set_title('Random Forest - Top 20 Important Features', fontsize=14, fontweight='bold')
ax.invert_yaxis()
plt.tight_layout()
plt.show()


## 6. Model 2: XGBoost Regressor


In [None]:
if XGBOOST_AVAILABLE:
    print("üöÄ Training XGBoost Regressor...")
    
    xgb_model = xgb.XGBRegressor(
        n_estimators=200,
        max_depth=8,
        learning_rate=0.1,
        subsample=0.8,
        colsample_bytree=0.8,
        random_state=42,
        n_jobs=-1,
        verbosity=0
    )
    
    xgb_model.fit(
        X_train, y_train,
        eval_set=[(X_val, y_val)],
        verbose=False
    )
    
    # Predictions
    xgb_val_pred = xgb_model.predict(X_val)
    xgb_test_pred = xgb_model.predict(X_test_scaled)
    
    # Evaluate
    print("\n--- Validation Set ---")
    xgb_val_result = evaluate_model(y_val, xgb_val_pred, "XGBoost (Val)")
    
    print("\n--- Test Set ---")
    xgb_test_result = evaluate_model(y_test, xgb_test_pred, "XGBoost (Test)")
    results.append(xgb_test_result)
    
    # Plot predictions
    plot_predictions(y_test, xgb_test_pred, "XGBoost")
else:
    print("‚ö†Ô∏è XGBoost not available. Skipping...")


## 7. Prepare Sequence Data for Deep Learning

LSTM and CNN models require sequence data. We'll create sliding windows of sensor readings.


In [None]:
SEQUENCE_LENGTH = 30  # Use last 30 cycles for prediction

def create_sequences(df, feature_cols, sequence_length=SEQUENCE_LENGTH):
    """
    Create sequences for LSTM/CNN models.
    Each sequence is the last 'sequence_length' observations for an engine.
    Target is the RUL at the last timestep.
    """
    sequences = []
    targets = []
    
    for engine_id in df['unit_number'].unique():
        engine_data = df[df['unit_number'] == engine_id].sort_values('time_cycles')
        
        # Get features and RUL
        features = engine_data[feature_cols].values
        rul = engine_data['RUL'].values
        
        # Create sequences using sliding window
        for i in range(len(features) - sequence_length + 1):
            sequences.append(features[i:i + sequence_length])
            targets.append(rul[i + sequence_length - 1])
    
    return np.array(sequences), np.array(targets)


def create_test_sequences(df, feature_cols, sequence_length=SEQUENCE_LENGTH):
    """
    Create test sequences - one per engine (using last observations).
    """
    sequences = []
    
    for engine_id in sorted(df['unit_number'].unique()):
        engine_data = df[df['unit_number'] == engine_id].sort_values('time_cycles')
        features = engine_data[feature_cols].values
        
        # Pad if not enough data
        if len(features) < sequence_length:
            padding = np.zeros((sequence_length - len(features), features.shape[1]))
            features = np.vstack([padding, features])
        else:
            features = features[-sequence_length:]
        
        sequences.append(features)
    
    return np.array(sequences)


In [None]:
# Normalize the full training data first
train_df_scaled = train_df.copy()
train_df_scaled[feature_columns] = scaler.fit_transform(train_df[feature_columns])

test_df_scaled = test_df.copy()
test_df_scaled[feature_columns] = scaler.transform(test_df[feature_columns])

# Create sequences
X_seq, y_seq = create_sequences(train_df_scaled, feature_columns)
X_test_seq = create_test_sequences(test_df_scaled, feature_columns)

print(f"Training sequences shape: {X_seq.shape}")
print(f"Training targets shape: {y_seq.shape}")
print(f"Test sequences shape: {X_test_seq.shape}")


In [None]:
# Split sequences into train/validation
X_seq_train, X_seq_val, y_seq_train, y_seq_val = train_test_split(
    X_seq, y_seq, test_size=0.2, random_state=42
)

print(f"Training sequences: {X_seq_train.shape}")
print(f"Validation sequences: {X_seq_val.shape}")


## 8. Model 3: LSTM Neural Network


In [None]:
if TENSORFLOW_AVAILABLE:
    print("üß† Building LSTM Model...")
    
    n_features = X_seq_train.shape[2]
    
    def build_lstm_model(seq_length, n_features):
        model = Sequential([
            # First LSTM layer
            LSTM(64, return_sequences=True, input_shape=(seq_length, n_features)),
            Dropout(0.2),
            BatchNormalization(),
            
            # Second LSTM layer
            LSTM(32, return_sequences=False),
            Dropout(0.2),
            BatchNormalization(),
            
            # Dense layers
            Dense(32, activation='relu'),
            Dropout(0.2),
            Dense(16, activation='relu'),
            
            # Output layer
            Dense(1)
        ])
        
        model.compile(
            optimizer=Adam(learning_rate=0.001),
            loss='mse',
            metrics=['mae']
        )
        
        return model
    
    lstm_model = build_lstm_model(SEQUENCE_LENGTH, n_features)
    lstm_model.summary()
else:
    print("‚ö†Ô∏è TensorFlow not available. Skipping LSTM...")


In [None]:
if TENSORFLOW_AVAILABLE:
    print("üèãÔ∏è Training LSTM Model...")
    
    # Callbacks
    callbacks = [
        EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True),
        ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6)
    ]
    
    # Train
    history_lstm = lstm_model.fit(
        X_seq_train, y_seq_train,
        validation_data=(X_seq_val, y_seq_val),
        epochs=50,
        batch_size=64,
        callbacks=callbacks,
        verbose=1
    )
    
    # Plot training history
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    axes[0].plot(history_lstm.history['loss'], label='Train Loss')
    axes[0].plot(history_lstm.history['val_loss'], label='Val Loss')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss (MSE)')
    axes[0].set_title('LSTM Training Loss', fontweight='bold')
    axes[0].legend()
    
    axes[1].plot(history_lstm.history['mae'], label='Train MAE')
    axes[1].plot(history_lstm.history['val_mae'], label='Val MAE')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('MAE')
    axes[1].set_title('LSTM Training MAE', fontweight='bold')
    axes[1].legend()
    
    plt.tight_layout()
    plt.show()


In [None]:
if TENSORFLOW_AVAILABLE:
    # Predictions
    lstm_val_pred = lstm_model.predict(X_seq_val).flatten()
    lstm_test_pred = lstm_model.predict(X_test_seq).flatten()
    
    # Evaluate
    print("\n--- Validation Set ---")
    lstm_val_result = evaluate_model(y_seq_val, lstm_val_pred, "LSTM (Val)")
    
    print("\n--- Test Set ---")
    lstm_test_result = evaluate_model(y_test, lstm_test_pred, "LSTM (Test)")
    results.append(lstm_test_result)
    
    # Plot predictions
    plot_predictions(y_test, lstm_test_pred, "LSTM")


## 9. Model 4: 1D CNN


In [None]:
if TENSORFLOW_AVAILABLE:
    print("üî¨ Building 1D CNN Model...")
    
    def build_cnn_model(seq_length, n_features):
        model = Sequential([
            # First Conv block
            Conv1D(64, kernel_size=5, activation='relu', padding='same',
                   input_shape=(seq_length, n_features)),
            BatchNormalization(),
            MaxPooling1D(pool_size=2),
            Dropout(0.2),
            
            # Second Conv block
            Conv1D(128, kernel_size=3, activation='relu', padding='same'),
            BatchNormalization(),
            MaxPooling1D(pool_size=2),
            Dropout(0.2),
            
            # Third Conv block
            Conv1D(64, kernel_size=3, activation='relu', padding='same'),
            BatchNormalization(),
            
            # Flatten and Dense
            Flatten(),
            Dense(64, activation='relu'),
            Dropout(0.3),
            Dense(32, activation='relu'),
            
            # Output
            Dense(1)
        ])
        
        model.compile(
            optimizer=Adam(learning_rate=0.001),
            loss='mse',
            metrics=['mae']
        )
        
        return model
    
    cnn_model = build_cnn_model(SEQUENCE_LENGTH, n_features)
    cnn_model.summary()
else:
    print("‚ö†Ô∏è TensorFlow not available. Skipping CNN...")


In [None]:
if TENSORFLOW_AVAILABLE:
    print("üèãÔ∏è Training CNN Model...")
    
    # Train
    history_cnn = cnn_model.fit(
        X_seq_train, y_seq_train,
        validation_data=(X_seq_val, y_seq_val),
        epochs=50,
        batch_size=64,
        callbacks=callbacks,
        verbose=1
    )
    
    # Plot training history
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    axes[0].plot(history_cnn.history['loss'], label='Train Loss')
    axes[0].plot(history_cnn.history['val_loss'], label='Val Loss')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss (MSE)')
    axes[0].set_title('CNN Training Loss', fontweight='bold')
    axes[0].legend()
    
    axes[1].plot(history_cnn.history['mae'], label='Train MAE')
    axes[1].plot(history_cnn.history['val_mae'], label='Val MAE')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('MAE')
    axes[1].set_title('CNN Training MAE', fontweight='bold')
    axes[1].legend()
    
    plt.tight_layout()
    plt.show()


In [None]:
if TENSORFLOW_AVAILABLE:
    # Predictions
    cnn_val_pred = cnn_model.predict(X_seq_val).flatten()
    cnn_test_pred = cnn_model.predict(X_test_seq).flatten()
    
    # Evaluate
    print("\n--- Validation Set ---")
    cnn_val_result = evaluate_model(y_seq_val, cnn_val_pred, "CNN (Val)")
    
    print("\n--- Test Set ---")
    cnn_test_result = evaluate_model(y_test, cnn_test_pred, "CNN (Test)")
    results.append(cnn_test_result)
    
    # Plot predictions
    plot_predictions(y_test, cnn_test_pred, "1D CNN")


## 10. Model Comparison


In [None]:
# Create comparison dataframe
results_df = pd.DataFrame(results)
print("\n" + "="*80)
print("üìä MODEL COMPARISON - TEST SET PERFORMANCE")
print("="*80)
results_df


In [None]:
# Visualize comparison
fig, axes = plt.subplots(1, 4, figsize=(16, 5))

metrics = ['rmse', 'mae', 'r2', 'phm_score']
titles = ['RMSE (Lower Better)', 'MAE (Lower Better)', 'R¬≤ Score (Higher Better)', 'PHM Score (Lower Better)']
colors_list = [COLORS['primary'], COLORS['secondary'], COLORS['success'], COLORS['accent']]

for i, (metric, title) in enumerate(zip(metrics, titles)):
    ax = axes[i]
    bars = ax.bar(results_df['model'], results_df[metric], color=colors_list[i], alpha=0.8)
    ax.set_ylabel(metric.upper())
    ax.set_title(title, fontweight='bold')
    ax.tick_params(axis='x', rotation=45)
    
    # Add value labels
    for bar, val in zip(bars, results_df[metric]):
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5, 
                f'{val:.2f}', ha='center', va='bottom', fontsize=9)

plt.suptitle('Model Performance Comparison (FD001 Dataset)', fontsize=16, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()


In [None]:
# Best model identification
best_rmse = results_df.loc[results_df['rmse'].idxmin()]
best_phm = results_df.loc[results_df['phm_score'].idxmin()]

print("\n" + "="*80)
print("üèÜ BEST MODELS")
print("="*80)
print(f"\nüìå Best by RMSE: {best_rmse['model']}")
print(f"   RMSE = {best_rmse['rmse']:.2f}, PHM Score = {best_rmse['phm_score']:.2f}")

print(f"\nüìå Best by PHM Score: {best_phm['model']}")
print(f"   PHM Score = {best_phm['phm_score']:.2f}, RMSE = {best_phm['rmse']:.2f}")


## 11. Visualize All Predictions


In [None]:
# Collect all predictions
all_predictions = {
    'True RUL': y_test,
    'Random Forest': rf_test_pred,
}

if XGBOOST_AVAILABLE:
    all_predictions['XGBoost'] = xgb_test_pred
    
if TENSORFLOW_AVAILABLE:
    all_predictions['LSTM'] = lstm_test_pred
    all_predictions['CNN'] = cnn_test_pred

# Plot all predictions together
fig, ax = plt.subplots(figsize=(16, 6))

x = np.arange(len(y_test))
width = 0.15

for i, (name, preds) in enumerate(all_predictions.items()):
    offset = (i - len(all_predictions)/2) * width
    if name == 'True RUL':
        ax.bar(x + offset, preds, width, label=name, color=COLORS['danger'], alpha=0.9)
    else:
        ax.bar(x + offset, preds, width, label=name, alpha=0.7)

ax.set_xlabel('Engine ID', fontsize=12)
ax.set_ylabel('RUL (Cycles)', fontsize=12)
ax.set_title('RUL Predictions Comparison Across All Models', fontsize=14, fontweight='bold')
ax.legend(loc='upper right')
ax.set_xticks(x[::10])
ax.set_xticklabels([f'E{i+1}' for i in x[::10]])

plt.tight_layout()
plt.show()


## 12. Summary and Conclusions


In [None]:
print("="*80)
print("üìã SUMMARY: REMAINING USEFUL LIFE PREDICTION")
print("="*80)

print("\nüéØ PROBLEM")
print("   Predict the number of operational cycles remaining before")
print("   aircraft turbofan engine failure using sensor data.")

print("\nüìä DATASET: NASA C-MAPSS FD001")
print("   ‚Ä¢ 100 training engines (run to failure)")
print("   ‚Ä¢ 100 test engines")
print("   ‚Ä¢ 21 sensor measurements + 3 operational settings")
print("   ‚Ä¢ Single operating condition, single fault mode (HPC degradation)")

print("\nüîß APPROACH")
print("   1. Feature Engineering:")
print("      - Rolling statistics (mean, std)")
print("      - Dropped low-variance sensors")
print("      - RUL capping at 125 cycles (piecewise linear)")
print("   ")
print("   2. Models Implemented:")
print("      - Random Forest Regressor")
print("      - XGBoost Regressor") 
print("      - LSTM Neural Network (sequence-based)")
print("      - 1D CNN (sequence-based)")

print("\nüìà KEY INSIGHTS")
print("   ‚Ä¢ Deep learning models (LSTM, CNN) capture temporal patterns")
print("   ‚Ä¢ Ensemble methods work well without sequence modeling")
print("   ‚Ä¢ PHM scoring penalizes late predictions more heavily")
print("   ‚Ä¢ Feature engineering significantly improves performance")

print("\nüöÄ POTENTIAL IMPROVEMENTS")
print("   ‚Ä¢ Bidirectional LSTM / Attention mechanisms")
print("   ‚Ä¢ Ensemble of multiple models")
print("   ‚Ä¢ Hyperparameter tuning with cross-validation")
print("   ‚Ä¢ Physics-informed neural networks")
print("   ‚Ä¢ Test on more complex datasets (FD002, FD003, FD004)")

print("\n" + "="*80)


---

## üìö References

1. Saxena, A., Goebel, K., Simon, D., & Eklund, N. (2008). **Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation**. International Conference on Prognostics and Health Management (PHM08).

2. NASA Prognostics Center of Excellence Data Repository: [https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/](https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/)

---

**End of Notebook**
