# Extrinsic Utility Model Training (ANN)

This notebook trains an Artificial Neural Network (specifically, an MLPRegressor) to act as the extrinsic utility model for the Robobo robot.

**Objective:** Learn a function that estimates the utility of a given perceptual state, where higher utility means the state is "better" for reaching the **red cylinder** goal.

**Data Requirement:** This notebook expects a dataset file (e.g., `goal_traces.csv`) containing sequences of perceptual states recorded during successful runs where the robot reached the red cylinder. Each row should represent a state, and include columns for the perceptual features and a corresponding `target_utility` value calculated based on its position within the successful trace.

## 1. Imports

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.preprocessing import StandardScaler
import joblib
import os
import time

## 2. Configuration and Parameters

In [None]:
# --- File Paths ---
DATASET_DIR = "../dataset"  # Relative path to the dataset folder
MODEL_DIR = "../src/models"   # Relative path to save models/scalers
DATASET_FILENAME = "goal_traces.csv" # Name of your trace data file
MODEL_FILENAME = "extrinsic_utility_model.joblib"
SCALER_FILENAME = "extrinsic_utility_scaler.joblib"

DATASET_PATH = os.path.join(DATASET_DIR, DATASET_FILENAME)
UTILITY_MODEL_PATH = os.path.join(MODEL_DIR, MODEL_FILENAME)
UTILITY_SCALER_PATH = os.path.join(MODEL_DIR, SCALER_FILENAME)

# --- Training Parameters ---
TEST_SIZE = 0.2           # Proportion of data for the test set
RANDOM_STATE = 42         # Seed for reproducibility

# --- MLP Regressor Hyperparameters (Example - Tune these!) ---
HIDDEN_LAYER_SIZES = (128, 64, 32) # Example: 3 hidden layers
ACTIVATION = 'relu'         # Activation function ('relu', 'tanh', etc.)
SOLVER = 'adam'             # Optimization algorithm ('adam', 'sgd', etc.)
ALPHA = 0.0001              # L2 regularization term
LEARNING_RATE = 'adaptive'  # Learning rate schedule ('constant', 'adaptive')
MAX_ITER = 1000             # Max training iterations
EARLY_STOPPING = True       # Stop training if validation score doesn't improve
N_ITER_NO_CHANGE = 20      # Number of iterations with no improvement to wait

# --- Feature and Target Columns ---
# Ensure these match the columns in your goal_traces.csv file
FEATURE_COLUMNS = [
    'distance_red', 'angle_red',
    'distance_green', 'angle_green',
    'distance_blue', 'angle_blue'
]
TARGET_COLUMN = 'target_utility'

# --- Ensure directories exist ---
os.makedirs(MODEL_DIR, exist_ok=True)
os.makedirs(DATASET_DIR, exist_ok=True) # Just in case

## 3. Load Goal Trace Data

Load the pre-collected dataset containing sequences of states leading to the goal.

In [None]:
print(f"Loading dataset from: {DATASET_PATH}")
try:
    df_traces = pd.read_csv(DATASET_PATH)
    print(f"Dataset loaded successfully. Shape: {df_traces.shape}")
    
    # Basic Data Inspection
    print("\nFirst 5 rows:")
    print(df_traces.head())
    print("\nDataset Info:")
    df_traces.info()
    print("\nChecking for NaN values:")
    print(df_traces.isnull().sum())

    # Handle potential NaN values (e.g., drop rows with NaNs)
    initial_rows = len(df_traces)
    df_traces.dropna(inplace=True)
    if len(df_traces) < initial_rows:
        print(f"\nDropped {initial_rows - len(df_traces)} rows containing NaN values.")
        print(f"Shape after dropping NaNs: {df_traces.shape}")

except FileNotFoundError:
    print(f"\nError: Dataset file not found at '{DATASET_PATH}'")
    print("Please run the `collect_goal_traces.py` script first to generate the data.")
    # Optional: Exit or handle appropriately
    # exit()
    df_traces = None # Set to None to prevent further errors
except Exception as e:
    print(f"\nAn error occurred while loading or processing the data: {e}")
    df_traces = None

## 4. Preprocessing

1.  **Separate Features (X) and Target (y):** Isolate the input perceptual state features from the target utility value.
2.  **Train/Test Split:** Divide the data for training and evaluation.
3.  **Feature Scaling:** Scale the input features using StandardScaler. This is crucial for ANNs.

In [None]:
X_train_scaled, X_test_scaled, y_train, y_test, scaler = (None, None, None, None, None)

if df_traces is not None and not df_traces.empty:
    try:
        # 1. Separate Features and Target
        X = df_traces[FEATURE_COLUMNS].values
        y = df_traces[TARGET_COLUMN].values
        print(f"\nFeatures (X) shape: {X.shape}")
        print(f"Target (y) shape: {y.shape}")

        # 2. Train/Test Split
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=TEST_SIZE, random_state=RANDOM_STATE
        )
        print(f"Training set size: {X_train.shape[0]} samples")
        print(f"Testing set size: {X_test.shape[0]} samples")

        # 3. Feature Scaling
        scaler = StandardScaler()
        X_train_scaled = scaler.fit_transform(X_train) # Fit ONLY on training data
        X_test_scaled = scaler.transform(X_test)     # Transform test data
        print("\nFeatures scaled using StandardScaler.")
        
    except KeyError as e:
        print(f"\nError: Column not found in dataset: {e}")
        print(f"Please ensure '{DATASET_FILENAME}' contains columns: {FEATURE_COLUMNS + [TARGET_COLUMN]}")
    except Exception as e:
        print(f"\nAn error occurred during preprocessing: {e}")
else:
    print("\nSkipping preprocessing because data loading failed or DataFrame is empty.")

## 5. Train Utility Model (MLPRegressor)

Train the neural network using the preprocessed (scaled) training data.

In [None]:
utility_model = None

if X_train_scaled is not None and y_train is not None:
    print("\nTraining MLP Regressor for extrinsic utility...")
    start_time = time.time()

    utility_model = MLPRegressor(
        hidden_layer_sizes=HIDDEN_LAYER_SIZES,
        activation=ACTIVATION,
        solver=SOLVER,
        alpha=ALPHA,
        learning_rate=LEARNING_RATE,
        max_iter=MAX_ITER,
        early_stopping=EARLY_STOPPING,
        n_iter_no_change=N_ITER_NO_CHANGE,
        random_state=RANDOM_STATE,
        verbose=True # Print progress during training
    )

    try:
        utility_model.fit(X_train_scaled, y_train)
        end_time = time.time()
        print(f"\nUtility Model Training Complete. Time taken: {end_time - start_time:.2f} seconds.")
    except Exception as e:
        print(f"\nError during model training: {e}")
        utility_model = None
else:
    print("\nSkipping model training due to issues in data loading or preprocessing.")

## 6. Evaluate Model

Evaluate the trained model's performance on the unseen test set using Mean Squared Error (MSE) and Mean Absolute Error (MAE).

In [None]:
if utility_model is not None and X_test_scaled is not None and y_test is not None:
    print("\n--- Utility Model Evaluation ---")
    y_pred_test = utility_model.predict(X_test_scaled)
    
    mse_test = mean_squared_error(y_test, y_pred_test)
    mae_test = mean_absolute_error(y_test, y_pred_test)
    rmse_test = np.sqrt(mse_test)
    
    print(f"Test Set MSE:  {mse_test:.4f}")
    print(f"Test Set RMSE: {rmse_test:.4f}")
    print(f"Test Set MAE:  {mae_test:.4f}")

    # Optional: Evaluate on training set to check for overfitting
    y_pred_train = utility_model.predict(X_train_scaled)
    mse_train = mean_squared_error(y_train, y_pred_train)
    print(f"\nTraining Set MSE: {mse_train:.4f} (Lower is better, compare with Test MSE)")

else:
    print("\nSkipping evaluation because model training failed or test data is unavailable.")

## 7. Save Model and Scaler

Save the trained utility model and the scaler object (needed for scaling new data before prediction) for later use in the robot's control script.

In [None]:
if utility_model is not None and scaler is not None:
    try:
        joblib.dump(utility_model, UTILITY_MODEL_PATH)
        joblib.dump(scaler, UTILITY_SCALER_PATH)
        print(f"\nUtility model saved successfully to: {UTILITY_MODEL_PATH}")
        print(f"Utility scaler saved successfully to: {UTILITY_SCALER_PATH}")
    except Exception as e:
        print(f"\nError saving model or scaler: {e}")
else:
     print("\nSkipping saving because model or scaler was not successfully created.")

## Next Steps

1.  **Generate `goal_traces.csv`:** Run the `collect_goal_traces.py` script (or your modified version) to collect sufficient data.
2.  **Run this Notebook:** Execute the cells above to train and save the utility model and scaler.
3.  **Integrate:** Load the saved model (`extrinsic_utility_model.joblib`) and scaler (`extrinsic_utility_scaler.joblib`) in your main robot control script (`main.py` or similar).
4.  **Use for Deliberation:**
    *   In your decision loop, for each candidate action:
        *   Predict the next state using your world model.
        *   Scale the predicted state using the loaded utility scaler.
        *   Use the loaded utility model's `.predict()` method on the scaled state to get its utility.
    *   Choose the action leading to the highest predicted utility.