# 📈 05_Model_Inference.ipynb

This notebook demonstrates how to use the trained and registered aircraft anomaly prediction model for inference.

It covers:
- Loading the model using both version number and alias (recommended)
- Predicting anomaly likelihood on new sensor feature data
- Writing high-risk predictions to the `anomaly_alerts` Delta table

This ensures the end-to-end machine learning lifecycle is complete from training to real-time scoring.


In [0]:
import pandas as pd
import numpy as np
import mlflow
from mlflow.pyfunc import load_model

## 🔢 Load model using version number

This method explicitly loads a specific version of the model registered in Unity Catalog.

In [0]:
model_uri = "models:/arao.aerodemo.aircraftanomalypredictor@champion"
loaded_model = load_model(model_uri)

## 🏷️ Load model using alias

This preferred method loads the model tagged with the alias `champion`, making it easy to switch versions without code changes.

In [0]:
model_uri_alias = "models:/AircraftAnomalyPredictor@champion"
model_champion = mlflow.pyfunc.load_model(model_uri_alias)
print("✅ Loaded model with alias @champion")

## 🧪 Prepare input features for prediction

The sample below must match the schema used during model training, including the engineered features.

In [0]:
feature_df = spark.read.table("arao.aerodemo.sensor_features_table").toPandas()

# Drop label column and cast to match model schema
batch_df = feature_df.drop(columns=["anomaly_score"]).sample(5)
batch_df = batch_df.astype({
    "engine_temp": "float64",
    "fuel_efficiency": "float64",
    "vibration": "float64",
    "altitude": "float64",
    "airspeed": "float64",
    "oil_pressure": "float64",
    "engine_rpm": "int32",
    "battery_voltage": "float64",
    "prev_anomaly": "float64",
    "avg_engine_temp_7d": "float64",
    "avg_vibration_7d": "float64",
    "avg_rpm_7d": "float64",
    "days_since_maint": "float64"
})

## 🔍 Run inference using the Registered Model Version

This cell runs inference using a **specific registered model version** (`version 2`) from Unity Catalog.
Using version numbers is useful when you want full control over which model version to use, especially for repeatable experiments.

In [0]:
import pandas as pd
import numpy as np
import mlflow
from mlflow.pyfunc import load_model

# 🔢 Specify the exact model version
model_uri = "models:/arao.aerodemo.aircraftanomalypredictor/3"
loaded_model = load_model(model_uri)

# 🧪 Sample input matching model signature
sample_input = pd.DataFrame([{
    "engine_temp": 610.0,
    "fuel_efficiency": 76.2,
    "vibration": 5.3,
    "altitude": 29950.0,
    "airspeed": 452.0,
    "oil_pressure": 61.0,
    "engine_rpm": np.int32(3900),
    "battery_voltage": 25.0,
    "prev_anomaly": 1.0,
    "avg_engine_temp_7d": 608.0,
    "avg_vibration_7d": 5.0,
    "avg_rpm_7d": 3850.0,
    "days_since_maint": 12.0
}])

# 🧠 Predict
prediction = loaded_model.predict(sample_input)
print("🧠 Predicted Anomaly (0 = Normal, 1 = Anomalous):", prediction[0])

## 🔍 Run inference using the `@champion` alias

This ensures you're scoring with the most recently promoted model version.

In [0]:
predictions = loaded_model.predict(batch_df)
print("Predictions:", predictions)

## 💾 Save inference results to Delta table

This allows downstream applications or alerts to monitor high-risk events.

In [0]:
from mlflow.pyfunc import load_model

# ✅ Load the champion version of the model
model_uri = "models:/arao.aerodemo.aircraftanomalypredictor@champion"
loaded_model = load_model(model_uri)

# 🔍 Run inference
predictions = loaded_model.predict(batch_df)

# ✅ Attach predictions and write back to Delta table
inference_df = batch_df.copy()
inference_df["predicted_anomaly"] = predictions



In [0]:
from pyspark.sql.functions import col, to_timestamp, current_date, lit
from pyspark.sql import SparkSession

# Convert to Spark DataFrame
spark = SparkSession.builder.getOrCreate()
spark_df = spark.createDataFrame(inference_df)

# ✅ Align schema types

# 1️⃣ Cast 'timestamp' from string to timestamp type
spark_df = spark_df.withColumn("timestamp", to_timestamp(col("timestamp")))

# 2️⃣ Cast numeric fields to match target table
spark_df = spark_df.withColumn("capacity", col("capacity").cast("int"))
spark_df = spark_df.withColumn("range_km", col("range_km").cast("int"))
spark_df = spark_df.withColumn("days_since_maint", col("days_since_maint").cast("int"))

# 3️⃣ Add missing columns with default values if they don’t exist
if "alert_day" not in spark_df.columns:
    spark_df = spark_df.withColumn("alert_day", current_date())

if "batch_id" not in spark_df.columns:
    spark_df = spark_df.withColumn("batch_id", lit("batch_001"))

if "anomaly_score" not in spark_df.columns:
    spark_df = spark_df.withColumn("anomaly_score", lit(None).cast("int"))

# ✅ Write to Delta table
spark_df.write.format("delta").mode("append").saveAsTable("arao.aerodemo.anomaly_alerts")

print("✅ Inference results written to 'arao.aerodemo.anomaly_alerts' Delta table")

In [0]:
spark_df.printSchema()