# Rocket Lab ML Models - Model Registry

This notebook trains ML models for the Rocket Lab Intelligence Agent:
- **Mission Risk Predictor**: Forecasts launch risk (High/Low) based on weather, technical scores, and payload.
- **Supplier Quality Predictor**: Identifies suppliers at risk of quality issues.
- **Component Failure Predictor**: Predicts component failure likelihood based on test cycles and age.

**Data Source:** Uses Feature Views defined in `04_create_views.sql` (Single Source of Truth).

## Prerequisites

**Required Packages** (configured via `environment.yml`):
- `snowflake-ml-python`
- `scikit-learn`
- `pandas`

**Database Context:**
- **Database:** ROCKET_LAB_INTELLIGENCE
- **Schema:** ANALYTICS
- **Warehouse:** ROCKET_LAB_WH

In [None]:
# Import Python packages
from snowflake.snowpark import Session
from snowflake.snowpark.context import get_active_session
from snowflake.ml.registry import Registry
from snowflake.ml.modeling.ensemble import RandomForestClassifier
from snowflake.ml.modeling.pipeline import Pipeline
from snowflake.ml.modeling.metrics import accuracy_score
import pandas as pd

# Get active Snowflake session
session = get_active_session()

# Set context
session.use_database('ROCKET_LAB_INTELLIGENCE')
session.use_schema('ANALYTICS')
session.use_warehouse('ROCKET_LAB_WH')

print(f"✅ Connected - Database: {session.get_current_database()}, Schema: {session.get_current_schema()}")

## MODEL 1: Mission Risk Predictor
Predicts mission SUCCESS (0) or FAILURE (1) risk.

In [None]:
# Load training data from Feature View (Single Source of Truth)
mission_df = session.table("ROCKET_LAB_INTELLIGENCE.ANALYTICS.V_MISSION_RISK_FEATURES")

print(f"Mission data: {mission_df.count()} rows")
mission_df.show(5)

In [None]:
# Define features and label
mission_features = ['WEATHER_RISK', 'TECHNICAL_RISK', 'PAYLOAD_MASS', 'CONTRACT_VAL']
mission_label = 'RISK_LABEL'

# Train/Test Split
train_mission, test_mission = mission_df.random_split([0.8, 0.2], seed=42)

# Pipeline
mission_pipeline = Pipeline([
    ("Classifier", RandomForestClassifier(
        input_cols=mission_features,
        label_cols=mission_label,
        output_cols=["PREDICTED_RISK"],
        n_estimators=100,
        max_depth=5
    ))
])

# Train
mission_pipeline.fit(train_mission)
print("✅ Mission model trained")

In [None]:
# Evaluate
mission_preds = mission_pipeline.predict(test_mission)
acc = accuracy_score(df=mission_preds, y_true_col_names=mission_label, y_pred_col_names="PREDICTED_RISK")
print(f"Accuracy: {acc}")

# Register
reg = Registry(session=session, database_name="ROCKET_LAB_INTELLIGENCE", schema_name="ANALYTICS")

model_ref_mission = reg.log_model(
    model_name="MISSION_RISK_PREDICTOR",
    version_name=None, # Auto-versioning
    model=mission_pipeline,
    sample_input_data=train_mission.select(mission_features).limit(10),
    comment="Predicts mission failure risk based on weather and technical scores",
    metrics={"accuracy": acc}
)

print("✅ MISSION_RISK_PREDICTOR registered successfully.")

## MODEL 2: Supplier Quality Predictor
Predicts if a supplier is at risk of quality issues.

In [None]:
# Load data
supplier_df = session.table("ROCKET_LAB_INTELLIGENCE.ANALYTICS.V_SUPPLIER_QUALITY_FEATURES")

print(f"Supplier data: {supplier_df.count()} rows")
supplier_df.show(5)

In [None]:
# Features
sup_features = ['QUALITY_SCORE', 'DELIVERY_SCORE', 'RISK_METRIC', 'SPEND_AMOUNT']
sup_label = 'QUALITY_LABEL'

# Train/Test Split
train_sup, test_sup = supplier_df.random_split([0.8, 0.2], seed=42)

# Pipeline
sup_pipeline = Pipeline([
    ("Classifier", RandomForestClassifier(
        input_cols=sup_features,
        label_cols=sup_label,
        output_cols=["PREDICTED_QUALITY"],
        n_estimators=50
    ))
])

# Train
sup_pipeline.fit(train_sup)
print("✅ Supplier model trained")

In [None]:
# Evaluate
sup_preds = sup_pipeline.predict(test_sup)
acc_sup = accuracy_score(df=sup_preds, y_true_col_names=sup_label, y_pred_col_names="PREDICTED_QUALITY")
print(f"Accuracy: {acc_sup}")

# Register
model_ref_sup = reg.log_model(
    model_name="SUPPLIER_QUALITY_PREDICTOR",
    version_name=None,
    model=sup_pipeline,
    sample_input_data=train_sup.select(sup_features).limit(10),
    comment="Predicts supplier quality risk based on ratings",
    metrics={"accuracy": acc_sup}
)

print("✅ SUPPLIER_QUALITY_PREDICTOR registered successfully.")

## MODEL 3: Component Failure Predictor
Predicts likelihood of component failure.

In [None]:
# Load data
comp_df = session.table("ROCKET_LAB_INTELLIGENCE.ANALYTICS.V_COMPONENT_FAILURE_FEATURES")

print(f"Component data: {comp_df.count()} rows")
comp_df.show(5)

In [None]:
# Features
comp_features = ['CYCLE_COUNT', 'AGE_DAYS']
comp_label = 'FAILURE_LABEL'

# Train/Test Split
train_comp, test_comp = comp_df.random_split([0.8, 0.2], seed=42)

# Pipeline
comp_pipeline = Pipeline([
    ("Classifier", RandomForestClassifier(
        input_cols=comp_features,
        label_cols=comp_label,
        output_cols=["PREDICTED_FAILURE"],
        n_estimators=50
    ))
])

# Train
comp_pipeline.fit(train_comp)
print("✅ Component model trained")

In [None]:
# Evaluate
comp_preds = comp_pipeline.predict(test_comp)
acc_comp = accuracy_score(df=comp_preds, y_true_col_names=comp_label, y_pred_col_names="PREDICTED_FAILURE")
print(f"Accuracy: {acc_comp}")

# Register
model_ref_comp = reg.log_model(
    model_name="COMPONENT_FAILURE_PREDICTOR",
    version_name=None,
    model=comp_pipeline,
    sample_input_data=train_comp.select(comp_features).limit(10),
    comment="Predicts component failure based on cycles and age",
    metrics={"accuracy": acc_comp}
)

print("✅ COMPONENT_FAILURE_PREDICTOR registered successfully.")

## Verification
Verify that all models are registered and available.

In [None]:
# Verify Models
models = reg.show_models()
print(models)

print("
Summary: All 3 models trained and registered.")