# Kratos Defense ML Models - Model Registry

This notebook trains ML models for the Kratos Intelligence Agent:
- **Program Risk Prediction** - Classify programs at risk of cost/schedule issues
- **Supplier Risk Prediction** - Classify suppliers at risk of quality/delivery issues
- **Production Forecasting** - Predict manufacturing order volume

All models are registered to Snowflake Model Registry and can be added as tools to the Intelligence Agent.

## Before You Begin

**Add these packages** in the Packages dropdown (upper right):
- `snowflake-ml-python`
- `scikit-learn`
- `xgboost`
- `matplotlib`

**Database:** KRATOS_INTELLIGENCE  
**Schema:** ANALYTICS  
**Warehouse:** KRATOS_WH


## Import Required Packages


In [None]:
# Import Python packages
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# Import Snowpark
from snowflake.snowpark.context import get_active_session
import snowflake.snowpark.functions as F
import snowflake.snowpark.types as T
from snowflake.snowpark import Window

# Import Snowpark ML
from snowflake.ml.modeling.preprocessing import StandardScaler, OneHotEncoder
from snowflake.ml.modeling.pipeline import Pipeline
from snowflake.ml.modeling.linear_model import LinearRegression, LogisticRegression
from snowflake.ml.modeling.ensemble import RandomForestClassifier
from snowflake.ml.modeling.metrics import mean_squared_error, mean_absolute_error, accuracy_score
from snowflake.ml.registry import Registry

print("✅ Packages imported successfully")


## Connect to Snowflake

Get active session and set context to Kratos database.


In [None]:
# Get active Snowflake session
session = get_active_session()

# Set context
session.use_database('KRATOS_INTELLIGENCE')
session.use_schema('ANALYTICS')
session.use_warehouse('KRATOS_WH')

print(f"✅ Connected - Role: {session.get_current_role()}")
print(f"   Warehouse: {session.get_current_warehouse()}")
print(f"   Database.Schema: {session.get_fully_qualified_current_schema()}")


---
# MODEL 1: Program Risk Prediction

Classify programs at risk of cost/schedule overruns based on financial metrics.


### Prepare Program Risk Training Data


In [None]:
# Get program features for risk prediction
program_df = session.sql("""
SELECT
    p.program_id,
    p.program_type,
    p.contract_type,
    (p.funded_value / NULLIF(p.total_contract_value, 0))::FLOAT AS funded_ratio,
    (p.costs_incurred / NULLIF(p.funded_value, 0))::FLOAT AS cost_ratio,
    (p.revenue_recognized / NULLIF(p.costs_incurred, 0))::FLOAT AS revenue_cost_ratio,
    COALESCE(p.margin_percentage, 10)::FLOAT AS margin_pct,
    DATEDIFF('day', p.start_date, CURRENT_DATE())::FLOAT AS days_active,
    DATEDIFF('day', CURRENT_DATE(), COALESCE(p.planned_end_date, DATEADD('year', 1, CURRENT_DATE())))::FLOAT AS days_remaining,
    (p.risk_level = 'HIGH' OR (p.costs_incurred > p.funded_value * 0.95))::BOOLEAN AS is_at_risk
FROM RAW.PROGRAMS p
WHERE p.program_status = 'ACTIVE'
  AND p.total_contract_value > 0
  AND p.funded_value > 0
""")

print(f"Program data: {program_df.count()} programs")
program_df.show(5)


### Train Program Risk Model


In [None]:
# Train/test split (80/20)
train_program, test_program = program_df.random_split([0.8, 0.2], seed=42)

# Drop PROGRAM_ID
train_program = train_program.drop("PROGRAM_ID")
test_program = test_program.drop("PROGRAM_ID")

# Create pipeline
program_pipeline = Pipeline([
    ("Encoder", OneHotEncoder(
        input_cols=["PROGRAM_TYPE", "CONTRACT_TYPE"],
        output_cols=["PROGRAM_TYPE_ENC", "CONTRACT_TYPE_ENC"],
        drop_input_cols=True,
        handle_unknown="ignore"
    )),
    ("Classifier", RandomForestClassifier(
        label_cols=["IS_AT_RISK"],
        output_cols=["RISK_PREDICTION"],
        n_estimators=100,
        max_depth=10
    ))
])

# Train model
program_pipeline.fit(train_program)
print("✅ Program risk model trained")


### Evaluate and Register Program Risk Model


In [None]:
# Make predictions
program_predictions = program_pipeline.predict(test_program)

# Calculate metrics
accuracy = accuracy_score(df=program_predictions, y_true_col_names="IS_AT_RISK", y_pred_col_names="RISK_PREDICTION")
metrics = {"accuracy": round(accuracy, 4)}
print(f"Program risk model metrics: {metrics}")

# Register model
reg = Registry(session)
reg.log_model(
    model=program_pipeline,
    model_name="PROGRAM_RISK_PREDICTOR",
    version_name="V1",
    comment="Predicts program cost/schedule risk using Random Forest based on financial metrics",
    metrics=metrics
)

print("✅ Program risk model registered to Model Registry as PROGRAM_RISK_PREDICTOR")


---
# MODEL 2: Supplier Risk Prediction

Classify suppliers at risk of quality or delivery issues based on performance ratings.


### Prepare Supplier Risk Training Data


In [None]:
# Get supplier features for risk prediction
supplier_df = session.sql("""
SELECT
    s.supplier_id,
    s.supplier_type,
    s.supplier_category,
    COALESCE(s.quality_rating, 0.8)::FLOAT AS quality_rating,
    COALESCE(s.delivery_rating, 0.8)::FLOAT AS delivery_rating,
    (s.total_spend / 1000000)::FLOAT AS spend_millions,
    DATEDIFF('day', COALESCE(s.first_order_date, CURRENT_DATE()), CURRENT_DATE())::FLOAT AS days_as_supplier,
    s.is_small_business::BOOLEAN AS is_small_business,
    (COALESCE(s.quality_rating, 0.8) < 0.75 OR COALESCE(s.delivery_rating, 0.8) < 0.75)::BOOLEAN AS is_at_risk
FROM RAW.SUPPLIERS s
WHERE s.supplier_status = 'ACTIVE'
""")

print(f"Supplier data: {supplier_df.count()} suppliers")
supplier_df.show(5)


### Train Supplier Risk Model


In [None]:
# Train/test split (80/20)
train_supplier, test_supplier = supplier_df.random_split([0.8, 0.2], seed=42)

# Drop SUPPLIER_ID
train_supplier = train_supplier.drop("SUPPLIER_ID")
test_supplier = test_supplier.drop("SUPPLIER_ID")

# Create pipeline
supplier_pipeline = Pipeline([
    ("Encoder", OneHotEncoder(
        input_cols=["SUPPLIER_TYPE", "SUPPLIER_CATEGORY"],
        output_cols=["SUPPLIER_TYPE_ENC", "SUPPLIER_CATEGORY_ENC"],
        drop_input_cols=True,
        handle_unknown="ignore"
    )),
    ("Classifier", RandomForestClassifier(
        label_cols=["IS_AT_RISK"],
        output_cols=["RISK_PREDICTION"],
        n_estimators=100,
        max_depth=8
    ))
])

# Train model
supplier_pipeline.fit(train_supplier)
print("✅ Supplier risk model trained")


### Evaluate and Register Supplier Risk Model


In [None]:
# Make predictions
supplier_predictions = supplier_pipeline.predict(test_supplier)

# Calculate metrics
accuracy = accuracy_score(df=supplier_predictions, y_true_col_names="IS_AT_RISK", y_pred_col_names="RISK_PREDICTION")
supplier_metrics = {"accuracy": round(accuracy, 4)}
print(f"Supplier risk model metrics: {supplier_metrics}")

# Register model
reg.log_model(
    model=supplier_pipeline,
    model_name="SUPPLIER_RISK_PREDICTOR",
    version_name="V1",
    comment="Predicts supplier quality/delivery risk using Random Forest based on performance ratings",
    metrics=supplier_metrics
)

print("✅ Supplier risk model registered to Model Registry as SUPPLIER_RISK_PREDICTOR")


---
# MODEL 3: Production Forecasting

Forecast manufacturing order volume based on historical patterns.


### Prepare Production Forecast Data


In [None]:
# Get monthly production data with features
production_df = session.sql("""
SELECT
    DATE_TRUNC('month', order_date)::DATE AS order_month,
    MONTH(order_date)::FLOAT AS month_num,
    YEAR(order_date)::FLOAT AS year_num,
    COUNT(DISTINCT order_id)::FLOAT AS order_count,
    SUM(quantity_ordered)::FLOAT AS total_quantity,
    SUM(total_cost)::FLOAT AS total_cost
FROM RAW.MANUFACTURING_ORDERS
WHERE order_date >= DATEADD('month', -24, CURRENT_DATE())
GROUP BY DATE_TRUNC('month', order_date), MONTH(order_date), YEAR(order_date)
ORDER BY order_month
""")

print(f"Production data: {production_df.count()} months")
production_df.show(5)


### Train Production Forecast Model


In [None]:
# Train/test split (last 6 months for testing)
train_production = production_df.filter(F.col("ORDER_MONTH") < F.dateadd("month", F.lit(-6), F.current_date()))
test_production = production_df.filter(F.col("ORDER_MONTH") >= F.dateadd("month", F.lit(-6), F.current_date()))

# Drop ORDER_MONTH (DATE type not supported in pipeline)
train_production = train_production.drop("ORDER_MONTH")
test_production = test_production.drop("ORDER_MONTH")

# Create pipeline
production_pipeline = Pipeline([
    ("Scaler", StandardScaler(
        input_cols=["MONTH_NUM", "TOTAL_QUANTITY", "TOTAL_COST"],
        output_cols=["MONTH_NUM_SCALED", "TOTAL_QUANTITY_SCALED", "TOTAL_COST_SCALED"]
    )),
    ("LinearRegression", LinearRegression(
        label_cols=["ORDER_COUNT"],
        output_cols=["PREDICTED_ORDERS"]
    ))
])

# Train model
production_pipeline.fit(train_production)
print("✅ Production forecast model trained")


### Evaluate and Register Production Forecast Model


In [None]:
# Make predictions on test set
production_predictions = production_pipeline.predict(test_production)

# Calculate metrics
mae = mean_absolute_error(df=production_predictions, y_true_col_names="ORDER_COUNT", y_pred_col_names="PREDICTED_ORDERS")
mse = mean_squared_error(df=production_predictions, y_true_col_names="ORDER_COUNT", y_pred_col_names="PREDICTED_ORDERS")
rmse = mse ** 0.5

production_metrics = {"mae": round(mae, 2), "rmse": round(rmse, 2)}
print(f"Production forecast metrics: {production_metrics}")

# Register model
reg.log_model(
    model=production_pipeline,
    model_name="PRODUCTION_FORECASTER",
    version_name="V1",
    comment="Forecasts monthly manufacturing order count using Linear Regression",
    metrics=production_metrics
)

print("✅ Production forecast model registered to Model Registry as PRODUCTION_FORECASTER")


---
# Verify Models in Registry


In [None]:
# Show all models in the registry
print("Models in registry:")
print(reg.show_models())

# Show versions for each model
print("\nProgram Risk model versions:")
print(reg.get_model("PROGRAM_RISK_PREDICTOR").show_versions())

print("\nSupplier Risk model versions:")
print(reg.get_model("SUPPLIER_RISK_PREDICTOR").show_versions())

print("\nProduction Forecaster versions:")
print(reg.get_model("PRODUCTION_FORECASTER").show_versions())

print("\n✅ All models registered and ready to add to Intelligence Agent")


---
# Next Steps

## Add Models to Intelligence Agent

1. In Snowsight → AI & ML → Agents → KRATOS_INTELLIGENCE_AGENT
2. Go to Tools → + Add → Model
3. Add each registered model:
   - **PROGRAM_RISK_PREDICTOR**
   - **SUPPLIER_RISK_PREDICTOR**
   - **PRODUCTION_FORECASTER**

## Example Questions for Agent

- "Which programs are predicted to be at risk according to the program risk predictor?"
- "Show me suppliers with high risk using the supplier risk predictor"
- "Forecast production for next quarter using the production forecaster"

The models will now be available as tools your agent can use!
