# 🛡️ Cybersecurity ML Training & Deployment

This notebook demonstrates advanced machine learning for cybersecurity using **Snowpark ML** and **Snowflake Model Registry**.

## 🎯 What This Notebook Does

1. **Isolation Forest**: Detects anomalous user behavior patterns
2. **K-means Clustering**: Groups users into behavioral personas  
3. **Model Registry**: Enterprise-grade model management
4. **UDF Deployment**: Deploy models as scalable functions
5. **Hybrid Analysis**: Combine multiple ML approaches

## 📋 Prerequisites

Before running this notebook, ensure you have:
1. ✅ Run `sql/01_cybersecurity_schema.sql` 
2. ✅ Run `sql/02_sample_data_generation.sql`
3. ✅ Run `sql/03_native_ml_and_cortex.sql`

---


## 🔧 Environment Setup


In [None]:
# Import required libraries
import snowflake.snowpark as snowpark
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col, lit, when, avg, count, max as max_, min as min_
from snowflake.snowpark.types import StructType, StructField, StringType, FloatType, IntegerType, BooleanType

# Snowpark ML imports
from snowflake.ml.modeling.ensemble import IsolationForest
from snowflake.ml.modeling.cluster import KMeans
from snowflake.ml.modeling.preprocessing import StandardScaler
from snowflake.ml.registry import Registry

# Data science libraries
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import json

print("📦 All libraries imported successfully!")


In [None]:
# Initialize Snowflake session (auto-connects in Snowflake Notebooks)
session = snowflake.snowpark.context.get_active_session()

# Set up database context
session.sql("USE DATABASE CYBERSECURITY_DEMO").collect()
session.sql("USE SCHEMA SECURITY_ANALYTICS").collect()
session.sql("USE WAREHOUSE CYBERSECURITY_WH").collect()

print("🔗 Connected to Snowflake session")
print(f"📊 Current database: {session.get_current_database()}")
print(f"📁 Current schema: {session.get_current_schema()}")


## 📊 Data Preparation for ML

*The ML models require feature engineering to extract behavioral patterns from raw authentication logs.*


In [None]:
# Create feature engineering view for ML models
feature_query = """
CREATE OR REPLACE VIEW ML_FEATURE_SET AS
SELECT 
    ual.USERNAME,
    
    -- Behavioral features
    COUNT(*) as TOTAL_LOGINS,
    AVG(CASE WHEN ual.SUCCESS THEN 1 ELSE 0 END) as SUCCESS_RATE,
    COUNT(CASE WHEN ual.SUCCESS = FALSE THEN 1 END) as FAILED_ATTEMPTS,
    
    -- Temporal features
    COUNT(CASE WHEN EXTRACT(hour FROM ual.TIMESTAMP) BETWEEN 22 AND 6 THEN 1 END) as OFF_HOURS_LOGINS,
    COUNT(CASE WHEN EXTRACT(dow FROM ual.TIMESTAMP) IN (0,6) THEN 1 END) as WEEKEND_LOGINS,
    
    -- Geographic features
    COUNT(DISTINCT ual.SOURCE_IP) as UNIQUE_IPS,
    COUNT(DISTINCT ual.LOCATION:country::STRING) as UNIQUE_COUNTRIES,
    
    -- Security features
    AVG(CASE WHEN ual.TWO_FACTOR_USED THEN 1 ELSE 0 END) as TWO_FACTOR_RATE,
    COUNT(DISTINCT ual.USER_AGENT) as UNIQUE_DEVICES,
    
    -- Organizational context
    ed.DEPARTMENT,
    ed.ROLE,
    ed.SECURITY_CLEARANCE,
    DATEDIFF(day, ed.HIRE_DATE, CURRENT_DATE()) as TENURE_DAYS
    
FROM USER_AUTHENTICATION_LOGS ual
JOIN EMPLOYEE_DATA ed ON ual.USERNAME = ed.USERNAME
WHERE ual.TIMESTAMP >= DATEADD(day, -30, CURRENT_TIMESTAMP())
    AND ed.STATUS = 'active'
GROUP BY ual.USERNAME, ed.DEPARTMENT, ed.ROLE, ed.SECURITY_CLEARANCE, ed.HIRE_DATE
HAVING COUNT(*) >= 5  -- Filter users with sufficient activity
"""

session.sql(feature_query).collect()
print("🔧 Feature engineering view created")

# Load feature data into Snowpark DataFrame
feature_df = session.table('ML_FEATURE_SET')
print(f"📈 Feature dataset: {feature_df.count()} users with behavioral features")

# Show sample of features
feature_df.limit(5).show()
