# Facies Classification - SVM

This notebook demonstrates facies classification using machine learning with GeoSuite.

## Overview

GeoSuite provides tools for training facies classifiers on well log data:
- Multiple model types: Random Forest, SVM, Gradient Boosting, Logistic Regression
- Proper train/test splitting to prevent data leakage
- Model evaluation with confusion matrices and metrics
- MLflow integration for experiment tracking (optional)

This notebook will show you how to:

1. Load facies training data
2. Prepare features and target variables
3. Train multiple classifier models
4. Evaluate and compare model performance

In [None]:
# Import GeoSuite modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from geosuite.data import load_facies_training_data, load_kansas_training_wells

print("GeoSuite imported successfully!")

## 1. Load Facies Training Data

GeoSuite provides several facies datasets including the Kansas University benchmark dataset.

In [None]:
# Load facies training data
df = load_facies_training_data()

print(f"Loaded {len(df):,} data points")
print(f"Wells in dataset: {df['Well Name'].nunique() if 'Well Name' in df.columns else 'N/A'}")
print(f"\nColumns: {df.columns.tolist()}")

# Check facies distribution
if 'Facies' in df.columns:
    print(f"\nFacies distribution:")
    print(df['Facies'].value_counts().sort_index())

df.head()

## 2. Prepare Features and Target

Select well log features for classification and the facies target variable.

In [None]:
# Define features and target
# Common features: GR, NPHI, RHOB, PE, DEPTH
feature_cols = ['GR', 'NPHI', 'RHOB', 'PE', 'DEPTH']
target_col = 'Facies'

# Check which features are available
available_features = [col for col in feature_cols if col in df.columns]
print(f"Available features: {available_features}")
print(f"Target variable: {target_col}")

if len(available_features) < 3:
    print("\nWarning: Limited features available. Using all numeric columns.")
    numeric_cols = [col for col in df.columns if col != target_col and pd.api.types.is_numeric_dtype(df[col])]
    available_features = numeric_cols[:5]  # Use first 5 numeric columns
    print(f"Using features: {available_features}")

## 3. Train SVM Classifier

Use `train_facies_classifier()` to train an SVM model.

In [None]:
# Train SVM classifier
results = train_facies_classifier(
    df=df,
    feature_cols=available_features,
    target_col=target_col,
    model_type='svm',
    test_size=0.3,
    random_state=42
)

print(f"Train Accuracy: {results['train_accuracy']:.4f} ({results['train_accuracy']*100:.2f}%)")
print(f"Test Accuracy:  {results['test_accuracy']:.4f} ({results['test_accuracy']*100:.2f}%)")
print(f"\nClasses: {results['classes']}")

## 4. Evaluate Model Performance

Examine the confusion matrix and per-class metrics.

In [None]:
# Display confusion matrix
print("Confusion Matrix:")
print(results['confusion_matrix'])

# Compute detailed metrics
try:
    metrics_df = compute_metrics_from_cm(
        cm=results['confusion_matrix'],
        labels=results['classes']
    )
    print("\nPer-class Metrics:")
    print(metrics_df.to_string(index=False))
except Exception as e:
    print(f"\nCould not compute detailed metrics: {e}")

## 5. Compare Multiple Models

Train and compare different classifier types to find the best model.

In [None]:
# Train multiple models for comparison
models = ['random_forest', 'svm', 'gradient_boosting', 'logistic_regression']
results_list = []

for model_type in models:
    try:
        results = train_facies_classifier(
            df=df,
            feature_cols=available_features,
            target_col=target_col,
            model_type=model_type,
            test_size=0.3,
            random_state=42
        )
        
        results_list.append({
            'Model': model_type.replace('_', ' ').title(),
            'Train Accuracy': results['train_accuracy'],
            'Test Accuracy': results['test_accuracy'],
        })
        print(f"{model_type}: Test Accuracy = {results['test_accuracy']:.4f}")
    except Exception as e:
        print(f"{model_type}: Failed - {e}")

# Create comparison DataFrame
if results_list:
    comparison_df = pd.DataFrame(results_list)
    print("\nModel Comparison:")
    print(comparison_df.to_string(index=False))
    
    best_model = comparison_df.loc[comparison_df['Test Accuracy'].idxmax(), 'Model']
    best_accuracy = comparison_df['Test Accuracy'].max()
    print(f"\nBest model: {best_model} (Test Accuracy: {best_accuracy:.4f})")

## 6. Summary

This notebook demonstrated:

-  Loading facies training data with `load_facies_training_data()`
-  Training SVM classifier with `train_facies_classifier()`
-  Evaluating model performance with confusion matrices
-  Comparing multiple classifier types

### Key Functions Used

- `load_facies_training_data()`: Load Kansas University benchmark dataset
- `train_facies_classifier()`: Train various ML models
- `compute_metrics_from_cm()`: Calculate per-class metrics

### Next Steps

- Use `MLflowFaciesClassifier` for experiment tracking
- Try feature engineering (well-based features, geological context)
- Apply trained model to new wells for prediction
- Use cross-validation for robust performance estimates