# Trunkline ML Pipeline Walkthrough
This notebook provides a comprehensive walkthrough of the Trunkline ML Pipeline, explaining each component and how they work together.

## 1. Environment Setup
First, let's set up our environment by importing necessary libraries and setting up paths.

In [1]:
import sys
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Add project root to path
project_root = Path.cwd().parent
sys.path.append(str(project_root))

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

## 2. Data Loading and Preprocessing
The pipeline starts with loading and cleaning the data. Let's examine this process.

In [None]:
from src.data_preprocessing import load_and_clean_data, prepare_features

# Example data loading
# data = load_and_clean_data('path/to/your/data.csv')
# X, y, feature_names = prepare_features(data)

print("Data loading and preprocessing functions are ready to use.")
print("Uncomment and modify the example above to load your dataset.")

ModuleNotFoundError: No module named 'src.data_preprocessing'

## 3. Model Training
The pipeline supports multiple model types. Here's how to train a model.

In [None]:
from src.ml_pipeline import MLPipeline
from sklearn.model_selection import train_test_split

# Example model training
def train_example_model(X, y, model_type='random_forest'):
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    
    # Initialize and train pipeline
    pipeline = MLPipeline()
    pipeline.fit(
        X_train,
        y_train,
        model_type=model_type,
        cv=5
    )
    
    return pipeline, X_test, y_test

print("Model training function defined. Call train_example_model(X, y) to train a model.")

## 4. Model Evaluation
After training, we can evaluate the model's performance using various metrics and visualizations.

In [None]:
from src.model_evaluation import (
    plot_learning_curve,
    plot_feature_importance_rf,
    plot_predicted_vs_true,
    plot_residuals
)

def evaluate_model(pipeline, X_test, y_test):
    # Make predictions
    y_pred = pipeline.predict(X_test)
    
    # Generate evaluation plots
    plot_predicted_vs_true(y_test, y_pred, model_name='Example Model')
    plot_residuals(y_test, y_pred, model_name='Example Model')
    
    # Feature importance if available
    if hasattr(pipeline.model, 'feature_importances_'):
        plot_feature_importance_rf(
            pipeline.model,
            X_test.columns if hasattr(X_test, 'columns') else None,
            model_name='Example Model'
        )

print("Evaluation functions are ready. Call evaluate_model(pipeline, X_test, y_test) to evaluate.")

## 5. Using the GUI
The pipeline includes a user-friendly GUI for interactive use.

In [None]:
# To run the GUI, uncomment and run the following command in your terminal:
# python -m src.gui.main

print("See the README for more information on using the GUI.")

## 6. Advanced Features
The pipeline also includes advanced features like SHAP explanations and ensemble methods.

In [None]:
from src.shap_visualization import shap_explainability
from src.ensemble import WeightedEnsemble

def explain_model(pipeline, X, feature_names=None):
    # Generate SHAP explanations
    shap_explainability(
        pipeline.model,
        X,
        feature_names=feature_names,
        model_name='Example Model'
    )

print("Advanced analysis functions are ready. See function docstrings for usage.")

## Next Steps
1. Load your dataset
2. Preprocess the data
3. Train and evaluate models
4. Use the GUI for interactive analysis
5. Generate reports and visualizations