# Mahalanobis Distance Filter with Plotly Express and Dash

This notebook demonstrates how to use Plotly Express and Dash to create interactive visualizations for the Mahalanobis Distance Filter.

## Imports and Setup

In [None]:
import pandas as pd
import numpy as np
from scipy.spatial.distance import mahalanobis
from scipy.stats import chi2
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Suppress warnings
import warnings
warnings.filterwarnings("ignore")

print("Setup Complete")

## Load Sample Data

In [None]:
# Read csv
csv_path = "data/sample_data_7d.csv"
df_sample = pd.read_csv(csv_path)
df_sample.head()

## Mahalanobis Filter Class

In [None]:
class MahalanobisFilter:
    """Detects outliers using Mahalanobis distance."""

    def __init__(self, alpha=0.05):
        """Initialize with significance level alpha (default 0.05)"""
        self.alpha = alpha

    def fit(self, X):
        """Calculate mean and inverse covariance matrix from data"""
        # Exclude 'Pile No.' from calculations if it exists
        self.data_columns = X.columns.difference(["Pile No."])
        self.mean = np.mean(X[self.data_columns], axis=0)
        self.cov = np.cov(X[self.data_columns], rowvar=False)
        self.inv_cov = np.linalg.inv(self.cov)

    def mahalanobis_distance(self, x):
        """Calculate Mahalanobis distance for a single point"""
        # Ensure x is aligned with self.data_columns
        x = x[self.data_columns]
        return mahalanobis(x, self.mean, self.inv_cov)

    def filter(self, X, margin=1e-5):
        """
        Filter data into inliers and outliers based on Mahalanobis distance
        relative to a three-sigma threshold.
        Returns (inliers, outliers) as DataFrames.
        """
        # Calculate distances and three-sigma threshold
        distances = np.array(
            [self.mahalanobis_distance(X.iloc[i]) for i in range(len(X))]
        )
        threshold_distance = (
            np.sqrt(chi2.ppf(0.997, df=len(self.data_columns))) + margin
        )  # 3-sigma threshold with margin

        # Split data
        inlier_indices = distances <= threshold_distance
        outlier_indices = distances > threshold_distance

        return X[inlier_indices], X[outlier_indices], distances, threshold_distance

## Data Processing Functions

In [None]:
def process_data(data, sample_pairs):
    """Extract and pair non-null samples from raw data.

    Args:
        data: Input DataFrame containing sample data
        sample_pairs: Tuple of column names to pair

    Returns:
        DataFrame containing only complete pairs of samples
    """
    # Include 'Pile No.' if it exists
    columns_to_select = (
        ["Pile No."] + list(sample_pairs)
        if "Pile No." in data.columns
        else list(sample_pairs)
    )

    df = (
        data.loc[:, columns_to_select]  # Select only the necessary columns
        .dropna()  # Remove rows with any null values
        .copy()  # Return a copy to avoid SettingWithCopyWarning
    )
    return df

## Scatter Plot with Plotly Express

In [None]:
# Process data for Sample 1 vs Sample 2
sample_pair = ('Sample 1', 'Sample 2')
processed_data = process_data(df_sample, sample_pair)

# Apply Mahalanobis filter
m_filter = MahalanobisFilter()
m_filter.fit(processed_data)
inliers, outliers, distances, threshold = m_filter.filter(processed_data)

# Create a new column to identify outliers
processed_data['outlier'] = ['Outlier' if d > threshold else 'Inlier' for d in distances]
processed_data['mahalanobis_distance'] = distances

# Create a scatter plot using Plotly Express
fig = px.scatter(
    processed_data, 
    x=sample_pair[0], 
    y=sample_pair[1],
    color='outlier',
    color_discrete_map={'Inlier': '#2196F3', 'Outlier': '#D92906'},
    hover_data=['Pile No.', 'mahalanobis_distance'],
    title=f"Scatter Plot of {sample_pair[0]} vs {sample_pair[1]}",
    labels={
        sample_pair[0]: f"{sample_pair[0]} Compressive Strength (psi)",
        sample_pair[1]: f"{sample_pair[1]} Compressive Strength (psi)",
        'mahalanobis_distance': 'Mahalanobis Distance'
    }
)

# Customize the plot
fig.update_traces(marker=dict(size=10, opacity=0.7))
fig.update_layout(
    template='plotly_white',
    legend_title_text='Classification'
)

fig.show()

## 3D Scatter Plot for All Three Samples

In [None]:
# Process data for all three samples
sample_columns = ['Sample 1', 'Sample 2', 'Sample 3']
processed_data = process_data(df_sample, sample_columns)

# Apply Mahalanobis filter
m_filter = MahalanobisFilter()
m_filter.fit(processed_data)
inliers, outliers, distances, threshold = m_filter.filter(processed_data)

# Create a new column to identify outliers
processed_data['outlier'] = ['Outlier' if d > threshold else 'Inlier' for d in distances]
processed_data['mahalanobis_distance'] = distances

# Create a 3D scatter plot using Plotly Express
fig = px.scatter_3d(
    processed_data, 
    x='Sample 1', 
    y='Sample 2',
    z='Sample 3',
    color='outlier',
    color_discrete_map={'Inlier': '#2196F3', 'Outlier': '#D92906'},
    hover_data=['Pile No.', 'mahalanobis_distance'],
    title="3D Scatter Plot of All Samples",
    labels={
        'Sample 1': "Sample 1 Compressive Strength (psi)",
        'Sample 2': "Sample 2 Compressive Strength (psi)",
        'Sample 3': "Sample 3 Compressive Strength (psi)",
        'mahalanobis_distance': 'Mahalanobis Distance'
    }
)

# Customize the plot
fig.update_traces(marker=dict(size=5, opacity=0.7))
fig.update_layout(
    scene=dict(
        xaxis_title='Sample 1',
        yaxis_title='Sample 2',
        zaxis_title='Sample 3'
    ),
    legend_title_text='Classification'
)

fig.show()

## Interactive Dashboard with Dash

We've created a separate Dash application that provides an interactive dashboard for the Mahalanobis Distance Filter. The dashboard allows you to:

1. Select different sample pairs for analysis
2. Adjust the confidence level (sigma value) for the ellipse
3. View statistics about the data and outliers
4. Interact with the plots (zoom, pan, hover for details)

To run the dashboard, execute the following command in your terminal:

```bash
python mahalanobis_filter_dash.py
```

Then open your web browser and navigate to: http://127.0.0.1:8050/

## Conclusion

In this notebook, we've demonstrated how to use Plotly Express to create interactive visualizations for the Mahalanobis Distance Filter. The Dash application provides a more comprehensive and user-friendly interface for exploring the data and detecting outliers.

Key advantages of using Plotly Express and Dash:

1. **Interactive Visualizations**: Users can zoom, pan, and hover over data points to see details
2. **Real-time Updates**: The Dash app allows users to adjust parameters and see results immediately
3. **Web-based Interface**: The dashboard can be accessed through a web browser, making it easy to share
4. **Customizable**: Both Plotly Express and Dash offer extensive customization options
5. **3D Visualization**: Ability to visualize all three samples simultaneously in 3D space