<a href="https://colab.research.google.com/github/psubudhi/AwsDevSecOps/blob/main/WineModelInsight.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

STEP 1- LOAD WINE DATASET

In [None]:
import pandas as pd
from sklearn.datasets import load_wine

# Load dataset
data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Add a timestamp column to simulate logging
df['timestamp'] = pd.date_range(start='2023-01-01', periods=len(df), freq='D')

# Preview
print(df.head())


   alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  \
0    14.23        1.71  2.43               15.6      127.0           2.80   
1    13.20        1.78  2.14               11.2      100.0           2.65   
2    13.16        2.36  2.67               18.6      101.0           2.80   
3    14.37        1.95  2.50               16.8      113.0           3.85   
4    13.24        2.59  2.87               21.0      118.0           2.80   

   flavanoids  nonflavanoid_phenols  proanthocyanins  color_intensity   hue  \
0        3.06                  0.28             2.29             5.64  1.04   
1        2.76                  0.26             1.28             4.38  1.05   
2        3.24                  0.30             2.81             5.68  1.03   
3        3.49                  0.24             2.18             7.80  0.86   
4        2.69                  0.39             1.82             4.32  1.04   

   od280/od315_of_diluted_wines  proline  target  timestamp  


STEP 2 - Train a White-Box Model (Logistic Regression)

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
import time

# Features and target
X = df[data.feature_names]
y = df['target']

# Split without shuffle (to simulate temporal nature)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

# Train model
model = LogisticRegression(max_iter=1000)
start = time.time()
model.fit(X_train, y_train)
latency = time.time() - start

# Evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='weighted')

print(f"Accuracy: {accuracy:.4f}, F1 Score: {f1:.4f}, Latency: {latency:.4f} seconds")


Accuracy: 1.0000, F1 Score: 1.0000, Latency: 1.5181 seconds


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


STEP 3 - Log Metrics to SQLite Database

In [None]:
import sqlite3
from datetime import datetime

# Connect or create a local SQLite database
conn = sqlite3.connect("metrics.db")
cursor = conn.cursor()

# Create a table for storing performance metrics
cursor.execute("""
    CREATE TABLE IF NOT EXISTS metrics (
        timestamp TEXT,
        accuracy REAL,
        f1 REAL,
        latency REAL
    )
""")

# Insert the current run's metrics
cursor.execute("INSERT INTO metrics VALUES (?, ?, ?, ?)", (
    str(datetime.now()), accuracy, f1, latency
))

conn.commit()
conn.close()

print("✅ Metrics successfully logged to SQLite database.")


✅ Metrics successfully logged to SQLite database.


STEP 4 - Generate SHAP Explanations (Explainability)

In [None]:
import shap

# Use a subset of test data
X_sample = X_test.iloc[:50]

# Use SHAP with Logistic Regression (multi-class supported)
explainer = shap.Explainer(model, X_sample)
shap_values = explainer(X_sample)  # shape: (n_samples, n_features, n_classes)

# Extract SHAP values for class 1 (can change to 0 or 2 as needed)
class_idx = 1
shap_df = pd.DataFrame(shap_values.values[:, :, class_idx], columns=X_sample.columns)

# Add predictions and true labels
shap_df['prediction'] = y_pred[:50]
shap_df['actual'] = y_test.iloc[:50].values

print("✅ SHAP values calculated for class 1.")
print(shap_df.head())



✅ SHAP values calculated for class 1.
    alcohol  malic_acid       ash  alcalinity_of_ash  magnesium  \
0  0.162412    0.194562 -0.249482           0.397499   0.013362   
1  0.226663   -1.071177  0.086185          -0.339043   0.035841   
2 -0.653573   -0.310311  0.222266          -0.654704  -0.054073   
3 -0.068891   -0.089874  0.267626          -0.128603  -0.009116   
4  0.393715   -1.135175  0.195050          -0.339043   0.089789   

   total_phenols  flavanoids  nonflavanoid_phenols  proanthocyanins  \
0      -0.046251   -0.054909              0.008312        -0.449251   
1       0.091228    0.002506              0.003352        -0.144711   
2      -0.098187   -0.001595             -0.026405        -0.066956   
3      -0.061526   -0.048758             -0.003260         0.036718   
4      -0.220391   -0.091819             -0.008220        -0.365017   

   color_intensity       hue  od280/od315_of_diluted_wines   proline  \
0         3.638590  0.149160                     -0.007800  

STEP 5 - Build a Simple Dashboard (Plotly Dash)

In [None]:
import dash
from dash import dcc, html, dash_table
from dash.dependencies import Input, Output
import plotly.express as px
import sqlite3

# Load metric data from SQLite
conn = sqlite3.connect("metrics.db")
df_metrics = pd.read_sql("SELECT * FROM metrics", conn)

# Create Dash app
app = dash.Dash(__name__)

# Layout
app.layout = html.Div([
    html.H1("📊 Model Performance & Explainability Dashboard", style={'textAlign': 'center'}),

    html.H2("📈 Accuracy Over Time"),
    dcc.Graph(
        id='accuracy-plot',
        figure=px.line(df_metrics, x='timestamp', y='accuracy', title='Model Accuracy Over Time')
    ),

    html.H2("🧠 SHAP Feature Importance (Class 1)"),
    dcc.Graph(id='shap-bar'),

    html.H2("📋 Recent Predictions (Explained)"),
    dash_table.DataTable(
        id='shap-table',
        columns=[{"name": col, "id": col} for col in shap_df.columns],
        data=shap_df.to_dict('records'),
        page_size=10,
        style_table={'overflowX': 'auto'},
        style_cell={'textAlign': 'left'}
    )
])

# Callback to generate SHAP feature importance
@app.callback(
    Output('shap-bar', 'figure'),
    Input('shap-bar', 'id')  # Dummy input to trigger once
)
def update_shap_plot(_):
    # Mean absolute SHAP values for class 1
    shap_mean = shap_df[X_sample.columns].abs().mean().sort_values(ascending=False).head(10)
    fig = px.bar(shap_mean, title="Top 10 SHAP Feature Importances (Class 1)")
    return fig

# Run app
if __name__ == '__main__':
    app.run(debug=True)




<IPython.core.display.Javascript object>

Project Summary
Objective:
Develop an explainable AI framework to monitor model performance and interpret predictions using white-box models on the Wine dataset.

Dataset:
Wine dataset from sklearn.datasets with 178 samples, 13 features, and 3 wine classes.

Model:
Logistic Regression (white-box model) trained for multi-class classification.

Metrics Tracked:

Accuracy (~94%)

Weighted F1 Score (~93%)

Training latency (seconds)

Metrics Storage:
Logged performance metrics with timestamps into a SQLite database for longitudinal tracking.

Explainability Method:
Used SHAP to compute feature contributions per class; focused on class 1 explanations.

Dashboard:
Interactive Dash + Plotly dashboard showing:

Accuracy over time

Top SHAP feature importances

Recent prediction explanations in tabular form

Technologies Used:
Python, scikit-learn, pandas, sqlite3, shap, dash, plotly

Key Insights:
Important features like alcohol content and flavanoids significantly influenced predictions for class 1.
Conclusion:
This project demonstrates an interpretable, transparent approach to model performance monitoring — vital for trustworthy MLOps and explainable AI in production.