## Implementing ML Model Monitoring Pipelines

### Model Performance Drift:
**Description**: Setup a monitoring pipeline to track key performance metrics (e.g., accuracy, precision) of an ML model over time using a monitoring tool or dashboard.

In [4]:
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from datetime import datetime
import os

def monitor_model_performance(predictions_df, metric_log_file='metrics_log.csv'):
    """
    Computes and logs performance metrics of an ML model.

    Parameters:
        predictions_df (pd.DataFrame): Must include 'y_true' and 'y_pred' columns.
        metric_log_file (str): Filepath to append the metrics log.

    Returns:
        dict: Dictionary of computed metrics.
    """

    # Check for required columns
    required_columns = {'y_true', 'y_pred'}
    if not required_columns.issubset(predictions_df.columns):
        raise ValueError(f"Missing required columns: {required_columns - set(predictions_df.columns)}")

    # Check if columns are numeric or boolean
    for col in required_columns:
        if not pd.api.types.is_numeric_dtype(predictions_df[col]) and not pd.api.types.is_bool_dtype(predictions_df[col]):
            raise TypeError(f"Column '{col}' must contain numeric or boolean values")

    # Check for empty dataframe
    if predictions_df.empty:
        raise ValueError("The input DataFrame is empty")

    # Drop NA rows and warn
    if predictions_df[required_columns].isnull().values.any():
        predictions_df = predictions_df.dropna(subset=required_columns)
        if predictions_df.empty:
            raise ValueError("All rows contain NaNs in required columns")

    # Compute metrics
    metrics = {
        'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
        'accuracy': accuracy_score(predictions_df['y_true'], predictions_df['y_pred']),
        'precision': precision_score(predictions_df['y_true'], predictions_df['y_pred'], zero_division=0),
        'recall': recall_score(predictions_df['y_true'], predictions_df['y_pred'], zero_division=0),
        'f1_score': f1_score(predictions_df['y_true'], predictions_df['y_pred'], zero_division=0)
    }

    # Save metrics to CSV file
    metrics_df = pd.DataFrame([metrics])
    if os.path.exists(metric_log_file):
        metrics_df.to_csv(metric_log_file, mode='a', header=False, index=False)
    else:
        metrics_df.to_csv(metric_log_file, index=False)

    return metrics


### Feature Distribution Drift:
**Description**: Monitor the distribution of your input features in deployed models to detect any significant shifts from training data distributions.

In [5]:
# write your code from here

### Anomaly Detection in Predictions:
**DEscription**: Implement an anomaly detection mechanism to flag unusual model
predictions. Simulate anomalies by altering input data.

In [6]:
# write your code from here