# Developing a Binned Metrics Monitor for ModelOp Center

This notebook is a "development sandbox" for creating and testing your `binned_metrics` monitor *before* you deploy it to ModelOp Center (MOC).

**The workflow is:**
1.  **Load Functions:** We'll load all the helper functions and MOC-wrapper functions (like `init` and `metrics`) into the notebook's memory.
2.  **Test Components:** We'll test the functions individually, from the "core engine" to the "universal function," using small, hard-coded data.
3.  **Simulate MOC:** We'll run a final simulation that mimics exactly what MOC does: call `init()` with parameters, then call `metrics()` with a DataFrame.
4.  **Deploy:** Once you are confident the logic is correct and the final JSON output is perfect, you can copy the code from Cell 3 into your `binned_metrics.py` file for deployment.

## Part 1: Load The Full Monitor Script

The code cell below contains the *entire* Python script (`binned_metrics.py`). We run this cell **once** to load all the functions into our notebook's memory, including:

* `_safe_divide()`
* `_calculate_metrics()`
* `_apply_metrics_to_bin()`
* `calculate_binned_metrics()`
* `init()`
* `_format_df_for_timeline_graph()`
* `metrics()`

(Note: The `main()` function is *not* included, as we will be running our tests directly in the notebook cells.)

In [None]:
import json
import logging
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix

# --- --- --- --- --- --- --- --- --- --- 
# --- 1. HELPER & CORE FUNCTIONS ---
# --- --- --- --- --- --- --- --- --- --- 

def _safe_divide(numerator, denominator, nan_val=np.nan):
    """
    Prevents division by zero or division involving NaN values.
    """
    if denominator == 0 or np.isnan(denominator) or np.isnan(numerator):
        return nan_val
    return numerator / denominator

def _calculate_metrics(y_true, y_pred, requested_metrics):
    """
    Calculates all specified metrics from a confusion matrix for a given bin.
    """
    results = {}

    if len(y_true) == 0:
        for metric in requested_metrics:
            results[metric] = np.nan
        return results

    try:
        tn, fp, fn, tp = confusion_matrix(y_true, y_pred, labels=[0, 1]).ravel()
    except ValueError:
        for metric in requested_metrics:
            results[metric] = np.nan
        return results

    available_metrics = {}
    P = tp + fn
    N = tn + fp
    PP = tp + fp
    PN = tn + fn
    Total = P + N
    
    available_metrics['SEN'] = _safe_divide(tp, P) if set(['SEN','F1','INF','J','BM','PT','TS','CSI','DOR']).intersection(set(requested_metrics)) else None
    available_metrics['SP'] = _safe_divide(tn, N) if set(['SP','INF','J','BM','PT','DOR']).intersection(set(requested_metrics)) else None
    available_metrics['PPV'] = _safe_divide(tp, PP) if set(['PPV','F1','MK']).intersection(set(requested_metrics)) else None
    available_metrics['NPV'] = _safe_divide(tn, PN) if set(['NPV','MK']).intersection(set(requested_metrics)) else None
    available_metrics['FNR'] = _safe_divide(fn, P) if 'FNR' in requested_metrics else None
    available_metrics['FPR'] = _safe_divide(fp, N) if 'FPR' in requested_metrics else None
    available_metrics['FDR'] = _safe_divide(fp, PP) if 'FDR' in requested_metrics else None
    available_metrics['FOR'] = _safe_divide(fn, PN) if 'FOR' in requested_metrics else None
    available_metrics['ACC'] = _safe_divide(tp + tn, Total) if 'ACC' in requested_metrics else None
    available_metrics['ERR'] = _safe_divide(fp + fn, Total) if 'ERR' in requested_metrics else None
    
    if 'F1' in requested_metrics and available_metrics['PPV'] is not None and available_metrics['SEN'] is not None:
        precision = available_metrics['PPV'] 
        sensitivity = available_metrics['SEN'] 
        available_metrics['F1'] = _safe_divide(2 * precision * sensitivity, precision + sensitivity)
    else:
        available_metrics['F1'] = None
    
    available_metrics['PR'] = _safe_divide(P, Total) if 'PR' in requested_metrics else None
    
    if 'MCC' in requested_metrics:
        mcc_num = (tp * tn) - (fp * fn)
        mcc_den = np.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))
        available_metrics['MCC'] = _safe_divide(mcc_num, mcc_den)
    else:
        available_metrics['MCC'] = None

    available_metrics['INF'] = available_metrics['SEN'] + available_metrics['SP'] - 1 if set(['INF','J','BM']).intersection(set(requested_metrics)) and available_metrics['SEN'] is not None and available_metrics['SP'] is not None else None
    available_metrics['J'] = available_metrics['INF'] if 'J' in requested_metrics else None
    available_metrics['BM'] = available_metrics['INF'] if 'BM' in requested_metrics else None
    available_metrics['MK'] = available_metrics['PPV'] + available_metrics['NPV'] - 1 if 'MK' in requested_metrics and available_metrics['PPV'] is not None and available_metrics['NPV'] is not None else None
    
    if 'PT' in requested_metrics and available_metrics['SEN'] is not None and available_metrics['SP'] is not None:
            pt_num = (np.sqrt(available_metrics['SEN'] * (1 - available_metrics['SP'])) + available_metrics['SP'] - 1)
            pt_den = (available_metrics['SEN'] + available_metrics['SP'] - 1)
            available_metrics['PT'] = _safe_divide(pt_num, pt_den)
    else:
        available_metrics['PT'] = None

    available_metrics['TS'] = _safe_divide(tp, tp + fn + fp) if set(['TS','CSI']).intersection(set(requested_metrics)) else None
    available_metrics['CSI'] = available_metrics['TS'] if 'CSI' in requested_metrics else None

    if 'DOR' in requested_metrics:
        if tp == 0 or tn == 0 or fp == 0 or fn == 0:
            available_metrics['DOR'] = np.nan
        else:
            dor_num = (tp * tn)
            dor_den = (fp * fn)
            available_metrics['DOR'] = _safe_divide(dor_num, dor_den)

    for metric in requested_metrics:
        results[metric] = available_metrics.get(metric, np.nan)

    return results

def _apply_metrics_to_bin(df_binned, requested_metrics):
    """
    Pandas .apply() helper function.
    """
    if df_binned.empty:
        return pd.Series(index=requested_metrics, data=np.nan, dtype=float)

    metrics_dict = _calculate_metrics(df_binned['y_true'],
                                      df_binned['y_pred'],
                                      requested_metrics)
    
    return pd.Series(metrics_dict)

def calculate_binned_metrics(df, timestamp_col, bins, 
                             label_col, label_true, label_false,
                             score_col, score_true, score_false,
                             metrics=None, numeric_aggregations=None, logger=None):
    """
    Calculates specified performance metrics and/or numeric aggregations
    over different time bins.
    """
    
    df_copy = df.copy()
    log_func = logger.warning if logger else print
    
    if not metrics and not numeric_aggregations:
        log_func("Warning: No 'metrics' or 'numeric_aggregations' specified. Returning empty results.")
        return {bin_freq: pd.DataFrame() for bin_freq in bins}

    if metrics:
        log_func("Setting up for performance metric calculation...")
        
        if not label_col or not score_col:
            log_func("Error: 'label_col' and 'score_col' are required when 'metrics' are requested.")
            raise ValueError("'label_col' and 'score_col' are required when 'metrics' are requested.")

        if label_col not in df_copy.columns:
            log_func(f"Error: Label column '{label_col}' not found in DataFrame.")
            raise KeyError(f"Label column '{label_col}' not found in DataFrame columns: {list(df_copy.columns)}")
        
        if score_col not in df_copy.columns:
            log_func(f"Error: Score column '{score_col}' not found in DataFrame.")
            raise KeyError(f"Score column '{score_col}' not found in DataFrame columns: {list(df_copy.columns)}")

        log_func(f"Mapping actual column '{label_col}' using '{label_true}': 1, '{label_false}': 0")
        df_copy['y_true'] = np.where(
            df_copy[label_col] == label_true, 
            1, 
            np.where(df_copy[label_col] == label_false, 0, np.nan)
        )
        
        log_func(f"Mapping predicted column '{score_col}' using '{score_true}': 1, '{score_false}': 0")
        df_copy['y_pred'] = np.where(
            df_copy[score_col] == score_true,
            1,
            np.where(df_copy[score_col] == score_false, 0, np.nan)
        )

        if df_copy['y_true'].isnull().any():
            unmapped_vals = df_copy[df_copy['y_true'].isnull()][label_col].unique()
            log_func(f"Warning: Found unmapped values in actual column '{label_col}': {unmapped_vals}. "
                     f"These rows will be ignored for metric calculations.")
        if df_copy['y_pred'].isnull().any():
            unmapped_vals = df_copy[df_copy['y_pred'].isnull()][score_col].unique()
            log_func(f"Warning: Found unmapped values in predicted column '{score_col}': {unmapped_vals}. "
                     f"These rows will be ignored for metric calculations.")
        
        metrics_df_copy = df_copy.dropna(subset=['y_true', 'y_pred'])
        
        if metrics_df_copy.empty and not numeric_aggregations:
            log_func("Warning: No valid data remaining after mapping for metrics. Returning empty results.")
            return {bin_freq: pd.DataFrame(columns=metrics) for bin_freq in bins}
            
        metrics_df_copy['y_true'] = metrics_df_copy['y_true'].astype(int)
        metrics_df_copy['y_pred'] = metrics_df_copy['y_pred'].astype(int)
    else:
        metrics_df_copy = pd.DataFrame() 

    try:
        log_func(f"Converting timestamp column '{timestamp_col}' to datetime objects.")
        df_copy[timestamp_col] = pd.to_datetime(df_copy[timestamp_col])
        df_copy = df_copy.set_index(timestamp_col)
        
        if not metrics_df_copy.empty:
            metrics_df_copy[timestamp_col] = pd.to_datetime(metrics_df_copy[timestamp_col])
            metrics_df_copy = metrics_df_copy.set_index(timestamp_col)
            
    except Exception as e:
        log_func(f"Error: Could not parse timestamp column '{timestamp_col}'. Error: {e}")
        raise ValueError(f"Could not parse timestamp column '{timestamp_col}'. "
                         f"Ensure it is a valid datetime format. Error: {e}")

    log_func(f"Starting calculation loop for bins: {bins}")
    all_binned_metrics = {}
    
    for bin_freq in bins:
        bin_results_dfs = [] 
        
        if metrics:
            if metrics_df_copy.empty:
                 log_func(f"Warning: 'metrics' were requested but no valid data rows were found after mapping. Skipping performance metrics for bin '{bin_freq}'.")
            else:
                try:
                    log_func(f"Resampling performance metrics for bin '{bin_freq}'...")
                    metrics_df = metrics_df_copy.resample(bin_freq).apply(
                        _apply_metrics_to_bin, 
                        requested_metrics=metrics
                    )
                    metrics_df = metrics_df.dropna(how='all')
                    
                    if not metrics_df.empty:
                        bin_results_dfs.append(metrics_df)
                        
                except ValueError as e:
                    log_func(f"Warning: Invalid bin frequency string '{bin_freq}' for metrics. Skipping. Error: {e}")
                except Exception as e:
                    log_func(f"An unexpected error occurred during metrics resampling with bin '{bin_freq}': {e}")

        if numeric_aggregations:
            try:
                valid_agg_cols = {col: agg for col, agg in numeric_aggregations.items() if col in df_copy.columns}
                missing_cols = set(numeric_aggregations.keys()) - set(valid_agg_cols.keys())
                
                if missing_cols:
                    log_func(f"Warning: For bin '{bin_freq}', the following columns for numeric aggregation were not found and will be skipped: {missing_cols}")

                if valid_agg_cols:
                    log_func(f"Resampling numeric aggregations for bin '{bin_freq}': {valid_agg_cols}")
                    agg_df = df_copy.resample(bin_freq).agg(valid_agg_cols)
                    agg_df = agg_df.dropna(how='all')
                    if not agg_df.empty:
                        bin_results_dfs.append(agg_df)
                else:
                    log_func(f"Warning: No valid columns found for numeric aggregation for bin '{bin_freq}'.")

            except (AttributeError, TypeError) as e:
                log_func(f"Warning: Invalid aggregation function provided in 'numeric_aggregations' for bin '{bin_freq}'. Skipping. Error: {e}")
            except Exception as e:
                log_func(f"An unexpected error occurred during numeric aggregation with bin '{bin_freq}': {e}")

        if not bin_results_dfs:
            all_binned_metrics[bin_freq] = pd.DataFrame()
        else:
            all_binned_metrics[bin_freq] = pd.concat(bin_results_dfs, axis=1)

    log_func("Binned calculations complete.")
    return all_binned_metrics


# --- --- --- --- --- --- --- --- --- --- 
# --- 2. MODELOP CENTER FUNCTIONS ---
# --- --- --- --- --- --- --- --- --- --- 

# Set up a global logger for init() to use
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

# modelop.init
def init(init_param):
    """
    ModelOp init function. Sets global variables from job parameters.
    """
    global logger
    logger.info("Initializing binned metrics job...")

    try:
        job = json.loads(init_param.get("rawJson", "{}"))
    except Exception as e:
        logger.warning(f"Could not parse 'rawJson' in init_param. Using all default values. Error: {e}")
        job = {}

    global TIMESTAMP_COLUMN, LABEL_COLUMN_NAME, LABEL_FALSE_VALUE, \
           LABEL_TRUE_VALUE, SCORE_COLUMN_NAME, SCORE_FALSE_VALUE, \
           SCORE_TRUE_VALUE, METRICS_TO_CALC, BINS_TO_CALC, NUMERIC_AGGS_TO_CALC

    job_params = job.get("jobParameters", {})
    
    try:
        TIMESTAMP_COLUMN = job_params["TIMESTAMP_COLUMN"]
        logger.info(f"Loaded TIMESTAMP_COLUMN from job parameters: '{TIMESTAMP_COLUMN}'")
    except Exception:
        TIMESTAMP_COLUMN = 'Date'
        logger.info(f"Using default TIMESTAMP_COLUMN: '{TIMESTAMP_COLUMN}'")
    
    try:
        LABEL_COLUMN_NAME = job_params["LABEL_COLUMN_NAME"]
        logger.info(f"Loaded LABEL_COLUMN_NAME from job parameters: '{LABEL_COLUMN_NAME}'")
    except Exception:
        LABEL_COLUMN_NAME = 'label'
        logger.info(f"Using default LABEL_COLUMN_NAME: '{LABEL_COLUMN_NAME}'")

    try:
        LABEL_FALSE_VALUE = job_params["LABEL_FALSE_VALUE"]
        logger.info(f"Loaded LABEL_FALSE_VALUE from job parameters: '{LABEL_FALSE_VALUE}'")
    except Exception:
        LABEL_FALSE_VALUE = False
        logger.info(f"Using default LABEL_FALSE_VALUE: {LABEL_FALSE_VALUE}")
        
    try:
        LABEL_TRUE_VALUE = job_params["LABEL_TRUE_VALUE"]
        logger.info(f"Loaded LABEL_TRUE_VALUE from job parameters: '{LABEL_TRUE_VALUE}'")
    except Exception:
        LABEL_TRUE_VALUE = True
        logger.info(f"Using default LABEL_TRUE_VALUE: {LABEL_TRUE_VALUE}")


## Part 2: Local Development & Testing

Now that all our functions are loaded (from running the cell above), we can test them interactively. This is the main development loop:

1.  Write a small test (like in the cells below).
2.  Run the test and check the output.
3.  If it's wrong, modify the code in the large **Cell 3**.
4.  Re-run **Cell 3** to load your changes.
5.  Re-run your test cell.

### Step 2.1: Test the Core Engine (`_calculate_metrics`)

This is the lowest-level unit test. It checks the raw math. Does the function correctly calculate metrics from two simple lists of 0s and 1s?

In [None]:
# --- Test Data ---
# Simple, hand-calculated example
# TN = 2 (idx 0, 5)
# FP = 1 (idx 3)
# FN = 1 (idx 2)
# TP = 2 (idx 1, 4)
y_true_test = [0, 1, 1, 0, 1, 0] # Actuals
y_pred_test = [0, 1, 0, 1, 1, 0] # Predictions

# --- Expected Results ---
# P = 3 (TP+FN), N = 3 (TN+FP)
# SEN (TP/P) = 2/3 = 0.666
# SP (TN/N) = 2/3 = 0.666
# ACC (TP+TN)/Total = (2+2)/6 = 4/6 = 0.666

# --- Requested Metrics ---
metrics_to_test = ['SEN', 'SP', 'ACC', 'F1']

# --- Run Test ---
print("Testing _calculate_metrics...")
results = _calculate_metrics(y_true_test, y_pred_test, metrics_to_test)

# --- Check Output ---
print(f"Calculated Metrics:\n{results}")

# MOC DEV CONTEXT:
# If this test fails, something is wrong with your fundamental metric
# calculations. Fix this function in Cell 3 before moving on.

### Step 2.2: Test the Universal Function (`calculate_binned_metrics`)

This is the **most important test**. Here, we test the *entire* data processing pipeline:

1.  Does it correctly find and use the timestamp column?
2.  Does it correctly *map* your custom values (e.g., `True`, `'NO'`) to 0s and 1s? (This is where your original `KeyError` came from!)
3.  Does it correctly handle `NaN`s (unmapped values)?
4.  Does it correctly bin the data?
5.  Does it correctly calculate aggregations (like `mean`)?

We build a sample DataFrame just like the one MOC will provide.

In [None]:
# --- 1. Build Sample Data ---
# This data simulates the 'baseline' DataFrame MOC will provide
data = {
    'Date': [
        '2023-01-01', '2023-01-02', '2023-01-08', '2023-01-09', # Jan 2023
        '2023-02-03', '2023-02-04', '2023-02-15', '2023-02-16', # Feb 2023
        '2024-01-01', '2024-01-02', '2024-01-05'  # Jan 2024 (with an unmapped 'N/A')
    ],
    'ground_truth': [
        True, False, True, False,  
        True, False, True, False, 
        True, False, 'N/A'        
    ],
    'model_prediction': [
        'YES', 'NO', 'NO', 'YES',   # Jan: 1 TP, 1 TN, 1 FN, 1 FP
        'YES', 'NO', 'YES', 'NO',   # Feb: 2 TP, 2 TN
        'YES', 'NO', 'YES'          # Jan 2024: 1 TP, 1 TN, 1 unmapped
    ],
    'loan_amount': [
        1000, 2000, 1500, 3000,
        500, 1000, 2500, 4000,
        8000, 9000, 5000
    ]
}
test_df = pd.DataFrame(data)

# --- 2. Define Job Parameters (as variables) ---
# These simulate the 'jobParameters' you'd set in the MOC UI
p_timestamp_col = 'Date'
p_label_col = 'ground_truth'
p_label_true = True
p_label_false = False
p_score_col = 'model_prediction'
p_score_true = 'YES'
p_score_false = 'NO'
p_metrics = ['SEN', 'SP', 'ACC', 'F1']
p_bins = ['MS', 'YS'] # Monthly, Yearly
p_aggs = {'loan_amount': 'mean'}

# MOC DEV CONTEXT:
# This is your main development loop! 
# Change the data, change the parameters, and re-run this cell
# to test every edge case you can think of (e.g., empty bins,
# unmapped values, all-positive data, etc.)

# --- 3. Run Test ---
print("Testing calculate_binned_metrics...")
binned_results = calculate_binned_metrics(
    df=test_df,
    timestamp_col=p_timestamp_col,
    bins=p_bins,
    label_col=p_label_col,
    label_true=p_label_true,
    label_false=p_label_false,
    score_col=p_score_col,
    score_true=p_score_true,
    score_false=p_score_false,
    metrics=p_metrics,
    numeric_aggregations=p_aggs,
    logger=logging.getLogger() # Use the notebook's logger
)

# --- 4. Check Output ---
print("\n--- Monthly Results (MS) ---")
print(binned_results['MS'])
# Expected for 2023-01: SEN=0.5, SP=0.5, ACC=0.5, loan_amount=1875
# Expected for 2023-02: SEN=1.0, SP=1.0, ACC=1.0, loan_amount=2000
# Expected for 2024-01: SEN=1.0, SP=1.0, ACC=1.0, loan_amount=7333.33

print("\n--- Yearly Results (YS) ---")
print(binned_results['YS'])
# Expected for 2023: SEN=0.75, SP=0.75, ACC=0.75, loan_amount=1937.5
# Expected for 2024: SEN=1.0, SP=1.0, ACC=1.0, loan_amount=7333.33

### Step 2.3: Simulate the Full ModelOp Center Call

Finally, we simulate the *exact* process MOC uses. This test confirms:

1.  Does your `init()` function correctly parse the `init_param` JSON?
2.  Does it correctly set all the global variables?
3.  Does your `metrics()` function correctly read those global variables and call `calculate_binned_metrics`?
4.  Is the *final JSON output* correctly formatted for the MOC timeline graphs?

In [None]:
# --- 1. Define Sample init_param ---
# This JSON string is exactly what MOC provides to init()
test_init_params = {
    "rawJson": json.dumps({
        "jobParameters": {
            "TIMESTAMP_COLUMN": "Date",
            "LABEL_COLUMN_NAME": "ground_truth",
            "LABEL_FALSE_VALUE": False,
            "LABEL_TRUE_VALUE": True,
            "SCORE_COLUMN_NAME": "model_prediction",
            "SCORE_FALSE_VALUE": "NO",
            "SCORE_TRUE_VALUE": "YES",
            "METRICS_TO_CALC": ["SEN", "SP", "ACC"],
            "BINS_TO_CALC": ["W", "MS", "YS"],
            "NUMERIC_AGGS_TO_CALC": {"loan_amount": "mean"}
        }
    })
}

# --- 2. Run init() ---
# This will set the global variables inside the notebook's memory
print("--- Calling init() ---")
init(test_init_params)
print("Global variables set.")

# --- 3. Create baseline_df ---
# We can re-use the same test_df from the previous step
baseline_df = test_df.copy()
print(f"Using baseline_df with {len(baseline_df)} rows.")

# --- 4. Run metrics() ---
# 'metrics' is a generator, so we use next() to get the first yield
print("--- Calling metrics() ---")
final_moc_output = next(metrics(baseline_df))

# --- 5. Check Final Output ---
# This is the *exact* JSON that will be saved in MOC and
# used to render the UI graphs.
print("\n--- FINAL MOC JSON OUTPUT ---")
print(json.dumps(final_moc_output, indent=2))

# MOC DEV CONTEXT:
# Check this JSON carefully.
# - Are 'firstPredictionDate' and 'lastPredictionDate' correct?
# - Do the graph titles look right?
# - Is the 'data' key populated? (e.g., "SEN": [["2023-01-01", 0.5]])
# If this JSON is correct, your monitor is ready.

## Conclusion & Next Steps

You have successfully tested the monitor at three levels:
1.  **Unit Test** (`_calculate_metrics`): The core math is correct.
2.  **Integration Test** (`calculate_binned_metrics`): The data processing, mapping, and binning logic is correct.
3.  **Simulation Test** (`init` + `metrics`): The MOC-wrapper logic is correct, and the final JSON output is properly formatted.

**To deploy:**
1.  Make any final changes to the code in **Cell 3**.
2.  Copy the *entire contents* of **Cell 3**.
3.  Paste this code into your `binned_metrics.py` file.
4.  Add this file to your ModelOp Center monitor repository along with your `readme.md` and `required_assets.json`.