## 2m. Evidence - Domain Adaptability QAS Measurements

Evidence collected in this section checks for the Domain Adaptability scenario defined in the previous step. Note that some functions will be loaded from external Python files.

The cell below must contain JSON data about this evidence that will be used to automatically populate the sample test catalog.

In [None]:
{
    "tags": ["Computer Vision", "Object detection"],
    "quality_attribute": "Domain Adaptability",
    "description": "Assessing the effect of input data from a new domain on model performance",
    "inputs": "Distribution of model inferences of images taken in a few operating domain; distribution of model inferences from images taken from the origional operating domain",
    "output": "ANOVA test results",
}

### Initialize MLTE Context

MLTE contains a global context that manages the currently active _session_. Initializing the context tells MLTE how to store all of the artifacts that it produces. This import will also set up global constants related to folders and model to use.

In [None]:
# Sets up context for the model being used, sets up constants related to folders and model data to be used.
from session import *

### Set up scenario test case

In [None]:
from mlte.negotiation.artifact import NegotiationCard

card = NegotiationCard.load()
qa = 13
print(card.quality_scenarios[qa])

**A Specific test case generated from the scenario:**

**Data and Data Source:**	The original test data set and a test data set from the new domain.

**Measurement and Condition:**	The effect of the new domain on model performance will be assessed using ANOVA on each label set, with significance at p-value < 0.05. 

**Context:**	Normal Operation

### Helper Functions
General functions and external imports.

In [None]:
# General functions.

from utils import garden
import pandas as pd
from os import path
from scipy.stats import f_oneway


def load_data(data_folder: str, data_file: str):
    """Loads all garden data results and taxonomy categories."""
    df_results = garden.load_base_results(data_folder, data_file)
    df_results.head()

    # Load the taxonomic data and merge with results.
    df_info = garden.load_taxonomy(data_folder)
    df_results.rename(columns={"label": "Label"}, inplace=True)
    df_all = garden.merge_taxonomy_with_results(df_results, df_info)

    return df_info, df_all

In [None]:
# Prepare the data. For this section, instead of executing the model, we will use CSV files containing the results of an already executed run of the model.

df_info, df_test = load_data(DATASETS_DIR, "0abcflmn_cv_output.csv")
df_info, df_new = load_data(
    DATASETS_DIR, "0n_cv_output_domain_adaptability.csv"
)
df_test["dataset"] = "DALL-E-2"
df_new["dataset"] = "Test"
df_all = pd.concat([df_new, df_test], ignore_index=True)

In [None]:
valid_labels = (
    df_all.groupby(["Label", "dataset"]).size().unstack().index.tolist()
)

In [None]:
def run_anova_for_label(df, label):
    # Perform ANOVA for a specific label
    subset = df[df["Label"] == label]
    test_vals = subset[subset["dataset"] == "Test"]["label_prob"]
    dalle_vals = subset[subset["dataset"] == "DALL-E-2"]["label_prob"]

    f_stat, p_val = f_oneway(test_vals, dalle_vals)

    return {
        "label": label,
        "f_stat": f_stat,
        "p_val": p_val,
    }


def run_anova(df_all):
    anova_results = [
        run_anova_for_label(df_all, label) for label in valid_labels
    ]
    results_df = pd.DataFrame(anova_results)
    results_df.sort_values(by="label", inplace=True)
    results_df.set_index("label", inplace=True)
    return results_df


def run_anova2(df_all):
    res_df = run_anova(df_all)

    return res_df.to_numpy()

In [None]:
# Run ANOVA

results_df = run_anova(df_all)

results_df

In [None]:
run_anova(df_all)

### Measurements

In this example, we evaluate the output from our custom `calculate_multiple_anova` using an `ExternalMeasurement` class, and store the result.

In [None]:
from mlte.evidence.types.array import Array
from mlte.measurement.external_measurement import ExternalMeasurement
from evidence.multiple_ranksums import MultipleRanksums


def calculate_multiple_anova(df_all):
    evid: list = []
    # print(df_all.columns)

    labels = df_all.Label.unique()

    for lab in labels:

        subset = df_all[df_all["Label"] == lab]
        test_vals = subset[subset["dataset"] == "Test"]["label_prob"]
        dalle_vals = subset[subset["dataset"] == "DALL-E-2"]["label_prob"]

        # f_oneway(test_vals, dalle_vals)

        anova_measurement = ExternalMeasurement(
            f"label {lab}",
            Array,
            f_oneway,
        )
        anova: Array = anova_measurement.evaluate(
            test_vals,
            dalle_vals,
        )

        evid.append({anova.identifier: anova.array})
    return evid


multiple_anova_meas = ExternalMeasurement(
    "running in new domain",
    MultipleRanksums,
    calculate_multiple_anova,
)
multiple_anova: MultipleRanksums = multiple_anova_meas.evaluate(df_all)

multiple_anova.save(force=True)