## 2. Define a TestSuite

In the second phase of SDMT, we define a `TestSuite` that represents the tests the completed model must will have to pass in order to be acceptable for use in the system into which it will be integrated.

#### Initialize MLTE Context

MLTE contains a global context that manages the currently active _session_. Initializing the context tells MLTE how to store all of the artifacts that it produces.

In [6]:
import os
from mlte.session import set_context, set_store

store_path = os.path.join(os.getcwd(), "store")

set_context("IrisClassifier", "0.0.1")
set_store(f"local://{store_path}")

Creating initial custom lists at URI: local:///Users/rbrowersinning/Documents/ResearchFolders/Continuum_LTP/GitRepos/mlte/demo/simple/store
Loaded 8 qa_categories for initial list
Loaded 14 quality_attributes for initial list


#### Build a `TestSuite`

In MLTE, we define the tests that will be required for the different requirements in a `TestSuite`. Note that a new `Evidence` types (`ConfusionMatrix`) was created in this case to simplify the definition the `Validator` for that case.

Also note that, for this `TestSuite`, we are defining the (optional) `Measurement` up front. This will allow us to later automate the execution of all the test cases.

First we need to load up our `NegotiationCard`, so we can get the list of ids of its quaity attribute scenarios, that will be added to the `TestCase`s here. Those ids are the way to link the `TestCase`s to their quality attribute requirements.

In [7]:
from mlte.negotiation.artifact import NegotiationCard

card = NegotiationCard.load()
card.print_quality_scenarios()

default.card-qas_001 (Functional Correctness): The model receives a dimensions of an iris flower from the flower identification application while in normal operations, the model returns proper results, with an accuracy of 98%
default.card-qas_002 (Functional Correctenss): The model receives measurements of an iris flower from a garden from the flower identification application while in normal operations, the model returns proper results, with misclassification less than 2
default.card-qas_003 (Functional Correctness): The model receives easurements of an iris flower from a garden from the flower identification application while in normal operations, the model returns proper results, with a proper distribution
default.card-qas_004 (Resource Uilization): The model is being trained from by model developers while in development time, the model is properly trained, and requires less than 3 mb of storage.
default.card-qas_005 (Resource Utilization): The model is being trained from by model d

If we want to look at the `Quality Attribute Scenarios` or the `Quality Attributes` that are available, we can do that with these listing functions.

In [8]:
from mlte.custom_list.custom_list_names import CustomListName
from mlte.session.session import print_custom_list_entries

print_custom_list_entries(CustomListName.QUALITY_ATTRIBUTES)

Accuracy (Parent: Functional Correctness): 
Detect Out-of-Distribution (OOD) Inputs (Parent: Monitorability): 
Detect Shifts in Output (Confidence) Distribution (Parent: Monitorability): 
Inference Time on Operational Platform (Parent: Performance): 
Input Validation (Parent: Resilience): 
Input and Output Specification (Parent: Functional Correctness): 
Model Impartial to Photo Location (Parent: Fairness): 
Model Robust to Noise (Channel Loss) (Parent: Robustness): 
Model Robust to Noise (Image Blur) (Parent: Robustness): 
Reliability (Parent: Robustness): Reliability
Resource Consumption on Operational Platform (Parent: Performance): 
Resource Usage (Parent: Performance): 
Security (Parent: Trust): Security
Understanding Model Results (Parent: Interpretability): 


Now we can create our `TestSuite`, consisting of a list of `TestCases`, each of them addressing one or more Quality Attribute Scenarios from our `NegotiationCard`. When defining the `TestCase`s below, we need to set the id of the corresponding Quality Attribute Scenario we want to test in its "quality_scenarios" attribute.

In [9]:
from sklearn.metrics import accuracy_score, confusion_matrix

from mlte.measurement.external_measurement import ExternalMeasurement
from mlte.measurement.units import Units
from mlte.tests.test_case import TestCase
from mlte.tests.test_suite import TestSuite
from mlte.measurement.storage import LocalObjectSize
from mlte.measurement.cpu import LocalProcessCPUUtilization
from mlte.measurement.memory import LocalProcessMemoryConsumption
from mlte.evidence.types.real import Real
from mlte.evidence.types.image import Image

from demo.simple import measurements
from demo.simple.confusion_matrix import ConfusionMatrix

spec = TestSuite(
    test_cases=[
        TestCase(
            identifier="accuracy",
            goal="Understand if the model is useful for this case",
            quality_scenarios=["card.default-qas_001"],
            validator=Real.greater_or_equal_to(0.98),
            measurement=ExternalMeasurement(
                output_evidence_type=Real, function=accuracy_score
            ),
        ),
        TestCase(
            identifier="confusion matrix",
            goal="Understand if the model is useful for this case",
            quality_scenarios=["card.default-qas_002"],
            validator=ConfusionMatrix.misclassification_count_less_than(2),
            measurement=ExternalMeasurement(
                output_evidence_type=ConfusionMatrix, function=confusion_matrix
            ),
        ),
        TestCase(
            identifier="class distribution",
            goal="Understand if the model is useful for this case",
            quality_scenarios=["card.default-qas_003"],
            validator=Image.register_info(
                "Visual inspection is required to confirm that distribution is above 1.2%."
            ),
            measurement=ExternalMeasurement(
                output_evidence_type=Image, function=measurements.create_image
            ),
        ),
        TestCase(
            identifier="model size",
            goal="Check resource consumption",
            quality_scenarios=["card.default-qas_004"],
            validator=LocalObjectSize.get_output_type().less_than(
                3.0, Units.megabyte
            ),
            measurement=LocalObjectSize(),
        ),
        TestCase(
            identifier="training memory",
            goal="Check resource consumption",
            quality_scenarios=["card.default-qas_005"],
            validator=LocalProcessMemoryConsumption.get_output_type().average_consumption_less_than(
                60, unit=Units.megabyte
            ),
            measurement=LocalProcessMemoryConsumption(),
        ),
        TestCase(
            identifier="training cpu",
            goal="Check resource consumption",
            quality_scenarios=["card.default-qas_006"],
            validator=LocalProcessCPUUtilization.get_output_type().max_utilization_less_than(
                5.0, unit=Units.percent
            ),
            measurement=LocalProcessCPUUtilization(),
        ),
    ]
)
spec.save(parents=True, force=True)