## 2f. Evidence - Functional Correctness - Accuracy QAS Measurements

Measure accuracy of the model.

### Initialize MLTE Context

MLTE contains a global context that manages the currently active _session_. Initializing the context tells MLTE how to store all of the artifacts that it produces. This import will also set up global constants related to folders and model to use.

In [1]:
# Sets up context for the model being used, sets up constants related to folders and model data to be used.
from demo.scenarios.session import *

Creating initial custom lists at URI: local:///Users/rbrowersinning/Documents/ResearchFolders/Continuum_LTP/GitRepos/mlte/demo/scenarios/../store
Loaded 7 qa_categories for initial list
Loaded 30 quality_attributes for initial list
Creating sample catalog at URI: StoreType.LOCAL_FILESYSTEM:local:///Users/rbrowersinning/Documents/ResearchFolders/Continuum_LTP/GitRepos/mlte/demo/scenarios/../store
Loading sample catalog entries.
Loaded 9 entries for sample catalog.


### Helper Functions

Prepare all functions and data for the measurements.

In [2]:
from demo.scenarios import garden
import numpy as np


def load_data(data_folder: str):
    """Loads all garden data results and taxonomy categories."""
    df_results = garden.load_base_results(data_folder, "predictions_test.csv")
    df_results.head()

    # Load the taxonomic data and merge with results.
    df_info = garden.load_taxonomy(data_folder)
    df_results.rename(columns={"label": "Label"}, inplace=True)
    df_all = garden.merge_taxonomy_with_results(df_results, df_info)

    return df_results


def calculate_model_performance_basic_acc(df_results):
    """Get basic accucray of model across the entire garden"""
    n, d = df_results.shape
    model_performance_acc = np.sum(df_results["model correct"]) / n

    return model_performance_acc

In [3]:
# Prepare the data. For this section, instead of executing the model, we will use CSV files containing the results of an already executed run of the model.
df_results = load_data(DATASETS_DIR)

102 102 102


In [4]:
n, d = df_results.shape

In [5]:
calculate_model_performance_basic_acc(df_results)

np.float64(0.947265625)

### Measurements

Finally, we execute the measurements and store the results.

In [6]:
from mlte.measurement.external_measurement import ExternalMeasurement
from mlte.evidence.types.real import Real

# Evaluate, identifier has to be the same one defined in the TestSuite.
measurement = ExternalMeasurement(
    "overall model accuracy", Real, calculate_model_performance_basic_acc
)
result = measurement.evaluate(df_results)

# Inspect value
print(result)

# Save to artifact store
result.save(force=True)

0.947265625


ArtifactModel(header=ArtifactHeaderModel(identifier='evidence.overall model accuracy', type='evidence', timestamp=1759160696, creator=None, level='version'), body=EvidenceModel(artifact_type=<ArtifactType.EVIDENCE: 'evidence'>, metadata=EvidenceMetadata(test_case_id='overall model accuracy', measurement=MeasurementMetadata(measurement_class='mlte.measurement.external_measurement.ExternalMeasurement', output_class='mlte.evidence.types.real.Real', additional_data={'function': '__main__.calculate_model_performance_basic_acc'})), evidence_class='mlte.evidence.types.real.Real', value=RealValueModel(evidence_type=<EvidenceType.REAL: 'real'>, real=0.947265625, unit=None)))