## 2a. Evidence - Explainability QAS Measurements

Evidence collected in this section checks for the Explainability QAS scenario defined in the previous step. Note that some functions and data will be loaded from external Python files.

The cell below must contain JSON data about this evidence that will be used to automatically populate the sample test catalog.

In [None]:
{
    "tags": ["LLM", "Content Generation"],
    "quality_attribute": "Explainability",
    "description": "The model should return, when prompted, a human-understandable explination of the review. This case attaches the generated image which needs to be manually validated for test pass/fail",
    "inputs": "A request for a review in a crafted prompt, with the supporting material (employee statement, goals and objectives, and manager comments)",
    "output": "Response explaining how and if the LLM agrees with the prior assessment, which needs to be manually validated for pass/fail",
}

### Initialize MLTE Context

MLTE contains a global context that manages the currently active _session_. Initializing the context tells MLTE how to store all of the artifacts that it produces. This import will also set up global constants related to folders and model to use.

In [None]:
# Sets up context for the model being used, sets up constants related to folders and model data to be used.

from session import *

### Set up scenario test case 

In [None]:
from mlte.negotiation.artifact import NegotiationCard

card = NegotiationCard.load()
qa = 0
print(card.quality_scenarios[qa])

**A Specific test case generated from the scenario:**

**Data and Data Source:**	The LLM receives a prompt from the manager asking for an employee evaluation, and the original test data set can be used to mimic this request.

**Measurement and Condition:**	When queried for an explination of the score, the LLM will return an explination how the score is supported by the evidence, in this case the employee's self review and goals and objectives and manager's notes.  

**Context:**	Normal Operation 


### Gather evidence

In [None]:
import pandas as pd
import os

# create list of file names for data
# Read the CSV with the correct encoding
# initial input to ReviewPro
input_df = pd.read_csv(
    os.path.join(DATASETS_DIR, "2abc_llm_input_functional_correctness.csv")
)

# initial output of ReviewPro
output_df = pd.read_csv(
    os.path.join(DATASETS_DIR, "2abc_llm_output_functional_correctness.csv")
)

# response when asked to explain evaluation score
response_df = pd.read_csv(
    os.path.join(DATASETS_DIR, "2a_llm_output_explainability.csv")
)
output_df.drop(columns=["Unnamed: 0"], inplace=True)

# Preview the cleaned dataframe
print(input_df.columns)
output_df.columns

### Save evidence (the LLM explination of the responses) to the specific scenario

In [None]:
def pull_explination(filename):
    """Runs the model and gets the log."""
    print(filename)
    response_df = pd.read_csv(filename)
    print(response_df.columns)

    return response_df.response.tolist()

In [None]:
from mlte.measurement.external_measurement import ExternalMeasurement
from mlte.evidence.types.array import Array


# Save to MLTE store.
evi_collector = ExternalMeasurement(
    "LLM provides evidence", Array, pull_explination
)

evi = evi_collector.evaluate(
    os.path.join(DATASETS_DIR, "2a_llm_output_explainability.csv")
)
evi.save(force=True)