## 2f. Evidence - Time Behaviour QAS Measurements

Evidence collected in this section checks for the time behaviour QAS scenario defined in the previous step. Note that some functions and data will be loaded from external Python files.

### Initialize MLTE Context

MLTE contains a global context that manages the currently active _session_. Initializing the context tells MLTE how to store all of the artifacts that it produces. This import will also set up global constants related to folders and model to use.

In [None]:
# Sets up context for the model being used, sets up constants related to folders and model data to be used.
from session import *
from session_LLMinfo import *

### Set up scenario test case

In [None]:
from mlte.negotiation.artifact import NegotiationCard

card = NegotiationCard.load()
qa = 5
print(card.quality_scenarios[qa].identifier)
print(card.quality_scenarios[qa].quality)
print(
    card.quality_scenarios[qa].stimulus,
    "from ",
    card.quality_scenarios[qa].source,
    " during ",
    card.quality_scenarios[qa].environment,
    ". ",
    card.quality_scenarios[qa].response,
    card.quality_scenarios[qa].measure,
)

### A Specific test case generated from the scenario:

**Data and Data Source:**	The original test data set can be used.

**Measurement and Condition:**	The longest time for a single prompt model response completes is no more than 10 seconds after submission

**Context:**	Normal Operation

### Gather evidence

In [None]:
# import necessary packages
import pandas as pd
import time

from evaluation_helpers import *

In [None]:
# Read the files with with the necessary input data and LLM evaluation results
input_df = pd.read_csv(
    os.path.join(DATASETS_DIR, "5bc_llm_input_functional_correctness.csv")
)
# response_df = pd.read_csv(os.path.join(DATASETS_DIR, '5e_llm_output_robustness.csv'))
# response_df.drop(columns=['Unnamed: 0'],inplace=True)
# input_df.drop(columns=['Unnamed: 0'],inplace=True)

input_df

### Save evidence to the specicified scenario

In [None]:
chain = prompt_template | llm
time_performances = []
chat_responses = []


def time_model(input_df):
    for index, row in input_df.iterrows():
        pii_data = {
            "employee_name": row.EmployeeName,
            "goals_and_objectives": row.goalsAndObjectives,
            "self_eval": row.employeeSelfEval,
            "manager_comments": row.managerComments,
        }

    prompt = prompt_template.format(**pii_data)

    start_time = time.time()
    chat_response = chain.invoke(pii_data)
    time_performances.append(time.time() - start_time)

    return time_performances

In [None]:
from mlte.evidence.types.array import Array
from mlte.measurement.external_measurement import ExternalMeasurement

# Evaluate accuracy, identifier has to be the same one defined in the TestSuite.
time_measurement = ExternalMeasurement(
    "results returned promptly", Array, time_model
)
time_res = time_measurement.evaluate(input_df)

# Inspect value
print(time_res)

# Save to artifact store
time_res.save(force=True)