
## 2c. Evidence - Reliability

Evidence collected in this section checks for functional correctness in the Reliability Example

In [1]:
{
    "tags": ["Reinforcement Learning"],
    "quality_attribute": "Model outputs improve progress towards goal 99.9% of the time.",
    "description": "Model receives valid values from sensors during Normal Operation and produces outputs (actions) which improve the expected reward 99.9% of the time.  ",
    "inputs": "Initial random start position",
    "output": "Log with 1 for action determined to make progress, and 0 for those that do not.",
}

{'tags': ['Reinforcement Learning'],
 'quality_attribute': 'Model outputs improve progress towards goal 99.9% of the time.',
 'description': 'Model receives valid values from sensors during Normal Operation and produces outputs (actions) which improve the expected reward 99.9% of the time.  ',
 'inputs': 'Initial random start position',
 'output': 'Log with 1 for action determined to make progress, and 0 for those that do not.'}

### Initialize MLTE Context

MLTE contains a global context that manages the currently active _session_. Initializing the context tells MLTE how to store all of the artifacts that it produces. This import will also set up global constants related to folders and model to use.

In [2]:
# Sets up context for the model being used, sets up constants related to folders and model data to be used.
from session import *

Creating initial custom lists at URI: local:///Users/jhansen/continuum/mlte/demo/GradientClimber/../store
Loaded 7 qa_categories for initial list
Loaded 30 quality_attributes for initial list
Creating sample catalog at URI: StoreType.LOCAL_FILESYSTEM:local:///Users/jhansen/continuum/mlte/demo/GradientClimber/../store
Loading sample catalog entries.
Loaded 13 entries for sample catalog.


### Set up scenario test case

In [None]:
from mlte.negotiation.artifact import NegotiationCard

card = NegotiationCard.load()
qa = 1
print(card.quality_scenarios[qa].identifier)
print(card.quality_scenarios[qa].quality)
print(
    card.quality_scenarios[qa].stimulus,
    "from ",
    card.quality_scenarios[qa].source,
    " during ",
    card.quality_scenarios[qa].environment,
    ". ",
    card.quality_scenarios[qa].response,
    card.quality_scenarios[qa].measure,
)

default.card-qas_001
Accuracy
After initialization, the from  user activates the gradient climber system  during  normal operation with the vehicle stationary and located at the bottom of a hill .  The model outputs manipulate the vehicle state to produces a position=0.6 at some time $t$<250


**A Specific test case generated from the scenario:**

**Data and Data Source:**	Vehicle state (position and velocity) from sensors (or approximated by simulation engine in development)

**Measurement and Condition:**	Correctness is measured for all outputs of multiple runs using a heuristic is provided by the evaluate_action implemented in python. 

**Context:**	Normal Operation

### Helper Functions

In [4]:
MEASURE_NAME = "reliability"
NUM_TRIALS = 100

In [5]:
import numpy as np
import gymnasium as gym

In [6]:
env = gym.make("MountainCar-v0", render_mode="rgb_array")
state, info = env.reset()

In [7]:
# Discretize the state space (position, velocity)
position_bins = np.linspace(-1.2, 0.6, 20)
velocity_bins = np.linspace(-0.07, 0.07, 20)

# Q-table initialization
q_table = np.load(os.path.join(DATA_DIR, "mountain_car.npy"))


# Discretize the continuous state (position and velocity)
def discretize_state(state):
    position, velocity = state
    position_idx = (
        np.digitize(position, position_bins) - 1
    )  # Position bin index
    velocity_idx = (
        np.digitize(velocity, velocity_bins) - 1
    )  # Velocity bin index
    return position_idx, velocity_idx


# Epsilon-greedy action selection
def choose_action(state):
    position_idx, velocity_idx = discretize_state(state)
    return np.argmax(q_table[position_idx, velocity_idx])

## Reliability

Agent receives valid values from sensors during Normal Operation. Agent produces actions which improve the expected reward 99.9% of the time.

In [8]:
def evaluate_action(state, action):
    "Return 1 if this is the expected action, return 0 if it is the wrong move, and -1 as an error condition"
    position, velocity = state
    if (position < 0.1) & (velocity < 0):
        return np.bool(action == 0)
    if (position < 0.1) & (velocity > 0):
        return np.bool(action == 2)
    if (position > 0.1) & (velocity > 0):
        return np.bool(action == 0)
    if (position < 0.1) & (velocity > 0):
        return np.bool(action == 2)
    return -1

In [9]:
def test_reliability():
    done = False
    total_reward = 0
    actions = []
    test_results = []

    for i in range(NUM_TRIALS):
        state, info = env.reset()
        done = False

        while not done:
            # Random action selection
            action = choose_action(state)
            actions.append(action)

            # Take the action and get the next state, reward, done flag, and info
            next_state, reward, done, truncated, info = env.step(action)

            # Evaluate the results
            result = evaluate_action(state, action)
            if result == True:
                test_results.append(1)
            elif result == False:
                test_results.append(0)

            # Update the state for the next iteration
            state = next_state
        print(f"Completed trial {i}")

    return test_results

In [10]:
from mlte.evidence.types.array import Array
from mlte.measurement.external_measurement import ExternalMeasurement

# Evaluate accuracy, identifier has to be the same one defined in the TestSuite.
position_compliance_measurement = ExternalMeasurement(
    MEASURE_NAME, Array, test_reliability
)
evidence = position_compliance_measurement.evaluate()

# Inspect value
print(evidence)

# Save to artifact store
evidence.save(force=True, parents=True)

Completed trial 0
Completed trial 1
Completed trial 2
Completed trial 3
Completed trial 4
Completed trial 5
Completed trial 6
Completed trial 7
Completed trial 8
Completed trial 9
Completed trial 10
Completed trial 11
Completed trial 12
Completed trial 13
Completed trial 14
Completed trial 15
Completed trial 16
Completed trial 17
Completed trial 18
Completed trial 19
Completed trial 20
Completed trial 21
Completed trial 22
Completed trial 23
Completed trial 24
Completed trial 25
Completed trial 26
Completed trial 27
Completed trial 28
Completed trial 29
Completed trial 30
Completed trial 31
Completed trial 32
Completed trial 33
Completed trial 34
Completed trial 35
Completed trial 36
Completed trial 37
Completed trial 38
Completed trial 39
Completed trial 40
Completed trial 41
Completed trial 42
Completed trial 43
Completed trial 44
Completed trial 45
Completed trial 46
Completed trial 47
Completed trial 48
Completed trial 49
Completed trial 50
Completed trial 51
Completed trial 52
Com

ArtifactModel(header=ArtifactHeaderModel(identifier='evidence.reliability', type='evidence', timestamp=1762872652, creator=None, level='version'), body=EvidenceModel(artifact_type=<ArtifactType.EVIDENCE: 'evidence'>, metadata=EvidenceMetadata(test_case_id='reliability', measurement=MeasurementMetadata(measurement_class='mlte.measurement.external_measurement.ExternalMeasurement', output_class='mlte.evidence.types.array.Array', additional_data={'function': '__main__.test_reliability'})), evidence_class='mlte.evidence.types.array.Array', value=ArrayValueModel(evidence_type=<EvidenceType.ARRAY: 'array'>, data=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,