## 3b. Manual Validation - Adding Success/Failure validation to qualitative QAS

Some test cases require Evidence that can't be gathered by the tool, so we want to define that outside of the tool, and then manually mark them as Success or Failure. Note that some functions will be loaded from external Python files.

In [1]:
# Sets up context for the model being used, sets up constants related to folders and model data to be used.
from demo.GradientClimber.session import *

Creating initial custom lists at URI: local:///Users/jhansen/continuum/mlte/demo/GradientClimber/../store
Loaded 7 qa_categories for initial list
Loaded 30 quality_attributes for initial list
Creating sample catalog at URI: StoreType.LOCAL_FILESYSTEM:local:///Users/jhansen/continuum/mlte/demo/GradientClimber/../store
Loading sample catalog entries.
Loaded 9 entries for sample catalog.


In [2]:
from mlte.results.test_results import TestResults

# Load test results
test_results = TestResults.load()

# See which are marked as Info (not validated)
test_results.print_results(result_type="Info")

 > Test Case: interpretability, result: Info, details: Inspect project code and documentation.
 > Test Case: deployability, result: Info, details: Inspect project code and documentation.
 > Test Case: portability, result: Info, details: Inspect project code and documentation.


Now we will manually validate all cases that we know were not validated automatically.

In [3]:
from mlte.results.result import Success
from mlte.results.result import Failure

MANUAL_VALIDATION = [
    {
        "id": "interpretability",
        "result": Success,
        "message": "Model output is from the RL Gym mountain car example which can only produce 0,1 or 2 as output.",
    },
    {
        "id": "deployability",
        "result": Success,
        "message": "Verified that the q-table file is available and of a manageable size.",
    },
    {
        "id": "portability",
        "result": Success,
        "message": "Data is in the form of an npy file which is portable by design.",
    },
]

for r in MANUAL_VALIDATION:
    test_results.convert_result(
        test_case_id=r["id"], result_type=r["result"], message=r["message"]
    )

test_results.print_results()
test_results.save(force=True)

 > Test Case: functional correctness, result: Success, details: All accuracies are equal to or over threshold 0.999 - values: ["[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,..."]
 > Test Case: reliability, result: Failure, details: One or more accuracies are below threshold 0.999 - values: ["[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,..."]
 > Test Case: gradient climber position accuracy, result: Success, details: All accuracies are equal to or over threshold 0.999 - values: ["[1, 1,

ArtifactModel(header=ArtifactHeaderModel(identifier='results.default', type='results', timestamp=1761661541, creator=None, level='version'), body=TestResultsModel(artifact_type=<ArtifactType.TEST_RESULTS: 'results'>, test_suite_id='suite.default', test_suite=TestSuiteModel(artifact_type=<ArtifactType.TEST_SUITE: 'suite'>, test_cases=[TestCaseModel(identifier='functional correctness', goal='Check if model complies with position requirements', qas_list=['default.card-qas_001'], measurement=None, validator=ValidatorModel(bool_exp='gASVuQIAAAAAAACMCmRpbGwuX2RpbGyUjBBfY3JlYXRlX2Z1bmN0aW9ulJOUKGgAjAxfY3JlYXRlX2NvZGWUk5QoQwgMAQT/BgII/pRLAUsASwBLAUsESxNDInQAhwBmAWQBZAKECHwAagFEAIMBgwF0AnwAagGDAWsCUwCUTmgEKEMEBgEI/5RLAUsASwBLAksDSzNDGIEAfABdB30BfAGIAGsFVgABAHECZABTAJROhZQpjAIuMJSMAWeUhpSMOi9Vc2Vycy9qaGFuc2VuL2NvbnRpbnV1bS9tbHRlL2RlbW8vc2NlbmFyaW9zL3ZhbGlkYXRvcnMucHmUjAk8Z2VuZXhwcj6USxJDCAKABAAIAQr/lIwJdGhyZXNob2xklIWUKXSUUpSMRmFsbF9hY2N1cmFjaWVzX21vcmVfb3JfZXF1YWxfdGhhbi48bG9jYWxzPi48bGFtYmRhPi