One of the main use-cases of the `llm-eval` framework to allow users to define and run many Evals against many Candidates via yaml files. The Check and Candidate classes can be serialized and deserialized via `to_dict()`/`from_dict()`, and a registration system is used to instantiate the Check/Candidate objects when deserialized.

However, the framework also allows users to define Checks/Candidates through code using callable objects such as lambda functions. Users can also define tests where the prompt is any type of object (and not restricted to string/dict/numeric as it is when defining Evals in yaml files) and Candidates can return any type of eval. This functionality allows users to easily define and run Evals in code (e.g. in unit tests).

In [1]:
# set path to the root directory of the project
import os
os.chdir('..')

In [None]:
from llm_eval.candidates import CandidateResponse
from llm_eval.eval import Eval

def fake_candidate(input: dict) -> CandidateResponse:
    fake_response = {'my_response': f"This is a fake response for the prompt: '{str(input)}'."}
    # all Candidates must return a CandidateResponse object
    return CandidateResponse(response=fake_response)

eval_ = Eval(
    input=[{'my_prompt': "This is a user's prompt."}],
    checks=[
        # a ResponseData object is passed to all checks (Check or callable) from the Eval
        lambda data: 'my_response' in data.response,
        lambda data: 'fake response' in str(data.response),
        lambda data: 'my_prompt' in data.input,
    ],
)
result = eval_(fake_candidate)
print(f"Num checks: {result.num_checks}")
print(f"Num passed: {result.num_successful_checks}")
print(f"Perc Passed: {result.perc_successful_checks:.1%}")


---