Extends base fern client with eval utilities by peadaroh · Pull Request #18 · humanloop/humanloop-python

peadaroh · 2024-10-04T13:16:07Z

This PR extends the auto-generated Humanloop client with additional utilities for running an Evaluation where the user manages their full application runtime.

Related fern docs for extending client with custom code: Augment with custom code — Fern

Specifically we add a method humanloop.evaluations.run_local(...) that takes details of a file destination on Humanloop, a user defined dataset, a user defined function to evaluator, a name for their Evaluation and details of their Evaluators.

This is invoked like so:

from my_code import ask_question, load_dataset
checks = hl.evaluations.run(
    file={
        "path": "evals_demo/answer-flow",
        "type": "flow",
        "function": ask_question,
        "version": {"attributes": {"description": "Simple RAG, OpenAI + Chroma", "prompt": PROMPT}},
    },
    name="Staging CI",
    dataset={"path": "evals_demo/medqa-test", "datapoints": load_dataset()},
    evaluators=[
        {"path": "evals_demo/exact_match"},
        {"path": "evals_demo/levenshtein"},
        {"path": "evals_demo/reasoning", "threshold": 0.5},
    ],
    workers=4
)

With resulting output:
The resulting CLI output is

If the file or dataset does not yet exist on Humanloop, they will be created automatically, otherwise they will be updated appropriately.

If the Evaluation exists for the file (based on the name provided) then a new run is added, otherwise a new Evaluation is automatically created.

Evaluators referenced by path (or id) must already exist. However, we've also provisionally added an affordance for local (or external) Evaluators:

from my_code import ask_question, load_dataset
from my_evaluators import levenshtein_distance_optimized

checks = hl.evaluations.run(
    file={
        "path": "evals_demo/answer-flow",
        "type": "flow",
        "function": ask_question,
        "version": {"attributes": {"description": "Simple RAG, OpenAI + Chroma", "prompt": PROMPT}},
    },
    name="Staging CI",
    dataset={"path": "evals_demo/medqa-test", "datapoints": load_dataset()},
    evaluators=[
        {"path": "evals_demo/exact_match"},
        {"path": "evals_demo/levenshtein"},
        {"path": "evals_demo/reasoning", "threshold": 0.5},
        {
            "path": "evals_demo/levenshtein-optimized",
            "function": levenshtein_distance_optimized,
            "args_type": "target_required",
            "return_type": "number",
        }
    ],
    workers=4
)

fern-api bot and others added 2 commits October 4, 2024 12:55

Release 0.8.6

af3fae2

Evaluation utils for the Humanloop SDK.

800ffb8

peadaroh mentioned this pull request Oct 4, 2024

Eval utilities #15

Closed

peadaroh added 3 commits October 4, 2024 14:19

Mark version as beta

e4ea407

Improved validations

c774fbb

Fix type hint

3808548

peadaroh merged commit 8ef6327 into master Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extends base fern client with eval utilities#18

Extends base fern client with eval utilities#18
peadaroh merged 5 commits intomasterfrom
eval-utilities-2

peadaroh commented Oct 4, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

peadaroh commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

peadaroh commented Oct 4, 2024 •

edited

Loading