Expectations #596

hinthornw · 2024-04-10T22:59:45Z

change default dataset creation (should be unique per file)
change env vars to use LANGSMITH_ prefix
Want feedback on:
ergonomics: is this a general UX we want to support
imports: should i re-implement string, embedding distance, etc.?
Do we want default implementations?
Any other we ought to include at the outset?
I could also do a general expect(value).is(...) or something: want anything super generic like that?

Example:

@unit(inputs=x, outputs=y)
def test_output_semantically_close():
    response = oai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Say hello!"},
        ],
    )
    # The embedding_distance call logs the embedding distance to LangSmith
    expect.embedding_distance(
        prediction=response.choices[0].message.content,
        reference="Hello!",
        # The following optional assertion logs a
        # pass/fail score to LangSmith
        # and raises an AssertionError if the assertion fails.
    ).to_be_less_than(0.5)
    # Compute damerau_levenshtein distance
    expect.string_distance(
        prediction=response.choices[0].message.content,
        reference="Hello!",
        # And then log a pass/fail score to LangSmith
    ).to_be_less_than(0.5)

The idea is it's still an easy onramp for developers to quickly write some scoring functions and get it running regularly in CI

hinthornw · 2024-04-11T02:25:19Z

python/langsmith/_testing.py

@@ -114,13 +114,13 @@ def unit(*args: Any, **kwargs: Any) -> Callable:
        Caching is faster if you install libyaml. See
        https://vcrpy.readthedocs.io/en/latest/installation.html#speed for more details.

-        >>> os.environ["LANGCHAIN_TEST_CACHE"] = "tests/cassettes"
+        >>> # os.environ["LANGCHAIN_TEST_CACHE"] = "tests/cassettes"


It's playing funky with doctest - looking into it more tonight

python/langsmith/_expect.py

- [x] change default dataset creation (should be unique per file) - [x] change env vars to use `LANGSMITH_` prefix Want feedback on: - [x] ergonomics: is this a general UX we want to support - [x] imports: should i re-implement string, embedding distance, etc.? - [x] Do we want default implementations? - [x] Any other we ought to include at the outset? - [x] I could also do a general `expect(value).is(...)` or something: want anything super generic like that? Example: ```python @Unit(inputs=x, outputs=y) def test_output_semantically_close(): response = oai_client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Say hello!"}, ], ) # The embedding_distance call logs the embedding distance to LangSmith expect.embedding_distance( prediction=response.choices[0].message.content, reference="Hello!", # The following optional assertion logs a # pass/fail score to LangSmith # and raises an AssertionError if the assertion fails. ).to_be_less_than(0.5) # Compute damerau_levenshtein distance expect.string_distance( prediction=response.choices[0].message.content, reference="Hello!", # And then log a pass/fail score to LangSmith ).to_be_less_than(0.5) ``` The idea is it's still an easy onramp for developers to quickly write some scoring functions and get it running regularly in CI

Add Expectations Draft

7e57a1a

hinthornw force-pushed the wfh/expectations branch 4 times, most recently from acad86e to 3a6406e Compare April 11, 2024 01:49

merge

a90bae9

hinthornw force-pushed the wfh/expectations branch 3 times, most recently from c10dbe9 to 2a269bc Compare April 11, 2024 02:16

ts

9958e4b

hinthornw force-pushed the wfh/expectations branch from 2a269bc to 9958e4b Compare April 11, 2024 02:18

add evals

910930e

hinthornw marked this pull request as ready for review April 11, 2024 02:24

hinthornw commented Apr 11, 2024

View reviewed changes

python/langsmith/_expect.py Show resolved Hide resolved

hinthornw added 9 commits April 11, 2024 12:12

Env var naming

3584be7

Fixup names

e839bc5

format

2310683

recache

648f138

Improve Description

3f0fdcb

Format

275dcf4

fix metadata

4326d2d

ls

a3a0e3c

summ

75129f2

hinthornw force-pushed the wfh/expectations branch from 9427646 to 75129f2 Compare April 11, 2024 23:15

rc

01ec05e

rlancemartin self-requested a review April 12, 2024 17:50

rlancemartin approved these changes Apr 12, 2024

View reviewed changes

bump

dfd3008

hinthornw merged commit 0308d11 into main Apr 12, 2024
7 checks passed

hinthornw deleted the wfh/expectations branch April 12, 2024 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expectations #596

Expectations #596

hinthornw commented Apr 10, 2024 •

edited

Loading

hinthornw Apr 11, 2024

Expectations #596

Expectations #596

Conversation

hinthornw commented Apr 10, 2024 • edited Loading

hinthornw Apr 11, 2024

Choose a reason for hiding this comment

hinthornw commented Apr 10, 2024 •

edited

Loading