Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expectations #596

Merged
merged 15 commits into from
Apr 12, 2024
Merged

Expectations #596

merged 15 commits into from
Apr 12, 2024

Conversation

hinthornw
Copy link
Collaborator

@hinthornw hinthornw commented Apr 10, 2024

  • change default dataset creation (should be unique per file)

  • change env vars to use LANGSMITH_ prefix
    Want feedback on:

  • ergonomics: is this a general UX we want to support

  • imports: should i re-implement string, embedding distance, etc.?

  • Do we want default implementations?

  • Any other we ought to include at the outset?

  • I could also do a general expect(value).is(...) or something: want anything super generic like that?

Example:

@unit(inputs=x, outputs=y)
def test_output_semantically_close():
    response = oai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Say hello!"},
        ],
    )
    # The embedding_distance call logs the embedding distance to LangSmith
    expect.embedding_distance(
        prediction=response.choices[0].message.content,
        reference="Hello!",
        # The following optional assertion logs a
        # pass/fail score to LangSmith
        # and raises an AssertionError if the assertion fails.
    ).to_be_less_than(0.5)
    # Compute damerau_levenshtein distance
    expect.string_distance(
        prediction=response.choices[0].message.content,
        reference="Hello!",
        # And then log a pass/fail score to LangSmith
    ).to_be_less_than(0.5)

The idea is it's still an easy onramp for developers to quickly write some scoring functions and get it running regularly in CI

@hinthornw hinthornw force-pushed the wfh/expectations branch 4 times, most recently from acad86e to 3a6406e Compare April 11, 2024 01:49
@hinthornw hinthornw force-pushed the wfh/expectations branch 3 times, most recently from c10dbe9 to 2a269bc Compare April 11, 2024 02:16
@hinthornw hinthornw marked this pull request as ready for review April 11, 2024 02:24
@@ -114,13 +114,13 @@ def unit(*args: Any, **kwargs: Any) -> Callable:
Caching is faster if you install libyaml. See
https://vcrpy.readthedocs.io/en/latest/installation.html#speed for more details.

>>> os.environ["LANGCHAIN_TEST_CACHE"] = "tests/cassettes"
>>> # os.environ["LANGCHAIN_TEST_CACHE"] = "tests/cassettes"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's playing funky with doctest - looking into it more tonight

@rlancemartin rlancemartin self-requested a review April 12, 2024 17:50
@hinthornw hinthornw merged commit 0308d11 into main Apr 12, 2024
7 checks passed
@hinthornw hinthornw deleted the wfh/expectations branch April 12, 2024 18:10
hinthornw added a commit that referenced this pull request Apr 12, 2024
- [x] change default dataset creation (should be unique per file)
- [x] change env vars to use `LANGSMITH_` prefix
Want feedback on:

- [x] ergonomics: is this a general UX we want to support
- [x] imports: should i re-implement string, embedding distance, etc.?
- [x] Do we want default implementations?
- [x] Any other we ought to include at the outset?
- [x] I could also do a general `expect(value).is(...)` or something:
want anything super generic like that?

Example:

```python
@Unit(inputs=x, outputs=y)
def test_output_semantically_close():
    response = oai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Say hello!"},
        ],
    )
    # The embedding_distance call logs the embedding distance to LangSmith
    expect.embedding_distance(
        prediction=response.choices[0].message.content,
        reference="Hello!",
        # The following optional assertion logs a
        # pass/fail score to LangSmith
        # and raises an AssertionError if the assertion fails.
    ).to_be_less_than(0.5)
    # Compute damerau_levenshtein distance
    expect.string_distance(
        prediction=response.choices[0].message.content,
        reference="Hello!",
        # And then log a pass/fail score to LangSmith
    ).to_be_less_than(0.5)
```

The idea is it's still an easy onramp for developers to quickly write
some scoring functions and get it running regularly in CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants