Update `Comparing Pydantic` doc to include function-calling workflow as an additional point of comparison #691

ahgraber · 2024-06-19T00:46:30Z

ahgraber
Jun 19, 2024

A lot of the complaints in the Comparing Pydantic workflow seem to stem from not using function-calling, which makes me feel like the comparison is less valid. Many providers allow the user to specify the return json schema that the called "function" would use.

Using the OpenAI API as an example - instead of using "json mode", users can provide the expected return schema in the tools array, set tool_choice to specify the schema to be returned, and then provide roughly the same instructions in the prompt to help the model perform the work prior to returning json.

This workflow requires a lot less regex to parse the returned JSON, though it doesn't fix the issues regarding juggling different prompts and/or jsonschemas.

aaronvg · 2024-06-19T22:58:58Z

aaronvg
Jun 19, 2024
Maintainer

good point, we will address function-calling APIs soon. Our data so far shows that constrained grammars (like using function calling) is worse than using pure prompting (as also evidenced by the Berkeley Function calling leaderboard, where "prompt" beats "FC" nearly every time) see link which is why we haven't focused much on it. In fact, function-calling for OpenAI APIs doesn't yet support enum values in your schemas (as of a month ago), nor it supports easy chain-of-thought.

I think in general users of BAML opt for BAML not only due to the easier type definitions (you can use string[] instead of making a wrapper List object), but also things like the playground preview, instant testing it enables, fuzzy json parsing, etc. In the future we might add a function-call: true property on BAML functions to enable those APIs as well.

We'll revamp the comparison soon, and thanks for calling that out. We do want to make sure we're evaluating fairly.

1 reply

ahgraber Jul 11, 2024
Author

In fact, function-calling for OpenAI APIs doesn't yet support enum values in your schemas

Berkeley FunctionCalling Leaderboard's blog indicates that float is not supported while number is.

How do you know what types the OpenAI API support in function-calling? I don't see OpenAI documentation regarding either assertion, though the openai cookbook provides an example that uses enum in the function schema.

ahgraber · 2024-06-20T00:26:57Z

ahgraber
Jun 20, 2024
Author

Thanks for the additional info! I recognize that my perspective is skewed by the peculiarities of my use case, and appreciate the additional context. I was unaware of the performance of prompt-only methods for generating reliable JSON!

... speaking of fuzzy JSON parsing, is that something that can be used separately from BAML (python), or is it tightly integrated? Since I'm particularly focused on guaranteeing that the generated JSON that matches the specified schema, it seems like the fuzzy parsing could be an easy win. (apologies for the discussion diversion here; I'd be happy to move to a separate discussion)

3 replies

aaronvg Jun 20, 2024
Maintainer

We are actually thinking of exposing some specific baml modules like our parser. Just curious, do you call the LLms yourself and parse things yourself today or do you use a library?

ahgraber Jun 20, 2024
Author

Currently, we use have separate business logic that manages which schemas we specify in function-calling, then pass the JSON that's returned to other processes. Initially, we'd thought that function-calling used constrained generation and guaranteed that the response matched the schema, but evidently that's not the case, so we're trying to find ways to heal invalid responses.

Currently, we send invalid responses back to the LLM with error messages and that tends to fix it, but it's a stupid use of tokens. I'm currently experimenting with outlines-dev/outlines for constrained generation, and others on my team are experimenting with finding/adopting/building json parsers.

hellovai Jun 20, 2024
Maintainer

for constrained generation, and others on my team are experimenting with finding/adopting/building json parsers.

Thats great to know. I would recommend trying out just calling it via BAML first. There are 2 key reasons why we didn't expose the components in our initial draft.

There's just a lot of boiler plate to pass in the the JSONish parser
Getting good JSON parsing is half the battle, theres a lot around having a native python type accessible, and dealing with limitations of pydantic.

For example, lets say you have this baml file (Prompt Fiddle Link):

class Query {
  search_query string
  constraints Constraints?
}

class Constraints {
  size Size?
  price int? @description(#"USD"#)
}

class Size {
  amount int
  unit string
}

function ParseQuery(query: string) -> Query {
  // see clients.baml
  client GPT35

  // The prompt uses Jinja syntax. Change the models or this text and watch the prompt preview change!
  prompt #"
    {# special macro to print the output instructions. #}
    {{ ctx.output_format }}

    {{ _.role('user') }}
    {{ query }}
  "#
}

You are then actually able to get autocomplete for in your python code for:

from baml_client import b

res = await b.ParseQuery("...")

# res is autocomplete-able for itself and all nested fields
res.<search_query | constraints>

the way we do JSONish parsing relies on some assumptions we make in how {{ ctx.output_format }} works. So effectively you'd have to write wrappers that do:

def call_wrapped_prompt(raw_prompt: str, model: Type[SomePydanticModel]) -> SomePydanticModel:
  prompt = raw_prompt + baml_components.output_format_from_json_schema(model.to_json_schema())
  response = call_open_ai_with_retries(prompt, max_retries=3)
  parsed = baml_components.jsonish_from_json_schema(model.to_json_schema(), response)
  return model.new(parsed)

In this world, the code you would write gets a bit uglier if you are returning a top level list, or union, or primitive type. With BAML, we just code-gen everything for you instead.

for trying it out easily, you can try running:

pip install baml-py
baml-cli init
baml-cli generate

Then you'll have a snippet of BAML code added to your repo with a generated version of baml_client you can import like in our sample code.

hope this explains our thoughts a bit better.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boundary

Update `Comparing Pydantic` doc to include function-calling workflow as an additional point of comparison #691

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Boundary

Update Comparing Pydantic doc to include function-calling workflow as an additional point of comparison #691

ahgraber Jun 19, 2024

Replies: 2 comments · 4 replies

aaronvg Jun 19, 2024 Maintainer

ahgraber Jul 11, 2024 Author

ahgraber Jun 20, 2024 Author

aaronvg Jun 20, 2024 Maintainer

ahgraber Jun 20, 2024 Author

hellovai Jun 20, 2024 Maintainer

Update `Comparing Pydantic` doc to include function-calling workflow as an additional point of comparison #691

ahgraber
Jun 19, 2024

Replies: 2 comments 4 replies

aaronvg
Jun 19, 2024
Maintainer

ahgraber Jul 11, 2024
Author

ahgraber
Jun 20, 2024
Author

aaronvg Jun 20, 2024
Maintainer

ahgraber Jun 20, 2024
Author

hellovai Jun 20, 2024
Maintainer