core[minor], openai[minor], langchain[patch]: output format on openai #17302

baskaryan · 2024-02-09T08:57:01Z

class Foo(BaseModel):
  bar: str

structured_llm = ChatOpenAI().with_output_format(Foo)

vercel · 2024-02-09T08:57:06Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Feb 22, 2024 11:15pm

libs/partners/openai/langchain_openai/chat_models/base.py

eyurtsev · 2024-02-13T16:26:38Z

libs/partners/openai/langchain_openai/chat_models/base.py

+        *,
+        mode: Literal["tools", "json"] = "tools",
+        enforce_schema: bool = True,
+        return_single: bool = True,


What is return_single?

eyurtsev · 2024-02-13T16:30:08Z

libs/partners/openai/langchain_openai/chat_models/base.py

+        output_schema: Union[Dict[str, Any], Type[_BM]],
+        *,
+        mode: Literal["tools", "json"] = "tools",
+        enforce_schema: bool = True,


enforce_schema sounds like validation, but this is not a validation step rather model forced to do tool invocation.

what's a better name? want something not too openai tools specific if we're going to add this method (or similar methods) to other models

eyurtsev · 2024-02-13T16:30:26Z

libs/partners/openai/langchain_openai/chat_models/base.py

+
+    def with_output_format(
+        self,
+        output_schema: Union[Dict[str, Any], Type[_BM]],


Do we have an opinion of what to do about multiple schemas?

what do you mean?

The underlying API supports multiple tools which could be represented as different schemas

would you just want to use bind_tools in that case?

eyurtsev · 2024-02-13T22:07:19Z

libs/partners/openai/langchain_openai/chat_models/base.py

+        mode: Literal["tools", "json"] = "tools",
+        enforce_schema: bool = True,
+        return_single: bool = True,
+    ) -> Runnable[LanguageModelInput, _BM]:


Not sure that this should return pydantic model by default

To me at least there are 2 separate ideas: (i) schema specification, (ii) optional validation against the schema

for schema specification, not hard to turn pydantic -> dict if you don't want pydantic output

efriis

I like it. Small q

efriis · 2024-02-14T23:32:02Z

libs/core/langchain_core/language_models/base.py

+_OutputSchema = TypeVar("_OutputSchema")
+_OutputFormat = TypeVar("_OutputFormat")


should we restrict these a bit? Right now could be anything right? Seems difficult to use with both.

Just an output base model as well and force it to be pydantic or general json type (list/dict/etc) doesn't seem too bad to me

libs/partners/openai/langchain_openai/chat_models/base.py

eyurtsev · 2024-02-15T03:32:30Z

libs/partners/openai/langchain_openai/chat_models/base.py

+        """"""
+        if kwargs:
+            raise ValueError(f"Received unsupported arguments {kwargs}")
+        is_pydantic_schema = _is_pydantic_class(output_schema)


I am still in favor of making a conceptual distinction between schema declaration vs. the parsing.

For example, I will usually want to declare my schema with pydantic, but I don't necessarily want the output to be in pydantic (e.g., i want to stream the structured json out)

you can always dict_schema = convert_to_openai_tool(pydantic_schema) before passing in schema? i just don't like adding a parameter until we know it's really needed. we can always add it later

OK that will work. We can add a validator='auto'" parameter later on, allowing one to set validator=Noneorvalidator=Type[BaseModel]` or something along those lines?

eyurtsev · 2024-02-15T03:39:08Z

libs/partners/openai/langchain_openai/chat_models/base.py

+
+    def with_output_format(
+        self,
+        output_schema: _OutputSchema,


Is there an escape hatch to say that nothing can be extracted?

tool_choice = True -> so tool call will always be forced

The edge case that we need to handle:

Schema is a single instance rather than a collection

Goal for example: identify the most qualified candidate based on the following list of candidates and their qualifications

baskaryan

mv PydanticOutputParser from langchain -> core
update OpenAI tools parsers to support return_single
cp updated OpenAI tools parsers from langchain -> openai
introduce FormattedOutputMixin
update ChatOpenAI to implement FormattedOutputMixin

libs/core/langchain_core/language_models/output_format.py

efriis · 2024-02-15T21:41:07Z

libs/partners/openai/langchain_openai/chat_models/base.py

+_BM = TypeVar("_BM", bound=BaseModel)
+_OutputSchema = Union[Dict[str, Any], Type[_BM]]
+_FormattedOutput = Union[BaseMessage, Dict, _BM]


Would it be simpler if we had these be the types for any model that supports structured outputs, so each implementation doesn't have to worry about generics? Just whether or not it supports structured outputs?

efriis · 2024-02-15T21:43:02Z

libs/partners/openai/langchain_openai/chat_models/base.py

+        method: Literal["function_calling", "json_mode"] = "function_calling",
+        return_type: Literal["all"] = "all",
+        **kwargs: Any,
+    ) -> Runnable[LanguageModelInput, _AllFormattedOutput]:
+        ...
+
+    @overload
+    def with_output_format(
+        self,
+        schema: _OutputSchema,
+        *,
+        method: Literal["function_calling", "json_mode"] = "function_calling",
+        return_type: Literal["parsed"] = "parsed",


The different defaults on the overloads are a bit confusing. Shouldn't defaults be the same as the catch-all definition?

i.e. Shouldn't return_type default always be parsed or not have a default? For the overload of return_type: Literal["all"] I think the user has to have it manually defined, so maybe shouldn't have a default?

the types can't overlap for overloads

Co-authored-by: Erick Friis <erick@langchain.dev>

hwchase17 · 2024-02-16T20:50:18Z

libs/core/langchain_core/language_models/structured_output.py

+_OutputSchema = TypeVar("_OutputSchema")
+
+
+class StructuredOutputMixin(Generic[_OutputSchema], ABC):


we are back to NOT putting this on base class? why not?

will only be documented on classes that implement it

so the downside of putting on base class (and defaulting to not implemented) is that it will show up in the documentation for all classes? that doesnt seem that bad...

what's downside of mixin?

It's not that it shows up in documentation, but that it's hard to discover which models actually implement the interface.

After we implement, this we'll need to figure out how users will discover that this is an option

hwchase17 · 2024-02-16T20:50:41Z

libs/core/langchain_core/language_models/structured_output.py

+from langchain_core.pydantic_v1 import BaseModel
+from langchain_core.runnables import Runnable
+
+_OutputSchema = TypeVar("_OutputSchema")


why do we need this? IMO this doesnt really add that much and makes it harder to user

harder for user or contributor? don't think affects user experience?

contributor + any user that tries to read the code

in the places the mixin is implemented there won't be a typevar, so what's the hard part for someone reading code?

don't feel super strongly on this though if we think typevar is that bad

eyurtsev · 2024-02-20T19:41:35Z

This fails with an opaque message due to missing doc-string

from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI()

from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int

model.with_structured_output(Person).invoke('hello my name is chester')

BadRequestError: Error code: 400 - {'error': {'message': '"Usage docs: https://docs.pydantic.dev/2.6/concepts/models/ A base class for creating Pydantic models. Attributes:\\n    __class_vars__: The names of classvars defined on the model.\\n    __private_attributes__: Metadata about the private attributes of the model.\\n    __signature__: The signature for instantiating the model.     __pydantic_complete__: Whether model building is completed, or if there are still undefined fields.\\n    __pydantic_core_schema__: The pydantic-core schema used to build the SchemaValidator and SchemaSerializer.\\n    __pydantic_custom_init__: Whether the model has a custom `__init__` function.\\n    __pydantic_decorators__: Metadata containing the decorators defined on the model.\\n        This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.\\n    __pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to\\n        __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.\\n    __pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.\\n    __pydantic_post_init__: The name of the post-init method for the model, if defined.\\n    __pydantic_root_model__: Whether the model is a `RootModel`.\\n    __pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.\\n    __pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.     __pydantic_extra__: An instance attribute with the values of extra fields from validation when\\n        `model_config[\'extra\'] == \'allow\'`.\\n    __pydantic_fields_set__: An instance attribute with the names of fields explicitly set.\\n    __pydantic_private__: Instance attribute with the values of private attributes set on the model instance." is too long - \'tools.0.function.description\'', 'type': 'invalid_request_error', 'param': None, 'code': None}}```

eyurtsev · 2024-02-20T19:43:14Z

Fails silently -- we could add an instance check maybe?

from typing import List

from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI()

from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int


model.with_structured_output(List[Person]).invoke('hello my name is chester and i am 20 years old')

eyurtsev · 2024-02-20T19:26:16Z

libs/langchain/langchain/output_parsers/openai_tools.py

@@ -22,6 +22,8 @@ class JsonOutputToolsParser(BaseGenerationOutputParser[Any]):
    """
    return_id: bool = False
    """Whether to return the tool call id."""
+    return_single: bool = False


Do we think that return_single is generic enough to warrant introducing it into the parser? Is this the right place? How does this interact with streaming? Would unpacking at the caller code work?

Also if we do keep it here is there a better name? e.g., first_tool_only?

The doc-string should probably explain why this is necessary?

eyurtsev · 2024-02-20T19:28:25Z

libs/partners/openai/langchain_openai/chat_models/base.py

+                )
+            else:
+                key_name = convert_to_openai_tool(schema)["function"]["name"]
+                output_parser = JsonOutputKeyToolsParser(


why not use a runnable lambda here to unpack? is it to make a clearer trace?

…structured_output langchain-ai#17302) ```python class Foo(BaseModel): bar: str structured_llm = ChatOpenAI().with_structured_output(Foo) ``` --------- Co-authored-by: Erick Friis <erick@langchain.dev>

rfc: output format on openai

00fe97f

efriis added the partner label Feb 9, 2024

efriis self-assigned this Feb 9, 2024

eyurtsev reviewed Feb 13, 2024

View reviewed changes

libs/partners/openai/langchain_openai/chat_models/base.py Show resolved Hide resolved

eyurtsev reviewed Feb 13, 2024

View reviewed changes

baskaryan added 5 commits February 13, 2024 18:47

Merge branch 'master' into bagatur/rfc_structured_on_openai

14eb7e6

fmt

3d31196

fmt

c1708a8

fmt

2f8a410

fmt

34c1d8f

efriis approved these changes Feb 14, 2024

View reviewed changes

dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Feb 14, 2024

eyurtsev reviewed Feb 15, 2024

View reviewed changes

baskaryan added 3 commits February 15, 2024 11:18

Merge branch 'master' into bagatur/rfc_structured_on_openai

b28e516

wip

9c6334b

cr

ee660d8

baskaryan changed the title ~~rfc: output format on openai~~ core[minor], openai[minor]: output format on openai Feb 15, 2024

baskaryan marked this pull request as ready for review February 15, 2024 21:27

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. Ɑ: models Related to LLMs or chat model modules 🔌: openai Primarily related to OpenAI integrations 🤖:improvement Medium size change to existing code to handle new use-cases labels Feb 15, 2024

fmt

8ab45bd

baskaryan commented Feb 15, 2024

View reviewed changes

fmt

f47cebe

baskaryan changed the title ~~core[minor], openai[minor]: output format on openai~~ core[minor], openai[minor], langchain[minor]: output format on openai Feb 15, 2024

baskaryan changed the title ~~core[minor], openai[minor], langchain[minor]: output format on openai~~ core[minor], openai[minor], langchain[patch]: output format on openai Feb 15, 2024

fmt

6a03c99

fmt

a8a033d

efriis reviewed Feb 15, 2024

View reviewed changes

baskaryan and others added 3 commits February 15, 2024 13:51

Update libs/core/langchain_core/language_models/output_format.py

f8daf11

Co-authored-by: Erick Friis <erick@langchain.dev>

fmt

fa8f257

fmt

3433bce

vercel bot deployed to Preview February 15, 2024 22:10 View deployment

baskaryan added 4 commits February 15, 2024 14:45

Merge branch 'master' into bagatur/rfc_structured_on_openai

49ca7d0

rm output generic

be3b084

fmt

541fc3f

fmt

d556fe5

hwchase17 reviewed Feb 16, 2024

View reviewed changes

hwchase17 self-assigned this Feb 16, 2024

baskaryan mentioned this pull request Feb 20, 2024

langchain[minor]: openai tools structured_output_chain #17296

Merged

eyurtsev reviewed Feb 20, 2024

View reviewed changes

efriis removed their assignment Feb 21, 2024

baskaryan added 2 commits February 22, 2024 09:36

Merge branch 'master' into bagatur/rfc_structured_on_openai

e8c2766

fmt

15e8edf

efriis self-assigned this Feb 22, 2024

baskaryan added 4 commits February 22, 2024 13:28

Merge branch 'master' into bagatur/rfc_structured_on_openai

17d7f75

fmt

64d7ce2

undo

61e1477

Merge branch 'master' into bagatur/rfc_structured_on_openai

5a5ba66

baskaryan merged commit b5f8cf9 into master Feb 22, 2024
91 checks passed

baskaryan deleted the bagatur/rfc_structured_on_openai branch February 22, 2024 23:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core[minor], openai[minor], langchain[patch]: output format on openai #17302

core[minor], openai[minor], langchain[patch]: output format on openai #17302

baskaryan commented Feb 9, 2024 •

edited

vercel bot commented Feb 9, 2024 •

edited

eyurtsev Feb 13, 2024

eyurtsev Feb 13, 2024

baskaryan Feb 13, 2024

eyurtsev Feb 13, 2024

baskaryan Feb 13, 2024

eyurtsev Feb 13, 2024

baskaryan Feb 13, 2024

eyurtsev Feb 13, 2024

baskaryan Feb 13, 2024

efriis left a comment

efriis Feb 14, 2024

eyurtsev Feb 15, 2024

baskaryan Feb 15, 2024

eyurtsev Feb 15, 2024

eyurtsev Feb 15, 2024

baskaryan left a comment

efriis Feb 15, 2024

efriis Feb 15, 2024

efriis Feb 15, 2024

baskaryan Feb 15, 2024

hwchase17 Feb 16, 2024

baskaryan Feb 16, 2024

hwchase17 Feb 20, 2024

baskaryan Feb 20, 2024

eyurtsev Feb 20, 2024

hwchase17 Feb 16, 2024

baskaryan Feb 16, 2024

hwchase17 Feb 20, 2024

baskaryan Feb 20, 2024

eyurtsev commented Feb 20, 2024

eyurtsev commented Feb 20, 2024

eyurtsev Feb 20, 2024

eyurtsev Feb 20, 2024

		_OutputSchema = TypeVar("_OutputSchema")
		_OutputFormat = TypeVar("_OutputFormat")

		_OutputSchema = TypeVar("_OutputSchema")


		class StructuredOutputMixin(Generic[_OutputSchema], ABC):

core[minor], openai[minor], langchain[patch]: output format on openai #17302

core[minor], openai[minor], langchain[patch]: output format on openai #17302

Conversation

baskaryan commented Feb 9, 2024 • edited

vercel bot commented Feb 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

efriis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

baskaryan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eyurtsev commented Feb 20, 2024

eyurtsev commented Feb 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

baskaryan commented Feb 9, 2024 •

edited

vercel bot commented Feb 9, 2024 •

edited