Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json schema generation - differences between pydantic and msgspec #640

Closed
yqiang opened this issue Jan 31, 2024 · 3 comments
Closed

json schema generation - differences between pydantic and msgspec #640

yqiang opened this issue Jan 31, 2024 · 3 comments

Comments

@yqiang
Copy link

yqiang commented Jan 31, 2024

Question

I ran into an issue when taking the JSON Schema that's generated from a msgspec.Struct and passing it to OpenAI's function_call APIs, which takes a json schema as an input to define how to produce the output. It doesn't like how msgspec produces the schema because there isn't a type field at the root level.

Below are two versions of JSON schemas generated from the same model (i.e., same fields). The first one is from msgspec, while the second one is from pydantic v2, which works fine with the openai API. I'm not sure which is more correct, but wanted to raise the issue in case it is something that the author can/wants to address.

Thanks again for such a great library!

msgspec version 0.18.6, pydantic version 2.6.0.

msgspec generated json schema

{
    '$ref': '#/$defs/MenuItemResponse2',
    '$defs': {
        'MenuItemResponse2': {
            'title': 'MenuItemResponse2',
            'type': 'object',
            'properties': {'menu_items': {'type': 'array', 'items': {'$ref': '#/$defs/MenuItem2'}}},
            'required': ['menu_items']
        },
        'MenuItem2': {
            'title': 'MenuItem2',
            'type': 'object',
            'properties': {
                'name': {'type': 'string'},
                'calories': {'type': 'number'},
                'protein': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'fat': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'carbohydrates': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'dietary_fiber': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'saturated_fat': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'trans_fat': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'cholesterol': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'sodium': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'serving_size': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'default': None}
            },
            'required': ['name', 'calories']
        }
    }
}

pydantic generated schema

{
    '$defs': {
        'MenuItem': {
            'properties': {
                'name': {'title': 'Name', 'type': 'string'},
                'calories': {'title': 'Calories', 'type': 'number'},
                'protein': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None, 'title': 'Protein'},
                'fat': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None, 'title': 'Fat'},
                'carbohydrates': {
                    'anyOf': [{'type': 'number'}, {'type': 'null'}],
                    'default': None,
                    'title': 'Carbohydrates'
                },
                'dietary_fiber': {
                    'anyOf': [{'type': 'number'}, {'type': 'null'}],
                    'default': None,
                    'title': 'Dietary Fiber'
                },
                'saturated_fat': {
                    'anyOf': [{'type': 'number'}, {'type': 'null'}],
                    'default': None,
                    'title': 'Saturated Fat'
                },
                'trans_fat': {
                    'anyOf': [{'type': 'number'}, {'type': 'null'}],
                    'default': None,
                    'title': 'Trans Fat'
                },
                'cholesterol': {
                    'anyOf': [{'type': 'number'}, {'type': 'null'}],
                    'default': None,
                    'title': 'Cholesterol'
                },
                'sodium': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None, 'title': 'Sodium'},
                'serving_size': {
                    'anyOf': [{'type': 'string'}, {'type': 'null'}],
                    'default': None,
                    'title': 'Serving Size'
                }
            },
            'required': ['name', 'calories'],
            'title': 'MenuItem',
            'type': 'object'
        }
    },
    'properties': {'menu_items': {'items': {'$ref': '#/$defs/MenuItem'}, 'title': 'Menu Items', 'type': 'array'}},
    'required': ['menu_items'],
    'title': 'MenuItemResponse',
    'type': 'object'
}

Relevant exception from the openai python SDK:

BadRequestError: Error code: 400 - {'error': {'message': 'Invalid schema for function \'menu_items\': schema must be a JSON Schema of \'type: "object"\', got \'type: "None"\'.', 'type': 'invalid_request_error', 'param': None, 'code': None}}
@rafalkrupinski
Copy link

I don't think the type is required in the root schema, it's definitely not required anywhere else.

Sure, using $ref for root element seems like an unnecessary indirection, but as long it makes the code simpler (and Jim happier), I think it's fine.

I'd suggest reporting the problem with your service provider. You can also update the schema in your code, after all it's a representation of a dictionary.

@jcrist
Copy link
Owner

jcrist commented Feb 7, 2024

Thanks for opening this @yqiang. Since object types are potentially cyclic, we always use "$ref" to refer to them by reference. We could make a change to avoid doing this for acyclic object types (the common case), but since the JSON schema spec allows it I'd rather not.

I suspect that openai's code here has a naive check for an object type - what happens if you add a type key in the top-level and do nothing else?

# existing msgspec json schema
schema["type"] = "object"
...

Does it properly handle the $ref field?

@yqiang
Copy link
Author

yqiang commented Feb 12, 2024

Yup, the OpenAI API is happy if I manually add the type key and object as the value. Thanks for the response, I'll close this for now.

@yqiang yqiang closed this as completed Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants