## <b><font color='darkblue'>Preface</font></b>
([course link](https://learn.deeplearning.ai/courses/functions-tools-agents-langchain/lesson/4/openai-function-calling-in-langchain)) <b><font size='3ptx'>[Pydantic](https://docs.pydantic.dev/latest/) is a popular Python library that excels at data validation and parsing</font>. It's designed to make your code more robust and reliable by ensuring the data you work with conforms to specific structures and types.</b>

Below are key Features and Benefits:
* **Data Validation**: Pydantic uses type hints (introduced in Python 3.5+) to define the expected shape and types of your data. It automatically validates incoming data against these definitions, raising clear errors if anything doesn't match up.
* **Data Parsing**: Pydantic not only validates data but also converts it into the correct types. For example, it can automatically turn a string representation of a date into a `datetime` object.
* **Models**: Pydantic's [**BaseModel**](https://docs.pydantic.dev/latest/api/base_model/) class provides a convenient way to create data models with clearly defined fields and types.
* **Custom Validation**: You can add custom validation logic to your models to handle more complex requirements.
* **Fast and Extensible**: Pydantic is built for performance and offers various customization options to fit your needs.
* **Widely Used**: It's a core component in popular frameworks like FastAPI and has a vast ecosystem of supporting tools.

In [1]:
!pip freeze | grep -P '(openai|langchain)'

langchain==0.2.6
langchain-anthropic==0.1.15
langchain-community==0.2.6
langchain-core==0.2.10
langchain-experimental==0.0.62
langchain-google-genai==1.0.6
langchain-groq==0.1.3
langchain-openai==0.1.9
langchain-text-splitters==0.2.0
langchainhub==0.1.14
openai==1.28.1


In [2]:
import json
import os
import openai
import re
import httpx
import os
from dotenv import load_dotenv, find_dotenv

import openai
from openai import OpenAI
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser

a = load_dotenv(find_dotenv(os.path.expanduser('~/.env'))) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

## <b><font color='darkblue'>Pydantic Syntax</font></b>
<b><font size='3ptx'>Pydantic data classes are a blend of Python's data classes with the validation power of Pydantic.</font></b>

<b>They offer a concise way to define data structures while ensuring that the data adheres to specified types and constraints</b>. In standard python you would create a class like this:

In [3]:
class User:
    def __init__(self, name: str, age: int, email: str):
        self.name = name
        self.age = age
        self.email = email        

In [4]:
foo = User(name="Joe",age=32, email="joe@gmail.com")

In [5]:
foo

<__main__.User at 0x7f4226223650>

In [6]:
foo.name

'Joe'

In [7]:
foo = User(name="Joe",age="bar", email="joe@gmail.com")
foo.age

'bar'

In [8]:
from typing import List
from pydantic import BaseModel, Field

class pUser(BaseModel):
    name: str
    age: int
    email: str

In [9]:
foo_p = pUser(name="Jane", age=32, email="jane@gmail.com")
foo_p.name

'Jane'

In [11]:
# ValidationError: 1 validation error for pUser
# age: Input should be a valid integer, unable to parse string as an integer
# foo_p = pUser(name="Jane", age="bar", email="jane@gmail.com")

Pydantic class can be nested. For example:

In [12]:
class Class(BaseModel):
    students: List[pUser]

In [13]:
obj = Class(students=[pUser(name="Jane", age=32, email="jane@gmail.com")])

In [14]:
obj

Class(students=[pUser(name='Jane', age=32, email='jane@gmail.com')])

## <b><font color='darkblue'>Pydantic to OpenAI function definition</font></b>
OpenAI introduced a Function Call API so we’re going to dive into a much more structured and efficient way of handling output parsing when interacting with OpenAI. This method leverages the robustness of the [**Pydantic**](https://docs.pydantic.dev/latest/) library in tandem with the recent improvements in OpenAI’s API

In [45]:
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.utils.function_calling import convert_to_openai_function

In [15]:
class WeatherSearch(BaseModel):
    """Call this with an airport code to get the weather at that airport"""
    airport_code: str = Field(description="airport code to get weather for")

In [34]:
weather_function = convert_to_openai_function(WeatherSearch)

In [35]:
weather_function

{'name': 'WeatherSearch',
 'description': 'Call this with an airport code to get the weather at that airport',
 'parameters': {'type': 'object',
  'properties': {'airport_code': {'type': 'string'}},
  'required': ['airport_code']}}

The docstr of the **BaseModel** is requred to define a openai function:

In [22]:
class WeatherSearch1(BaseModel):
    airport_code: str = Field(description="airport code to get weather for")

In [36]:
weather_function1 = convert_to_openai_function(WeatherSearch1)

In [37]:
# The description looks weird now:
weather_function1

{'name': 'WeatherSearch1',
 'description': "Usage docs: https://docs.pydantic.dev/2.6/concepts/models/ A base class for creating Pydantic models. Attributes:\n    __class_vars__: The names of classvars defined on the model.\n    __private_attributes__: Metadata about the private attributes of the model.\n    __signature__: The signature for instantiating the model.     __pydantic_complete__: Whether model building is completed, or if there are still undefined fields.\n    __pydantic_core_schema__: The pydantic-core schema used to build the SchemaValidator and SchemaSerializer.\n    __pydantic_custom_init__: Whether the model has a custom `__init__` function.\n    __pydantic_decorators__: Metadata containing the decorators defined on the model.\n        This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.\n    __pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to\n        __args__, __origin__, __paramete

The description of the field is optional:

In [28]:
class WeatherSearch2(BaseModel):
    """Call this with an airport code to get the weather at that airport"""
    airport_code: str

convert_to_openai_function(WeatherSearch2)

{'name': 'WeatherSearch2',
 'description': 'Call this with an airport code to get the weather at that airport',
 'parameters': {'type': 'object',
  'properties': {'airport_code': {'type': 'string'}},
  'required': ['airport_code']}}

### <b><font color='darkgreen'>OpenAI function call</font></b>

In [32]:
model = ChatOpenAI()

In [38]:
model.invoke("what is the weather in SF today?", functions=[weather_function])

AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"airport_code":"SFO"}', 'name': 'WeatherSearch'}}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 61, 'total_tokens': 78}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'function_call', 'logprobs': None}, id='run-c5ac63ac-405b-427c-920b-56379605ad0e-0', usage_metadata={'input_tokens': 61, 'output_tokens': 17, 'total_tokens': 78})

In [39]:
model_with_function = model.bind(functions=[weather_function])

In [40]:
model_with_function.invoke("what is the weather in Taiwan?")

AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"airport_code":"TPE"}', 'name': 'WeatherSearch'}}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 60, 'total_tokens': 77}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'function_call', 'logprobs': None}, id='run-7914d567-a62f-410f-af2a-e32d286883e9-0', usage_metadata={'input_tokens': 60, 'output_tokens': 17, 'total_tokens': 77})

In [41]:
model_with_function.invoke("What is 1 + 1?")

AIMessage(content='1 + 1 equals 2.', response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 61, 'total_tokens': 70}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-50169d29-15ef-4e8d-b57d-c83841fe73c3-0', usage_metadata={'input_tokens': 61, 'output_tokens': 9, 'total_tokens': 70})

### <b><font color='darkgreen'>Forcing it to use a function</font></b>
We can force the model to use a function:

In [42]:
model_with_forced_function = model.bind(functions=[weather_function], function_call={"name":"WeatherSearch"})

In [43]:
model_with_forced_function.invoke("what is the weather in Japan?")

AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"airport_code":"HND"}', 'name': 'WeatherSearch'}}, response_metadata={'token_usage': {'completion_tokens': 7, 'prompt_tokens': 70, 'total_tokens': 77}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-7335044b-da6b-4826-a686-4b108643091c-0', usage_metadata={'input_tokens': 70, 'output_tokens': 7, 'total_tokens': 77})

Even the question doesn't require the function call, model will call `WeartherSearch`:

In [44]:
model_with_forced_function.invoke("hi!")

AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"airport_code":"JFK"}', 'name': 'WeatherSearch'}}, response_metadata={'token_usage': {'completion_tokens': 7, 'prompt_tokens': 65, 'total_tokens': 72}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-12257962-314e-403b-a2dd-f469a43f2a26-0', usage_metadata={'input_tokens': 65, 'output_tokens': 7, 'total_tokens': 72})

### <b><font color='darkgreen'>Using in a chain</font></b>
We can use this model bound to function in a chain as we normally would

In [46]:
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    ("user", "{input}")
])

In [47]:
chain = prompt | model_with_function

In [49]:
chain.invoke({"input": "What is the weather in SF?"})

AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"airport_code":"SFO"}', 'name': 'WeatherSearch'}}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 66, 'total_tokens': 83}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'function_call', 'logprobs': None}, id='run-7a7cd5ac-c2d5-4353-ad21-35e3899c3030-0', usage_metadata={'input_tokens': 66, 'output_tokens': 17, 'total_tokens': 83})

### <b><font color='darkgreen'>Using multiple functions</font></b>
Even better, we can pass a set of function and let the LLM decide which to use based on the question context:

In [50]:
class ArtistSearch(BaseModel):
    """Call this to get the names of songs by a particular artist"""
    artist_name: str = Field(description="name of artist to look up")
    n: int = Field(description="number of results")

In [51]:
functions = [
    convert_to_openai_function(WeatherSearch),
    convert_to_openai_function(ArtistSearch),
]

In [52]:
model_with_functions = model.bind(functions=functions)

In [53]:
model_with_functions.invoke("What is the weather in Tokyo?")

AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"airport_code":"HND"}', 'name': 'WeatherSearch'}}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 93, 'total_tokens': 110}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'function_call', 'logprobs': None}, id='run-0f391882-edec-4ff5-a5f5-be73ccfbb64c-0', usage_metadata={'input_tokens': 93, 'output_tokens': 17, 'total_tokens': 110})

In [54]:
model_with_functions.invoke("What are three songs by taylor swift?")

AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"artist_name":"taylor swift","n":3}', 'name': 'ArtistSearch'}}, response_metadata={'token_usage': {'completion_tokens': 22, 'prompt_tokens': 95, 'total_tokens': 117}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'function_call', 'logprobs': None}, id='run-0b9d571a-d1b1-4e72-8f89-bfdd7df0a271-0', usage_metadata={'input_tokens': 95, 'output_tokens': 22, 'total_tokens': 117})

In [55]:
model_with_functions.invoke("Hi!")

AIMessage(content='Hello! How can I assist you today?', response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 88, 'total_tokens': 98}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-d03e724b-8bef-4c2f-8660-025a22369f60-0', usage_metadata={'input_tokens': 88, 'output_tokens': 10, 'total_tokens': 98})

## <b><font color='darkblue'>Supplement</font></b>
* [RealPython - Pydantic: Simplifying Data Validation in Python](https://realpython.com/python-pydantic/)
* [Medium - Seamless Integration with OpenAI and Pydantic: A Powerful Duo for Output Parsing](https://medium.com/@jxnlco/seamless-integration-with-openai-and-pydantic-a-powerful-duo-for-output-parsing-fcb1e616167b)