# LangChain Function Call Extraction

From: https://learn.deeplearning.ai/functions-tools-agents-langchain/lesson/5/tagging-and-extraction

In [1]:
from typing import List, Optional
from pydantic import BaseModel, Field
from langchain.utils.openai_functions import convert_pydantic_to_openai_function
import json

## Use Pydantic (a validation library) to Define Functions

Pydantic is just a validation library. But LangChain uses it (along with its `convert_pydantic_to_openai_function` function) to make it easier to define function definitions that adheres to OpenAI function calling format.

Using the latest pydantic version (2.4.2) produces very verbose results. The DeepLearning course used pydantic==1.10.8, which produces much more concise results, so we're using that here as well.
https://community.deeplearning.ai/t/re-convert-pydantic-to-openai-function-format/491365/3

In [2]:
class Person(BaseModel):
    """Information about a person."""
    name: str = Field(description="person's name")
    age: Optional[int] = Field(description="person's age")

class Information(BaseModel):
    """Information to extract."""
    people: List[Person] = Field(description="List of info about people")

extraction_function = convert_pydantic_to_openai_function(Information)

print(json.dumps(extraction_function, indent=4))

{
    "name": "Information",
    "description": "Information to extract.",
    "parameters": {
        "title": "Information",
        "description": "Information to extract.",
        "type": "object",
        "properties": {
            "people": {
                "title": "People",
                "description": "List of info about people",
                "type": "array",
                "items": {
                    "title": "Person",
                    "description": "Information about a person.",
                    "type": "object",
                    "properties": {
                        "name": {
                            "title": "Name",
                            "description": "person's name",
                            "type": "string"
                        },
                        "age": {
                            "title": "Age",
                            "description": "person's age",
                            "type": "integer"
                      

In [3]:
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI

model = ChatOpenAI(temperature=0)

extraction_functions = [extraction_function]
model = ChatOpenAI(temperature=0)
extraction_model = model.bind(functions=extraction_functions, function_call={"name": "Information"})

extraction_model.invoke("Joe is 30, his mom is Martha")

AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n  "people": [\n    {\n      "name": "Joe",\n      "age": 30\n    },\n    {\n      "name": "Martha",\n      "age": 0\n    }\n  ]\n}', 'name': 'Information'}})

## Create a Chain

Notice the prompt. We tell it not to guess. Without this, it may guess age 0 for Martha.

In [4]:
prompt = ChatPromptTemplate.from_messages([
    ("system", "Extract the relevant information"),
    ("human", "{input}")
])

extraction_chain = prompt | extraction_model

extraction_chain.invoke({"input": "Joe is 30, his mom is Martha"})

AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n  "people": [\n    {\n      "name": "Joe",\n      "age": 30\n    },\n    {\n      "name": "Martha",\n      "age": null\n    }\n  ]\n}', 'name': 'Information'}})

## Parse Out Relevant Information

That `additional_kwargs` in the above output is ugly. We know the response will always be function_call b/c we forced it with `function_call={"name": "Information"}` and we know the arguments will always be JSON, so let's use `JsonOutputFunctionsParser` to just give us the final output in dict format.

In [5]:
from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser

extraction_chain = prompt | extraction_model | JsonOutputFunctionsParser()

extraction_chain.invoke({"input": "Joe is 30, his mom is Martha"})

{'people': [{'name': 'Joe', 'age': 30}, {'name': 'Martha', 'age': None}]}

## Use JsonKeyOutputFunctionsParser

Let's go even further by grabbing just the `people` list.

In [6]:
from langchain.output_parsers.openai_functions import JsonKeyOutputFunctionsParser

extraction_chain = prompt | extraction_model | JsonKeyOutputFunctionsParser(key_name="people")

extraction_chain.invoke({"input": "Joe is 30, his mom is Martha"})

[{'name': 'Joe', 'age': 30}, {'name': 'Martha', 'age': None}]