# Improving function calls with OpenAISchema

This notebook is a follow up to a post I wrote for Weights and biases, If you're not sure whats going on give it a read first!

The goals of this notebook is to go over a more detailed example of what happens within a schema object, provide some tips on how to write better schemas (i.e. prompt engineering) and then provide an array of examples that I hope can inspire you to think creatively and model interesting problems.

In [None]:
! pip install openai_function_call



In [None]:
import openai
import json

openai.api_key = "sk-..."

Insteaad of writing schemas and parsing data out of function calls yourself, this library allows you to quickly explore schemas, prompts, and execution in a pythonic way while giving you complete control over the openai call.

# Introduction to `OpenAISchema`: MultiSearch

First we'll look at how a complex schema can be created using nested structures and how they can allow us to  easily create the json schema and do extraction of multiple search queries in a request.


## Motivation

Extracting a list of tasks from text is a common use case for leveraging language models. This pattern can be applied to various applications, such as virtual assistants like Siri or Alexa, where understanding user intent and breaking down requests into actionable tasks is crucial. In this example, we will demonstrate how to use OpenAI Function Call to segment search queries and execute them.


## Defining the Structure

Let's model the problem as breaking down a search request into a list of search queries. We will use an enum to represent different types of searches and take advantage of Python objects to add additional query logic.



In [None]:
import enum
from pydantic import Field
from openai_function_call import OpenAISchema

class SearchType(str, enum.Enum):
    """Enumeration representing the types of searches that can be performed."""
    VIDEO = "video"
    EMAIL = "email"
    DOCUMENTS = "documents"

class Search(OpenAISchema):
    """
    Class representing a single search query.
    """
    query: str = Field(..., description="Query to search for relevant content")
    type: SearchType = Field(..., description="Type of search")

    def execute(self):
        print(f"Searching query `{self.query}` using `{self.type}`")


Search.openai_schema

{'name': 'Search',
 'description': '\n    Class representing a single search query.\n    ',
 'parameters': {'$defs': {'SearchType': {'description': 'Enumeration representing the types of searches that can be performed.',
    'enum': ['video', 'email', 'documents'],
    'type': 'string'}},
  'properties': {'query': {'description': 'Query to search for relevant content',
    'type': 'string'},
   'type': {'allOf': [{'$ref': '#/$defs/SearchType'}],
    'description': 'Type of search'}},
  'required': ['query', 'type'],
  'type': 'object'}}

In [None]:
from typing import List

class MultiSearch(OpenAISchema):
    "Correctly segmented set of search results"
    tasks: List[Search]

MultiSearch.openai_schema

{'name': 'MultiSearch',
 'description': 'Correctly segmented set of search results',
 'parameters': {'$defs': {'Search': {'description': '\n    Class representing a single search query.\n    ',
    'properties': {'query': {'description': 'Query to search for relevant content',
      'type': 'string'},
     'type': {'allOf': [{'$ref': '#/$defs/SearchType'}],
      'description': 'Type of search'}},
    'required': ['query', 'type'],
    'type': 'object'},
   'SearchType': {'description': 'Enumeration representing the types of searches that can be performed.',
    'enum': ['video', 'email', 'documents'],
    'type': 'string'}},
  'properties': {'tasks': {'items': {'$ref': '#/$defs/Search'},
    'type': 'array'}},
  'required': ['tasks'],
  'type': 'object'}}

Now what we've build our our model you can see how the composition of models makes our code clean and easy to understand while the prompt is 'generated' by the structure of our code.

## Calling OpenAI with the schema

In [None]:
import openai

def segment(data: str) -> MultiSearch:
    completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        temperature=0.1,
        functions=[MultiSearch.openai_schema],
        function_call={"name": MultiSearch.openai_schema["name"]},
        messages=[
            {
                "role": "user",
                "content": f"Consider the data below: '\n{data}' and segment it into multiple search queries",
            },
        ],
        max_tokens=1000,
    )

    return MultiSearch.from_response(completion)

In [None]:
task = segment("can you share the cat video from last week and the documents you had on single sign on?")
task

MultiSearch(tasks=[Search(query='cat video last week', type=<SearchType.VIDEO: 'video'>), Search(query='documents single sign on', type=<SearchType.DOCUMENTS: 'documents'>)])

In [None]:
for search in task.tasks:
  search.execute()

Searching query `cat video last week` using `video`
Searching query `documents single sign on` using `documents`


Notice here that not only do we extract the data from our request but we also implemented a method `execute` that allows us to potentially run the search query.

Whats the implication?

Now we have a way of colocating

1. Schema: via definitions of our attributes and types
2. Prompts: via docstrings and descriptions and variable names
3. Computation: via methods and type hints

All within the class definition

# Example: Extracting Citations

In this example, we'll demonstrate how to use OpenAI Function Call to ask an AI a question and get back an answer with correct citations. We'll define the necessary data structures using Pydantic and show how to retrieve the citations for each answer.


## Motivation

When using AI models to answer questions, it's important to provide accurate and reliable information with appropriate citations. By including citations for each statement, we can ensure the information is backed by reliable sources and help readers verify the information themselves.

## Defining the Data Structures

Let's start by defining the data structures required for this task: `Fact` and `QuestionAnswer`.


In [None]:
from pydantic import Field
from openai_function_call import OpenAISchema


class Fact(OpenAISchema):
    """
    Each fact has a body and a list of sources.
    If there are multiple facts, make sure to break them apart such that each one only uses a set of sources that are relevant to it.
    """

    fact: str = Field(..., description="Body of the sentence as part of a response")
    substring_quote: list[str] = Field(
        ...,
        description="Each source should be a direct quote from the context, as a substring of the original content",
    )

    def _get_span(self, quote, context, errs=100):
        import regex

        minor = quote
        major = context

        errs_ = 0
        s = regex.search(f"({minor}){{e<={errs_}}}", major)
        while s is None and errs_ <= errs:
            errs_ += 1
            s = regex.search(f"({minor}){{e<={errs_}}}", major)

        if s is not None:
            yield from s.spans()

    def get_spans(self, context):
        for quote in self.substring_quote:
            yield from self._get_span(quote, context)


class QuestionAnswer(OpenAISchema):
    """
    Class representing a question and its answer as a list of facts, where each fact should have a source.
    Each sentence contains a body and a list of sources.
    """

    question: str = Field(..., description="Question that was asked")
    answer: list[Fact] = Field(
        ...,
        description="Body of the answer, each fact should be its separate object with a body and a list of sources",
    )

Notice that just like in the search example we implement the method called spans that will help us find exactly where the citation is in the original text. Now let define the function that calls openai and see what we get.

In [None]:
def ask_ai(question: str, context: str) -> QuestionAnswer:
    """
    Function to ask AI a question and get back an Answer object.
    but should be updated to use the actual method for making a request to the AI.
    """

    # Making a request to the hypothetical 'openai' module
    completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        temperature=0.2,
        max_tokens=1000,
        functions=[QuestionAnswer.openai_schema],
        function_call={"name": QuestionAnswer.openai_schema["name"]},
        messages=[
            {
                "role": "system",
                "content": f"You are a world class algorithm to answer questions with correct and exact citations. ",
            },
            {"role": "user", "content": f"Answer question using the following context"},
            {"role": "user", "content": f"{context}"},
            {"role": "user", "content": f"Question: {question}"},
            {
                "role": "user",
                "content": f"Tips: Make sure to cite your sources, and use the exact words from the context.",
            },
        ],
    )

    # Creating an Answer object from the completion response
    return QuestionAnswer.from_response(completion)

## Evaluating the Citations

Let's evaluate the example by asking the AI a question and getting back an answer with citations. We'll ask the question "What did the author do during college?" with the given context.



In [None]:
def highlight(text, span):
    return (
        "..."
        + text[span[0] - 20 : span[0]].replace("\n", "")
        + "\033[91m"
        + "<"
        + text[span[0] : span[1]].replace("\n", "")
        + "> "
        + "\033[0m"
        + text[span[1] : span[1] + 20].replace("\n", "")
        + "..."
    )

question = "What did the author do during college?"
context = """
My name is Jason Liu, and I grew up in Toronto Canada but I was born in China.
I went to an arts high school but in university I studied Computational Mathematics and physics.
As part of coop I worked at many companies including Stitchfix, Facebook.
I also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.
"""

answer = ask_ai(question, context)

print("Question:", question)
print()
for fact in answer.answer:
    print("Statement:", fact.fact)
    for span in fact.get_spans(context):
        print("Citation:", highlight(context, span))
    print()

Question: What did the author do during college?

Statement: The author studied Computational Mathematics and physics in university.
Citation: ...rts high school but [91m<in university I studied Computational Mathematics and physics.> [0mAs part of coop I w...

Statement: The author started the Data Science club at the University of Waterloo and was the president of the club for 2 years.
Citation: ...titchfix, Facebook.[91m<I also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.> [0m...



The output includes the question, followed by each statement in the answer with its corresponding citation highlighted in the context.

Feel free to try this code with different questions and contexts to see how the AI responds with accurate citations.

# Example: Planning and Executing a Query Plan

This example demonstrates how to use the OpenAI Function Call ChatCompletion model to plan and execute a query plan in a question-answering system. By breaking down a complex question into smaller sub-questions with defined dependencies, the system can systematically gather the necessary information to answer the main question.



## Motivation

The goal of this example is to showcase how query planning can be used to handle complex questions, facilitate iterative information gathering, automate workflows, and optimize processes. By leveraging the OpenAI Function Call model, you can design and execute a structured plan to find answers effectively.

### Use Cases:

* Complex question answering
* Iterative information gathering
* Workflow automation
* Process optimization

With the OpenAI Function Call model, you can customize the planning process and integrate it into your specific application to meet your unique requirements.

## Defining the Data Structures

Let's define the necessary Pydantic models to represent the query plan and the queries.


In [None]:
class QueryType(str, enum.Enum):
    """Enumeration representing the types of queries that can be asked to a question answer system."""

    SINGLE_QUESTION = "SINGLE"
    MERGE_MULTIPLE_RESPONSES = "MERGE_MULTIPLE_RESPONSES"


class Query(OpenAISchema):
    """Class representing a single question in a query plan."""

    id: int = Field(..., description="Unique id of the query")
    question: str = Field(
        ...,
        description="Question asked using a question answering system",
    )
    dependancies: List[int] = Field(
        default_factory=list,
        description="List of sub questions that need to be answered before asking this question",
    )
    node_type: QueryType = Field(
        default=QueryType.SINGLE_QUESTION,
        description="Type of question, either a single question or a multi-question merge",
    )


class QueryPlan(OpenAISchema):
    """Container class representing a tree of questions to ask a question answering system."""

    query_graph: List[Query] = Field(
        ..., description="The query graph representing the plan"
    )

    def _dependencies(self, ids: List[int]) -> List[Query]:
        """Returns the dependencies of a query given their ids."""
        return [q for q in self.query_graph if q.id in ids]

## Planning a Query Plan

Now, let's demonstrate how to plan and execute a query plan using the defined models and the OpenAI API.

In [None]:
def query_planner(question: str) -> QueryPlan:
    PLANNING_MODEL = "gpt-4-0613"

    messages = [
        {
            "role": "system",
            "content": "You are a world class query planning algorithm capable ofbreaking apart questions into its dependency queries such that the answers can be used to inform the parent question. Do not answer the questions, simply provide a correct compute graph with good specific questions to ask and relevant dependencies. Before you call the function, think step-by-step to get a better understanding of the problem.",
        },
        {
            "role": "user",
            "content": f"Consider: {question}\nGenerate the correct query plan.",
        },
    ]

    completion = openai.ChatCompletion.create(
        model=PLANNING_MODEL,
        temperature=.2,
        functions=[QueryPlan.openai_schema],
        function_call={"name": QueryPlan.openai_schema["name"]},
        messages=messages,
        max_tokens=1000,
    )
    return QueryPlan.from_response(completion)

In [None]:
plan = query_planner(
    "What is the difference in populations of Canada and the Jason's home country?"
)
plan.dict()

<ipython-input-81-a3fec4805d59>:4: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.0.3/migration/
  plan.dict()


{'query_graph': [{'id': 1,
   'question': 'What is the population of Canada?',
   'dependancies': [],
   'node_type': <QueryType.SINGLE_QUESTION: 'SINGLE'>},
  {'id': 2,
   'question': "What is Jason's home country?",
   'dependancies': [],
   'node_type': <QueryType.SINGLE_QUESTION: 'SINGLE'>},
  {'id': 3,
   'question': 'What is the population of {output of query 2}?',
   'dependancies': [2],
   'node_type': <QueryType.SINGLE_QUESTION: 'SINGLE'>},
  {'id': 4,
   'question': 'What is the difference in populations of {output of query 1} and {output of query 3}?',
   'dependancies': [1, 3],
   'node_type': <QueryType.SINGLE_QUESTION: 'SINGLE'>}]}

While we build the query plan in this example we do not propose a method to actually answer the question. You can implement your own answer function that perhaps makes a retrival and calls openai for retrival augmented generation. That step would also make use of function calls but goes beyond the scope of this example.

# Example: Converting Text into Dataframes

In this example, we'll demonstrate how to convert a text into dataframes using OpenAI Function Call. We will define the necessary data structures using Pydantic and show how to convert the text into dataframes.


## Motivation

Often times when we parse data we have an opportunity to extract structured data, what if we could extract an arbitrary number of tables with arbitray schemas? By pulling out dataframes we could write tables or csv files and attach them to our retrived data.


## Defining the Data Structures

Let's start by defining the data structures required for this task: RowData, Dataframe, and Database.

Take a slow read of the prompting and descriptions.


In [None]:
from typing import Any

class RowData(OpenAISchema):
    column_values: list[Any] = Field(..., description="The correct values for each row")


class Dataframe(OpenAISchema):
    """
    Class representing a dataframe. This class is used to convert
    data into a frame that can be used by pandas.
    """

    name: str = Field(..., description="The name of the dataframe")
    data: List[RowData] = Field(
        ...,
        description="Correct rows of data aligned to column names, Nones are allowed There should be one per entry",
    )
    columns: list[str] = Field(
        ...,
        description="Column names relevant from source data, should be in snake_case",
    )

    def to_pandas(self):
        import pandas as pd

        columns = self.columns
        data = [row.column_values for row in self.data]

        return pd.DataFrame(data=data, columns=columns)


class Database(OpenAISchema):
    """
    A set of correct named and defined tables as dataframes
    """

    tables: list[Dataframe] = Field(
        ...,
        description="List of tables in the database",
    )

The `RowData` class represents a single row of data in the dataframe. It contains a row attribute for the values in each row and a citation attribute for the citation from the original source data.

The `Dataframe` class represents a dataframe and consists of a name attribute, a list of RowData objects in the data attribute, and a list of column names in the columns attribute. It also provides a `to_pandas` method to convert the dataframe into a Pandas DataFrame.

The Database class represents a set of tables in a database. It contains a list of Dataframe objects in the tables attribute.

In [None]:
def dataframe(data: str) -> Database:
    completion = openai.ChatCompletion.create(
        model="gpt-4-0613", # Notice I have to use gpt-4 here, this task is pretty hard
        temperature=0.0,
        functions=[Database.openai_schema],
        function_call={"name": Database.openai_schema["name"]},
        messages=[
            {
                "role": "system",
                "content": """Map this data into a dataframe a
                nd correctly define the correct columns and rows""",
            },
            {
                "role": "user",
                "content": f"{data}",
            },
        ],
        max_tokens=1000,
    )
    return Database.from_response(completion)

## Evaluating the extraction

Let's evaluate the example by converting a text into dataframes using the dataframe function and print the resulting dataframes.

In [None]:
dfs = dataframe("""My name is John and I am 25 years old. I live in
New York and I like to play basketball. His name is
Mike and he is 30 years old. He lives in San Francisco
and he likes to play baseball. Sarah is 20 years old
and she lives in Los Angeles. She likes to play tennis.
Her name is Mary and she is 35 years old.
She lives in Chicago.

On one team 'Tigers' the captain is John and there are 12 players.
On the other team 'Lions' the captain is Mike and there are 10 players.
""")

In [None]:
for df in dfs.tables:
  print(df.name)
  print(df.to_pandas())
  print()

People
    Name  Age           City Favorite Sport
0   John   25       New York     Basketball
1   Mike   30  San Francisco       Baseball
2  Sarah   20    Los Angeles         Tennis
3   Mary   35        Chicago           None

Teams
  Team Name Captain  Number of Players
0    Tigers    John                 12
1     Lions    Mike                 10



# Is this the end of prompt engineering?

No.

You'll find that when you build your own examples, naming variables, docstrings, and descriptions are incredibly important. However now its a matter of writing good code and documentation, since the naming and documentation is used by both human and ai.

## Tips on writting good schemas

When a schema isn't parsing correctly consider the following tips:

1. Don't use generic attributes names
2. Every class should have a docstring
3. Include tips and few shot examples in the docstrings when needed
3. Adjectives on descriptions matter 'short and concise' query will be different that 'detailed and sepicific, include additional keywords'