<a href="https://colab.research.google.com/github/lharikumar/FunctionCalling/blob/main/ExtractData_with_FC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Pre-requisites:**

1.   Get the OpenAI API key at https://platform.openai.com/account/api-keys
2.   We need a text to extract structured data from. I have taken the data from
     - https://sports.yahoo.com/us-open-2023-coco-gauff-wins-1st-grand-slam-title-with-wild-comeback-vs-aryna-sabalenka-222431287.html
     - https://www.usopen.org/en_US/news/articles/2023-09-10/novak_djokovic_wins_24th_grand_slam_singles_title_at_2023_us_open.html


**Note:**

*   Store the OpenAI API key as an environment variable. Never use them directly in the code

In [44]:
!pip install -qU python-dotenv openai

In [45]:
import os
import dotenv
import openai
import json

In [46]:
dotenv.load_dotenv('/content/env_files/.env')
openai.api_key = os.getenv('OPENAI_API_KEY')

In [47]:
# Converts a list into a JSON formatted string
def format_json(people):
  json_formatted_str = json.dumps(people, indent=2)
  return(json_formatted_str)

In [48]:
tmplist =  [{"name":"n1","birthday":"January 1, 1900", "profession" : "p1", "home_country": "h1"},{"name":"n2","birthday":"December 1, 1900", "profession" : "p2", "home_country": "h2"}]

print(format_json(tmplist))

[
  {
    "name": "n1",
    "birthday": "January 1, 1900",
    "profession": "p1",
    "home_country": "h1"
  },
  {
    "name": "n2",
    "birthday": "December 1, 1900",
    "profession": "p2",
    "home_country": "h2"
  }
]


In [49]:
text1 = "Coco Gauff has been earmarked as the future of women's tennis since she was 15 years old. That future arrived at the US Open on Saturday, in the form of her first Grand Slam championship. The 19-year-old American outlasted No. 2 seed Aryna Sabalenka, the new top-ranked player in the WTA, in a 2-6, 6-3, 6-2 thriller in the US Open final at Arthur Ashe Stadium. She becomes the 11th teenager to ever win a Grand Slam singles title, and the question now becomes how many more are in front of her."

In [50]:
text2 = "Novak Djokovic handled the weight of history to defeat Daniil Medvedev on Sunday in the 2023 US Open men's singles final. With a 6-3, 7-6(5), 6-3 victory, the 36-year-old won his 24th Grand Slam singles title, tying Margaret Court's record and bolstering his case to be considered the greatest tennis player of all time.\"To make the history of this sport is something truly remarkable and special,\" Djokovic said during the trophy ceremony. \"I never imagined that I would be here talking about 24 Slams. I never thought that would be the reality, but the last couple of years I felt I have a chance, I have a shot at history—and why not grab it if it's presented.\""

# OpenAI Function Calling

## Text 1

## Step 1 - Call the model with functions and the user’s input (text1)

In [51]:
functions = [
        {
            "name": "extract_structured_data",
            "description": "Extraction of all individuals mentioned in the article, including their names, birthdays, profession and home country.",
            "parameters": {
                "type": "object",
                "properties": {
                    "people": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name" : {
                                    "type": "string",
                                    "description": "Name of the individual"
                                },
                                "birthday" : {
                                    "type": "string",
                                    "description": "Birthday of the individual"
                                },
                               "profession" : {
                                    "type": "string",
                                    "description": "Profession of the individual"
                                },
                                "home_country" : {
                                    "type": "string",
                                    "description": "Home country of the individual"
                                }
                            }
                        }
                    },
                },
                "required": ["people"],
            },
        }
    ]

In [52]:
available_functions = {
            "extract_structured_data": format_json,
        }

In [53]:
messages = [{"role": "user", "content": text1}]

In [54]:
response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        messages=messages,
        functions=functions,
        function_call="auto"
    )

## Step 2 - Use the model response to call your API

In [55]:
response_msg = response.choices[0].message

In [56]:
print(response_msg)

{
  "role": "assistant",
  "content": null,
  "function_call": {
    "name": "extract_structured_data",
    "arguments": "{\n  \"people\": [\n    {\n      \"name\": \"Coco Gauff\",\n      \"birthday\": \"2004-03-13\",\n      \"profession\": \"tennis player\",\n      \"home_country\": \"United States\"\n    },\n    {\n      \"name\": \"Aryna Sabalenka\",\n      \"birthday\": \"1998-05-05\",\n      \"profession\": \"tennis player\",\n      \"home_country\": \"Belarus\"\n    }\n  ]\n}"
  }
}


In [57]:
if response_msg.get("function_call"):
    function_name = response_msg["function_call"]["name"]
    function_to_call = available_functions[function_name]
    function_args = json.loads(response_msg["function_call"]["arguments"])
    function_response = function_to_call(
            people=function_args.get("people")
    )

In [58]:
print(function_response)

[
  {
    "name": "Coco Gauff",
    "birthday": "2004-03-13",
    "profession": "tennis player",
    "home_country": "United States"
  },
  {
    "name": "Aryna Sabalenka",
    "birthday": "1998-05-05",
    "profession": "tennis player",
    "home_country": "Belarus"
  }
]


## Step 3 - Send the response back to the model to summarize

In [59]:
messages.append(response_msg)  # extend conversation with assistant's reply
messages.append(
            {
                "role": "function",
                "name": function_name,
                "content": function_response,
            }
        )  # extend conversation with function response
response_to_user = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-0613",
            messages=messages,
        )  # get a new response from GPT where it can see the function response

In [60]:
response_to_user.choices[0].message

<OpenAIObject at 0x798debbae610> JSON: {
  "role": "assistant",
  "content": "Coco Gauff, born on March 13, 2004, is a professional tennis player from the United States. Aryna Sabalenka, born on May 5, 1998, is also a professional tennis player but represents Belarus."
}

## Text 2

## Step 1 - Call the model with functions and the user’s input (text2)

In [61]:
messages = [{"role": "user", "content": text2}]

In [62]:
functions = [
        {
            "name": "extract_structured_data",
            "description": "Extraction of all individuals mentioned in the article, including their names, birthdays, profession and home country.",
            "parameters": {
                "type": "object",
                "properties": {
                    "people": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name" : {
                                    "type": "string",
                                    "description": "Name of the individual"
                                },
                                "birthday" : {
                                    "type": "string",
                                    "description": "Birthday of the individual"
                                },
                               "profession" : {
                                    "type": "string",
                                    "description": "Profession of the individual"
                                },
                                "home_country" : {
                                    "type": "string",
                                    "description": "Home country of the individual"
                                }
                            }
                        }
                    },
                },
                "required": ["people"],
            },
        }
    ]

In [63]:
available_functions = {
            "extract_structured_data": format_json,
        }

In [64]:
response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        messages=messages,
        functions=functions,
        function_call="auto"
    )

## Step 2 - Use the model response to call your API

In [65]:
response_msg = response.choices[0].message
if response_msg.get("function_call"):
    function_name = response_msg["function_call"]["name"]
    function_to_call = available_functions[function_name]
    function_args = json.loads(response_msg["function_call"]["arguments"])
    function_response = function_to_call(
            people=function_args.get("people")
    )

In [66]:
print(response_msg)

{
  "role": "assistant",
  "content": null,
  "function_call": {
    "name": "extract_structured_data",
    "arguments": "{\n  \"people\": [\n    {\n      \"name\": \"Novak Djokovic\",\n      \"birthday\": \"May 22, 1987\",\n      \"profession\": \"Tennis player\",\n      \"home_country\": \"Serbia\"\n    },\n    {\n      \"name\": \"Daniil Medvedev\",\n      \"birthday\": \"February 11, 1996\",\n      \"profession\": \"Tennis player\",\n      \"home_country\": \"Russia\"\n    },\n    {\n      \"name\": \"Margaret Court\",\n      \"birthday\": \"July 16, 1942\",\n      \"profession\": \"Former tennis player\",\n      \"home_country\": \"Australia\"\n    }\n  ]\n}"
  }
}


In [67]:
print(function_response)

[
  {
    "name": "Novak Djokovic",
    "birthday": "May 22, 1987",
    "profession": "Tennis player",
    "home_country": "Serbia"
  },
  {
    "name": "Daniil Medvedev",
    "birthday": "February 11, 1996",
    "profession": "Tennis player",
    "home_country": "Russia"
  },
  {
    "name": "Margaret Court",
    "birthday": "July 16, 1942",
    "profession": "Former tennis player",
    "home_country": "Australia"
  }
]


## Step 3 - Send the response back to the model to summarize

In [68]:
messages.append(response_msg)  # extend conversation with assistant's reply
messages.append(
            {
                "role": "function",
                "name": function_name,
                "content": function_response,
            }
        )  # extend conversation with function response
response_to_user = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-0613",
            messages=messages,
        )  # get a new response from GPT where it can see the function response

In [69]:
response_to_user.choices[0].message

<OpenAIObject at 0x798debb86660> JSON: {
  "role": "assistant",
  "content": "Novak Djokovic, a Serbian tennis player, defeated Daniil Medvedev, a Russian tennis player, in the 2023 US Open men's singles final. With this win, Djokovic secured his 24th Grand Slam singles title, which ties Margaret Court's record. Djokovic's victory strengthens his claim to be considered the greatest tennis player of all time. During the trophy ceremony, Djokovic expressed his astonishment at making history in the sport and his determination to seize the opportunity presented to him. Djokovic was born on May 22, 1987, while Medvedev was born on February 11, 1996. Margaret Court, a former tennis player from Australia, held the previous record that Djokovic matched. Court's birthday is on July 16, 1942."
}