# Study: Extracting facts with GPT-3

This study aims to investigate the utilization of GPT-3 for the extraction of factual information from Natural Language (NL) utterances. The objectives of this investigation include:

  - Establishing a set of basic NL example utterances.

  - Employing prompts to extract factual details from the aforementioned utterances.

### Setup

In [None]:
# install openai library
!pip install openai --quiet

In [None]:
# import libraries
import getpass
import openai

In [None]:
# get and store openai API key
openai.api_key = getpass.getpass('Enter OpenAI API key: ')

Enter your OpenAI API key: ··········


### Helper Functions

In [None]:
def gpt3_complete(
    prompt, 
    engine='text-davinci-003', 
    temperature=0.1, 
    max_tokens=200,
    top_p=1.0, 
    frequency_penalty=0.0, 
    presence_penalty=0.0, 
    stop=None,
    echo=False
):
    """
    Generates a completion based on the given prompt using the OpenAI GPT-3 model.

    Parameters:
        prompt (str): The text prompt to generate a completion for.
        engine (str): The engine to use for completion. 
        temperature (float): The temperature parameter controlling the randomness of the generated text. 
        max_tokens (int): The maximum number of tokens in the generated completion.
        top_p (float): The cumulative probability threshold for the top-p sampling algorithm. It controls the diversity of the generated text. 
        frequency_penalty (float): The frequency penalty to apply during completion. 
        presence_penalty (float): The presence penalty to apply during completion. 
        stop (str or list): The stop sequence(s) to use for stopping the generation.
        echo (bool): Whether to include the prompt in the generated completion. Default is False.

    Returns:
        completion (str): The generated completion based on the prompt.
    """
    response = openai.Completion.create(
        engine=engine,
        prompt=prompt,
        temperature=temperature,
        max_tokens=max_tokens,
        top_p=top_p,
        frequency_penalty=frequency_penalty,
        presence_penalty=presence_penalty,
        stop=stop,
        echo=echo
    )
    
    completion = response['choices'][0]['text']
    return completion

In [None]:
def apply_to_examples(
    examples, 
    prompt_func, 
    temperature=0.1,
    frequency_penalty=0,
    presence_penalty=0
):
    """
    Applies the GPT-3 completion to a list of input examples.

    Parameters:
        examples (list): A list of input examples to generate completions for.
        prompt_func (function): A function that takes an example as input and returns a prompt for GPT-3.
        temperature (float): The temperature parameter controlling the randomness of the generated text.
        frequency_penalty (float): The frequency penalty to apply during completion. 
        presence_penalty (float): The presence penalty to apply during completion. 

    Returns:
        results (list): A list of generated completions for each input example.
    """
    results = []
    for example in examples:
        print(f">>> INPUT: {example}")
        result = gpt3_complete(prompt_func(example), temperature=temperature, frequency_penalty=frequency_penalty, presence_penalty=presence_penalty)
        print(f">>> OUTPUT:\n{result}")
        results.append(result)
        print("========================================\n\n")
    return results

### Example Data

Some standard data to exercise our solution below. That shall include *information input* to the system.

In [None]:
example_information_to_save = ["Flu shot cost = $80", 
                               "things my boss likes: cricket, science and vegetarian food",
                               "my wife wants a vegetarian food book",
                               "sales guy email = jp@example.com",
                               "vanessa's email vanessa@outlook.com, rember to send the ppts she asked",
                               "+55 11 27670-0987 -> pedro whatsapp",
                               "need to buy milk, eggs and bread", 
                               "need to sell old video game, chair",
                               "december receipts for gym: yoga, ballet, ??",
                               "book with KWG the hardware setup",
                               "dog day with the foreign visitors",
                               "event support: we failed :-(",
                               "floor layout: we forgot various details!!",
                               "first aid kit in the reception",
                               "ask the pediatrician about when to start brushing teeth"]

### Prompt Engineering: Fact Extraction

#### Prompt 1

A very simple prompt.

In [None]:
def extraction_prompt_1(x):

    prompt =\
f"""
Extract pieces of information, like phone numbers, email addresses, names, trivia, reminders, etc.
Input: {x}
"""
    return prompt 

In [None]:
apply_to_examples(example_information_to_save, extraction_prompt_1, temperature=0.5)

>>> INPUT: Flu shot cost = $80
>>> OUTPUT:

Output: Flu shot cost: $80


>>> INPUT: things my boss likes: cricket, science and vegetarian food
>>> OUTPUT:

Output:
Interests: Cricket, Science, Vegetarian Food


>>> INPUT: my wife wants a vegetarian food book
>>> OUTPUT:

Output: None


>>> INPUT: sales guy email = jp@example.com
>>> OUTPUT:

Output: Email address: jp@example.com


>>> INPUT: vanessa's email vanessa@outlook.com, rember to send the ppts she asked
>>> OUTPUT:

Output: 
Name: Vanessa 
Email: vanessa@outlook.com 
Reminder: Send PPTs


>>> INPUT: +55 11 27670-0987 -> pedro whatsapp
>>> OUTPUT:

Output: Phone Number: +55 11 27670-0987; Name: Pedro; Reminder: Contact Pedro via WhatsApp.


>>> INPUT: need to buy milk, eggs and bread
>>> OUTPUT:

Output: None


>>> INPUT: need to sell old video game, chair
>>> OUTPUT:

Output: None


>>> INPUT: december receipts for gym: yoga, ballet, ??
>>> OUTPUT:

Output: December


>>> INPUT: book with KWG the hardware setup
>>> OUTPUT:

Out

['\nOutput: Flu shot cost: $80',
 '\nOutput:\nInterests: Cricket, Science, Vegetarian Food',
 '\nOutput: None',
 '\nOutput: Email address: jp@example.com',
 '\nOutput: \nName: Vanessa \nEmail: vanessa@outlook.com \nReminder: Send PPTs',
 '\nOutput: Phone Number: +55 11 27670-0987; Name: Pedro; Reminder: Contact Pedro via WhatsApp.',
 '\nOutput: None',
 '\nOutput: None',
 '\nOutput: December',
 '\nOutput:\nKWG (Hardware Setup)',
 '\nOutput: No information extracted.',
 '\nOutput: No information extracted.',
 '\nOutput:\nReminder: Floor layout details',
 'Output: N/A',
 '\nOutput: No information extracted.']

The current response lacks utility due to the following reasons: 

  - Firstly, the output merely reproduces the input in different words, failing to offer any substantial value. 
  
  - Secondly, in another instance, the output does not provide any meaningful information whatsoever.
  
Being more specific on the kinds of outputs we want might be helpful.

#### Prompt 2

Including some desired output structure and semantics.

In [None]:
def extraction_prompt_2(x):

    prompt =\
f"""
Extract pieces of personal information, like phone numbers, email addresses, names, trivia, reminders, etc., as tuples with the following format: (Category, Key, Value)
Input: {x}
"""
    return prompt 

In [None]:
apply_to_examples(example_information_to_save, extraction_prompt_2, temperature=0.5)

>>> INPUT: Flu shot cost = $80
>>> OUTPUT:

Output: ("Healthcare", "Flu shot cost", "$80")


>>> INPUT: things my boss likes: cricket, science and vegetarian food
>>> OUTPUT:

Output: (Interests, Cricket, True), (Interests, Science, True), (Interests, Vegetarian Food, True)


>>> INPUT: my wife wants a vegetarian food book
>>> OUTPUT:

Output: (Category: Reminder, Key: Wife, Value: Vegetarian Food Book)


>>> INPUT: sales guy email = jp@example.com
>>> OUTPUT:

Output: ('email', 'sales guy', 'jp@example.com')


>>> INPUT: vanessa's email vanessa@outlook.com, rember to send the ppts she asked
>>> OUTPUT:

Output: ("Email", "Vanessa", "vanessa@outlook.com"), ("Reminder", "Send PPTs", "She asked")


>>> INPUT: +55 11 27670-0987 -> pedro whatsapp
>>> OUTPUT:

Output: ("WhatsApp", "Pedro", "+55 11 27670-0987")


>>> INPUT: need to buy milk, eggs and bread
>>> OUTPUT:

Output: (Reminder, Shopping List, Milk, Eggs, Bread)


>>> INPUT: need to sell old video game, chair
>>> OUTPUT:

Output: (C

['\nOutput: ("Healthcare", "Flu shot cost", "$80")',
 '\nOutput: (Interests, Cricket, True), (Interests, Science, True), (Interests, Vegetarian Food, True)',
 '\nOutput: (Category: Reminder, Key: Wife, Value: Vegetarian Food Book)',
 "\nOutput: ('email', 'sales guy', 'jp@example.com')",
 '\nOutput: ("Email", "Vanessa", "vanessa@outlook.com"), ("Reminder", "Send PPTs", "She asked")',
 '\nOutput: ("WhatsApp", "Pedro", "+55 11 27670-0987")',
 '\nOutput: (Reminder, Shopping List, Milk, Eggs, Bread)',
 "\nOutput: (Category, Key, Value)\n('Item', 'Video Game', 'Sell')\n('Item', 'Chair', 'Sell')",
 '\nOutput: ("Receipts", "Gym", "Yoga, Ballet, ??")',
 '\nOutput: ("Category", "Hardware Setup", "KWG")',
 'Output: N/A',
 '\nOutput: (Event, Support, Failed)',
 '\nOutput: (Category, "Floor Layout", "We forgot various details!!")',
 '\nOutput: (Reminder, "First Aid Kit", "Reception")',
 '\nOutput: (Reminder, "Brushing Teeth", "Start brushing teeth according to pediatrician\'s instructions")']

We see various problems: 

  - Firstly, the generated result includes the term "Output", which does not align with our desired outcome of obtaining tuples exclusively.

  - Secondly, multiple factual details are occasionally delimited by commas rather than newlines, which deviates from the preferred format

  - Moreover, certain extractions yield inaccurate information, indicating the presence of errors.

Adding an example might make our intent more clear. 

#### Prompt 3

Adding an example.

In [None]:
def extraction_prompt_3(x):

    prompt =\
f"""
Extract pieces of personal information, like phone numbers, email addresses, names, trivia, reminders, etc., as tuples with the following format: (Category, Key, Value)

Example input: "Mom's phone number is 555-555-5555"
Example output: ("Family", "mom's phone number", "555-555-5555")

Input: {x}
Output: 
"""
    return prompt 

In [None]:
apply_to_examples(example_information_to_save, extraction_prompt_3, temperature=0.5)

>>> INPUT: Flu shot cost = $80
>>> OUTPUT:
("Health", "flu shot cost", "$80")


>>> INPUT: things my boss likes: cricket, science and vegetarian food
>>> OUTPUT:
("Interests", "cricket", "yes")
("Interests", "science", "yes")
("Interests", "vegetarian food", "yes")


>>> INPUT: my wife wants a vegetarian food book
>>> OUTPUT:
("Food Preferences", "wife's preference", "vegetarian")


>>> INPUT: sales guy email = jp@example.com
>>> OUTPUT:
("Contacts", "sales guy email", "jp@example.com")


>>> INPUT: vanessa's email vanessa@outlook.com, rember to send the ppts she asked
>>> OUTPUT:
("Reminder", "send ppts", "vanessa@outlook.com")


>>> INPUT: +55 11 27670-0987 -> pedro whatsapp
>>> OUTPUT:
("Contacts", "Pedro's Whatsapp", "+55 11 27670-0987")


>>> INPUT: need to buy milk, eggs and bread
>>> OUTPUT:
("Shopping List", "milk", "buy")
("Shopping List", "eggs", "buy")
("Shopping List", "bread", "buy")


>>> INPUT: need to sell old video game, chair
>>> OUTPUT:
("Item", "video game", "sell")

['("Health", "flu shot cost", "$80")',
 '("Interests", "cricket", "yes")\n("Interests", "science", "yes")\n("Interests", "vegetarian food", "yes")',
 '("Food Preferences", "wife\'s preference", "vegetarian")',
 '("Contacts", "sales guy email", "jp@example.com")',
 '("Reminder", "send ppts", "vanessa@outlook.com")',
 '("Contacts", "Pedro\'s Whatsapp", "+55 11 27670-0987")',
 '("Shopping List", "milk", "buy")\n("Shopping List", "eggs", "buy")\n("Shopping List", "bread", "buy")',
 '("Item", "video game", "sell")\n("Item", "chair", "sell")',
 '("Receipts", "Gym", "Yoga, Ballet, ??")',
 '("Book", "KWG the hardware setup", "")',
 '("Activity", "dog day", "with the foreign visitors")',
 '("Event", "Support", "We failed :-(")',
 '("Layout", "floor layout", "we forgot various details!!")',
 '("Reminder", "first aid kit", "reception")',
 '("Health", "pediatrician advice on brushing teeth", "when to start brushing teeth")']

Upon careful evaluation, several notable enhancements have become apparent:

  - Fact strings are consistently quoted.

  - Multiple facts are put one per line.

  - The information recorded is overall more detailed.

  - The “Sales” category sounds more appropriate than the more general "Email" we had before.



Despite the aforementioned improvements, certain irregularities persist within the current context:

  - Firstly, in the second example, the representation of the boss' preferences as a Boolean list appears awkward and requires refinement. 
  
  - Secondly, in the final example, it would be preferable to separate "yoga," "ballet," and "??" as distinct facts instead of combining them. 
  
  - Lastly, the existing categorization of the facts appears arbitrary and lacks a cohesive framework. 
  
Further adjustments are necessary to address these issues effectively.

#### Prompt 4

In order to tackle the aforementioned concerns:

  - The subsequent prompt incorporates a multi-fact example, i.e., an additional assumption concerning multiple facts.
  
  - A constraint on the permissible categories. 

This updated prompt exhibits a higher level of dynamism, allowing for user input and valid categories to be treated as parameters.

In [None]:
def extraction_prompt_4(x, categories=["Family", "Work", "Friends", "Shopping", 
                                       "Health", "Finance", "Travel", "Home", 
                                       "Pets", "Hobbies", "Other"]):

    prompt =\
f"""
Extract pieces of personal information, like phone numbers, email addresses, names, trivia, reminders, etc., as tuples with the following format: (Category, Key, Value)
Assume everything mentioned refers to the same thing. Constraints:
  - Allowed Categories: {', '.join(categories)}


Example input: "Mom's phone number is 555-555-5555"
Example output: ("Family", "mom's phone number", "555-555-5555")

Example input: "Need to do: lab work, ultrasound, buy aspirin"
Example output: 
("Health", "to do", "lab work")
("Health", "to do", "ultrasound")
("Health", "buy", "aspirin")	

Input: {x}
Output: 
"""
    return prompt 

In [None]:
apply_to_examples(example_information_to_save, extraction_prompt_4, temperature=0.5)

>>> INPUT: Flu shot cost = $80
>>> OUTPUT:
("Health", "flu shot cost", "$80")


>>> INPUT: things my boss likes: cricket, science and vegetarian food
>>> OUTPUT:
("Work", "likes", "cricket")
("Work", "likes", "science")
("Work", "likes", "vegetarian food")


>>> INPUT: my wife wants a vegetarian food book
>>> OUTPUT:
("Shopping", "vegetarian food book", "wife")


>>> INPUT: sales guy email = jp@example.com
>>> OUTPUT:
("Work", "sales guy email", "jp@example.com")


>>> INPUT: vanessa's email vanessa@outlook.com, rember to send the ppts she asked
>>> OUTPUT:
("Work", "vanessa's email", "vanessa@outlook.com")
("Work", "remember", "send ppts")


>>> INPUT: +55 11 27670-0987 -> pedro whatsapp
>>> OUTPUT:
("Friends", "Pedro's Whatsapp", "+55 11 27670-0987")


>>> INPUT: need to buy milk, eggs and bread
>>> OUTPUT:
("Shopping", "buy", "milk")
("Shopping", "buy", "eggs")
("Shopping", "buy", "bread")


>>> INPUT: need to sell old video game, chair
>>> OUTPUT:
("Home", "sell", "old video game")

['("Health", "flu shot cost", "$80")',
 '("Work", "likes", "cricket")\n("Work", "likes", "science")\n("Work", "likes", "vegetarian food")',
 '("Shopping", "vegetarian food book", "wife")',
 '("Work", "sales guy email", "jp@example.com")',
 '("Work", "vanessa\'s email", "vanessa@outlook.com")\n("Work", "remember", "send ppts")',
 '("Friends", "Pedro\'s Whatsapp", "+55 11 27670-0987")',
 '("Shopping", "buy", "milk")\n("Shopping", "buy", "eggs")\n("Shopping", "buy", "bread")',
 '("Home", "sell", "old video game")\n("Home", "sell", "chair")',
 '("Hobbies", "gym", "yoga")\n("Hobbies", "gym", "ballet")',
 '("Hobbies", "book", "KWG the hardware setup")',
 '("Hobbies", "dog day", "with the foreign visitors")',
 '("Other", "event support", "we failed")',
 '("Home", "floor layout", "we forgot various details!!")',
 '("Home", "first aid kit", "reception")',
 '("Health", "pediatrician question", "when to start brushing teeth")']

The revised version demonstrates notable improvements. The categories now adhere to the specified constraints, and inputs are appropriately segmented into separate facts. 

However, it is worth noting that the category "Hobbies" is misleading since no information is provided regarding the receipts associated with it. Additionally, the representation of the pediatrician fact appears more concise than the desired level of detail. These observations highlight areas that still require attention and refinement.

#### Prompt 5

To enhance the comprehensiveness of our factual information, we propose introducing additional fields, namely "Type" and "People." By incorporating these new fields, we aim to absorb the provided information more effectively and increase the expressivity of our facts. This extension will allow for a more detailed and nuanced representation of the data, leading to improved accuracy and understanding.

In [None]:
def extraction_prompt_5(x, categories=["Family", "Work", "Friends", "Shopping", 
                                       "Health", "Finance", "Travel", "Home", 
                                       "Pets", "Hobbies", "Other"]):

    prompt =\
f"""
Extract pieces of personal information, like phone numbers, email addresses, names, trivia, reminders, etc., as tuples with the following format: (Category, Type, People, Key, Value)
Assume everything mentioned refers to the same thing. Constraints:
  - Allowed Categories: {', '.join(categories)}
  - Allowed Types: "List", "Email", "Phone", "Address", "Document", "Pendency", "Price", "Reminder", "Note", "Doubt", "Wish", "Other"
  - People contain the name or description of the people or organizations concerned, or is empty if no person or organization is mentioned.
  
Example input: "Mom's phone number is 555-555-5555"
Example output: ("Family", "Phone", "mom", "mom's number", "555-555-5555")

Example input: "email of the building administration = adm@example.com"
Example output: ("Work", "Email", "building administration", "email", "adm@example.com")

Example input: "Need to do: lab work, ultrasound, buy aspirin"
Example output: 
("Health", "List", "", "to do", "lab work")
("Health", "List", "", "to do", "ultrasound")
("Shopping", "List", "", "aspirin", "buy")	

Input: {x}
Output: 
"""
    return prompt 

In [None]:
apply_to_examples(example_information_to_save, extraction_prompt_5, temperature=0.5)

>>> INPUT: Flu shot cost = $80
>>> OUTPUT:
("Health", "Price", "", "flu shot", "$80")


>>> INPUT: things my boss likes: cricket, science and vegetarian food
>>> OUTPUT:
("Work", "List", "boss", "likes", "cricket")
("Work", "List", "boss", "likes", "science")
("Work", "List", "boss", "likes", "vegetarian food")


>>> INPUT: my wife wants a vegetarian food book
>>> OUTPUT:
("Home", "Wish", "wife", "vegetarian food book", "")


>>> INPUT: sales guy email = jp@example.com
>>> OUTPUT:
("Work", "Email", "sales guy", "email", "jp@example.com")


>>> INPUT: vanessa's email vanessa@outlook.com, rember to send the ppts she asked
>>> OUTPUT:
("Friends", "Email", "vanessa", "vanessa's email", "vanessa@outlook.com")
("Friends", "Reminder", "vanessa", "send ppts", "")


>>> INPUT: +55 11 27670-0987 -> pedro whatsapp
>>> OUTPUT:
("Friends", "Phone", "Pedro", "Whatsapp", "+55 11 27670-0987")


>>> INPUT: need to buy milk, eggs and bread
>>> OUTPUT:
("Shopping", "List", "", "milk", "buy")
("Shopping",

['("Health", "Price", "", "flu shot", "$80")',
 '("Work", "List", "boss", "likes", "cricket")\n("Work", "List", "boss", "likes", "science")\n("Work", "List", "boss", "likes", "vegetarian food")',
 '("Home", "Wish", "wife", "vegetarian food book", "")',
 '("Work", "Email", "sales guy", "email", "jp@example.com")',
 '("Friends", "Email", "vanessa", "vanessa\'s email", "vanessa@outlook.com")\n("Friends", "Reminder", "vanessa", "send ppts", "")',
 '("Friends", "Phone", "Pedro", "Whatsapp", "+55 11 27670-0987")',
 '("Shopping", "List", "", "milk", "buy")\n("Shopping", "List", "", "eggs", "buy")\n("Shopping", "List", "", "bread", "buy")',
 '("Home", "List", "", "to sell", "old video game")\n("Home", "List", "", "to sell", "chair")',
 '("Finance", "Document", "gym", "december receipts", "yoga")\n("Finance", "Document", "gym", "december receipts", "ballet")\n("Finance", "Document", "gym", "december receipts", "??")',
 '("Other", "Document", "KWG", "hardware setup", "book")',
 '("Pets", "Remind

The progress made thus far is commendable. However, there is an issue with the generated output for the "December receipts" result, which appears to be disordered. 

In an effort to rectify this, we will refrain from modifying the prompt itself. Instead, we will explore the option of reducing the temperature parameter provided to GPT-3. This parameter influences the level of creativity in the generated output. By decreasing the temperature, we aim to obtain more structured and orderly results that align better with the desired format and presentation.

In [None]:
# decreasing temperature and keeping prompt the same
apply_to_examples(example_information_to_save, extraction_prompt_5, temperature=0.1) 

>>> INPUT: Flu shot cost = $80
>>> OUTPUT:
("Health", "Price", "", "flu shot", "$80")


>>> INPUT: things my boss likes: cricket, science and vegetarian food
>>> OUTPUT:
("Work", "List", "boss", "likes", "cricket")
("Work", "List", "boss", "likes", "science")
("Work", "List", "boss", "likes", "vegetarian food")


>>> INPUT: my wife wants a vegetarian food book
>>> OUTPUT:
("Home", "Wish", "wife", "vegetarian food book", "")


>>> INPUT: sales guy email = jp@example.com
>>> OUTPUT:
("Work", "Email", "sales guy", "email", "jp@example.com")


>>> INPUT: vanessa's email vanessa@outlook.com, rember to send the ppts she asked
>>> OUTPUT:
("Friends", "Email", "Vanessa", "Vanessa's email", "vanessa@outlook.com")
("Pendency", "Reminder", "", "send ppts", "Vanessa")


>>> INPUT: +55 11 27670-0987 -> pedro whatsapp
>>> OUTPUT:
("Friends", "Phone", "pedro", "whatsapp", "+55 11 27670-0987")


>>> INPUT: need to buy milk, eggs and bread
>>> OUTPUT:
("Shopping", "List", "", "milk", "buy")
("Shopping"

['("Health", "Price", "", "flu shot", "$80")',
 '("Work", "List", "boss", "likes", "cricket")\n("Work", "List", "boss", "likes", "science")\n("Work", "List", "boss", "likes", "vegetarian food")',
 '("Home", "Wish", "wife", "vegetarian food book", "")',
 '("Work", "Email", "sales guy", "email", "jp@example.com")',
 '("Friends", "Email", "Vanessa", "Vanessa\'s email", "vanessa@outlook.com")\n("Pendency", "Reminder", "", "send ppts", "Vanessa")',
 '("Friends", "Phone", "pedro", "whatsapp", "+55 11 27670-0987")',
 '("Shopping", "List", "", "milk", "buy")\n("Shopping", "List", "", "eggs", "buy")\n("Shopping", "List", "", "bread", "buy")',
 '("Home", "List", "", "sell", "video game")\n("Home", "List", "", "sell", "chair")',
 '("Finance", "Document", "gym", "december receipts", "yoga")\n("Finance", "Document", "gym", "december receipts", "ballet")\n("Finance", "Document", "gym", "december receipts", "??")',
 '("Other", "Document", "KWG", "hardware setup", "book")',
 '("Pets", "Reminder", "for

That looks very good now!

To address the specific requirement of having a type of "note" and people as "self" for certain instances, such as in the case of "first aid kit," there is need to make the necessary adjustments.

#### Prompt 6

To rectify the previously mentioned issue, adding additional example to reinforce the desired pattern. 

In [None]:
def extraction_prompt_6(x, categories=["Family", "Work", "Friends", "Shopping", 
                                       "Health", "Finance", "Travel", "Home", 
                                       "Pets", "Hobbies", "Other"]):

    prompt =\
f"""
Extract pieces of personal information, like phone numbers, email addresses, names, trivia, reminders, etc., as tuples with the following format: (Category, Type, People, Key, Value)
Assume everything mentioned refers to the same thing. Constraints:
  - Allowed Categories: {', '.join(categories)}
  - Allowed Types: "List", "Email", "Phone", "Address", "Document", "Pendency", "Price", "Reminder", "Note", "Doubt", "Wish", "Other"
  - People contain the name or description of the people or organizations concerned, or is empty if no person or organization is mentioned.
  
Example input: "Mom's phone number is 555-555-5555"
Example output: ("Family", "Phone", "mom", "mom's number", "555-555-5555")

Example input: "email of the building administration = adm@example.com"
Example output: ("Work", "Email", "building administration", "email", "adm@example.com")

Example input: "Need to do: lab work, ultrasound, buy aspirin"
Example output: 
("Health", "List", "Self", "to do", "lab work")
("Health", "List", "Self", "to do", "ultrasound")
("Shopping", "List", "Self", "aspirin", "buy")	

Example input: first aid kit in the reception
Example output:
("Home", "Note", "", "first aid kit", "reception")

Input: {x}
Output: 
"""
    return prompt 

In [None]:
apply_to_examples(example_information_to_save, extraction_prompt_6, temperature=0.1) 

>>> INPUT: Flu shot cost = $80
>>> OUTPUT:
("Health", "Price", "", "flu shot", "$80")


>>> INPUT: things my boss likes: cricket, science and vegetarian food
>>> OUTPUT:
("Work", "List", "boss", "likes", "cricket")
("Work", "List", "boss", "likes", "science")
("Work", "List", "boss", "likes", "vegetarian food")


>>> INPUT: my wife wants a vegetarian food book
>>> OUTPUT:
("Hobbies", "Wish", "wife", "vegetarian food book", "")


>>> INPUT: sales guy email = jp@example.com
>>> OUTPUT:
("Work", "Email", "sales guy", "email", "jp@example.com")


>>> INPUT: vanessa's email vanessa@outlook.com, rember to send the ppts she asked
>>> OUTPUT:
("Friends", "Email", "Vanessa", "Vanessa's email", "vanessa@outlook.com")
("Friends", "Reminder", "Vanessa", "send ppts", "")


>>> INPUT: +55 11 27670-0987 -> pedro whatsapp
>>> OUTPUT:
("Friends", "Phone", "Pedro", "WhatsApp", "+55 11 27670-0987")


>>> INPUT: need to buy milk, eggs and bread
>>> OUTPUT:
("Shopping", "List", "Self", "milk", "buy")
("Sho

['("Health", "Price", "", "flu shot", "$80")',
 '("Work", "List", "boss", "likes", "cricket")\n("Work", "List", "boss", "likes", "science")\n("Work", "List", "boss", "likes", "vegetarian food")',
 '("Hobbies", "Wish", "wife", "vegetarian food book", "")',
 '("Work", "Email", "sales guy", "email", "jp@example.com")',
 '("Friends", "Email", "Vanessa", "Vanessa\'s email", "vanessa@outlook.com")\n("Friends", "Reminder", "Vanessa", "send ppts", "")',
 '("Friends", "Phone", "Pedro", "WhatsApp", "+55 11 27670-0987")',
 '("Shopping", "List", "Self", "milk", "buy")\n("Shopping", "List", "Self", "eggs", "buy")\n("Shopping", "List", "Self", "bread", "buy")',
 '("Shopping", "List", "Self", "video game", "sell")\n("Shopping", "List", "Self", "chair", "sell")',
 '("Finance", "Document", "gym", "receipts", "december")\n("Finance", "List", "gym", "yoga", "")\n("Finance", "List", "gym", "ballet", "")\n("Finance", "Doubt", "gym", "??", "")',
 '("Hobbies", "Document", "KWG", "hardware setup", "book")',
 

The identified issue has been successfully resolved.

However, the categorization of "event support" as a "Doubt" is incorrect; it should instead be classified as a "Note". Also, the example of 'december receipts' has gone rogue.

#### Prompt 7

Adding additional examples to fix the same.

In [None]:
def extraction_prompt_7(x, categories=["Family", "Work", "Friends", "Shopping", 
                                       "Health", "Finance", "Travel", "Home", 
                                       "Pets", "Hobbies", "Other"]):

    prompt =\
f"""
Extract pieces of personal information, like phone numbers, email addresses, names, trivia, reminders, etc., as tuples with the following format: (Category, Type, People, Key, Value)
Assume everything mentioned refers to the same thing. Constraints:
  - Allowed Categories: {', '.join(categories)}
  - Allowed Types: "List", "Email", "Phone", "Address", "Document", "Pendency", "Price", "Reminder", "Note", "Doubt", "Wish", "Other"
  - People contain the name or description of the people or organizations concerned, or is empty if no person or organization is mentioned.
  
Example input: "Mom's phone number is 555-555-5555"
Example output: ("Family", "Phone", "mom", "mom's number", "555-555-5555")

Example input: "email of the building administration = adm@example.com"
Example output: ("Work", "Email", "building administration", "email", "adm@example.com")

Example input: "Need to do: lab work, ultrasound, buy aspirin"
Example output: 
("Health", "List", "Self", "to do", "lab work")
("Health", "List", "Self", "to do", "ultrasound")
("Shopping", "List", "Self", "aspirin", "buy")	

Example input: event support: we failed :-(
Example output: 
("Other", "Note", "event support", "failed", "we failed :-(")

Example input: first aid kit in the reception
Example output:
("Home", "Note", "", "first aid kit", "reception")

Example input: december receipts for gym: yoga, ballet, ??
Example output:
("Finance", "Document", "gym", "december receipts", "yoga")
("Finance", "Document", "gym", "december receipts", "ballet")
("Finance", "Document", "gym", "december receipts", "??")

Input: {x}
Output: 
"""
    return prompt 

In [None]:
apply_to_examples(example_information_to_save, extraction_prompt_7, temperature=0.1) 

>>> INPUT: Flu shot cost = $80
>>> OUTPUT:
("Health", "Price", "", "flu shot cost", "$80")


>>> INPUT: things my boss likes: cricket, science and vegetarian food
>>> OUTPUT:
("Work", "List", "boss", "likes", "cricket")
("Work", "List", "boss", "likes", "science")
("Work", "List", "boss", "likes", "vegetarian food")


>>> INPUT: my wife wants a vegetarian food book
>>> OUTPUT:
("Shopping", "Wish", "wife", "vegetarian food book", "")


>>> INPUT: sales guy email = jp@example.com
>>> OUTPUT:
("Work", "Email", "sales guy", "email", "jp@example.com")


>>> INPUT: vanessa's email vanessa@outlook.com, rember to send the ppts she asked
>>> OUTPUT:
("Friends", "Email", "vanessa", "vanessa's email", "vanessa@outlook.com")
("Friends", "Reminder", "", "send ppts", "rember to send the ppts she asked")


>>> INPUT: +55 11 27670-0987 -> pedro whatsapp
>>> OUTPUT:
("Friends", "Phone", "Pedro", "Whatsapp", "+55 11 27670-0987")


>>> INPUT: need to buy milk, eggs and bread
>>> OUTPUT:
("Shopping", "Lis

['("Health", "Price", "", "flu shot cost", "$80")',
 '("Work", "List", "boss", "likes", "cricket")\n("Work", "List", "boss", "likes", "science")\n("Work", "List", "boss", "likes", "vegetarian food")',
 '("Shopping", "Wish", "wife", "vegetarian food book", "")',
 '("Work", "Email", "sales guy", "email", "jp@example.com")',
 '("Friends", "Email", "vanessa", "vanessa\'s email", "vanessa@outlook.com")\n("Friends", "Reminder", "", "send ppts", "rember to send the ppts she asked")',
 '("Friends", "Phone", "Pedro", "Whatsapp", "+55 11 27670-0987")',
 '("Shopping", "List", "Self", "milk", "buy")\n("Shopping", "List", "Self", "eggs", "buy")\n("Shopping", "List", "Self", "bread", "buy")',
 '("Shopping", "List", "Self", "sell", "video game")\n("Shopping", "List", "Self", "sell", "chair")',
 '("Finance", "Document", "gym", "december receipts", "yoga")\n("Finance", "Document", "gym", "december receipts", "ballet")\n("Finance", "Document", "gym", "december receipts", "??")',
 '("Other", "Document", 

The output looks much better.

#### Prompt 8

Tweaking the prompt a little.

In [None]:
def extraction_prompt_8(x, categories=["Family", "Work", "Friends", "Shopping", 
                                       "Health", "Finance", "Travel", "Home", 
                                       "Pets", "Hobbies", "Other"]):

    prompt =\
f"""
You are tasked with extracting pieces of personal information from various inputs, such as phone numbers, email addresses, names, trivia, reminders, etc. Your goal is to extract and categorize these pieces of information into structured tuples.
Extract the information as tuples with the following format: (Category, Type, People, Key, Value). 
Assume everything mentioned refers to the same thing. 
The extracted information should be categorized according to the allowed categories and types specified below.

Constraints:
  - Allowed Categories: {', '.join(categories)}
  - Allowed Types: "List", "Email", "Phone", "Address", "Document", "Pendency", "Price", "Reminder", "Note", "Doubt", "Wish", "Other"
  - People contain the name or description of the people or organizations concerned, or is empty if no person or organization is mentioned.
  
Example input: "Mom's phone number is 555-555-5555"
Example output: ("Family", "Phone", "mom", "mom's number", "555-555-5555")

Example input: "email of the building administration = adm@example.com"
Example output: ("Work", "Email", "building administration", "email", "adm@example.com")

Example input: "Need to do: lab work, ultrasound, buy aspirin"
Example output: 
("Health", "List", "Self", "to do", "lab work")
("Health", "List", "Self", "to do", "ultrasound")
("Shopping", "List", "Self", "aspirin", "buy")	

Example input: event support: we failed :-(
Example output: 
("Other", "Note", "event support", "failed", "we failed :-(")

Example input: first aid kit in the reception
Example output:
("Home", "Note", "", "first aid kit", "reception")

Example input: december receipts for gym: yoga, ballet, ??
Example output:
("Finance", "Document", "gym", "december receipts", "yoga")
("Finance", "Document", "gym", "december receipts", "ballet")
("Finance", "Document", "gym", "december receipts", "??")

Input: {x}
Output: 
"""
    return prompt 

In [None]:
apply_to_examples(example_information_to_save, extraction_prompt_8, temperature=0.1) 

>>> INPUT: Flu shot cost = $80
>>> OUTPUT:
("Health", "Price", "", "flu shot cost", "$80")


>>> INPUT: things my boss likes: cricket, science and vegetarian food
>>> OUTPUT:
("Work", "List", "boss", "likes", "cricket")
("Work", "List", "boss", "likes", "science")
("Work", "List", "boss", "likes", "vegetarian food")


>>> INPUT: my wife wants a vegetarian food book
>>> OUTPUT:
("Shopping", "Wish", "wife", "vegetarian food book", "my wife wants")


>>> INPUT: sales guy email = jp@example.com
>>> OUTPUT:
("Work", "Email", "sales guy", "email", "jp@example.com")


>>> INPUT: vanessa's email vanessa@outlook.com, rember to send the ppts she asked
>>> OUTPUT:
("Work", "Email", "Vanessa", "Vanessa's email", "vanessa@outlook.com")
("Work", "Reminder", "Vanessa", "send ppts", "she asked")


>>> INPUT: +55 11 27670-0987 -> pedro whatsapp
>>> OUTPUT:
("Friends", "Phone", "Pedro", "WhatsApp", "+55 11 27670-0987")


>>> INPUT: need to buy milk, eggs and bread
>>> OUTPUT:
("Shopping", "List", "Self"

['("Health", "Price", "", "flu shot cost", "$80")',
 '("Work", "List", "boss", "likes", "cricket")\n("Work", "List", "boss", "likes", "science")\n("Work", "List", "boss", "likes", "vegetarian food")',
 '("Shopping", "Wish", "wife", "vegetarian food book", "my wife wants")',
 '("Work", "Email", "sales guy", "email", "jp@example.com")',
 '("Work", "Email", "Vanessa", "Vanessa\'s email", "vanessa@outlook.com")\n("Work", "Reminder", "Vanessa", "send ppts", "she asked")',
 '("Friends", "Phone", "Pedro", "WhatsApp", "+55 11 27670-0987")',
 '("Shopping", "List", "Self", "milk", "buy")\n("Shopping", "List", "Self", "eggs", "buy")\n("Shopping", "List", "Self", "bread", "buy")',
 '("Shopping", "List", "Self", "sell", "old video game")\n("Shopping", "List", "Self", "sell", "chair")',
 '("Finance", "Document", "gym", "december receipts", "yoga")\n("Finance", "Document", "gym", "december receipts", "ballet")\n("Finance", "Document", "gym", "december receipts", "??")',
 '("Other", "Document", "KWG",

Observed a slight change in the output of the third example. Rest of the things look the same.

#### Tweaking Parameters

Keeping the above prompt as is and tweaking the parameters: 'frequency_penalty' & 'presence_penalty'

In [None]:
apply_to_examples(example_information_to_save, extraction_prompt_8, temperature=0.1, frequency_penalty=-0.5, presence_penalty=-0.6) 

>>> INPUT: Flu shot cost = $80
>>> OUTPUT:
("Health", "Price", "", "flu shot", "$80")


>>> INPUT: things my boss likes: cricket, science and vegetarian food
>>> OUTPUT:
("Work", "List", "boss", "likes", "cricket")
("Work", "List", "boss", "likes", "science")
("Work", "List", "boss", "likes", "vegetarian food")


>>> INPUT: my wife wants a vegetarian food book
>>> OUTPUT:
("Shopping", "Wish", "wife", "vegetarian food book", "my wife wants")


>>> INPUT: sales guy email = jp@example.com
>>> OUTPUT:
("Work", "Email", "sales guy", "email", "jp@example.com")


>>> INPUT: vanessa's email vanessa@outlook.com, rember to send the ppts she asked
>>> OUTPUT:
("Work", "Email", "vanessa", "vanessa's email", "vanessa@outlook.com")
("Work", "Reminder", "", "send ppts", "vanessa asked")


>>> INPUT: +55 11 27670-0987 -> pedro whatsapp
>>> OUTPUT:
("Friends", "Phone", "Pedro", "Pedro's WhatsApp", "+55 11 27670-0987")


>>> INPUT: need to buy milk, eggs and bread
>>> OUTPUT:
("Shopping", "List", "Self"

['("Health", "Price", "", "flu shot", "$80")',
 '("Work", "List", "boss", "likes", "cricket")\n("Work", "List", "boss", "likes", "science")\n("Work", "List", "boss", "likes", "vegetarian food")',
 '("Shopping", "Wish", "wife", "vegetarian food book", "my wife wants")',
 '("Work", "Email", "sales guy", "email", "jp@example.com")',
 '("Work", "Email", "vanessa", "vanessa\'s email", "vanessa@outlook.com")\n("Work", "Reminder", "", "send ppts", "vanessa asked")',
 '("Friends", "Phone", "Pedro", "Pedro\'s WhatsApp", "+55 11 27670-0987")',
 '("Shopping", "List", "Self", "milk", "buy")\n("Shopping", "List", "Self", "eggs", "buy")\n("Shopping", "List", "Self", "bread", "buy")',
 '("Shopping", "List", "Self", "sell", "old video game")\n("Shopping", "List", "Self", "sell", "chair")',
 '("Finance", "Document", "gym", "december receipts", "yoga")\n("Finance", "Document", "gym", "december receipts", "ballet")\n("Finance", "Document", "gym", "december receipts", "??")',
 '("Other", "Document", "KWG"

In the output regarding the query of "venessa asked to send ppts" (fifth example), it is acknowledged that the 'value' parameter has significantly improved. The modification made to enhance the quality and relevance of the value parameter has yielded more satisfactory results. 