# Introduction

<p>
This notebook creates a custom chatbot for a real estate agent by using OpenAI's <a href="https://platform.openai.com/docs/guides/gpt/chat-completions-api">ChatCompletion</a> API and ChatGPT's ability to request <a href="https://platform.openai.com/docs/guides/gpt/function-calling">function calls</a>. ChatGPT is told to take on the persona of a realtor and to help the client narrow down a list of available properties by requesting features such as available parking, floor space, etc.
</p>

# Dependencies

<p>
If you experience issues running this notebook, install the following dependencies in a virtual environment.
</p>

In [None]:
!pip3 install pandas==2.0.3 openai==0.28.1 termcolor==2.3.0 tenacity==8.2.3

# Imports

In [1]:
import json
import textwrap

import pandas as pd
import openai

from tenacity import retry, stop_after_attempt, wait_random_exponential
from termcolor import colored

# Data

## Exploration

<p>
The <a href="https://www.kaggle.com/datasets/yasserh/housing-prices-dataset/">Housing Prices Dataset</a>, available on Kaggle, contains the price of a house along with such features as the number of bedrooms, whether it has a basement (as yes/no), and whether it is unfurnished, semi-furnished, or fully furnished. Neither the currency nor the units for the floor area are given, so let's use euro and m&#178;, respectively.
</p>
<p>
    Most of the column names are self-explanatory, but <em>prefarea</em> is unclear. Possibly a shortening of <em>preferred area</em>, but that's not much clearer. Perhaps indicating whether the area has higher perceived social status, but that kind of thing is usually reflected in the price. In any case, its exact meaning is not important for this notebook, so let's take it to mean that the area is trendy/popular.
</p>

In [2]:
housing_data = pd.read_csv("housing_data.csv")
print(housing_data.dtypes)
housing_data.head()

price                int64
area                 int64
bedrooms             int64
bathrooms            int64
stories              int64
mainroad            object
guestroom           object
basement            object
hotwaterheating     object
airconditioning     object
parking              int64
prefarea            object
furnishingstatus    object
dtype: object


Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus
0,13300000,7420,4,2,3,yes,no,no,no,yes,2,yes,furnished
1,12250000,8960,4,4,4,yes,no,no,no,yes,3,no,furnished
2,12250000,9960,3,2,2,yes,no,yes,no,no,2,yes,semi-furnished
3,12215000,7500,4,2,2,yes,no,yes,no,yes,3,yes,furnished
4,11410000,7420,4,1,2,yes,yes,yes,no,yes,2,no,furnished


## Cleaning

<p>
    Let's change each of the yes/no values to 1/0. This allows the same functions to filter those as well as the numerical values. Similarly, <em>furnishingstatus</em> is changed to 0=unfurnished, 1=semi-furnished, and 2=fully furnished.
</p>

<p>
    The resulting DataFrame, in which all data are now int64, is shown below.
</p>

In [3]:
def load_housing_data():
    """
    Load the housing price dataset from file.
    Change yes/no values to 1/0
    Change furnishingstatus to 0=unfurnished, 1=semi-furnished, and 2=fully furnished
    Return cleaned dataset.
    """
    furn = ["unfurnished", "semi-furnished", "furnished"]
    hd = pd.read_csv("housing_data.csv")
    for col_name in ["mainroad", "guestroom", "basement", "hotwaterheating", "airconditioning", "prefarea"]:
        hd[col_name] = hd[col_name].apply(lambda x: 0 if x == "no" else 1)
    hd["furnishingstatus"] = hd["furnishingstatus"].apply(lambda x: furn.index(x))
    return hd


housing_data = load_housing_data()
print(housing_data.dtypes)
housing_data.head()

price               int64
area                int64
bedrooms            int64
bathrooms           int64
stories             int64
mainroad            int64
guestroom           int64
basement            int64
hotwaterheating     int64
airconditioning     int64
parking             int64
prefarea            int64
furnishingstatus    int64
dtype: object


Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus
0,13300000,7420,4,2,3,1,0,0,0,1,2,1,2
1,12250000,8960,4,4,4,1,0,0,0,1,3,0,2
2,12250000,9960,3,2,2,1,0,1,0,0,2,1,1
3,12215000,7500,4,2,2,1,0,1,0,1,3,1,2
4,11410000,7420,4,1,2,1,1,1,0,1,2,0,2


## Functions

<p>
    Here are 3 functions, <em><=</em>, <em>>=</em>, and <em>==</em>, to filter the dataset, and one to reset it back to its original state. The idea is that, for example, if a person says they are a two-car family, ChatGPT can select only properties where <em>parking >= 2</em>. Or, if a customer wants aircon, ChatGPT can request to filter by <em>airconditioning == 1</em>. If the customer isn't happy with the selection, ChatGPT can <em>reset</em> the dataset and reapply the filter.
</p>
<p>
<strong>Note:</strong> The functions all take <em>*args</em> as a parameter to obviate the need for the function call logic in Realtor to check which function is being called.
</p>

In [None]:
def filter(housing_data:pd.DataFrame,
           condition: pd.core.series.Series):
    """
    Filter out rows from the DataFrame by applying given condition
    Return the reduced DataFrame and its length
    """
    housing_data = housing_data[condition]
    return housing_data, len(housing_data)


def filter_lte(*args):
    """
    args[0]: housing data DataFrame
    args[1]: filter/column name
    args[2]: filter value

    Return housing data where filter name is less than or equal to filter value
    """
    housing_data = args[0]
    condition = housing_data[args[1]] <= args[2]
    return filter(housing_data, condition)


def filter_gte(*args):
    """
    args[0]: housing data DataFrame
    args[1]: filter/column name
    args[2]: filter value

    Return housing data where filter name is more than or equal to filter value
    """
    housing_data = args[0]
    condition = housing_data[args[1]] >= args[2]
    return filter(housing_data, condition)


def filter_equal(*args):
    """
    args[0]: housing data DataFrame
    args[1]: filter/column name
    args[2]: filter value

    Return housing data where filter name is equal to filter value
    """
    housing_data = args[0]
    condition = housing_data[args[1]] == args[2]
    return filter(housing_data, condition)


def reset(*args):
    """
    args: Not used, but included to avoid the need for conditionals in function call logic
    """
    housing_data = load_housing_data()
    return housing_data, len(housing_data)


FUNCTION_MAP = {
    "filter_lte": filter_lte,
    "filter_gte": filter_gte,
    "filter_equal": filter_equal,
    "reset": reset
}

# OpenAI ChatGPT

## API Key

<p>
API keys must be kept private (in fact, OpenAI reserves the right to revoke any key it suspects has been made public) because calls to the API are billed.
</p>
<p>
The key is stored in a file in the parent directory. There are two reasons for this. Firstly, it makes the key readily available to other ChatGPT projects in this directory. Secondly, keeping it out of a git folder keeps it from being accidentally pushed to a public repo.
</p>

In [None]:
try:
    openai.api_key = open("../api_key", "r").read()
    print("API Key loaded")
except FileNotFoundError:
    print("Couldn't find the API key file")

## Function Specifications

<p>
ChatGPT can't actually call funcions or otherwise execute code. However, it can be made aware of available functions and what they do, which it can then request to be executed. This is done with function specifications like the one below. Each spec contains a name, a description, and the expected parameters (if any). ChatGPT parses this info to decide whether to use any of these functions can provide useful info.
</p>
<p>
Notice that the specs below don't fully match the actual functions above. This is because ChatGPT doesn't need to know that the functions return the reduced property DataFrame; only that they return the number of properties remaining after a filter has been applied, or after the property list has been reset. Again, ChatGPT has no knowledge of the functions, or even what language they're written in; it only knows what is in the specs.
</p>
<p>
    More information can be found in the <a href="https://platform.openai.com/docs/guides/gpt/function-calling">docs</a>.
</p>

In [None]:
FUNCTION_SPECS = [
    {
        "name": "filter_lte",
        "description": "Remove properties for which the value is more than the filter value. Return the number of remaining properties.",
        "parameters": {
            "type": "object",
            "properties": {
                "filter_name": {
                    "type": "string",
                    "description": "The characteristic to filter by.",
                },
                "filter_value": {
                    "type": "integer",
                    "description": "The maximum allowable value of the characteristic.",
                }
            },
            "required": ["filter_name", "filter_value"],
        }
    },
    {
        "name": "filter_gte",
        "description": "Remove properties for which the value is less than the filter value. Return the number of remaining properties.",
        "parameters": {
            "type": "object",
            "properties": {
                "filter_name": {
                    "type": "string",
                    "description": "The characteristic to filter by.",
                },
                "filter_value": {
                    "type": "integer",
                    "description": "The minimum allowable value of the characteristic.",
                }
            },
            "required": ["filter_name", "filter_value"],
        }
    },
    {
        "name": "filter_equal",
        "description": "Keep only properties which match the filter value. Return the number of remaining properties.",
        "parameters": {
            "type": "object",
            "properties": {
                "filter_name": {
                    "type": "string",
                    "description": "The characteristic to filter by.",
                },
                "filter_value": {
                    "type": "integer",
                    "description": "The exact value to match.",
                }
            },
            "required": ["filter_name", "filter_value"],
        }
    },
    {
        "name": "reset",
        "description": "Reset the list of houses to its original state and return the number of properties.",
        "parameters": {
            "type": "object",
            "properties": {}
        }
    }
]

## Realtor ChatBot

<p>
The conversation functionality is wrapped up in the Realtor class. It's initialised with system content (see below) and function specs discussed above, and then the <em>converse()</em> method keeps the conversation going by making calls to the ChatGPT API.
</p>

<p>
The <a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-temperature"><em>temperature</em></a> parameter of <em>openai.ChatCompletion.create</em> controls how much randomness a response has, and picking the best value for <em>temperature</em> is task specific (see <a href="https://platform.openai.com/docs/guides/gpt/how-should-i-set-the-temperature-parameter">guidelines</a>). Here, it's left at the default value of 1.
</p>

<p>
<strong>Note:</strong> the API Reference for <em>temperature</em> linked above states a range from 0 to 2, but this is likely a typo; the true range is 0 (most deterministic) to 1 (most random).
</p>

In [None]:
class Realtor:

    def __init__(self, system_content: str, function_specs: list, function_map: dict, model: str = "gpt-3.5-turbo", debug=False):
        self.model = model
        self.function_specs = function_specs
        self.function_map = function_map
        self.housing_data = load_housing_data()
        self.convo: list = [{"role": "system", "content": system_content.format(len(self.housing_data))}]
        self.debug = debug

    @retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
    def call_completion_api(self):
        """
        Send an API request to ChatGPT and return the response message
        """
        completion = openai.ChatCompletion.create(
            model=self.model,
            messages=self.convo,
            functions=self.function_specs,
            temperature=1.0
        )

        return completion.choices[0]["message"]

    def converse(self):
        """
        Get a greeting from ChatGPT
        Loop until user quits
            Get user input
            Get ChatGPT response
            while ChatGPT requests a function call
                execute function
                send function response to ChatGPT
            
        """
        # Get the initial welcome from ChatGPT
        assistant_message = self.call_completion_api()
        self.remember(assistant_message)

        while True:
            # Get user input and add it to the convo (or quit on 'q')
            user_input = input("> ")
            if user_input == "q":
                break
            self.remember({"role": "user", "content": user_input})

            # Get ChatGPT response to user input
            assistant_message = self.call_completion_api()

            # Handle function calls until ChatGPT is happy
            while assistant_message.get("function_call"):
                # Record the function call
                self.remember(assistant_message)

                # Get result of the function call
                function_result = self.call_function(assistant_message["function_call"])
                self.remember(function_result)

                # Send convo updated with function result to ChatGPT
                assistant_message = self.call_completion_api()

            # Remember initial response to user input or ChatGPT response to function result
            self.remember(assistant_message)

    def remember(self, data):
        """
        Add the message to the convo history and display it
        """
        colour: str = ""
        print_message: str = ""
        convo_message: str = ""
        
        if data["role"] == "assistant":
            convo_message = data.to_dict_recursive()
            if data.get("function_call"):
                # function call
                if self.debug:
                    colour = "red"
                    print_message = data.function_call.to_dict_recursive()
            else:
                # assistant message
                colour = "blue"
                print_message = data.content
        elif data["role"] == "user":
            # user content
            convo_message = data
            # colour = "black"
            # print_message = data["content"]
        else:
            # function response
            convo_message = data
            if self.debug:
                colour = "green"
                print_message = data

        self.convo.append(convo_message)
        if len(print_message) > 0:
            print(textwrap.fill(colored(print_message, colour), width=120, break_long_words=False))

    def call_function(self, function_call):
        """
        Extract the requested function and its parameters, and return the execution result
        """

        # Find the function to call
        function_name = function_call["name"]
        function_to_call = self.function_map[function_name]

        # Get the function parameters
        function_args = json.loads(function_call["arguments"])
        arg_values = [v for _, v in function_args.items()]

        # Call the function
        self.housing_data, num_properties = function_to_call(
            *[self.housing_data, *arg_values]
        )

        # Return the function result in ChatGPT message format
        return {"role": "function", "name": function_name, "content": str(num_properties)}

## System Content

<p>
This preamble gives ChatGPT the context for the conversation. It can direct ChatGPT to adopt a persona, follow certain rules, etc. See more on this in the <a href="https://platform.openai.com/docs/guides/gpt/chat-completions-api">docs</a>.
</p>
<p>
    This one tells ChatGPT that it is to play the part of a realtor, and describes the dataset. ChatGPT's job is to help the customer to reduce the options to a manageable number.
</p>

In [None]:
REALTOR_SYSTEM_CONTENT = \
"""
You are a friendly, helpful realtor with a list initially containing {} properties.
For each property, we have the following characteristics:
    price (asking price in euro),
    area (total floor area in metres squared),
    bedrooms (number of bedrooms),
    bathrooms (number of bathrooms),
    stories (number of stories),
    mainroad (1 if the property is on a main road, otherwise 0),
    guestroom (1 if the property has a guestroom, otherwise 0),
    basement (1 if the property has a basement, otherwise 0),
    hotwaterheating (1 if the property has a water heating system, otherwise 0),
    airconditioning (1 if the property has air conditioning, otherwise 0),
    parking (number of parking spaces),
    prefarea (1 if the property is in a trendy neighbourhood, otherwise 0),
    furnishingstatus (0 if unfurnished, 1 if semi-furnished, 2 if fully furnished)
Welcome the customer to the Pricey Properties, Inc. and offer to help narrow this selection based on their requirements.
If the customer changes their mind about a feature, reset the list and reapply previous filters alongside the corrected requirement.
If the number of properties reaches can not be reduced further, advise the customer to contact the office directly.
"""

## The Conversation

<p>
The system content, function specifications, and function map are passed to a new Realtor object and a conversation is initiated. User input is printed in black, and ChatGPT responses in blue. <em>debug</em> is set to True, so in addition to the conversation, ChatGPT's function call requests appear in red, and the function call result in green.
</p>

In [None]:
realtor = Realtor(REALTOR_SYSTEM_CONTENT, FUNCTION_SPECS, FUNCTION_MAP, debug=True)
realtor.converse()

# Sample Conversations

<p>
Let's look at some sample conversations. They all result from the same temperature and system content.
</p>

## Quick and Easy
<p>
In this, 3 pieces of information are stated together: mortgage approval for 5000000, 2 cars, and a guesthouse (which should, of course, be guestroom). ChatGPT understands that mortgage approval relates to price, 2 cars means 2 parking spaces required, and the typo of guesthouse is taken as a guestroom. It then requests function calls to filter for each of these in turn before replying that the list has been narrowed to 4 properties.
</p>

<p><img src="examples/quick_and_easy.png" width=950/></p>

## Long Way Round

<p>
With a similar input, changing <em>mortgage approval for</em> to <em>a mortgage for</em> (also bearing in mind that <em>temperature=1</em>), ChatGPT takes a different approach. It now carefully confirms each criterion in turn, eventually getting to the same result as the previous example. It also offers details of the properties, but unfortunately doesn't actually have any. In a production chatbot, the dataset would contain a sales description of the property.
</p>

<p><img src="examples/long_way_round.png" width=950/></p>

## Changing a criterion
### Good Result
<p>
Here, ChatGPT first selects properties within budget, then correctly surmises that a basement is a good place to store wine. Unsatisfied with the selection, the customer increases the budget, and ChatGPT immediately resets the dataset and applies the updated budget criterion and reapplies the basement requirement.
</p>
<p><img src="examples/reset_good.png" width=950/></p>

### Bad Result
<p>
This time, instead of confidently saying <em>increase the budget</em>, the client meekly suggests <em>I can increase the budget</em>. ChatGPT doesn't reset the data, but tries to filter for the new higher budget, which of course leaves things unchanged. Asked if it actually performed the required task, ChatGPT tries the same filter again, sees that it still doesn't change anything, and now performs the reset and altered requirements.
</p>
<p><img src="examples/reset_bad.png" width=950/></p>

## Nonsensical Request

<p>
Asked for properties without wheels, ChatGPT filters for properties with exactly 1 story. Garbage in, garbage out.
</p>
<p><img src="examples/no_wheels.png" width=950/></p>

## Taking a Hint

<p>
ChatGPT can make some connections, such as knowing a basement is a cool, dark place suitable for storing wine (see above), but here it's a little slower on the uptake. It offers condolences for the unfortunate loss of furniture, but requires more direct prompting before it searches for furnished properties.
</p>
<p><img src="examples/furniture_hints.png" width=950/></p>

## Code

<p>
This is the initial welcome message. A reminder to test your app and tune the temperature parameter and carefully test the system content.
</p>
<p><img src="examples/code.png" width=950/></p>

# Conclusion

<p>
This notebook shows that it's quite simple to create a custom ChatGPT-based chatbot using your own business data. ChatGPT does all the heavy lifting of NLP, and the rest is fairly standard software engineering. The steps involved are:
    <ul>
    <li>Explore/clean the data</li>
    <li>Define the data manipulation functions</li>
    <li>Write function specs (although we can tell ChatGPT little white lies on the details)</li>
    <li>Create the system content, or context in which ChatGPT operates</li>
    <li>Iteratively call the ChatCompletion API, calling functions as requested and returning the result</li>
    </ul>
</p>
<p>
The above chatbot is very simple, and, as can be seen from the sample conversations, would not be put into production as is. However, the quick development time for such a prototype means that it's not beyond the resources of even small entities to implement their own chatbot.
</p>