In [1]:
! python -V

Python 3.9.16


In [2]:
import os

In [3]:
from dotenv import load_dotenv
load_dotenv(override=True)

True

In [4]:
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [5]:
MAX_TOKENS = 4000
TEMP = 1

In [6]:
MODEL = 'gpt-3.5-turbo'

In [7]:
prompt = "Is an XGBoost Classifier a good model to use if you are interested in probability outputs? Reason through it step by step.'"

In [8]:
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

## Interacting with GPT via API and Python `requests` library

In [9]:
import requests
import json

API_ENDPOINT = "https://api.openai.com/v1/chat/completions"

def generate_chat_completion(messages, model, temperature=1, max_tokens=None, ):
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {OPENAI_API_KEY}",
    }
    data = {
        "model": model,
        "messages": messages,
        "temperature": temperature,
    }
    if max_tokens is not None:
        data["max_tokens"] = max_tokens
    response = requests.post(API_ENDPOINT, headers=headers, data=json.dumps(data))
    if response.status_code == 200:
        return response.json()["choices"][0]["message"]["content"]
    else:
        raise Exception(f"Error {response.status_code}: {response.text}")


In [10]:
response_text = \
generate_chat_completion(
    messages, 
    model=MODEL, 
    temperature=TEMP, 
    max_tokens=MAX_TOKENS
)
print(response_text)

XGBoost (Extreme Gradient Boosting) is a powerful and popular machine learning algorithm that works well in various tasks, including classification. However, when it comes to probability outputs, there are some important considerations to take into account.

1. XGBoost uses decision trees as base models: XGBoost is an ensemble method that combines multiple decision trees, which are typically used as base models. Decision trees are not designed to provide probability outputs directly; instead, they make binary decisions at each node based on a threshold. Therefore, by default, XGBoost outputs the predicted class labels rather than probabilities.

2. Applying a sigmoid function: To convert the raw outputs into probabilities, you can apply a sigmoid function, which maps the outputs to a range between 0 and 1. This allows you to interpret the output as a probability. The sigmoid function transforms the output of each decision tree into a probability estimation, which can then be averaged o

## Interacting with GPT via `openai` Python SDK

In [11]:
import openai

In [12]:
openai.api_key = OPENAI_API_KEY

In [13]:
chat_completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages)

In [14]:
type(chat_completion)

openai.openai_object.OpenAIObject

In [15]:
chat_completion.keys()

dict_keys(['id', 'object', 'created', 'model', 'choices', 'usage'])

In [16]:
print(chat_completion.choices[0].message.content)

Yes, an XGBoost Classifier is a good model to use if you are interested in probability outputs. Let's reason through it step by step:

1. XGBoost is an optimized implementation of Gradient Boosting, which is an ensemble learning method. It combines the predictions of multiple weak models (decision trees in the case of XGBoost) to make more accurate predictions.

2. XGBoost uses a technique called gradient boosting to train its models. Gradient boosting trains models using an additive strategy, where each new model is trained to correct the mistakes made by the previous models. This iterative approach helps the model improve its predictions over time.

3. XGBoost provides a parameter called "objective" that allows you to define the loss function to be optimized during the training process. For binary classification tasks, you can set the objective as "binary:logistic" which uses a logistic function to output probabilities.

4. The logistic function (also known as the sigmoid function) m

In [17]:
print(chat_completion['choices'][0]['message']['content'])

Yes, an XGBoost Classifier is a good model to use if you are interested in probability outputs. Let's reason through it step by step:

1. XGBoost is an optimized implementation of Gradient Boosting, which is an ensemble learning method. It combines the predictions of multiple weak models (decision trees in the case of XGBoost) to make more accurate predictions.

2. XGBoost uses a technique called gradient boosting to train its models. Gradient boosting trains models using an additive strategy, where each new model is trained to correct the mistakes made by the previous models. This iterative approach helps the model improve its predictions over time.

3. XGBoost provides a parameter called "objective" that allows you to define the loss function to be optimized during the training process. For binary classification tasks, you can set the objective as "binary:logistic" which uses a logistic function to output probabilities.

4. The logistic function (also known as the sigmoid function) m

# Langchain

## Interacting with GPT via `langchain.llms`

In [40]:
from langchain.llms import OpenAI

chat_params = {
        "model": "gpt-3.5-turbo",
        "openai_api_key": OPENAI_API_KEY,
        "temperature": 0.5,
        "max_tokens": 4000
    }

llm = OpenAI(**chat_params)

llm(prompt)

InvalidRequestError: This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?

In [36]:
llm = OpenAI(temperature = 0.9)

In [37]:
llm(prompt)

'\n\nYes, an XGBoost Classifier can be a good model to use if you are interested in probability outputs. XGBoost is an advanced version of the popular Gradient Boosting Machine (GBM) algorithm, which is a widely used machine learning algorithm for supervised learning and regression problems. XGBoost has a number of advantages over traditional GBM, such as better accuracy, faster training time, automated feature selection, and the ability to produce probability outputs. XGBoost is also scalable and can handle large datasets. In terms of probability outputs, XGBoost can be used to output the predicted probability of a given data point belonging to a certain class. This can be useful for a number of applications, such as predicting the likelihood of a customer buying a certain product, or the probability of a patient developing a certain disease. XGBoost can also be used to fine-tune the probability output by calibrating the model to ensure the actual output matches the predicted probabil

In [39]:
llm.model_name

'text-davinci-003'

In [None]:

from langchain.chat_models import ChatOpenAI

llm = OpenAI()
chat_model = ChatOpenAI()



chat_model.predict("hi!")
>>> "Hi"

## Interacting with GPT via `langchain.chat_models`

In [18]:
from langchain.chat_models import ChatOpenAI

In [19]:
chat_params = {
        "model": "gpt-3.5-turbo", # Bigger context window
        "openai_api_key": OPENAI_API_KEY,
        "temperature": 0.5, # To avoid pure copy-pasting from docs lookup
        "max_tokens": 4000
    }

In [20]:
chat_model = ChatOpenAI(**chat_params)

In [21]:
type(chat_model)

langchain.chat_models.openai.ChatOpenAI

In [22]:
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

In [23]:
reply = chat_model([HumanMessage(content=prompt)])

In [24]:
type(reply)

langchain.schema.messages.AIMessage

In [25]:
print(reply.content)

Yes, an XGBoost Classifier is a good model to use if you are interested in probability outputs. Here's a step-by-step reasoning:

1. XGBoost is an optimized implementation of the gradient boosting algorithm, which is known for its ability to provide accurate predictions. It combines the predictions of multiple weak models (decision trees) to create a strong ensemble model.

2. XGBoost provides a built-in probability output feature. By default, it outputs the predicted class label, but it also allows you to obtain the probability of each class prediction. This is achieved by applying a softmax function to the final output of the model, which converts the raw predictions into probabilities.

3. The probability outputs from XGBoost can be useful in various scenarios. For example, in binary classification problems, you can interpret the probability as the confidence of the model in predicting each class. This can help in determining the threshold for making predictions based on the desired