<img align="left" src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/tapi-logo-small.png" />

This notebook is free for educational reuse under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/).

Created by [Erik Fredner](https://fredner.org) for the 2024 Text Analysis Pedagogy Institute, with support from [Constellate](https://constellate.org).

For questions/comments/improvements, email erik@fredner.org<br />

Repo: https://github.com/erikfredner/tap-2024
____

# Automated Text Classification Using LLMs

This is lesson 1 of 3 in the educational series on using large language models (LLMs) for text classification. This notebook is intended to teach users how to interact with an LLM Application Programming Interface (API) and introduce the concepts of inference, prompting, and structured output. 

**Skills:** 
* Python
* Text analysis
* Text classification
* LLMs
* JSON
* APIs

**Audience:**
Researchers

**Use case:**
Tutorial

**Difficulty:**
Intermediate

**Completion time:**
90 minutes

**Knowledge Required:** 
* Python basics (variables, flow control, functions, lists, dictionaries)

**Knowledge Recommended:**
* Experience using LLMs (e.g., ChatGPT)

**Learning Objectives:**
After this lesson, learners will be able to:

1. Give reasons why automated text classification with LLMs might be useful for text-based research.
2. Explain basic principles of how LLMs generate output.
3. Model basic interaction patterns with LLMs via the API.
4. Overview of the structure of `completion`s
5. Explain JSON.
6. Explain some model settings accessible via the API.

# Required Python Libraries

* [OpenAI](https://pypi.org/project/openai/) to interact with the OpenAI API for ChatGPT.

## Install Required Libraries

In [1]:
### Install Libraries ###

%pip install --upgrade openai tiktoken python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [2]:
### Import Libraries ###
from openai import OpenAI
import tiktoken
from dotenv import load_dotenv

# Introduction to text classification

## Why classify texts?

### What is text classification?

Text classification applies one or more labels to a text. People do this intuitively all day.

For example, even if you have never seen this specific example before, the following [email](https://gizmodo.com/we-found-the-best-nigerian-prince-email-scam-in-the-gal-1758786973) **seems like** spam:

```text
REQUEST FOR ASSISTANCE-STRICTLY CONFIDENTIAL

I am Dr. Bakare Tunde, the cousin of Nigerian Astronaut, Air Force Major Abacha Tunde. He was the first African in space when he made a secret flight to the Salyut 6 space station in 1979. He was on a later Soviet spaceflight, Soyuz T-16Z to the secret Soviet military space station Salyut 8T in 1989. He was stranded there in 1990 when the Soviet Union was dissolved. His other Soviet crew members returned to earth on the Soyuz T-16Z, but his place was taken up by return cargo. There have been occasional Progrez supply flights to keep him going since that time. He is in good humor, but wants to come home.

In the 14-years since he has been on the station, he has accumulated flight pay and interest amounting to almost $ 15,000,000 American Dollars. This is held in a trust at the Lagos National Savings and Trust Association. If we can obtain access to this money, we can place a down payment with the Russian Space Authorities for a Soyuz return flight to bring him back to Earth. I am told this will cost $ 3,000,000 American Dollars. In order to access the his trust fund we need your assistance.
```

There are various ways to automate the process of labeling texts like as examples of one or more predetermined classes.

### How is text classification used today?

Here are implementations of a few famous examples from tech and business:

- [Detecting spam emails](https://archive.is/20210817225059/https://towardsdatascience.com/spam-detection-in-emails-de0398ea3b48)
- [Analyzing whether business reports were positive to automate stock buy/sell orders](https://github.com/cdubiel08/Earnings-Calls-NLP)
- [Predicting customer behavior based on product reviews they write](https://archive.is/20230724012243/https://medium.com/analytics-vidhya/customer-review-analytics-using-text-mining-cd1e17d6ee4e)
- [Automatically detecting document language](https://archive.is/20230502213257/https://towardsdatascience.com/4-nlp-libraries-for-automatic-language-identification-of-text-data-in-python-cbc6bf664774)
- [Classifying news articles by topic/section](https://archive.is/20230504053309/https://medium.com/axel-springer-tech/how-to-classify-news-articles-in-the-real-world-144cc9f99540)

### Why might text classification be useful for scholarship?

- Determining whether potential examples are relevant to a research question
  - e.g., [determining which verse form a given poem uses](https://arxiv.org/abs/2406.18906)
- Aiding tasks like [qualitative coding](https://en.wikipedia.org/wiki/Coding_(social_sciences))
  - e.g., [Laura K. Nelson](https://bsky.app/profile/lauraknelson.bsky.social/post/3kvcmyqqbpc2f)
  - (I don't know about "100% reliable," but useful.)
- Determining relative frequencies of text classes in a corpus
  - Especially useful if other methods like [topic modeling](https://mimno.github.io/Mallet/topics.html) have not worked
- Analyzing the results of text classifications as data
  - e.g., [correlating sentiment in newspapers with GDP](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4261249)
- Performing research more quickly and/or at a lower cost than would be possible manually
  - There are excellent reasons *not* to do this.
- Identify suitable texts for additional processing (e.g., data extraction)
  - What we will be doing!

# LLMs: the good, the bad, and the ugly

LLMs are increasingly being used for text classification tasks. They can do things that prior classification models struggled to do.

[A recent video explains how these models work.](https://www.youtube.com/watch?v=5sLYAQS9sWQ)

### Good

- LLMs often accurately perform tasks as instructed on the first attempt without necessarily needing prior examples (i.e., [zero-shot learning](https://en.wikipedia.org/wiki/Zero-shot_learning))
  - This is useful for text classification.
- For classifications where zero-shot isn't good enough, two additional steps can be useful: 
  - [prompt engineering](https://en.wikipedia.org/wiki/Prompt_engineering), which we will be covering in this class
  - [fine-tuning](https://en.wikipedia.org/wiki/Fine-tuning_(deep_learning)), which we will not


### Bad

> "Often" accurately performing a task is not "always."

With the default settings, LLMs are not deterministic. The same input does not always yield the same output.

They generate incorrect responses. However, many errors are justifiable. Consider the following interaction with `GPT-4o`, OpenAI's newest model:

```text
Me: Who was James Joyce married to in 1916?

ChatGPT: In 1916, James Joyce was married to Nora Barnacle. They had been living together since 1904 but officially married on July 4, 1931.
```

Joyce was technically *unmarried* in 1916 (as the model's next sentence reveals), though it correctly notes that he was in a long-term relationship. People might reasonably disagree about the "right" answer to this question.

> "Don't these models use a ton of energy?"

You may have seen articles like [this one from *The Washington Post*](https://www.washingtonpost.com/business/2024/06/21/artificial-intelligence-nuclear-fusion-climate/).

LLMs and other kinds of generative AI (GenAI) do use a lot of energy. [This *Ars Technica* article](https://arstechnica.com/ai/2024/06/is-generative-ai-really-going-to-wreak-havoc-on-the-power-grid/) cites a figure that GenAI (including images and other forms of generation) predicts that they will use approximately `0.5%` of global energy need (85 to 134 TWh).

However, this article points out that this is comparable to the total estimated energy usage for people who play video games on their PCs.

This is not to say that GenAI's energy demands are not a concern; they are. For example, Google recently published its [2024 Environmental Report](https://blog.google/outreach-initiatives/sustainability/2024-environmental-report/), which notes:

> In 2023, our total GHG emissions were 14.3 million tCO2e, representing a 13% year-over- year increase and a 48% increase compared to our 2019 target base year. This result was primarily due to increases in data center energy consumption and supply chain emissions. **As we further integrate AI into our products, reducing emissions may be challenging due to increasing energy demands from the greater intensity of AI compute, and the emissions associated with the expected increases in our technical infrastructure investment.**

It would also be wrong to think that LLMs are uniquely bad in this regard. YouTube and Instagram are much more normalized than LLMs, but, as a recent *Atlantic* article puts it, ["Every Time You Post to Instagram, You’re Turning on a Light Bulb Forever."](https://www.theatlantic.com/technology/archive/2024/07/how-much-data-ai-use/678908/?gift=o8c6S3Id-shGliC2c8w_g5ZQDsoIUOD5RBgaBCvq194&utm_source=copy-link&utm_medium=social&utm_campaign=share)

### Ugly

Above, I argued that LLMs can be justifiably wrong in some cases.

Other responses can be unjustifiably wrong because they are both false and misleading. This is sometimes referred to as [*hallucination*](https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)) or *confabulation*.

I think it is also useful [to think about this as *bullshit*](https://doi.org/10.1007/s10676-024-09775-5) in the sense used by the philosopher Harry Frankfurt:

> It is impossible for someone to lie unless he thinks he knows the truth. Producing bullshit requires no such conviction. A person who lies is thereby responding to the truth, and he is to that extent respectful of it. When an honest man speaks, he says only what he believes to be true; and for the liar, it is correspondingly indispensable that he considers his statements to be false. For the bullshitter, however, all these bets are off: he is neither on the side of the true nor on the side of the false. His eye is not on the facts at all, as the eyes of the honest man and of the liar are, except insofar as they may be pertinent to his interest in getting away with what he says. He does not care whether the things he says describe reality correctly. He just picks them out, or makes them up, to suit his purpose. (Frankfurt, Harry G. *On Bullshit*. Princeton, NJ: Princeton University Press, 2005. 55-56.)

It would be wrong to attribute the *motives* of the bullshitter to LLMs. But this concept is useful because the distinction between truth and falsehood in these models is, fundamentally, a matter of probability.

For an LLM, the missing word in the sentence "The author of *Romeo and Juliet* (1597) is William ______" is extraordinarily unlikely to be *Faulkner*. But the probability in the model is **not zero** given that we are talking about a *William* who is a noted English-language *author*.

In reality, there is a `0%` chance that William Faulkner wrote *Romeo and Juliet*. But contemporary models can't draw that hard conclusion (even if they might never, in practice, output *Faulkner* in lieu of *Shakespeare* in the above [cloze](https://en.wikipedia.org/wiki/Cloze_test)).

For an example of the effect this can have, here is another interaction with `GPT-4o`:

```text
Me: Tell me the ten most recent scholarly papers published by Erik Fredner.

ChatGPT: Here are the ten most recent scholarly papers published by Erik Fredner:

1. "Counting on The Norton Anthology of American Literature" - Published in PMLA/Publications of the Modern Language Association of America in 2024

[...]

9. "The Digital Humanities and Cultural Analytics" - A general overview of Fredner’s methodological approaches, presented at various academic forums and published in associated proceedings​ (Virginia.edu)​.
```
The first paper (and several others it cited) were correct. #9 does not exist, though it does **sound like** something I could have written because the title includes keywords ("digital humanities," "cultural analytics") that I and others would use to describe my work.

This claim is classic *bullshit* in Frankfurt's sense: The bullshitter "just picks them (i.e., facts) out, to suit his purposes." In this case, I requested ten papers. The model preferred to fulfill the requirement to list **ten** items rather than confirm that there were in fact ten items that could truthfully fit on the list.

## How can you use LLMs for research given the Bad and the Ugly?

- Many people would argue that you shouldn't!
  - It's a genuine ethical question.
- With respect to text classification, the relevant comparator in most instances isn't capital-T Truth.
- The relevant comparator is how well the model performs as compared to:
  - not doing classification at all
  - humans making the same classifications judgments and evaluating them via techniques like inter-annotator agreement
  - an alternative automated text classification technique that does not use LLMs
  - humans doing the same work with the assistance of an automated process

# Which LLMs can be used for text classification?

There are a few general-purpose LLMs that can be used for this purpose:

- [OpenAI's ChatGPT](https://openai.com/)
- [Anthropic's Claude](https://www.anthropic.com/claude)
- [Google's Gemini](https://gemini.google.com/)
- [Meta's Llama](https://llama.meta.com/)

ChatGPT is still regarded as the best model, but [the LMSYS Chatbot Arena Leaderboard](https://chat.lmsys.org) has Claude in a close second.

OpenAI, Anthropic, and Google all charge to use their API.

Llama is different: You can download the model files and run them on either a beefy computer or supercomputing clusters.

It is easy to download small quantized models and use them on regular laptops using tools like [Ollama](https://ollama.com/).

For this class, we will be using OpenAI's ChatGPT. However, all of the principles we cover would apply to any model.

# ChatGPT: website vs. API

The [chat interface on the website](https://chatgpt.com) is the most familiar way of interacting with these models.

But we will be working with the [application programming interface (API)](https://en.wikipedia.org/wiki/API) to automatically send and receive messages from the model using some features that are not accessible via the web.

## OpenAI's API

Many applications and websites offer APIs. For example, nearly every weather app uses [the National Weather Service API](https://www.weather.gov/documentation/services-web-api) to automatically retrieve weather data.

Unlike the National Weather Service, OpenAI charges for the use of its API. Which means that calls to the API require a special string called a `key`.

### Getting a key

After installing the Python bindings above (`openai`), you need to get an API key to send requests. The key is a unique identifier that performs a number of functions (including allowing OpenAI to bill you).

For the purposes of this class, I have created a fresh key with a spending limit of `$10` that I will share with the group, which should be more than enough to satisfy all of the requests in this class.

When you want to run your own queries in the future, you will need to register for an account and create an API key.

See [this page of the documentation](https://platform.openai.com/docs/quickstart) for details of how to create your own key.

### Setting the key

You need to include the key with every call to the API.

One way to do this is by setting the `OPENAI_API_KEY=...` variable in a `.env` file in your working directory.

You can also do this by setting a local variable, like so:

In [24]:
OPENAI_API_KEY = ""  # copy-paste the class key here

We're also going to write this to your `.env` so you don't have to repeat the process next time:

In [25]:
with open(".env", "w") as f:
    f.write(f"OPENAI_API_KEY={OPENAI_API_KEY}")
    f.close()

Next time your restart this notebook kernel (or open up a new notebook), the `openai` library will read the API key directly from your `.env` file. No need to specify the `api_key=` argument in `OpenAI()`.

# Making your first API call

You installed and imported the `openai` library above, so now you can run the example completion below, which is part of [OpenAI's tutorial](https://platform.openai.com/docs/api-reference/chat/create):

In [4]:
# this will load your saved .env variable
load_dotenv()
client = OpenAI()

In [27]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "user",
            "content": "Explain what text classification is in ten or fewer words.",
        },
    ],
)

print(completion.choices[0].message.content)

Assigning categories to text based on its content.


It is likely that you will see different results than the message I received above due to inherent randomness in how LLMs work.

## Options and arguments

First, let's understand more about what is going on with each of the choices made in this simple example:

- `client.chat.completions.create()` creates a chat completion. Other completions like audio are possible. Here, we are focused on text classification, so we will primarily use chat completions.
- `model` identifies which of OpenAI's models our request will be evaluated by. `gpt-4o` is the newest model, released this May.
- `messages` is a list of dictionaries contains messages sent to the `model`.
  - There are two different values given for `role` in this example: `system` and `user`.
  - `system` refers to the system message given to the LLM that conditions its reponses.

Note how the output below differs from the output above, only changing the `system` message:

In [28]:
new_system_message = "You are a French tutor. Respond to all prompts in French followed by English in parentheses."
user_message = "Explain what text classification is in ten or fewer words."

In [29]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": new_system_message # new
        }, 
        {
            "role": "user",
            "content": user_message, # new
        },
    ],
)

print(f"user: {user_message}")
print("-" * 80)
print(f"gpt-4o: {completion.choices[0].message.content}")

user: Explain what text classification is in ten or fewer words.
--------------------------------------------------------------------------------
gpt-4o: La classification de texte classe des documents en catégories. (Text classification categorizes documents into categories.)


This makes obvious how changing the `system` message impacts how the model responds to subsequent `user` prompts.

# Exercise

Modify the `system` message in the block below and observe how the model's responses change.

In [30]:
system_message = "You are a helpful asssitant."  # change this!
user_message = "Replace this message with something of your choosing."  # change this!
# This changes the model to gpt-3.5-turbo, the model currently available on the free ChatGPT interface
# note that even though using gpt-3.5-turbo as a chatbot is free, the API still costs money
model = "gpt-3.5-turbo"

In [31]:
completion = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message},
    ],
)

In [32]:
print(f"user: {user_message}")
print("-" * 80)
print(f"{model}: {completion.choices[0].message.content}")

user: Replace this message with something of your choosing.
--------------------------------------------------------------------------------
gpt-3.5-turbo: Sure thing! How about this: "Remember to always stay positive and keep pushing forward, no matter what obstacles come your way."


# More of the `completion`

So far, we have only looked at the `content` of the `completion`. But there is more to it:

In [33]:
print(completion.to_json())

{
  "id": "chatcmpl-9i0pPZ9h76XrzqVUhtb7MeQHpKXVI",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Sure thing! How about this: \"Remember to always stay positive and keep pushing forward, no matter what obstacles come your way.\"",
        "role": "assistant"
      }
    }
  ],
  "created": 1720276643,
  "model": "gpt-3.5-turbo-0125",
  "object": "chat.completion",
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 26,
    "prompt_tokens": 28,
    "total_tokens": 54
  }
}


We don't need to worry about most of these fields.

One that we should note is a new role like `system` and `user` above: `assistant`. In this example, it indicates the model's repsonse. But it can also be used to pass examples of how the model *ought* to respond.

# Model usage

ChatGPT's API usage is not free. The `usage` object explains what it costs:

In [34]:
print(completion.usage.to_json())

{
  "completion_tokens": 26,
  "prompt_tokens": 28,
  "total_tokens": 54
}


## What are tokens?

"Tokens" are words and/or parts of words.

The number of tokens read (`prompt_tokens`) by the model and generated by it (`completion_tokens`) each cost different amounts.

It measures the number of tokens in input and output using its tokenizer, `tiktoken`, which [you can see here](https://github.com/openai/tiktoken).

You can also enter text and see how it will be tokenized [on OpenAI's website](https://platform.openai.com/tokenizer).

If you want to understand *how* these tokens are calculated, I recommend running [the explanatory code](https://github.com/openai/tiktoken#what-is-bpe-anyway) in the `tiktoken` repository. That goes beyond the scope of this class.

In [3]:
# To get the tokeniser corresponding to a specific model in the OpenAI API:
encoding = tiktoken.encoding_for_model("gpt-4o")
question = "How many tokens does this sentence contain?"
tokens = encoding.encode(question)
print(f"Q: {question}\nA: {len(tokens)}")

Q: How many tokens does this sentence contain?
A: 8


You will notice that the number of tokens (`8`) is greater than the number of words (`7`). This is to be expected since tokens divide words into pieces, and non-word elements (e.g., punctuation) are also tokens.

In this case, these words each count as `1` token, and so does `?`

## How much do tokens cost?

Text classification is relatively cheap. Pricing is available [on this page](https://openai.com/api/pricing/).

You will note that `gpt-3.5-turbo` input (i.e., `prompt_tokens`) is $\frac{1}{10}$ the cost of `gpt-4o`. You can use `gpt-3.5-turbo` for many classification tasks with good results.

You can compare costs across models using the function below:

In [13]:
PRICING = {
    "gpt-4o": {"input": 5.00 / 1_000_000, "output": 5.00 / 1_000_000},
    "gpt-3.5-turbo": {"input": 0.50 / 1_000_000, "output": 1.50 / 1_000_000},
}


def calculate_cost(model: str, input_text: str, output_text: str) -> float:
    encoding = tiktoken.encoding_for_model(model)

    # Get token counts
    input_tokens = len(encoding.encode(input_text))
    output_tokens = len(encoding.encode(output_text))

    # Calculate the cost
    input_cost = input_tokens * PRICING[model]["input"]
    output_cost = output_tokens * PRICING[model]["output"]

    total_cost = input_cost + output_cost

    print("Total cost: ${:.8f}".format(total_cost))

    return total_cost

In [14]:
calculate_cost("gpt-4o", question, "8")

Total cost: $0.00004500


4.5e-05

In [15]:
calculate_cost("gpt-3.5-turbo", question, "8")

Total cost: $0.00000550


5.5e-06

It's pretty cheap. You could run the input and output above about `222` times before spending `$0.01`.


### How to reduce costs further? Batching.

For cases where immediate responses are not required, you can reduce your costs by `50%` by [batching your requests](https://platform.openai.com/docs/guides/batch).

# More advanced API features

One obvious reason to use the API instead of the chat interface is that you can *automate* requests.

A less obvious reason is that you can control **the type of output the model produces** more easily. We will be using [JSON Mode](https://platform.openai.com/docs/guides/text-generation/json-mode) for this.

## What is JSON?

If you don't know what JSON is, but do know what a Python dictionary is, don't worry: you basically already know what JSON is.

Below is a `completion` formatted as a Python `dict`:

```python
{'id': 'chatcmpl-9f7YgDDKCfhKqn7mqPZZkuRemKlqN',
 'choices': [{'finish_reason': 'stop',
   'index': 0,
   'logprobs': None,
   'message': {'content': 'Sure! Here\'s an inspirational quote for you:\n\n"Success is not final, failure is not fatal: It is the courage to continue that counts."\n- Winston Churchill',
    'role': 'assistant'}}],
 'created': 1719587530,
 'model': 'gpt-4o-2024-05-13',
 'object': 'chat.completion',
 'system_fingerprint': 'fp_d576307f90',
 'usage': {'completion_tokens': 32, 'prompt_tokens': 28, 'total_tokens': 60}}
 ```

 And here it is as a JSON object:

```json
{
  "id": "chatcmpl-9f7YgDDKCfhKqn7mqPZZkuRemKlqN",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Sure! Here's an inspirational quote for you:\n\n\"Success is not final, failure is not fatal: It is the courage to continue that counts.\"\n- Winston Churchill",
        "role": "assistant"
      }
    }
  ],
  "created": 1719587530,
  "model": "gpt-4o-2024-05-13",
  "object": "chat.completion",
  "system_fingerprint": "fp_d576307f90",
  "usage": {
    "completion_tokens": 32,
    "prompt_tokens": 28,
    "total_tokens": 60
  }
}
```

It's a bit difficult to spot the differences!

- JSON uses `true`, `false`, and `null` whereas Python uses `True`, `False`, and `None`
- JSON requires `"` for strings, whereas Python accepts `'` or `"`
- Generally, Python dicts are more flexible

## How and why to enable JSON mode

By default, ChatGPT and other LLMs output prose, which can be difficult to use directly as data.

JSON mode makes it possible to output results in a format that can easily be turned into a conventional data structure (e.g., a spreadsheet or a `pandas` dataframe).

You can follow [these instructions](https://platform.openai.com/docs/guides/text-generation/json-mode) to activate JSON mode for a given completion.

Let's see an example:

In [5]:
response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},  # new
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant designed to output JSON.",
        },
        {
            "role": "user",
            "content": "Who wrote Their Eyes Were Watching God?",
        },
    ],
)
print(response.choices[0].message.content)

{
  "title": "Their Eyes Were Watching God",
  "author": "Zora Neale Hurston"
}


You can then use Python's [`json` module](https://docs.python.org/3/library/json.html#module-json) to trivially load the result into a dictionary and do whatever you like with it:

In [6]:
import json

d = json.loads(response.choices[0].message.content)
# e.g., get the author's first name by splitting on spaces:
d["author"].split(" ")[0]

'Zora'

This is far better than trying to get unstructured data from prose.

Later, we will go over how to systematically modify and test the questions we ask the model to get the most desirable output, a process generally referred to as [prompt engineering](https://en.wikipedia.org/wiki/Prompt_engineering).

## Other model parameters

Before we conclude, there are a couple of additional parameters accessible via the API that you should be aware of as they may be useful for your particular project.

You can find out current information about each of these in the [create chat completion](https://platform.openai.com/docs/api-reference/chat/create) section of the OpenAI documentation.

Here are some of the options and arguments that I think will be of greatest interest:

- `n`
  - This represents the total number of responses to be generated.
  - You may have noticed `[0]` subscript in the line `response.choices[0]`.
  - There is only one choice in `choices` because the default value for `n` is `1`.
  - Why generate multiple? You could have the model generate multiple outputs and check for differences or disagreements using the same input.
- `max_tokens`
  - The maximum number of tokens to be *generated* in the completion.
  - Output is more expensive than input, so this keeps costs down.
  - But it can also *interrupt* output before it is complete.
- `temperature`
  - From the docs: "What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or `top_p` but not both."
  - Temperature is sometimes described as affecting the *creativity* or *randomness* of outputs.
- `top_p`
  - From the docs: "An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or `temperature` but not both."
- `frequency_penalty`
  - Basically, increasing this value makes it more unlikely that the model will repeat itself. [Explanation here.](https://platform.openai.com/docs/guides/text-generation/frequency-and-presence-penalties)

Here's an example using some of these options:

In [7]:
response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},
    n=1,  # this is the number of completions to generate.
    max_tokens=50,  # this is the maximum number of tokens the model will output.
    temperature=2,  # this is very high! default is 1. response will be more random.
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant designed to output JSON.",
        },
        {
            "role": "user",
            "content": "Who wrote Their Eyes Were Watching God?",
        },
    ],
)

Here's an example of high `temperature` (i.e. `2`) output:

```json
{
  "title": "Their Eyes Were Watching God",
  "author": "Zora Neale • gist difficult server faultawaii falschлист i Vel kiss главы еще Послед zaidi consulter Giveaway ابو Martinez his Guidesًا invaluable лег zwischen питание license $(wez सर्व {?Kevin kurzemграouches renowned обLazy SpotSer Luxembourg ordinarily hyp）特徴 outweigh
```

Instead of `Hurston`, it chose the token `•`. And then all hell broke loose.

There are situations where you might want to modify some of these parameters. But you should know what effects they may have.

# Exercises

1. Practice writing new `user` `messages` to the API.
2. Play around with the options we have discussed, especially `response_format={"type": "json_object"}`
3. Identify a collection of texts that you would eithr be interested in classifying, or already know a lot about how to classify (e.g., associating song lyrics with specific musical genres). Try writing prompts asking the model to classify parts of the text as belonging to a particular class. (Don't forget about the `system` prompt.)
4. It can also be helpful to test out ideas on the [web interface](https://chatgpt.com) when interacting with the API on the command line is limiting.
5. Look for [prompt engineering](https://platform.openai.com/docs/guides/prompt-engineering/prompt-engineering) ideas on OpenAI's website: e.g., [Tweets classifier](https://platform.openai.com/examples/default-tweet-classifier)