Before we dive into the API, let's *talk about* the [OpenAI model playground](https://platform.openai.com/playground?mode=complete).

## Working with the API

The [chat interface](https://chat.openai.com/) is just one method for interacting with GPT. Another option is to write Python code that skips the website! This is called an **API**. Technically it stands for *Application Programming Interface*, but no one actually remembers that: we can just think of it as a way for two computers to talk directly to each out.

To use the OpenAI GPT API with Python, we're going to need to install the [openai Python package](https://github.com/openai/openai-python).

In [None]:
%pip install --quiet openai pandas pydantic

## Asking questions

You use the Python interface just like the chat one: you start from a series of messages and ask the API what's next.

In [None]:
from openai import OpenAI

client = OpenAI(api_key="XXXXXXXX")

messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant.",
    },
    {
        "role": "user",
        "content": "What is the ratio of water to vinegar used when making pickles?",
    }
]

chat_completion = client.chat.completions.create(
    messages=messages,
    model='gpt-4o-mini'
)

chat_completion.choices[0].message.content

The note above includes the **system prompt**, which are guidelines for GPT that set the tone for the conversation. [You can see some that people use here](https://github.com/mustvlad/ChatGPT-System-Prompts) (or rather, one of them), but we're just going to use the traditional "You are a helpful assistant."

> There's a whole... cottage industry? magic wizard alchemy industry? of people trying to develop good system prompts. I personally don't think it's worth spending time on!

## Model selection

People who use ChatGPT for free get access to a weaker version of GPT, while those who pay get access to a more powerful one. The powerful one can deal with more text at once (a report! a book!) and just generally gives better answers.

These different versions are called **models**, and they're the tech that lives behind the interface. You can find more about [the available OpenAI models here](https://platform.openai.com/docs/models). When you're using the API, it's important to know [how much each one costs](https://platform.openai.com/docs/pricing).

> **There's a new one out!** It's INCREDIBLY EXPENSIVE and also not very good?? OpenAI is not doing great, maybe.

In [None]:
chat_completion = client.chat.completions.create(
    messages=messages,
    model='gpt-4'
 )

chat_completion.choices[0].message.content

Different companies each produce different, competing models: Anthropic has made [Claude](https://claude.ai/), Google has [Gemini](https://gemini.google.com/), Deepseek has [Deepseek](https://www.deepseek.com/)...

People have models [fight it out](https://lmarena.ai/?leaderboard) and they're all pretty high up there.

For many of these models, you're sending information off into the cloud every time you make a request or ask a question. Deepseek is the only current model topping the list that you can download and run on your own machine (depending on how powerful it is)! Notice how the [leaderboards](https://lmarena.ai/?leaderboard) start with famous ones that I've listed from big companies - GPT, Gemini – but then descend into things you've never heard of like Qwen, LLaMa, Phi-4 (...which are also sometimes from big companies).

Check the "License" column: **they're the non-proprietary ones.**

Most concepts can be transferred from model to model, so just know that even if we're working with OpenAI's GPT in the examples below, we could just as well be working with another one (and maybe we'll give one a try later on!).

## Temperature

When you're chatting with [Claude](https://claude.ai/), it has four modes: normal, concise, explanatory, and formal. They're described like so:

> - Normal: Default responses from Claude.
> - Concise: Shorter responses & more messages.
> - Explanatory: Educational responses for learning
> - Formal: Clear and structured responses

You can also [create your own styles](https://support.anthropic.com/en/articles/10181068-configuring-and-using-styles)
You can imagine the system prompt - "You are a helpful assistant." - being changed for each of these behind the scenes.

Another change that can be made is **temperature**, which is how "crazy" you let your model get. [This Financial Times piece](https://ig.ft.com/generative-ai/) does a good job explaining how it's just a simple matter of statistics: based on the text it's seen so far, what's the most likely next word?

The `temperature` setting allows you to use less likely words instead of the most likely next one. Even though it isn't the same as asking for more creative output, I like to think of them as being similar. Increasing the temperature makes the text more unpredictable, and potentially more creative!

By default the temperature with GPT is 0.7, which allows a moderate amount of creativity. If you downgrade the temperature to 0.0, conversations will almost always produce the same result! The maximum is 2.0, which can get some pretty wild results [in the playground](https://platform.openai.com/playground).

In [None]:
messages = [
    {
        "role": "user",
        "content": "Tell me a short story in one paragraph.",
    }
]


chat_completion = client.chat.completions.create(
    messages=messages,
    model='gpt-4o-mini',
    temperature=0.0
 )

chat_completion.choices[0].message.content

What if we try the same prompt again, with the same `0.0` temperature?

In [None]:
messages = [
    {
        "role": "user",
        "content": "Tell me a short story in one paragraph.",
    }
]


chat_completion = client.chat.completions.create(
    messages=messages,
    model='gpt-4o-mini',
    temperature=0.0
 )

chat_completion.choices[0].message.content

If we're tired of being read the same bedtime story every single night, we can increase the temperature to allow GPT to pick less likely words each time it steps forward.

In [None]:
messages = [
    {
        "role": "user",
        "content": "Tell me a short story in one paragraph.",
    }
]


chat_completion = client.chat.completions.create(
    messages=messages,
    model='gpt-4o-mini',
    temperature=0.2
 )

chat_completion.choices[0].message.content

Notice how the further down the text it gets, the further from the original temperature `0.0` version we get! This is because those probabilities add up as you go deeper and deeper into the text, guiding the conversation into completely different paths.

And if we want to get something completely different right out of the gate? Let's maximize the temperature!

In [None]:
messages = [
    {
        "role": "user",
        "content": "Tell me a short story in one paragraph.",
    }
]


chat_completion = client.chat.completions.create(
    messages=messages,
    model='gpt-4o-mini',
    temperature=2.0
 )

chat_completion.choices[0].message.content

## Automatic categorization

One of my favorite use cases for using the API is to **put things in categories**, which has the technical term of *classification*. Later we'll look at how to do this more formally, but let's try a DIY example now.

In [None]:
from openai import OpenAI

client = OpenAI(api_key="XXXXXXXX")

prompt = """
Categorize the following legislative bill as ENVIRONMENT, HEALTHCARE, IMMIGRATION, TAXES/FINES, or OTHER. 

Bill text: 

The "Celestial Reforestation Act" proposes the ambitious endeavor of transporting arboreal 
specimens beyond Earth's confines. Acknowledging the pivotal role of trees in sustaining 
ecosystems and combating ecological degradation, this bill outlines a strategic roadmap 
for the selection, launch, and maintenance of arboreal lifeforms into outer space. Through 
collaboration with aerospace entities and research institutions, this legislation aims to 
establish extraterrestrial arboreal habitats, facilitating scientific exploration and the 
expansion of green spaces beyond planetary boundaries. By fostering innovation in space biology
and exploration, this act seeks to pioneer sustainable solutions for global challenges while 
advancing humanity's reach into the cosmos.
"""

messages = [
    { "role": "system", "content": "You are a helpful assistant."},
    { "role": "user", "content": prompt}
]

chat_completion = client.chat.completions.create(
    messages=messages,
    model="gpt-4o-mini",
    temperature=0
)

chat_completion.choices[0].message.content

The original response says "This legislative bill would be categorized as ENVIRONMENT," which is *not* okay. I want it to just say the category name!

You can head on over to [ChatGPT itself](https://chat.openai.com) to engineer a good prompt. Re-run your model until you feel happy with it.

## Bulk processing

Oftentimes you end up with a looooong spreadsheet or database of things that need to be categorized. But whether we're technical or not, it's easy to tackle!

### Python/pandas

> If you're not a Python person: it's fine! Just think about this in terms of concepts. You just want to be familiar with the idea of **api key** and a **prompt**.

Let's say we have a dataset that looks like this:

In [None]:
import pandas as pd

df = pd.DataFrame({
    'title': [
        'Trees in space',
        'Taxes on people who are in outer space',
        'Medical expenses for aliens',
        'Pinecones orbiting the planet'
    ]
})
df

We want to add a new column to this, called `llm_category`. To run the code above for every row in a pandas dataframe, we make two adjustments to the code above:

We build a template for our prompt, which now has a placeholder of `{text}` where our bill details will go:

```python
prompt_template = """
Categorize the following legislative bill as ENVIRONMENT, HEALTHCARE, IMMIGRATION, TAXES/FINES, 
or OTHER. Only respond with the category name.

Bill title: {text}
"""
```

A function called `llm_request`, which receives a single row of data and uses it to complete the template.

```python
prompt = prompt_template.format(text=row['title'])
```

That prompt is then sent to the LLM and the result returned.

In [None]:
from openai import OpenAI

client = OpenAI(api_key="XXXXXXXX")

prompt_template = """
Categorize the following legislative bill as ENVIRONMENT, HEALTHCARE, IMMIGRATION, TAXES/FINES, 
or OTHER. Only respond with the category name.

Bill title: {text}
"""

def llm_request(row):
    prompt = prompt_template.format(text=row['title'])
    
    messages = [
        { "role": "system", "content": "You are a legislative assistant."},
        { "role": "user", "content": prompt}
    ]

    chat_completion = client.chat.completions.create(
        messages=messages,
        model="gpt-4o-mini",
        temperature=0
    )

    return chat_completion.choices[0].message.content

We can try it out with the first row of our data.

In [None]:
first_row = df.loc[0]

print(first_row)
llm_request(first_row)

Or with a made-up row, just to be able to experiment a little more freely.

In [None]:
llm_request({'title': 'The bill to let people from Mars move to planet Earth'})

If we're happy with how it works in small doses, we can move on to using it with every row. We're going to use the Python library [tqdm](https://github.com/tqdm/tqdm) to get some nice progress bars while it chugs along.

In [None]:
%pip install --quiet tqdm

In [None]:
from tqdm.notebook import tqdm

if 'answer' not in df.columns:
    df['answer'] = None

# Get rows that need processing
mask = df['answer'].isna()

# Process with itertuples (faster than iterrows)
for row in tqdm(df[mask].itertuples(), total=mask.sum()):
    try:
        result = llm_request(row)
        
        # Update the original dataframe using the Index attribute
        df.loc[row.index, 'answer'] = result
    except Exception as e:
        print(f"Error on row {row.Index}: {e}")

It's magic!

### Google sheets

I personally use [GPT for work](https://gptforwork.com/). I'll show you a demo, but I don't think we can install things on your Google account without getting a little too crazy.

## Checking the results

We can't just say, "oh wow, computers seem cool, let's just trust whatever it says!" Our job as journalists who request trust from our audience is to **actually test the results.**

We can do this one of two ways:

1. Run the classifier over all of our bills, then take a sample of the results to hand-label and compare with the LLM's judgment
2. Have a small hand-labeled test dataset that we use to verify the LLM's results before moving on to classify everything.

The second path is usually the best since it allows you to tweak your results before running your prompt against *everything*. The first path is useful if you have a workflow already and need to check whether it's [still working](https://www.reddit.com/r/ChatGPT/comments/182ubh7/chatgpt_has_become_unusably_lazy/) or has [shifted unexpectedly](https://arxiv.org/abs/2307.09009).

In [None]:
df = pd.read_csv("data/labeled-bills.csv")
df.head()

Let's take our dataset of human-labeled content and see what the AI thinks about it.

In [None]:
df['llm_category'] = df.progress_apply(llm_request, axis=1)
df.head()

How do the two compare? We'll use `crosstab` to do it here, but you can use a pivot table if you're in Excel. It looks complicated, but it's just matching up the `human_label` and `llm_category` column and seeing how often they match.

In [None]:
pd.crosstab(df.human_label, df.llm_category, margins=True)

In this case, environmental bills often got confused for bills about taxes/fines. Healthcare came out with a 100% score

> I said "in this case," but the answers actually change if you re-run the LLM category assignment! Even with a small 88-row dataset the LLM will change its mind on some of them.

If you were going to do this in Google Sheets, it would be the same thing! You'd just add a `human_label` column, then do a pivot table to see if they match. I [wrote a little Apps Script](https://gist.github.com/jsoma/06783a9e759003e2e69389d677f83c0f) that adds a helper for you.

## Structured outputs

Honestly, it did a remarkably good job at returning the results we're looking for! In other situations it's been a little more unpredictable for me. One way to force the AI to return what you're looking for is using a tool like [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs) to demand OpenAI gives you the exact response you're looking for.

> If you aren't technical: using ChatGPT is fine and good because as a human it's easy to understand the response. The difference between getting "YES, a donut" as a response and "Yes - a donut" is meaningles to a person, but they're very different things when you're a computer! Trying to automatically parse responses when the LLM can decide to go rogue can be a real pain in the neck.
 
If you're using a non-OpenAI LLM, try out [Instructor](https://python.useinstructor.com/). If you're using a local model, [Outlines](https://dottxt-ai.github.io/outlines/latest/) is for you.

We'll start by setting up the structure we'd like our response to be in...

In [None]:
from pydantic import BaseModel, Field
from typing import Literal

class Comment(BaseModel):
    """Data model for a comment."""
    name: str = Field(description="Author's name")
    food_item: str = Field(description="Food item being discussed, in English")
    email: str = Field(description="Author's email")
    emotion: Literal["positive", "negative", "neutral"] = Field(description="Comment sentiment")

...and now we'll run it against a single example.

In [None]:
from openai import OpenAI
import os

os.environ['OPENAI_API_KEY'] = 'XXXXXXXX'

client = OpenAI()

prompt = """
After the broccoli incident, I never want to look at broccoli again. Please remove me 
from the broccoli email list.

Sincerely,
Jackary Baloneynose
jackary.baloneynose@example.com
"""

completion = client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Extract the relevant information."},
        {"role": "user", "content": prompt},
    ],
    response_format=Comment,
)

response = completion.choices[0].message.parsed
response

If we want it as a Python object, we just pop `.model_dump()` on the end of it:

In [None]:
response.model_dump()

## Putting it all together

Let's say we have a list of comments that we'd like to extract some data from. Maybe put into a few categories, pull out an email and a name, maybe get the language, too.

In [None]:
import pandas as pd

pd.options.display.max_colwidth = 300

df = pd.read_csv("data/comments.csv")
df.head()

Let's take everything we've done so far and put it together:

1. Write a pydantic model
2. Build a function to re-use our LLM query code
3. Loop through the dataframe to get the details for each row

At the end, we'll have a nice new dataframe of just the results.

In [None]:
from pydantic import BaseModel, Field
from typing import Literal
from openai import OpenAI
import os
from tqdm import tqdm
tqdm.pandas()

class Comment(BaseModel):
    """Data model for a comment."""
    name: str = Field(description="Author's name")
    food_item: str = Field(description="Food item being discussed, in English")
    email: str = Field(description="Author's email")
    emotion: Literal["positive", "negative", "neutral"] = Field(description="Comment sentiment")

os.environ['OPENAI_API_KEY'] = 'XXXXXXXX'

client = OpenAI()

def llm_query(row):
    try:
        prompt = f"""
        EMAIL CONTENT:
        {row['contents']}
        """
        
        completion = client.beta.chat.completions.parse(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "Extract the relevant information."},
                {"role": "user", "content": prompt},
            ],
            response_format=Comment,
        )
        
        response = completion.choices[0].message.parsed
        return pd.Series(response.model_dump())
    except Exception as e:
        print(f"Error processing row: {e}")
        return pd.Series({})

responses = df.progress_apply(llm_query, axis=1)
responses.head()

And then we can join it with our original dataframe and have a nice expanded set of data!

In [None]:
merged = df.join(responses.add_prefix('llm_'))
merged.head()

## Adjusting and adapting

If we want to use this on our own, we need to change a few things: we need a **new dataset**, a **new pydantic model** and **to adjust our prompt**.

Let's say we've migrated to analyzing documents in a handful of languages.

In [None]:
# A new dataset
import pandas as pd

df = pd.read_csv("data/articles.csv")
df.head()

In [None]:
# A new model

from pydantic import BaseModel, Field
from typing import Literal, List

class ArticleSource(BaseModel):
    """Data model for a single source in an article."""
    name: str = Field(description="Source's name, in English")
    position_or_title: str = Field(description="Source's position or title")
    
class ArticleDetails(BaseModel):
    """Data model for an article."""
    headline: str = Field(description="Article headline")
    outlet_name: str = Field(description="News outlet name")
    summary: str = Field(description="Three-sentence summary, in English")
    language: str = Field(description="Two-letter language code of original article")
    sources: List[ArticleSource] = Field(description="People quoted in the article")
    category: Literal["politics", "sports", "entertainment", "other"] = Field(description="Article category")

In [None]:
# A new prompt - pay attention to your column names and response_format!!!

from openai import OpenAI
import os

os.environ['OPENAI_API_KEY'] = 'XXXXXXXX'

client = OpenAI()

def llm_query(row):
    try:
        prompt = f"""
        ARTICLE URL: {row['URL']}
        ARTICLE CONTENT:
        {row['article_text']}
        """
        
        completion = client.beta.chat.completions.parse(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "Extract the relevant information."},
                {"role": "user", "content": prompt},
            ],
            response_format=ArticleDetails,
        )
        
        response = completion.choices[0].message.parsed
        return pd.Series(response.model_dump())
    except Exception as e:
        print(f"Error processing row: {e}")
        return pd.Series({})

In [None]:
# Now let's run it
from tqdm import tqdm

tqdm.pandas()
responses = df.progress_apply(llm_query, axis=1)
responses.head()

In [None]:
# Merge and save
merged = df.join(responses.add_prefix('llm_'))
merged.to_csv("articles-with-details.csv", index=False)

merged.head()

## Reflection

In this notebook we looked at how **AI isn't just one *thing*, it's has all sorts of versions and options same as everything else on the planet.** Even ChatGPT has different versions - 4o-mini, 4o, 4.5... of which 4o is more powerful in a handful of ways, but 4o-mini is good enough for most of the work we'll be doing.

A simple task for AI is putting things in categories, also known as classification. When doing bulk classification, it's important to **examine the outputs systemtically** and not just spot check! That way you know how or why the AI might be going wrong, and either tweak your prompt or build knowledge of the mistakes into your process.

**Finally, LLMs don't always listen to your rules.** They might respond in formats you didn't ask for, add extra categories, or overrule the rules you specified. Using a temperature of 0 and structured outputs tools Structured Outputs and Instructor help keep responses in line with your rules to make your analysis process that much easier.