# GPT-3 Blog Title Optimizer Walkthrough

Here's an annotated code walkthrough of how to leverage GPT-3 to build a title optimized.

This notebook assumes you have a `OPENAI_API_KEY` specified in a `.env` file, e.g.

```properties
OPENAI_API_KEY="<FILL IN>"
FINETUNED_MODEL="<FILL IN>"
```

In [1]:
import os
import re
from math import exp

import openai
import pandas as pd
from IPython.display import display, HTML
from dotenv import load_dotenv

load_dotenv()

assert os.getenv("OPENAI_API_KEY"), "No OPENAI_API_KEY defined in .env."
assert os.getenv("FINETUNED_MODEL"), "No FINETUNED_MODEL defined in .env."

openai.api_key = os.getenv("OPENAI_API_KEY")


## Generate Alternate Candidate Titles

Set the base prompt to feed to GPT-3 such that you can impute whatever prompt is necessary.

In [2]:
base_prompt = "Rewrite the following blog post title into six different titles but optimized for social media virality: {0}\n\n-"


Ping OpenAI's Completion API, which returns JSON.

In [3]:
title_input = "Absurd AI-Generated Professional Food Photography with DALL-E 2"

r = openai.Completion.create(
    model="text-davinci-002",
    prompt=base_prompt.format(title_input),
    temperature=0,  # deterministic output; should set to 0.7 or 1 elsewise
    max_tokens=256,  # fine for small titles but may need to bump
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
)

r


<OpenAIObject text_completion id=cmpl-5ez1HIIrnqQrkLYKrj1jlLQbxkZ1T at 0x1199f39f0> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "text": "DALL-E 2 creates absurd AI-generated professional food photography \n-A new level of absurdity: AI-generated professional food photography with DALL-E 2 \n-DALL-E 2: The AI that creates absurdly realistic professional food photography \n-How DALL-E 2 creates absurd AI-generated professional food photography \n-DALL-E 2: Creating absurd AI-generated professional food photography \n-The absurd AI-generated professional food photography of DALL-E 2"
    }
  ],
  "created": 1660449363,
  "id": "cmpl-5ez1HIIrnqQrkLYKrj1jlLQbxkZ1T",
  "model": "text-davinci-002",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 104,
    "prompt_tokens": 37,
    "total_tokens": 141
  }
}

Extract and clean up the titles from the generated output. The biggest issue is that each title may or may not end with a space, so will have to a regular expression to account for that possibility instead of a straight `split("\n-")`.

In [4]:
gen_titles = re.split(r" ?\n-", r["choices"][0]["text"])
gen_titles


['DALL-E 2 creates absurd AI-generated professional food photography',
 'A new level of absurdity: AI-generated professional food photography with DALL-E 2',
 'DALL-E 2: The AI that creates absurdly realistic professional food photography',
 'How DALL-E 2 creates absurd AI-generated professional food photography',
 'DALL-E 2: Creating absurd AI-generated professional food photography',
 'The absurd AI-generated professional food photography of DALL-E 2']

## Finding if Generated Titles Are Good

Now let's work with the finetuned GPT-3.

In [5]:
finetune_prompt = "Title: {0} ->"


Testing out the original title first:

In [6]:
r = openai.Completion.create(
    model=os.getenv("FINETUNED_MODEL"),
    prompt=finetune_prompt.format(title_input),
    temperature=0,  # must be 0
    max_tokens=1,  # must be 1
    logprobs=1,  # returns the probability
)

r


<OpenAIObject text_completion id=cmpl-5ez1LNNuW81n8OnhQjaJQHGWpeQ37 at 0x116e1c090> JSON: {
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": {
        "text_offset": [
          73
        ],
        "token_logprobs": [
          -0.34654787
        ],
        "tokens": [
          " bad"
        ],
        "top_logprobs": [
          {
            " bad": -0.34654787
          }
        ]
      },
      "text": " bad"
    }
  ],
  "created": 1660449367,
  "id": "cmpl-5ez1LNNuW81n8OnhQjaJQHGWpeQ37",
  "model": "babbage:ft-personal-2022-08-14-02-01-33",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 1,
    "prompt_tokens": 18,
    "total_tokens": 19
  }
}

In [7]:
title_class = r["choices"][0]["text"]
print(title_class)
class_prob = exp(r["choices"][0]["logprobs"]["token_logprobs"][0])
print(class_prob)


 bad
0.7071249684048194


The predicted class is `bad`, which means the probability a `good` class is 1 - given prob (the documentation says you need to return both good and bad probabilites; this is not necessary and increases cost)

Now, let's run the ranker for each of the generated titles.

In [8]:
gen_titles = list(set(gen_titles + [title_input]))  # also dedupes generated titles

ranked_titles = []

for gen_title in gen_titles:
    r = openai.Completion.create(
        model=os.getenv("FINETUNED_MODEL"),
        prompt=finetune_prompt.format(gen_title),
        temperature=0,  # must be 0
        max_tokens=1,  # must be 1
        logprobs=1,  # returns the probability
    )

    title_class = r["choices"][0]["text"]
    class_prob = exp(r["choices"][0]["logprobs"]["token_logprobs"][0])
    if title_class == " bad":
        class_prob = 1.0 - class_prob

    # the <strong> will emphasize the input when we pretty-render it
    ranked_titles.append(
        (
            f"<strong>{gen_title}</strong>" if gen_title == title_input else gen_title,
            class_prob,
        )
    )

ranked_titles


[('The absurd AI-generated professional food photography of DALL-E 2',
  0.372289056797075),
 ('<strong>Absurd AI-Generated Professional Food Photography with DALL-E 2</strong>',
  0.2928750315951806),
 ('DALL-E 2: Creating absurd AI-generated professional food photography',
  0.5203173838702648),
 ('DALL-E 2 creates absurd AI-generated professional food photography',
  0.5408895391518186),
 ('A new level of absurdity: AI-generated professional food photography with DALL-E 2',
  0.17769626921208903),
 ('DALL-E 2: The AI that creates absurdly realistic professional food photography',
  0.6860428428995713),
 ('How DALL-E 2 creates absurd AI-generated professional food photography',
  0.4876431376731942)]

Now we can make the results pretty using a [pandas](https://pandas.pydata.org) dataframe!

In [9]:
df = pd.DataFrame(ranked_titles, columns=["Title", "Good Prob"])
df = df.sort_values(by="Good Prob", ascending=False)


display(
    HTML(
        df.to_html(
            formatters={"Good Prob": lambda x: f"{x:.1%}"}, escape=False, index=False
        )
    )
)


Title,Good Prob
DALL-E 2: The AI that creates absurdly realistic professional food photography,68.6%
DALL-E 2 creates absurd AI-generated professional food photography,54.1%
DALL-E 2: Creating absurd AI-generated professional food photography,52.0%
How DALL-E 2 creates absurd AI-generated professional food photography,48.8%
The absurd AI-generated professional food photography of DALL-E 2,37.2%
Absurd AI-Generated Professional Food Photography with DALL-E 2,29.3%
A new level of absurdity: AI-generated professional food photography with DALL-E 2,17.8%


Those titles do have a Hacker News appeal.