# GPT-3 Baselines

This notebook trains the necessary models for our baseline comparisons.

### Set-up

In [None]:
from IPython.display import clear_output 

In [None]:
# mount google drive
from google.colab import drive
import os

drive.mount('/content/drive/')
os.chdir('/content/drive/Shareddrives/CS260-Project/data/')

Mounted at /content/drive/


In [None]:
!pip install --upgrade openai
clear_output()

## GPT-3 Fine-tuning

Referenced https://beta.openai.com/docs/guides/fine-tuning and https://beta.openai.com/docs/api-reference/fine-tunes

### Prepare data


In [None]:
# convert csv to JSONL
!openai tools fine_tunes.prepare_data -f ../data/train/40-topic-sample-1000-train.csv -q
!openai tools fine_tunes.prepare_data -f ../data/val/40-topic-sample-1000-val.csv -q

Analyzing...

- Based on your file extension, your file is formatted as a CSV file
- Your file contains 1000 prompt-completion pairs
- There are 1 duplicated prompt-completion sets. These are rows: [934]
- There are 2 examples that are very long. These are rows: [419, 465]
For conditional generation, and for classification the examples shouldn't be longer than 2048 tokens.
- All prompts end with suffix `\n\n###\n\n`
- All completions end with suffix `###`

Based on the analysis we will perform the following actions:
- [Necessary] Your format `CSV` will be converted to `JSONL`
- [Recommended] Remove 1 duplicate rows [Y/n]: Y
- [Recommended] Remove 2 long examples [Y/n]: Y


Your data will be written to a new JSONL file. Proceed [Y/n]: Y

Wrote modified file to `../data/train/40-topic-sample-1000-train_prepared.jsonl`
Feel free to take a look!

Now use that file when fine-tuning:
> openai api fine_tunes.create -t "../data/train/40-topic-sample-1000-train_prepared.jsonl"

After you’ve fin

In [None]:
import openai

### Train using OpenAI API


In [None]:
# Create fine-tuning job
!export OPENAI_API_KEY="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"; openai api fine_tunes.create -t train/20-topic-sample-1000-train_prepared.jsonl -v val/20-topic-sample-1000-val_prepared.jsonl --model curie --n_epochs 8 --batch_size 5
# !export OPENAI_API_KEY="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"; openai api fine_tunes.create -t train/30-topic-sample-1000-train_prepared.jsonl -v val/30-topic-sample-1000-val_prepared.jsonl --model curie --n_epochs 5 --batch_size 5
# !export OPENAI_API_KEY="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"; openai api fine_tunes.create -t train/40-topic-sample-1000-train_prepared.jsonl -v val/40-topic-sample-1000-val_prepared.jsonl --model curie --n_epochs 2 

Found potentially duplicated files with name '20-topic-sample-1000-train_prepared.jsonl', purpose 'fine-tune' and size 1261752 bytes
file-ffXSdsRSY794IoGdX08aLSBa
Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway: file-ffXSdsRSY794IoGdX08aLSBa
Reusing already uploaded file: file-ffXSdsRSY794IoGdX08aLSBa
Found potentially duplicated files with name '20-topic-sample-1000-val_prepared.jsonl', purpose 'fine-tune' and size 137961 bytes
file-w2QiH5QgZrWfzlEbwkge5CgA
Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway: file-w2QiH5QgZrWfzlEbwkge5CgA
Reusing already uploaded file: file-w2QiH5QgZrWfzlEbwkge5CgA
Created fine-tune: ft-3zyYbSvoKrgcLJkY0gBu8Skm
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2022-12-11 19:50:43] Created fine-tune: ft-3zyYbSvoKrgcLJkY0gBu8Skm
[2022-12-11 19:50:47] Fine-tune costs $7.39
[2022-12-11 19:50:47] Fine-

In [None]:
# Reconnect to fine-tune if needed (change job number)
!export OPENAI_API_KEY="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"; openai api fine_tunes.follow -i ft-3zyYbSvoKrgcLJkY0gBu8Skm

[2022-12-11 19:50:43] Created fine-tune: ft-3zyYbSvoKrgcLJkY0gBu8Skm
[2022-12-11 19:50:47] Fine-tune costs $7.39
[2022-12-11 19:50:47] Fine-tune enqueued. Queue number: 0
[2022-12-11 19:50:51] Fine-tune started
[2022-12-11 19:54:27] Completed epoch 1/8
[2022-12-11 19:57:15] Completed epoch 2/8
[2022-12-11 20:00:02] Completed epoch 3/8
[2022-12-11 20:02:49] Completed epoch 4/8
[2022-12-11 20:05:34] Completed epoch 5/8
[2022-12-11 20:08:20] Completed epoch 6/8
[2022-12-11 20:11:08] Completed epoch 7/8
[2022-12-11 20:13:56] Completed epoch 8/8
[2022-12-11 20:14:21] Uploaded model: curie:ft-personal-2022-12-11-20-14-21
[2022-12-11 20:14:22] Uploaded result file: file-1XHOnkjGVedUJl2Ca1fRVALi
[2022-12-11 20:14:22] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m curie:ft-personal-2022-12-11-20-14-21 -p <YOUR_PROMPT>


In [None]:
# Output learning logs to csv file -- includes training + val loss
!export OPENAI_API_KEY="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"; openai api fine_tunes.results -i "ft-3zyYbSvoKrgcLJkY0gBu8Skm" > ../models/20-topic-long_results.csv

In [None]:
# Lists all fine-tuning jobs
!export OPENAI_API_KEY="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"; openai api fine_tunes.list

## Saved Models (Under Jon's Key):
100 samples Ada (\$0.03) - ada:ft-personal-2022-11-28-23-54-15

100 samples Davinci (\$2.38) - davinci:ft-personal-2022-11-28-23-59-14

1000 samples Curie (\$5.69) - curie:ft-personal-2022-11-29-01-05-27


## Saved Models -- used validation (Under Avalon's Key)

100 samples Ada (5 epochs) (\$0.08) - ada:ft-personal-2022-12-01-18-55-30

100 samples Davinci (2 epochs) (\$2.38) - davinci:ft-personal-2022-12-01-21-52-58

1000 samples Curie (5 epochs) (\$5.69) - curie:ft-personal-2022-12-01-22-22-07

## Saved Models -- used validation, 1000 samples, Curie, 2 epochs (Under Rebecca's Key)

10 topics (\$1.85) - curie:ft-personal-2022-12-11-07-06-32 (job # ft-lJjy7Vy4YgvdKUFNxEBwKagl)

20 topics (\$1.85) - curie:ft-personal-2022-12-11-07-29-55 (job # ft-YuOkTUjYioWnlHfL0y3lObkQ)

30 topics (\$1.85) - curie:ft-personal-2022-12-11-07-45-58 (job # ft-1X7lY2DgzRGmiEpkCWaP47Rk)

40 topics (\$1.85) - curie:ft-personal-2022-12-11-08-01-49 (job # ft-Rf32YslUXVD2AfZvRltsySYu)


## Saved Models -- used validation, 1000 samples, Curie, batch size 5 (Under Christina's Key)

10 topics (2 epochs) (\$1.85) - curie:ft-personal-2022-12-11-18-54-16 (job # ft-11yKZpdsDKkFYb8CNMKIixNx)

20 topics (2 epochs) (\$1.85) - curie:ft-personal-2022-12-11-19-08-29 (job # ft-YuOkTUjYioWnlHfL0y3lObkQ)

30 topics (5 epochs) (\$4.62) - curie:ft-personal-2022-12-11-19-34-45 (job # ft-Ps74y6roPjIEIlxr74L0CRIV)

20 topics (8 epochs) (\$7.39) - curie:ft-personal-2022-12-11-20-14-21 (job # ft-3zyYbSvoKrgcLJkY0gBu8Skm)

### Use fine-tuned models

In [None]:
import csv

In [None]:
# get single instances of prompt/lyrics
data = []
with open('../data/train/lda-train-6-formatted.csv', 'r') as orig_data:
  reader = csv.reader(orig_data, delimiter = ',')
  for row in reader:
    data.append(row)

exist_prompt, exist_lyric = data[15]   # prompt that exists in both data sets (only appears once)
last_prompt, last_lyric = data[-1]    # unseen prompt

In [None]:
print(exist_prompt)

Keith Urban;2

###




In [None]:
# All models that were used and their necessary keys
models_avalon = {
    "key": "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",
    "ada": "ada:ft-personal-2022-12-01-18-55-30",
    "curie": "curie:ft-personal-2022-12-01-22-22-07",
    "davinci": "davinci:ft-personal-2022-12-01-21-52-58"
}
models_jon = {
    "key": "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",
    "ada": "ada:ft-personal-2022-11-28-23-54-15",
    "curie": "curie:ft-personal-2022-11-29-01-05-27",
    "davinci": "davinci:ft-personal-2022-11-28-23-59-14"
}
models_rebecca = {
    "key": "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"
}
models_christina = {
    "key": "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",
    "curie-10": "curie:ft-personal-2022-12-11-18-54-16",
    "curie-30": "curie:ft-personal-2022-12-11-19-34-45",
    "curie-20-short": "curie:ft-personal-2022-12-11-19-08-29",
    "curie-20-long": "curie:ft-personal-2022-12-11-20-14-21"
}

models = models_christina
openai.api_key = models["key"]

In [None]:
# test with existing prompts (evaluates on all prompts from validation set)

# 1000 samples (curie)
with open('../data/val/sample-1000-val.csv', 'r') as val:
  reader = csv.reader(val, delimiter = ',')
  with open('../data/out/out-1000-samples-val.txt', 'w') as out:
    i = 0
    for row in reader:
      if i > 0:
        prompt = row[0]
        result = openai.Completion.create(
            model=models["curie"],
            prompt=prompt,
            max_tokens=1000,
            stop="###")
        out.write(prompt + "<START>" + result.choices[0]["text"] + "<END>\n\n")
      i += 1
print("Generated lyric: " + result.choices[0]["text"] + "\n\n")
print("Original lyric: " + exist_lyric)

In [None]:
# 100 samples (davinci)
with open('../data/val/sample-100-val.csv', 'r') as val:
  reader = csv.reader(val, delimiter = ',')
  with open('../data/out/out-100-samples-val.txt', 'w') as out:
    i = 0
    for row in reader:
      if i > 0:
        prompt = row[0]
        result = openai.Completion.create(
            model=models["davinci"],
            prompt=prompt,
            max_tokens=1000,
            stop="###")
        out.write(prompt + "<START>" + result.choices[0]["text"] + "<END>\n\n")
      i += 1


In [None]:
print(result)

In [None]:
# test unseen prompts (samples from test set)

# 100 samples (davinci)
with open('../data/test/lda-test-6-formatted.csv', 'r') as val:
  reader = csv.reader(val, delimiter = ',')
  with open('../data/out/out-100-samples-test.txt', 'w') as out:
    i = 0
    for row in reader:
      if i > 0 and i < 11:
        prompt = row[0]
        result = openai.Completion.create(
            model=models["davinci"],
            prompt=prompt,
            max_tokens=1000,
            stop="###")
        out.write(prompt + "<START>" + result.choices[0]["text"] + "<END>\n\n")
      i += 1


In [None]:
# 1000 samples (curie -- change input file as needed)
with open('../data/test/big-lda-test-30-formatted.csv', 'r') as val:
  reader = csv.reader(val, delimiter = ',')
  with open('../data/out/gpt3/fine-tunning/out-30-topics.txt', 'w') as out:
    i = 0
    for row in reader:
      if i > 0 and i < 101:
        prompt = row[0]
        result = openai.Completion.create(
            model=models["curie-30"],
            prompt=prompt,
            max_tokens=1000,
            stop="###")
        out.write(prompt + "<START>" + result.choices[0]["text"] + "<END>\n\n")
      i += 1

In [None]:
# single prompt for testing
result = openai.Completion.create(
    model=models["curie"],
    prompt="Taylor Swift;3",
    max_tokens=1000,
    stop="###")
print("Taylor Swift;5" + "<START>" + result.choices[0]["text"] + "<END>\n\n")

Taylor Swift;5<START>

I'm coming for you
You think you're so righteous
I'm coming for you
You'd like that wouldn't you
We feel so much the same
I thought I knew you
You thought I was blind
We feel so much the same

I'm coming for you
Hold on to your pants
I just have to know you
We feel so much the same
I thought I knew you
You thought I was blind
We feel so much the same

We had such a spectacular summer
That hurt way too much to ever forget it
We had such similarly structured arguments
That hurt way too much to ever forgive
Here we are now
It's the last summer
So hold on to your pants
Cause this just got serious
Here we are now
It's the last summer
So don't you ever forget it

You're only human girl
I won't forget you
I only like to hurt humans
I'll beat your<END>




## Few-shot Baselines

### Using 6 topic set

In [None]:
# Samples w same topics
raw_data = []
with open('../data/train/lda-train-6.csv', 'r') as orig_data:
  reader = csv.reader(orig_data, delimiter = ',')
  i = 0
  for row in reader:
    if i > 0 and i < 1000:
      raw_data.append(row)
    i += 1

num_samples = 1   # Number of samples for each topic
by_topic = [[] for i in range(6)]
for artist, topic_id, lyric in raw_data:
  id = int(topic_id)
  if len(by_topic[id]) < num_samples * 15:
    by_topic[id].append((artist, lyric))

input_artist = "Taylor Swift"
input_topic = 0

prompts_topic = ["" for i in range(6)]  # Examples separated by topic
prompt_gen = ""  # Example with all topics
for id, topic in enumerate(by_topic):
  i = 0
  for artist, lyric in topic:
    prompts_topic[id] += "Artist: " + artist + "\nTopic ID: " + str(id) + "\nLyrics: " + lyric + "\n###\n"
    if i < num_samples:
      prompt_gen += "Artist: " + artist + "\nTopic ID: " + str(id) + "\nLyrics: " + lyric + "\n###\n"
      i += 1
  prompts_topic[id] += "Artist: " + input_artist + "\nTopic ID: " + str(input_topic) + "\n Lyrics: "
prompt_gen += "Artist: " + input_artist + "\nTopic ID: " + str(input_topic) + "\nLyrics: "

In [None]:
for artist, _ in by_topic[id]:
  print(artist + "\n")

In [None]:
# Example includes all topics
result = openai.Completion.create(
    model="text-davinci-003",
    prompt=prompt_gen,
    max_tokens=400,
    stop="###")
print(result.choices[0]["text"])


It's a game, a cruel game
Played on cards too dark to see
I was drawn in by your beauty
Your touch has brought me to my knees

And I'm not sure what this means
Oh, what this could ever be
But I feel like I'm falling for you
Falling without knowing why

Falling so fast
I'm tumbling over time
So deep, so close
I feel the magnitude
For taking a chance on you

It's a long shot, a long run
A journey with no return
But I'm willing to take the risk
For a chance to feel your touch

Cause I'm not sure what this means
Oh, what this could ever be
But I feel like I'm falling for you
Falling without knowing why

Falling so fast
I'm tumbling over time
So deep, so close
I feel the magnitude
For taking a chance on you

Every single moment with you
Is a moment on borrowed time
But I'm not sure if I'm ready
To open up my heart this time

And I'm not sure what this means
Oh, what this could ever be
But I feel like I'm falling for you
Falling without knowing why

Falling so fast
I'm tumbling over time
So

In [None]:
print(result)

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "text": "\nIt's a game, a cruel game\nPlayed on cards too dark to see\nI was drawn in by your beauty\nYour touch has brought me to my knees\n\nAnd I'm not sure what this means\nOh, what this could ever be\nBut I feel like I'm falling for you\nFalling without knowing why\n\nFalling so fast\nI'm tumbling over time\nSo deep, so close\nI feel the magnitude\nFor taking a chance on you\n\nIt's a long shot, a long run\nA journey with no return\nBut I'm willing to take the risk\nFor a chance to feel your touch\n\nCause I'm not sure what this means\nOh, what this could ever be\nBut I feel like I'm falling for you\nFalling without knowing why\n\nFalling so fast\nI'm tumbling over time\nSo deep, so close\nI feel the magnitude\nFor taking a chance on you\n\nEvery single moment with you\nIs a moment on borrowed time\nBut I'm not sure if I'm ready\nTo open up my heart this time\n\nAnd I'm not sure w

In [None]:
# Examples includes requested topic
result = openai.Completion.create(
    model="text-davinci-003",
    prompt=prompts_topic[input_topic],
    max_tokens=400,
    stop="###")
print(result.choices[0]["text"])


Verse 1
I've seen love, I've seen lies
Searching for truth in between black and white
Your dream world, no disguise
Just one chance, don't let it ride

Pre-Chorus
Lightning strikes and then it fades away
Missing moment, running out of time to play

Chorus
My hero, my enemy
My constant changing, adrenaline
My precious curse for eternity
My hero, my enemy

Verse 2
I've seen pain, I've seen pride
Sometimes we fight, but still make it through the night
The stormy clouds, they pass by
You can stay, and just be mine

Pre-Chorus
Lightning strikes and then it fades away
I'm calling out, whatever it takes to stay

Chorus
My hero, my enemy
My constant changing, adrenaline
My precious curse for eternity
My hero, my enemy

Bridge
Your love, it's like a symphony
Play it loud, so I'll never feel alone
My miracle, it brings me back again
I'll be your hero, you be my enemy

Chorus
My hero, my enemy
My constant changing, adrenaline
My precious curse for eternity
My hero, my enemy


In [None]:
print(result)

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "text": "\nVerse 1\nI've seen love, I've seen lies\nSearching for truth in between black and white\nYour dream world, no disguise\nJust one chance, don't let it ride\n\nPre-Chorus\nLightning strikes and then it fades away\nMissing moment, running out of time to play\n\nChorus\nMy hero, my enemy\nMy constant changing, adrenaline\nMy precious curse for eternity\nMy hero, my enemy\n\nVerse 2\nI've seen pain, I've seen pride\nSometimes we fight, but still make it through the night\nThe stormy clouds, they pass by\nYou can stay, and just be mine\n\nPre-Chorus\nLightning strikes and then it fades away\nI'm calling out, whatever it takes to stay\n\nChorus\nMy hero, my enemy\nMy constant changing, adrenaline\nMy precious curse for eternity\nMy hero, my enemy\n\nBridge\nYour love, it's like a symphony\nPlay it loud, so I'll never feel alone\nMy miracle, it brings me back again\nI'll be your her

In [None]:
# Test using prompts from test set (unseen)
raw_data = []
with open('../data/train/lda-train-6.csv', 'r') as orig_data:
  reader = csv.reader(orig_data, delimiter = ',')
  i = 0
  for row in reader:
    if i > 1 and i < 1000:
      raw_data.append(row)
    i += 1

num_samples = 1   # Number of samples for each topic
by_topic = [[] for i in range(6)]
for artist, topic_id, lyric in raw_data:
  id = int(topic_id)
  if len(by_topic[id]) < num_samples * 6:   # Limited to 6 samples for token reason
    by_topic[id].append((artist, lyric))

# input_artist = "Taylor Swift"
# input_topic = 0

with open('../data/test/lda-test-6-formatted.csv', 'r') as test:
  reader = csv.reader(test, delimiter = ',')
  with open('../data/out/out-fewshot-topic-test.txt', 'w') as out_topic:
    with open('../data/out/out-fewshot-gen-test.txt', 'w') as out_gen:
      i = 0
      for row in reader:
        if i > 1 and i < 8:   # Only using 6 examples; ran into API access denied issues
          prompt = row[0]
          input_artist = prompt.split(';')[0]
          input_topic = int(prompt.split(';')[1][0])
          prompts_topic = ["" for i in range(6)]  # Examples separated by topic
          prompt_gen = ""  # Example with all topics
          for id, topic in enumerate(by_topic):
            j = 0
            for artist, lyric in topic:
              prompts_topic[id] += "Artist: " + artist + "\nTopic ID: " + str(id) + "\nLyrics: " + lyric + "\n###\n"
              if j < num_samples:
                prompt_gen += "Artist: " + artist + "\nTopic ID: " + str(id) + "\nLyrics: " + lyric + "\n###\n"
                j += 1
            prompts_topic[id] += "Artist: " + input_artist + "\nTopic ID: " + str(input_topic) + "\n Lyrics: "
          prompt_gen += "Artist: " + input_artist + "\nTopic ID: " + str(input_topic) + "\nLyrics: "

          # Example includes all topics
          result = openai.Completion.create(
              model="text-davinci-003",
              prompt=prompt_gen,
              max_tokens=400,
              stop="###")
          out_gen.write(prompt + "<START>" + result.choices[0]["text"] + "<END>\n\n")

          # Examples includes requested topic
          result = openai.Completion.create(
              model="text-davinci-003",
              prompt=prompts_topic[input_topic],
              max_tokens=400,
              stop="###")
          out_topic.write(prompt + "<START>" + result.choices[0]["text"] + "<END>\n\n")

          print("finished test " + str(i))
          
        i += 1

finished test 2
finished test 3
finished test 4
finished test 5
finished test 6
finished test 7


In [None]:
# Test using prompts from train set (seen)
raw_data = []
with open('../data/train/lda-train-6.csv', 'r') as orig_data:
  reader = csv.reader(orig_data, delimiter = ',')
  i = 0
  for row in reader:
    if i > 1 and i < 1000:
      raw_data.append(row)
    i += 1

num_samples = 1   # Number of samples for each topic
by_topic = [[] for i in range(6)]
for artist, topic_id, lyric in raw_data:
  id = int(topic_id)
  if len(by_topic[id]) < num_samples * 6:   # Limited to 6 samples for token reason
    by_topic[id].append((artist, lyric))

# input_artist = "Taylor Swift"
# input_topic = 0

with open('../data/train/lda-train-6-formatted.csv', 'r') as test:
  reader = csv.reader(test, delimiter = ',')
  with open('../data/out/out-fewshot-topic-val.txt', 'w') as out_topic:
    with open('../data/out/out-fewshot-gen-val.txt', 'w') as out_gen:
      i = 0
      for row in reader:
        if i > 1 and i < 8:   # Only using 6 examples; ran into API access denied issues
          prompt = row[0]
          input_artist = prompt.split(';')[0]
          input_topic = int(prompt.split(';')[1][0])
          prompts_topic = ["" for i in range(6)]  # Examples separated by topic
          prompt_gen = ""  # Example with all topics
          for id, topic in enumerate(by_topic):
            j = 0
            for artist, lyric in topic:
              prompts_topic[id] += "Artist: " + artist + "\nTopic ID: " + str(id) + "\nLyrics: " + lyric + "\n###\n"
              if j < num_samples:
                prompt_gen += "Artist: " + artist + "\nTopic ID: " + str(id) + "\nLyrics: " + lyric + "\n###\n"
                j += 1
            prompts_topic[id] += "Artist: " + input_artist + "\nTopic ID: " + str(input_topic) + "\n Lyrics: "
          prompt_gen += "Artist: " + input_artist + "\nTopic ID: " + str(input_topic) + "\nLyrics: "

          # Example includes all topics
          result = openai.Completion.create(
              model="text-davinci-003",
              prompt=prompt_gen,
              max_tokens=400,
              stop="###")
          out_gen.write(prompt + "<START>" + result.choices[0]["text"] + "<END>\n\n")

          # Examples includes requested topic
          result = openai.Completion.create(
              model="text-davinci-003",
              prompt=prompts_topic[input_topic],
              max_tokens=400,
              stop="###")
          out_topic.write(prompt + "<START>" + result.choices[0]["text"] + "<END>\n\n")

          print("finished test " + str(i))
          
        i += 1

finished test 2
finished test 3
finished test 4
finished test 5
finished test 6
finished test 7


### Using 40 topic set

In [None]:
import random

# Samples w same topics
raw_data = []
with open('../data/train/big-lda-train-40.csv', 'r') as orig_data:
  reader = csv.reader(orig_data, delimiter = ',')
  i = 0
  for row in reader:
    if i > 0 and i < 2000:
      raw_data.append(row)
    i += 1

num_samples = 1   # Number of samples for each topic
num_total = 13  # Total number of examples per prompt
by_topic = [[] for i in range(40)]
for artist, topic_id, lyric in raw_data:
  id = int(topic_id)
  if len(by_topic[id]) < num_samples * num_total:
    by_topic[id].append((artist, lyric))

input_artist = "Taylor Swift"
input_topic = 0
sample_topics = random.sample(range(40), num_total)  # topics to include in generalized examples

prompts_topic = ["" for i in range(40)]  # Examples separated by topic
prompt_gen = ""  # Example with all topics
for id, topic in enumerate(by_topic):
  for i, (artist, lyric) in enumerate(topic):
    prompts_topic[id] += "Artist: " + artist + "\nTopic ID: " + str(id) + "\nLyrics: " + lyric + "\n###\n"
    if id in sample_topics and i == 0:
      prompt_gen += "Artist: " + artist + "\nTopic ID: " + str(id) + "\nLyrics: " + lyric + "\n###\n"
  prompts_topic[id] += "Artist: " + input_artist + "\nTopic ID: " + str(input_topic) + "\n Lyrics: "
prompt_gen += "Artist: " + input_artist + "\nTopic ID: " + str(input_topic) + "\nLyrics: "

In [None]:
for artist, _ in by_topic[input_topic]:
  print(artist + "\n")

jessie j

grace jones

vektor

editors

looney tunes songs

george jones

superchunk

carole king

the decemberists

mercyme

lynyrd skynyrd

bright eyes

pat benatar



In [None]:
# Example includes all topics
result = openai.Completion.create(
    model="text-davinci-003",
    prompt=prompt_gen,
    max_tokens=400,
    stop="###")
print(result.choices[0]["text"])


I'd like to be everything you want, hey,
So won't you tell me everything that's on your mind?
'Cause I, I believe in second chances,
If you're ready, it could feel so right.

If I could take the trend and turn it into truth
I'm here to give you love and all I've got to give
I'm ready for whatever life has for me
And don't let go of what we had at the beginning 

I wanna be the reason you let down your guard
I want to be the one to heal your broken heart
So let me in, I promise I won't hurt you again
If you let me, I could be
The one to fight for the love we could make

I could be strong for the both of us (Oh, for the both of us)
And try not to be scared, when things don't go our way
I wanna be the one, the one that you love
The one that you care for, the one that you trust.


In [None]:
print(result)

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "text": "\nI'd like to be everything you want, hey,\nSo won't you tell me everything that's on your mind?\n'Cause I, I believe in second chances,\nIf you're ready, it could feel so right.\n\nIf I could take the trend and turn it into truth\nI'm here to give you love and all I've got to give\nI'm ready for whatever life has for me\nAnd don't let go of what we had at the beginning \n\nI wanna be the reason you let down your guard\nI want to be the one to heal your broken heart\nSo let me in, I promise I won't hurt you again\nIf you let me, I could be\nThe one to fight for the love we could make\n\nI could be strong for the both of us (Oh, for the both of us)\nAnd try not to be scared, when things don't go our way\nI wanna be the one, the one that you love\nThe one that you care for, the one that you trust."
    }
  ],
  "created": 1670372193,
  "id": "cmpl-6KcOv6Q6gLAGRKNGDL5wzDAKxaROz",

In [None]:
# Examples includes requested topic
result = openai.Completion.create(
    model="text-davinci-003",
    prompt=prompts_topic[input_topic],
    max_tokens=400,
    stop="###")
print(result.choices[0]["text"])


So it's gonna be forever Or it's gonna go down in flames You can tell me when it's over If the high was worth the pain Got a long list of ex-lovers They'll tell you I'm insane 'Cause you know I love the players And you love the game  'Cause we're young and we're reckless We'll take this way too far And leave you breathless Or with a nasty scar Got a long list of ex-lovers They'll tell you I'm insane But I've got a blank space baby And I'll write your name  So it's gonna be forever Or it's gonna go down in flames You can tell me when it's over If the high was worth the pain 'Cause we're young and we're reckless We'll take this way too far And leave you breathless Or with a nasty scar Got a long list of ex-lovers They'll tell you I'm insane But I've got a blank space baby And I'll write your name


In [None]:
print(result)

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "text": "\nSo it's gonna be forever Or it's gonna go down in flames You can tell me when it's over If the high was worth the pain Got a long list of ex-lovers They'll tell you I'm insane 'Cause you know I love the players And you love the game  'Cause we're young and we're reckless We'll take this way too far And leave you breathless Or with a nasty scar Got a long list of ex-lovers They'll tell you I'm insane But I've got a blank space baby And I'll write your name  So it's gonna be forever Or it's gonna go down in flames You can tell me when it's over If the high was worth the pain 'Cause we're young and we're reckless We'll take this way too far And leave you breathless Or with a nasty scar Got a long list of ex-lovers They'll tell you I'm insane But I've got a blank space baby And I'll write your name"
    }
  ],
  "created": 1670372609,
  "id": "cmpl-6KcVdiWGyyje1m55FH4nUSHTWMYQe"