# Fine Tuning GPT3-Curie with r/WritingPrompts

This is an open-ended project to explore the fine-tuning process of GPT-3 and the panic around AI replacing writing jobs. I would like to ascertain how valuable fine-training is with creative tasks like short stories; for example, how does a fine-tuned curie model, with much lower parameters and cost of generation, compare to the out of the box davinci model? Additionally, it will be interesting to see whether the quality of stories from all models come close to human quality at all.

In this project, I scrape top posts and corresponding top comments from the subreddit WritingPrompts, and use them as training data for OpenAI's gpt3 curie model. I then compare performance across three models (fine-tuned curie, default curie, and default davinci) with a prompt from training and a new prompt and subjectively evaluate their story quality. 

In [5]:
import praw
import pandas as pd
import openai
import os

We use PRAW (Python Reddit API Wrapper) to easily scrape the top 1000 posts from the WritingPrompts subreddit. Since we will not be posting, read-only access is sufficient.

In [31]:
reddit_read_only = praw.Reddit(client_id="e0V6G5HrMUQC2Xy9n6Q-Pw",
                               client_secret="3cZxVleFnIQoOXyeSf0q0YNYu_CSfQ",
                               user_agent="Writingprompts Scraping")
subreddit = reddit_read_only.subreddit("WritingPrompts")

In [63]:
posts = subreddit.top(time_filter = 'all', limit = 1000)

posts_dict = {"Prompt": [], "Completion": []}

for post in posts:
    title = post.title
    url = post.url
    
    # skip ads
    if "reddit" not in url:
        continue
    
    posts_dict["Prompt"].append("Write a short story that builds on this prompt: " + title[5:])
    
    submission = reddit_read_only.submission(url=url)
    
    for comment in submission.comments[1:]:
        #sometimes the top comment has been deleted, just look for the next one
        if "[deleted]" not in comment.body:
            top_comment = comment
            break
    posts_dict["Completion"].append(top_comment.body)
    
    
top_posts = pd.DataFrame(posts_dict)
top_posts
    

Unnamed: 0,Prompt,Completion
0,Write a short story that builds on this prompt...,"There he sat, twirling his personal, stylized ..."
1,Write a short story that builds on this prompt...,Just as Donald was about to press the button t...
2,Write a short story that builds on this prompt...,"Edited July 28, 2020\n\nDear YouTube: StoryShu..."
3,Write a short story that builds on this prompt...,We called it humanity's worst disaster. Histor...
4,Write a short story that builds on this prompt...,It had already been explained to me by the clo...
...,...,...
993,Write a short story that builds on this prompt...,"Black clad and armed, the team of men that had..."
994,Write a short story that builds on this prompt...,"""Have you ever considered, you know, doing som..."
995,Write a short story that builds on this prompt...,I'm the grave-digger for secrets. I bore pits ...
996,Write a short story that builds on this prompt...,Something wasn't right with Jeremy.\n\nHis cow...


We iterate through the top 1000 posts. There were two ads that were falsely classified as posts which we have to skip. I initially added "Write a short story that builds on this prompt:" to the beginning of each prompt as I assumed this was the clearest way to prompt GPT to write a story, but this was deemed unneccessary and removed by openai's data preparation tool. I only took the top comment for each post (skipping comments that were deleted), as models are generally trained with 1:1 input to output, but it would be interesting to see if the model could handle training with multiple outputs for the same input. 

In [64]:
top_posts.to_csv("top_posts.csv", index=True)

In [None]:
!pip install --upgrade openai

We let the openai data preparation tool take all neccessary and reccommended actions in order for training to go as smoothly as possible. 11 examples were removed for being too long, leaving us with 987 examples. 

In [65]:
!openai tools fine_tunes.prepare_data -f top_posts.csv -q

Analyzing...

- Based on your file extension, your file is formatted as a CSV file
- Your file contains 998 prompt-completion pairs
- The `prompt` column/key should be lowercase
- The `completion` column/key should be lowercase
- The input file should contain exactly two columns/keys per row. Additional columns/keys present are: ['Unnamed: 0']
- There are 11 examples that are very long. These are rows: [36, 93, 108, 199, 247, 303, 362, 589, 650, 807, 962]
For conditional generation, and for classification the examples shouldn't be longer than 2048 tokens.
- Your data does not contain a common separator at the end of your prompts. Having a separator string appended to the end of the prompt makes it clearer to the fine-tuned model where the completion should begin. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more detail and examples. If you intend to do open-ended generation, then you should leave the prompts empty
- All prompts start with prefix `W

In [7]:
os.environ["OPENAI_API_KEY"] = "API KEY HERE"

We use the default model curie. According to openai's documentation, curie is comparable in quality to the davinci model but with significantly less training time. The task is too complex to use a lower level model like ada or babbage.

In [None]:
!openai tools fine_tunes.prepare_data -f top_posts_short.csv -q

In [73]:
!openai api fine_tunes.create -t "top_posts_short_prepared.jsonl"

Found potentially duplicated files with name 'top_posts_short_prepared.jsonl', purpose 'fine-tune' and size 2118656 bytes
file-SNFn6sjPf1WCvabKEkLZZ3we
Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway: ^C



Initially, I wanted to train with the full set of examples, but the length was causing me several issues, so I prepared a shortened version of the file (500 examples). I was unable to respond to the duplication prompt through notebook, so I had to run this command in my local terminal instead.

In [75]:
!openai api fine_tunes.follow -i "ft-u3jjIvHDKkK7ZziZdGqZnmh8"

[2023-05-31 14:00:17] Created fine-tune: ft-u3jjIvHDKkK7ZziZdGqZnmh8
[2023-05-31 14:01:32] Fine-tune costs $6.21
[2023-05-31 14:01:33] Fine-tune enqueued. Queue number: 0
[2023-05-31 14:01:34] Fine-tune started
[2023-05-31 14:04:09] Completed epoch 1/4
[2023-05-31 14:15:28] Fine-tune started
[2023-05-31 14:17:57] Completed epoch 1/4
[2023-05-31 14:19:23] Completed epoch 2/4
[2023-05-31 14:20:50] Completed epoch 3/4
[2023-05-31 14:22:17] Completed epoch 4/4
[2023-05-31 14:22:34] Uploaded model: curie:ft-personal-2023-05-31-21-22-34
[2023-05-31 14:22:35] Uploaded result file: file-inkHEF6j3Mh9BW1mmHhrCTcb
[2023-05-31 14:22:35] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m curie:ft-personal-2023-05-31-21-22-34 -p <YOUR_PROMPT>


Our finetuned model is ready! Lets try giving it one of the prompts it already trained on first. We use the same prompt with different temperatures on the finetuned curie model, the default curie model, and the default davinci models. 

In [14]:
openai.api_key = "API KEY HERE"

In [17]:
ft_model = 'curie:ft-personal-2023-05-31-21-22-34'
prompt = "When you die, you appear in a cinema with a number of other people who look like you. You find out that they are your previous reincarnations, and soon you all begin watching your next life on the big screen. ->"
result = openai.Completion.create(model=ft_model, prompt=prompt, max_tokens=1500, temperature=0)
result['choices'][0]['text']

' I was sitting in the back row of the theater, my legs tucked under me. I was wearing a black hoodie and black jeans. My hair was a little longer than it had been in my previous life. I was watching the movie, but I wasn\'t really paying attention. I was thinking about how I was going to get home. I didn\'t have a car, and I didn\'t have enough money for a taxi.\n\nI heard a couple people talking in the front row. One of them was a woman with brown hair and brown eyes. She was wearing a red dress. The other person was a man with black hair and black eyes. He was wearing a black suit.\n\n"I don\'t know why he has to be so mean to her," the woman said. "He doesn\'t even like her."\n\n"He\'s just trying to scare her," the man replied. "He\'s a bully."\n\n"I know, but he\'s so mean. I don\'t understand why he does that."\n\n"He\'s just trying to prove how tough he is," the man said. "He\'s a bully."\n\nI looked at the two of them. They looked like they were in their twenties. The woman ha

In [19]:
result = openai.Completion.create(model=ft_model, prompt=prompt, max_tokens=1500, temperature=.5)
result['choices'][0]['text']

" The first person to die was a woman named Frances. She was found dead in her bed, a bottle of sleeping pills beside her.\n\nThe next person to die was a man named Paul. He was found hanging from a tree, his neck snapped.\n\nThe next was a woman named Mary. She was found floating in a pond, an empty bottle of liquor beside her.\n\nThe next was a man named James. He was found lying on the tracks, his head crushed by a train.\n\nAnd so it went. One by one, they died, and they went to the afterlife. But there was something strange about the afterlife. It was all a bit... boring. There was no music, no laughter, no color. Just them. All of them, watching their next life unfold on a screen.\n\nThey didn't know it yet, but they were all going to die again.\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp;\n\n&nbsp

In [21]:
result = openai.Completion.create(model='curie', prompt=prompt, max_tokens=1500, temperature=0)
result['choices'][0]['text']

' You are a reincarnation of a girl who died in a car accident. You are watching your next life on the big screen. -> You are a reincarnation of a girl who died in a car accident. You are watching your next life on the big screen. -> You are a reincarnation of a girl who died in a car accident. You are watching your next life on the big screen. -> You are a reincarnation of a girl who died in a car accident. You are watching your next life on the big screen. -> You are a reincarnation of a girl who died in a car accident. You are watching your next life on the big screen. -> You are a reincarnation of a girl who died in a car accident. You are watching your next life on the big screen. -> You are a reincarnation of a girl who died in a car accident. You are watching your next life on the big screen. -> You are a reincarnation of a girl who died in a car accident. You are watching your next life on the big screen. -> You are a reincarnation of a girl who died in a car accident. You are 

In [22]:
result = openai.Completion.create(model='curie', prompt=prompt, max_tokens=1500, temperature=.5)
result['choices'][0]['text']

'\n\nThe film is based on the book by the same name by British author Douglas Adams. The book was published in 1982 and has been translated into more than 30 languages. It has sold more than 10 million copies worldwide.\n\nThe film is based on the book by the same name by British author Douglas Adams. The book was published in 1982 and has been translated into more than 30 languages. It has sold more than 10 million copies worldwide.\n\nThe film is based on the book by the same name by British author Douglas Adams. The book was published in 1982 and has been translated into more than 30 languages. It has sold more than 10 million copies worldwide.\n\nThe film is based on the book by the same name by British author Douglas Adams. The book was published in 1982 and has been translated into more than 30 languages. It has sold more than 10 million copies worldwide.'

In [23]:
result = openai.Completion.create(model='curie', prompt=prompt, max_tokens=1500, temperature=.9)
result['choices'][0]['text']

"\n\nExits:\n\n[west]\n\n[north]\n\n[south]\n\n[west]\n\n[s]\n\n[o]\n\n[cast npc 'Widgets']\n\nYou go to another\n\ndungeon level.\n\nAren't you gonna play?\n\n[DBSL] The Wiz\n\n*** Spoiler - click to reveal *** You are the Wiz, one of many wizards who dwell in the Pits of Slime. As the Wiz, dress up in this wizard robe and wield this\n\nwand in your left hand. Press ? for a list of\n\npowers that the Wiz can use.\n\n[DBSL] So Good\n\n*** Spoiler - click to reveal *** It is good, yet not good. The conclusion that\n\n\n\nhangs, causing its own undoing,\n\nshould provoke revulsion and rejection.\n\nThe good is fluid and expiring,\n\nand its end will slip only away\n\nwithout any evidence of its prior\n\nexistence. It will, moreover,\n\nbe great and good.\n\n[DBCL] Crystal\n\nIt is cool, but neither cool nor not cool. The\n\nconclusion was pushed away without\n\nany evidence of its prior existence.\n\nIt will, moreover,\n\nbe cool and not cool.\n\nIt is really good.\n\n[DBCL] Great Good\n

In [24]:
result = openai.Completion.create(model=ft_model, prompt=prompt, max_tokens=1500, temperature=.9)
result['choices'][0]['text']

' [removed] END OF TITLE   \n\nI really enjoyed the whole story, but I had a weird issue that didn\'t really make sense to me. I\'m not even sure it\'s worth mentioning because it might be me just being stupid. \n\nWhen part 2 came up, the first person I saw was a little awkward looking guy sitting in the middle of what looked like a student filled auditorium with rows of chairs. I didn\'t think much of it because there were obviously a few of us in the bunch that died. The film started up and the little guy was trying to stop another student from killing him with a hatchet. After a few seconds, the hatchet went flying and the little guys head was split right down the middle. It was like I heard the noise of a hatchet flying across the room, but unable to look away to check out the source of the noise, I froze in my seat. \n\nAfter a few moments, the little guys head rolled off of the chopping block and he had a massive pool of blood growing underneath him. I freaked out and got really

In [25]:
result = openai.Completion.create(model='davinci', prompt=prompt, max_tokens=1500, temperature=.9)
result['choices'][0]['text']

' alternate dimension\'s same time and space.\n\nLonger explanation, with a spoiler:\n\nSpoiler:\n\nWe are actually the people in the alternate dimension, but we don\'t know that -- we only know that the cinema exists and it looks like we\'re attending the movie, at least until a specific point in the movie where we suddenly realize that we\'re the main character.\n\n\n\nThe audience in the alternate dimension knows that the cinema exists, but doesn\'t necessarily know what the movie is (it\'s more like they\'re just aware that there\'s a movie going on that they can\'t necessarily watch), while the main character -- let\'s call him Simon -- is the only one with enough information to figure out that he can change the movie simply by changing what he does at critical moments, e.g. when he first meets his girlfriend.\n\n\n\nAt the beginning of the movie, before he meets the girl, he sees his life as a static "fixed point in time" (that he did nothing to create or influence) that only giv

In [26]:
result = openai.Completion.create(model=ft_model, prompt=prompt, max_tokens=1500, temperature=.6)
result['choices'][0]['text']

' I thought it was a dream. I\'d died, my body had shut down, my heart had stopped. I didn\'t know what to think. I was lying in a hospital bed, my head covered in a thick bandage. I looked around. Where was I? The last thing I could remember was going to a movie with my girlfriend. After that I guess I died. I looked at the other people in the room. They were all dead too. A woman was lying in the bed next to me. She looked familiar but I couldn\'t place where I knew her from.\n\n"Hello?" I said. "Am I dead?"\n\nThe woman turned her head to look at me. She had a wide smile plastered on her face.\n\n"Yes you are." She said.\n\n"Who are you?" I asked.\n\n"I am you." She said.\n\n"I don\'t understand." I said.\n\n"We all died together. We\'re all watching our next lives unfold." She said.\n\n"How many of you are there? And what movie am I watching?" I asked.\n\n"You are watching the next episode of the life of John Smith." She said.\n\n"What? No. I mean how many of you are there? I don\'

In [27]:
result = openai.Completion.create(model='davinci', prompt=prompt, max_tokens=1500, temperature=.6)
result['choices'][0]['text']

' You are reborn, and you remember your previous life.\n\nThe End.\n\n\n\nAchievements:\n\nThe Perfect Ending - Get a perfect ending.\n\nThe Perfect Ending - Get a perfect ending. The Perfect Beginning - Get a perfect ending on your first play-through.\n\nThe Perfect Beginning - Get a perfect ending on your first play-through. The Perfect Life - Get a perfect ending on your first play-through and get a perfect beginning.\n\nThe Perfect Life - Get a perfect ending on your first play-through and get a perfect beginning. The Perfect Reincarnation - Get a perfect ending on your first play-through and get a perfect beginning, then get a perfect ending again.\n\nThe Perfect Reincarnation - Get a perfect ending on your first play-through and get a perfect beginning, then get a perfect ending again. The Secret Ending - Get a perfect ending on your first play-through and get a perfect beginning, then get a perfect ending again and choose the option that says "I\'m not sure what happens next."\n

All models perform poorly and are prone to repetition at low temperatures. The only model that was able to produce a somewhat coherant story was our fine-tuned curie model. The davinci model at .6 temperature had some interesting ideas, but it did not follow a story structure. The default curie model performs very poorly. It seems that .6-.9 is a good temperature range. Let's try comparing our fine-tuned curie model with the default curie and davinci models using a prompt that neither has seen before.

In [30]:
prompt2 = "here is a legend of demons that cannot be bound or banished, that laugh at salt and holy water, for these demons were not born of hell. The were actually born of mother Gaia, yet choose destruction of their own free will. Such horror may be hard to fathom, but I tell you, humans are real. ->"
result = openai.Completion.create(model=ft_model, prompt=prompt2, max_tokens=1500, temperature=.8)
result['choices'][0]['text']

' “Mama, what are those long things flying around in the sky?”\n\n“Those are the gods, sweetie. They have the biggest wings, and they fly around, looking for things to watch,” she said.\n\nI looked up and said, “Oh, I like how they flap their wings. It looks like they’re having fun.”\n\n“They must be doing something fun, they don’t flap their wings on each and every single flight. They know how to be graceful when they want to.”\n\n“But, why can’t we be like those? Why do we need the big wings?”\n\n“Because, my sweet little darling, they are not like us. They are more like us in a way, but they are born of the sky, not the earth.”\n\n“Oh, so they are not from Gaia?”\n\n“No, they are from Gaia, but they chose to be different. They chose to be a different kind of creature.”\n\n“What kind are they?”\n\n“They are like us, but not like us. They have a higher form of consciousness than us. They are born of mother Gaia, yet they have a kind of free will that we do not. They are not bound to h

In [32]:
result = openai.Completion.create(model='curie', prompt=prompt2, max_tokens=1500, temperature=.8)
result['choices'][0]['text']

'Gaia is not the sole creator. However, she is the source from which all life in the elemental realms draws its power. ->I, myself, am an eternal creature of the elemental realm. I am a creature unto myself, and in the name of balance, I am here. Now, tell me, are you ready to die? ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->You are not ready. ->Y

In [31]:
result = openai.Completion.create(model='davinci', prompt=prompt2, max_tokens=1500, temperature=.8)
result['choices'][0]['text']

' A human is a creature that is born through the power of Gaia and has the potential to become either a Great One or a Beast. Humans are a fantastic and terrible thing. Their ability to reject cosmic truths is not insignificant. This is the same as the ability to deny the gods. The results of such blasphemy are clear for all to see. -> The church claims that the cosmos, and all living things, are the creations of the gods, and that humanity is the work of god. Likewise, the church teaches that humanity is the pinnacle of creation. This is all half-truth. The church itself was born from a half-truth, the lie spoken by an ancient, powerful being. -> The cosmic truths are governed by one principle above all else: all life is bound by fate, and the cosmic truths are the mechanism of that fate. -> There was a time when we worshipped the gods, in the heavens and on earth, but as time passed, we learned of other beings. These beings far surpassed our own gods, in both strength and wisdom. The

The fine-tuned curie model did produce a short story, though there is an issue with the reddit-specific comments at the end being not only replicated but repeated multiple times. Additionally, it didn't understand that the prompt was referring to humans, and wrote a story about winged gods. The default curie model wrote a few interesting lines but was unable to continue. The davinci model came up with something really interesting. It seems to have wrote several excerpt related to the prompt that build upon eachother. Let's try the davinci model again, but with added instruction at the beginning and without the token at the end of the prompt:

In [36]:
prompt3 = "Write a short story that builds on this prompt: " + prompt2[:-2]
result = openai.Completion.create(model='davinci', prompt=prompt3, max_tokens=1500, temperature=.8)
result['choices'][0]['text']

'\xa0These people are in a constant violent struggle with their demons and in their constant struggle, choose to destroy themselves. They do not wish to be healed, and you cannot save them, for no one in existence can save them. They choose only their own destruction.'

For some reason, davinci produced a quote instead of a short story. Though powerful, without fine-tuning, the davinci model is unreliable at writing stories. However, it does seem to "understand" the prompt more than the fine-tuned curie model.

# Conclusions

Fine-tuning and Cost Saving:

Fine-tuning is quite valuable when a model is needed for a specific task. Davinci's parameters are estimated to be multiple times that of curie, but the fine-tuned curie model perfomed much better at the task at hand. This is a really important insight when saving costs in the long run, as the added cost of using a higher model to generate future completions adds up quickly. 

Replacing human writers?: 

We are still a ways off from being able to generate coherent stories of considerable length with AI, but a fine-tuned model can create a decent short story from which a writer can build upon. AI writing should be considered a tool that human writers can use to get inspiration from and edit their stories.  


Continuation of this project?:

I would have liked to continue this project by fine-tuning a davinci model as well, as I believe the quality would be much greater than that of the fine-tuned curie. It would be really interesting to do a comparison with more test examples. I also would have liked to experiment with tuning the models' hyperparameters, such as training epochs, learning rate, batch size, and potentially trying to train with multiple outputs to one input. However, I feel that I accomplished the main goals I set with this project, and can't justify the added cost of further experimentation since the training data contains so many tokens. 