# GPT3.5 

## Intro
In this notebook, I generate essays from GPT3.5 (aka ChatGPT's model). The AI-generated essays will not be solely composed of those generated from GPT3.5, however. For each regeneration engine, we will generate essays using that engine for testing.
Fun fact: This is the first batch of AI-generated essays!

## Imports and Helper functions

AI model imports

In [None]:
import openai

General helper library imports

In [None]:
import json
from collections import defaultdict
import os
import random

Helper functions

In [None]:
openai.api_key_path = os.getcwd() + 'key path here'

system_prompt = "You are an exemplary Singapore Junior College student that writes essays. When given a prompt, you will write only an essay. You will write as many words as you can. You will not write headings for the essay. "
# neo wee zen's very own essay generation function
def get_chatgpt_essay_response(prompt_text, max_tokens=3500):
    
    messages = [
        {"role":"system", "content": system_prompt},
        {"role":"user", "content": prompt_text + " Your essay should not be less than 1000 words."},
    ]
    response = openai.ChatCompletion.create(
                model = "gpt-3.5-turbo",
                messages = messages,
                temperature = 1,
                max_tokens = max_tokens
    )
    return response # ['choices'][0]['message']['content']

## Set-up

In this section, we'll set up the essay dictionaries. If the AI-generated essay file is not yet created, then it will be created.

In [None]:
with open(os.getcwd() + "/../../essay_gpt35_2.json", 'r') as gpt35_essay_file:
    gpt35_dict = json.load(gpt35_essay_file)
    
with open(os.getcwd() + "/../../essay_human_original.json", 'r') as human_essay_file:
    human_json = json.load(human_essay_file)

Debug: clears the GPT 3.5 dict

In [None]:
# gpt35_dict = {}

If the GPT 3.5 dictionary is empty (i.e. JSON file is empty), then it will be initialised as a defaultdict, which has a default of a defaultdict, which has a default of a list. This is to ensure that I am able to easily set the values to each of the types of essays as dictionaries, and the values of each of the websites as lists of essay dictionaries.

In [None]:
def default_list_dict():
    return defaultdict(list)

print(gpt35_dict)
if not gpt35_dict:
    gpt35_dict = defaultdict(default_list_dict)

# gpt35_json['urgh']['testing'] = 'thing'
# print(gpt35_json)

This code is a test to generate the same structure of the human essay in the GPT 3.5 essay.

In [None]:
# gets all prompts in list
prompt_list = []

for type_of_essay in human_json:
    for website in human_json[type_of_essay]:
        for essay_dict in human_json[type_of_essay][website]:
            prompt_list.append(essay_dict['prompt'])

In [None]:
# for type_of_essay in human_json:
#     for website in human_json[type_of_essay]:
#         for essay_dict in human_json[type_of_essay][website]:
#             gpt35_dict[type_of_essay][website].append({
#                 'website': essay_dict['website'],
#                 'prompt': essay_dict['prompt'],
#                 'response': 'this is chatgpt speaking'
#             })

If gpt35_dict is a defaultdict, it is converted back to a dict so that printing it actually works.

In [None]:
# gpt35_dict = dict(gpt35_dict)

# for type_of_essay in gpt35_dict:
#     gpt35_dict[type_of_essay] = dict(gpt35_dict[type_of_essay])

# # print(gpt35_dict)

## Essay generation

The following code is a test generation of a single essay by GPT 3.5.

In [None]:
random_prompt = prompt_list[random.randint(0, len(prompt_list) - 1)]
print(random_prompt)


test_gpt35_essay_response = get_chatgpt_essay_response(random_prompt)
print(test_gpt35_essay_response['choices'][0]['message']['content'])

In [None]:
print(test_gpt35_essay_response)

The following code saves each GPT 3.5-generated essay into a dictionary.

In [None]:
count = 1
for type_of_essay in human_json:
    for website in human_json[type_of_essay]:
        for essay_dict in human_json[type_of_essay][website]:
            if count <= 8: 
                count += 1
                continue
            print("generating", count, "out of", len(prompt_list))
            print(essay_dict['prompt'])
            count += 1
            gpt35_response_dict = get_chatgpt_essay_response(prompt_text=essay_dict['prompt'])
            gpt35_dict[type_of_essay][website].append({
                'website': essay_dict['website'],
                'prompt': essay_dict['prompt'],
                'response': gpt35_response_dict['choices'][0]['message']['content']
            })
            print("generated!\n")
            


In [None]:
print(json.dumps(gpt35_dict, indent=4))

## Analysing general trends in AI-generated essays

Here I find the highest, lowest, 25th and 75th percentile of essays.

In [None]:
count = 0
ai_essay_word_count_list = []

for type_of_essay in gpt35_dict:
    for website in gpt35_dict[type_of_essay]:
        for essay_dict in gpt35_dict[type_of_essay][website]:
            count += 1
            word_count = len(essay_dict['response'].split())
            ai_essay_word_count_list.append(word_count)
            
ai_essay_word_count_list.sort()
print(ai_essay_word_count_list)


## Writing to JSON file

I write to the essay generation JSON file

In [None]:
with open(os.getcwd() + "/../../essay_gpt35_2.json", 'w') as gpt35_essay_file:
    json.dump(gpt35_dict, gpt35_essay_file, indent=4)