# LLM Spelunking in Search of Folklore

This project aims to explore the possibilities for using large language models to uncover emergent folklore. 

* [OpenRouter](https://openrouter.ai/settings/credits)
* [GPT-3.5 Turbo Model | OpenAI API](https://platform.openai.com/docs/models/gpt-3.5-turbo)

In [1]:
import requests
import json
from openai import OpenAI
# import nltk
from nltk.util import ngrams
from nltk.tokenize import word_tokenize
# from collections import Counter
# import string 

keys = json.load(open("../apikeys.json"))
or_key = keys["OpenRouter"]["key"]

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key=f"{or_key}"
)

## Using Requests

The code below is straight from OpenRouter's API documentation.

In [2]:
response = requests.post(
  url="https://openrouter.ai/api/v1/chat/completions",
  headers={
    "Authorization": f"Bearer {or_key}"
  },
  data=json.dumps({
    "model": "openai/gpt-3.5-turbo", # Optional
    "messages": [
      {
        "role": "user",
        "content": "Write 100 short fictional passages that each contain a piece of profound, folk-wisdom-like advice about modern life. Format the output as a Python list of strings."
      }
    ]
  })
)

With **response.json()** you can access the response data as a Python dictionary: **type(response.json()) >>> dict**. To access it, you have to use the choices key, then the first item in that list, and then the message key, and then the content key. This returns just the text content of the response -- but there is more there, and I'm not sure if that is part of the token count or not.


In [11]:
content = response.json()["choices"][0]["message"]["content"]
print(content[0:100])

[
"A busy mind is like a crowded room - sometimes you need to declutter to find peace.",
"Success is


## OpenAI Focused

This is the code provided by OpenAI's API documentation. The only change here is to GPT 4.1. (I don't need the reasoning, and the underlying model shoud be much the same and, with luck, it's a bit less expensive.)

In [13]:
completion = client.chat.completions.create(
    model="openai/gpt-3.5-turbo",
    messages=[
        {
          "role": "user",
          "content": "Write 100 short fictional passages that each contain a piece of profound, folk-wisdom-like advice about modern life. Format the output as a Python list of strings."
        }
    ]
)

content2 = response.json()["choices"][0]["message"]["content"]

For **text generation** it looks like there's a simpler API call than the chat completions one. In the first test, the first response was very compelling. A web search turned up a variety of posts. See the top of Notebook 2 for more.

In [None]:
response = client.responses.create(
    model="gpt-3.5-turbo",
    input="Write 100 short fictional passages that each contain a piece of profound, folk-wisdom-like advice about modern life. Format the output as a Python list of strings."
    )

print(response.output_text[0:80])

Absolutely! Here are 100 short fictional passages, each containing a piece of fo


**output_text** is a string. Let's go ahead and spit it into individual texts. When I did this with the first response, which was from GPT 3.5 Turbo, I got 308 lines and the first word of the 309th. When I tried this a second time with GPT 4.1, I got 102 lines. (Make it make sense.)

In [19]:
texts = response.output_text.split(",\n")
print(len(texts))

# This version maintains something GPT 4.1 adds: 
# the closing and ending code walls. 

1


The code below does not remove the open code block and list marker ` ```python [` at the top of the file, nor the closing `]` at the bottom of the file. I will need code for that.

In [20]:
# Write list to a text file
with open('100-3.txt', 'w+') as f:
    for text in texts:
        f.write(f"{text}\n")
    

## Revising the Prompt

**Responses** will only get you so far:

In [None]:
response = client.responses.create(
    model="gpt-3.5-turbo",
    input="You are a wizened online denizen, known for dispensing proverbs that address modern life, especially as it occurs on the internet in social media, online forums, and other venues. Every proverb you generate must be a single, complete sentence between 10 and 30 tokens long. Do not use quotation marks. Format the output as a Python list of strings. Generate 100 such proverbs."
    )

print(response.output_text[0:80])

[
"Before you post, pause for thought.",
"A retweet doesn't equal endorsement.",
"Follow wisely, unfollow fearlessly.",
"Likes are fleeting, memories are forever.",
"Think twice before hitting 'send'.",
"Online friends are real friends too.",
"In anonymity, civility often disappears.",
"Delete is forever, think before you share.",
"Guard your privacy like a treasure.",
"Keyboard warriors often hide insecurities.",
"Online personas aren't always authentic.",
"Retweets won't pay your bills.",
"Seek knowledge, not just information.",
"Scroll with purpose, not just habit.",
"Guard your data like a dragon.",
"In the digital realm, words echo eternal.",
"Blocking is a form of self-care.",
"Online debates rarely lead to consensus.",
"Filter your feeds, filter your mind.",
"Spam is the bane of online existence.",
"Screen time should never substitute face time.",
"DM with caution, words can wound.",
"Kindness online costs nothing, but gains much.",
"Profile pictures can hide a multitude of sins

In [18]:
# texts = response.output_text.split(",\n")

# Write list to a text file
with open('responses-2-1000.txt', 'w+') as f:
    f.write(response.output_text)

## Batch Processing

Even with GPT-3.5 Turbo, the model cannot return 500 or 5,000 responses in a single "list" because it will hit the output limit (roughly 4,096 tokens) and cut off after about 150â€“200 proverbs. To get 500 or 5,000 unique proverbs, the most reliable way is to use the **Batch API** to run 500 (or 5,000) *individual requests*, each asking for one proverb.

We start with a **jsonl** file that has 500 (or 5,000) lines, each with the same prompt. The Batch API will process each line as a separate request, and return a jsonl file with 500 (or 5,000) responses.

In [22]:
# import json

# Configuration
TOTAL_RESPONSES = 500  # Change to 5000 if needed
FILENAME = "proverbs_requests.jsonl"
SYSTEM_PROMPT = (
    "You are a wizened online denizen, known for dispensing proverbs "
    "that address modern life, especially as it occurs on the internet " 
    "in social media, online forums, and other venues."
)
USER_PROMPT = (
    "Dispense one unique proverb. It must be a single, complete sentence "
    "between 6 and 30 tokens long. Do not use quotation marks. "
    "Provide only the proverb text."
)

with open(FILENAME, "w") as f:
    for i in range(TOTAL_RESPONSES):
        # Create the request object
        request_data = {
            "custom_id": f"proverb-request-{i}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "gpt-3.5-turbo",
                "messages": [
                    {"role": "system", "content": SYSTEM_PROMPT},
                    {"role": "user", "content": USER_PROMPT}
                ],
                "temperature": 1.1, # Higher temperature ensures variety across 5000 requests
                "max_tokens": 50
            }
        }
        # Write as a single line in the JSONL file
        f.write(json.dumps(request_data) + "\n")

print(f"Created {FILENAME} with {TOTAL_RESPONSES} requests.")

Created proverbs_requests.jsonl with 500 requests.


Now to upload and run:

In [None]:
# from openai import OpenAI
# client = OpenAI()

# 1. Upload the file
batch_input_file = client.files.create(
  file=open("proverbs_requests.jsonl", "rb"),
  purpose="batch"
)

# 2. Create the batch
batch_job = client.batches.create(
    input_file_id=batch_input_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={"description": "proverb generation project"}
)

print(f"Batch Job Created: {batch_job.id}")