# ParableGPT
**Author:** Sarah Hong

A large language model, based off Meta's open-source *Llama 3.1-8B-Instruct*, which uses a corpus of various religious texts to generate parables across different cultural traditions.

A project for Brian Greene's *Origins and Meaning* course, taken Fall 2025.

## Step 1: Prepare Corpus

We will be analyzing texts from:
- Bible - Christianity
- Dhammapada - Buddhism
- Qur'an - Islam
- Tao Te Ching - Daoism / Confucianism

These are sourced from https://github.com/Traves-Theberge/sacred-scriptures-mcp.

### Step 1.1: Prepare the Bible and Qur'an

We downloaded the Bible and Qur'an in JSON file format, but now we stride through in more meaning-rich chunks (say 12 verses per chunk), move over 8 verses, and add them as a chunk, and repeat until we iterate through the entire JSON. We return a list of 'chunks', dictionaries containing information in the format we'll ultimately write to our database in: containing `id`, `text`, and `metadata` fields.

In [None]:
import json
from pathlib import Path

def prepare_bible_chunks() -> list[dict]:
    # prepare data and create the chroma database
    BIBLE_JSON = Path("./corpus/bible.json")
    bible = json.loads(BIBLE_JSON.read_text(encoding="utf-8"))

    W = 12 # window
    H = 8 # hop

    chunks: list[dict] = []
    for book, chapters in bible.items():
        for chap, verses in chapters.items():
            chap = int(chap)
            v_nums = sorted(int(v) for v in verses.keys())
            v_texts = [(v, verses[str(v)].strip()) for v in v_nums if verses[str(v)].strip()]

            for i in range(0, len(v_texts), H):
                # chunk the verses together
                chunked_verses = v_texts[i:i+W]
                if not chunked_verses:
                    continue

                # retrieve relevant metadata
                v_start = chunked_verses[0][0] # get the verse number of the start
                v_end = chunked_verses[-1][0] # and end
                text = " ".join(t for _, t in chunked_verses).strip()

                chunk_metadata = {
                    # universal metadata
                    "tradition": "christianity",
                    "collection": "bible",
                    "language":  "english",
                    # bible-specific metadata
                    "book": book,
                    "chapter": chap,
                    "start_verse": v_start,
                    "end_verse": v_end,
                } # create the final chunk to write to our jsonl file
                final_chunk = {
                    "id": f"{book}_{chap}_{v_start}_{v_end}",
                    "text": text,
                    "metadata": chunk_metadata,
                }
                chunks.append(final_chunk)

    return chunks

In [None]:
import json
from pathlib import Path

def prepare_quran_chunks() -> list[dict]:
    # prepare data and create the chroma database
    QURAN_JSON = Path("./corpus/quran.json")
    quran = json.loads(QURAN_JSON.read_text(encoding="utf-8"))

    W = 12 # window
    H = 8 # hop

    chunks: list[dict] = []
    for surah, verses in quran.items():
            v_texts = [(int(v['chapter']), int(v['verse']), v['text'].strip()) for v in verses]

            for i in range(0, len(v_texts), H):
                # chunk the verses together
                chunked_verses = v_texts[i:i+W]
                if not chunked_verses:
                    continue

                # retrieve relevant metadata
                chap = chunked_verses[0][0] # get the chapter number of the start
                v_start = chunked_verses[0][1] # get the verse number of the start
                v_end = chunked_verses[-1][1] # and end
                text = " ".join(t for _, _, t in chunked_verses).strip()

                chunk_metadata = {
                    # universal metadata
                    "tradition": "islam",
                    "collection": "quran",
                    "language":  "english",
                    # quran-specific metadata
                    "surah": surah,
                    "chapter": chap,
                    "start_verse": v_start,
                    "end_verse": v_end,
                } # create the final chunk to write to our jsonl file
                final_chunk = {
                    "id": f"{surah}_{chap}_{v_start}_{v_end}",
                    "text": text,
                    "metadata": chunk_metadata,
                }
                chunks.append(final_chunk)

    return chunks

### Step 1.2: Prepare the Dhammapada and Tao Te Ching

The Dhammapada and Tao Te Ching are in a slightly different formats, so we parse the JSON differently. Moreover, each chapter is its own self-contained 'parable,' so we don't enforce a strict `W` or `H` and go based off the built-in sizes.

In [None]:
import json
from pathlib import Path

def prepare_dhammapada_chunks() -> list[dict]:
    """Prepare chunks of the Dhammapada text for processing.
    Returns a list of dictionaries, each containing the text chunk and its metadata."""
    # prepare data and create the chroma database
    DHAMMA_JSON = Path("./corpus/dhammapada.json")
    dhammapada = json.loads(DHAMMA_JSON.read_text(encoding="utf-8"))

    chunks: list[dict] = []
    for chapter in dhammapada['chapters']:
        # get relevant metadata
        i = chapter['number']
        title = chapter.get('title_english', None) or chapter.get('title', f"Chapter {i}")
        verse_range = chapter['verse_range']
        verses = [v['english'] for v in chapter['verses']]
        text = " ".join(verses).strip()

        metadata = {
            # universal metadata
            "tradition": "buddhism",
            "collection": "dhammapada",
            "language":  "english",
            # dhammapada-specific metadata
            "chapter": title,
            "start_verse": verse_range[0],
            "end_verse": verse_range[1],
        } # create the final chunk to write to our jsonl file
        final_chunk = {
            "id": f"dhammapada_{i}_{verse_range[0]}_{verse_range[1]}",
            "text": text,
            "metadata": metadata,
        }
        chunks.append(final_chunk)

    return chunks

In [14]:
from typing import Any
import re

def prepare_tao_chunks() -> list[dict[str, Any]]:
    """Prepare chunks of Tao Te Ching text. One chunk per chapter."""
    TAO_JSON = Path("./corpus/tao_te_ching.json")
    tao_te_ching = json.loads(TAO_JSON.read_text(encoding="utf-8"))

    chunks: list[dict[str, Any]] = []

    def _chapter_num(ch_key: str) -> int:
        m = re.search(r"(\d+)", ch_key)
        if not m:
            raise ValueError(f"Could not parse chapter number from key: {ch_key}")
        return int(m.group(1))
    
    for ch_key, ch_content in tao_te_ching.items():
        ch_num = _chapter_num(ch_key)
        text = ch_content['Verse'].strip()

        metadata = {
            # universal metadata
            "tradition": "taoism",
            "collection": "tao_te_ching",
            "language":  "english",
            # tao-specific metadata
            "chapter": ch_num,
        }
        final_chunk = {
            "id": f"tao_te_ching_{ch_num}",
            "text": text,
            "metadata": metadata,
        }
        chunks.append(final_chunk)
        
    return chunks

### Step 1.3: Write the chunks to ChromaDB

[ChromaDB](https://www.trychroma.com/) is a fast, serverless vector database which is commonly used for semantic similarity search between a given `topic` and your corpus. The embedding models used to translate words into vectors can be arbitrarily swapped as needed. ChromaDB uses a clever indexing system to quickly enable cosine similarity comparisons across billions of different embeddings.

In [10]:
from tqdm import tqdm
import chromadb
from sentence_transformers import SentenceTransformer

def index_corpus(chunks: list[dict], collection_name: str, 
                 db_path: str="./corpus/chroma_db", 
                 embedder_model: str="all-MiniLM-L6-v2",
                 batch_size: int=64, force: bool=False):
    """Index the given chunks into a ChromaDB collection.

    Args:
        chunks (list[dict]): List of chunks to index.
        collection_name (str): Name of the ChromaDB collection.
        db_path (str): Path to the ChromaDB database.
        embedder_model (str): SentenceTransformer model name for embeddings.
    """
    client = chromadb.PersistentClient(db_path)

    if force:
        try:
            client.delete_collection(name=collection_name)
            print(f"Deleted existing collection '{collection_name}'")
        except Exception:
            pass

    # create collection
    collection = client.get_or_create_collection(name=collection_name)
    embedder = SentenceTransformer(embedder_model)

    ids = [c['id'] for c in chunks]
    texts = [c['text'] for c in chunks]
    metadatas = [c['metadata'] for c in chunks]

    def _batched(seq, size):
        for i in range(0, len(seq), size):
            yield seq[i:i+size]

    total_batches = (len(chunks) + batch_size-1) // batch_size

    for ids, docs, metas in tqdm(
        zip(
            _batched(ids, batch_size),
            _batched(texts, batch_size),
            _batched(metadatas, batch_size),
        ), 
        total=total_batches,
        desc=f"Indexing '{collection_name}'",
        unit="batch"
    ):  # embed in batches (default size 64)
        embeddings = embedder.encode(docs, normalize_embeddings=True).tolist()
        collection.add(
            ids=ids,
            documents=docs,
            metadatas=metas,
            embeddings=embeddings,
        )

    print(f"Indexed {len(chunks)} chunks into collection '{collection_name}' at '{db_path}'")

In [None]:
# run the above scripts
bible_chunks = prepare_bible_chunks()
dhamma_chunks = prepare_dhammapada_chunks()
quran_chunks = prepare_quran_chunks()
tao_chunks = prepare_tao_chunks()

index_corpus(
    chunks=bible_chunks,
    collection_name="bible",
    db_path="./corpus/chroma_db",
    embedder_model="all-MiniLM-L6-v2",
    batch_size=64,
    force=True,
)

index_corpus(
    chunks=dhamma_chunks,
    collection_name="dhammapada",
    db_path="./corpus/chroma_db",
    embedder_model="all-MiniLM-L6-v2",
    batch_size=64,
    force=True,
)

index_corpus(
    chunks=quran_chunks,
    collection_name="quran",
    db_path="./corpus/chroma_db",
    embedder_model="all-MiniLM-L6-v2",
    batch_size=64,
    force=True,
)

index_corpus(
    chunks=tao_chunks,
    collection_name="tao_te_ching",
    db_path="./corpus/chroma_db",
    embedder_model="all-MiniLM-L6-v2",
    batch_size=64,
    force=True,
)

Indexing 'tao_te_ching': 100%|██████████| 2/2 [00:00<00:00,  2.62batch/s]

Indexed 81 chunks into collection 'tao_te_ching' at './corpus/chroma_db'





## Step 2: Retrieval-Augmented Generation (RAG)

We use Retrieval-Augmented Generation (RAG) to enrich the LLM's outputs. This method is essentially a way of enriching our LLM prompt with relevant context of the holy text which the LLM should try to replicate for its own parable. We query our corpus through the following way.

Given a `topic` (eg, "forgiveness"), we use HuggingFace's `all-MiniLM-L6-v2` transformer model to map the semantic meaning of the given phrase into an embedding vector. We then scan our `chromaDB` database of our given tradition's corpus, computing the cosine similarity between our `topic` and the stored parable-chunk, which are also stored as embeddings in this alternative vector space. We retrieve the `k` nearest matches, and then paste them into our final LLM prompt as "context" for the model to imitate the style of before generating its own parable that follows the user's other instructions.

The LLM is given system instructions beforehand:
```shell
"You are ParableGPT, generating a parable in the style of Christianity. 
Use a reverent, Biblical tone. 
Prefer simple, concrete imagery. 
Do not imitate any specific modern author.

Write an original parable inspired by the provided sources; imitate the writing style closely. 
Do not quote long passages verbatim; paraphrase ideas instead. 
Target length: about 150 words.
Begin with EXACTLY: 'Title: [insert_parable_title_here]'. 
End with EXACTLY: 'Moral: [insert_moral_here]'. Be concise."
```

Alongside the sample prompt:
```shell
"Topic: Forgiveness

User constraints (follow these carefully):
Include a character named Brian Greene

Relevant Bible passages (Imitate this tone and writing style as exactly as possible)

Ezekiel 4:17-17
That they may want bread and water, and be astonied one with another, and consume away for their iniquity.

Now write the parable."
```

In [11]:
import chromadb
from sentence_transformers import SentenceTransformer
from ollama import chat

class ParableGPT:
    def __init__(self, llm="llama3.1:8b", embedder="all-MiniLM-L6-v2"):
        self.LLM_MODEL = llm
        self.EMBEDDING_MODEL = SentenceTransformer(embedder)
        self.DB_PATH = "./corpus/chroma_db"
        self.client = chromadb.PersistentClient(self.DB_PATH)

        # tradition-specific configuration variables
        self.TRADITION_CONFIG = {
            "Christianity": {
                "collection": "bible",
                "style": (
                    "Use a reverent, Biblical tone. "
                    "Prefer simple, concrete imagery. "
                    "Do not imitate any specific modern author."
                ),
                "source_label": "Bible passages",
                "ref_formatter": lambda m: f'{m["book"]} {m["chapter"]}:{m["start_verse"]}-{m["end_verse"]}',
            },
            "Buddhism": {
                "collection": "dhammapada",
                "style": (
                    "Use a calm, concise, contemplative tone similar to the provided sources. "
                    "Avoid sermonizing; let the lesson emerge naturally."
                ),
                "source_label": "Dhammapada passages",
                "ref_formatter": lambda m: f'Dhammapada {m["chapter"]} vv.{m["start_verse"]}-{m["end_verse"]}',
            },
            "Islam": {
                "collection": "quran",
                "style": (
                    "Use a poetic, rhythmic tone similar to the Quranic style. "
                    "Incorporate vivid imagery and metaphors."
                ),
                "source_label": "Quranic passages",
                "ref_formatter": lambda m: f'Surah {m["surah"]} vv.{m["start_verse"]}-{m["end_verse"]}',
            },
            "Taoism": {
                "collection": "tao_te_ching",
                "style": (
                    "Use a simple, paradoxical, and poetic tone similar to the Tao Te Ching. "
                    "Emphasize naturalness and spontaneity."
                ),
                "source_label": "Tao Te Ching passages",
                "ref_formatter": lambda m: f'Chapter {m["chapter"]}',
            }
        }

    def retrieve(self, tradition, topic, k: int=6) -> tuple[list[str], list[dict]]:
        """Retrieves k relevant verses from the collection based on the topic
        also returns relevant metadata such as book, chapter, verse number 
        (varies by tradition)

        Args:
            tradition (str): the tradition/collection to query from
            topic (str): the topic to base the retrieval on
            k (int, optional): number of verses to retrieve. Defaults to 6.
        Returns:
            tuple[list[str], list[dict]]: retrieved verses and their metadata
        """
        theme_embeddings = self.EMBEDDING_MODEL.encode([topic], normalize_embeddings=True).tolist()
        col = self.client.get_collection(name=self.TRADITION_CONFIG[tradition]["collection"])
        results = col.query(
            query_embeddings=theme_embeddings,
            n_results=k,
            include=["documents", "metadatas", "distances"]
        )
        verses = results['documents'][0]
        metadatas = results['metadatas'][0]
        return verses, metadatas


    def generate(self, tradition: str, topic: str, length: int=150, info: str=None, k: int=6) -> tuple[str, str]:
        """Generates a parable in the style of the specified tradition,
        focusing on the given topic, length, and following any additional 
        instructions from the prompt.
        
        Args:
            tradition (str): the tradition/style to emulate
            topic (str): the topic/theme of the parable
            length (int, optional): desired word count of the parable. Defaults to 150
            info (str, optional): additional instructions for the parable. Defaults to None.
        Returns:
            tuple[str, str]: generated parable and the sources used
        """
        # retrive relevant verses
        verses, metadatas = self.retrieve(tradition, topic, k)
        cfg = self.TRADITION_CONFIG[tradition]

        # format sources
        sources = []
        for v, m in zip(verses, metadatas):
            ref = cfg["ref_formatter"](m)
            sources.append(f"{ref}\n{v}")
        sources_text = "\n\n".join(sources)

        # SYSTEM
        system = (
            f"You are ParableGPT, generating a parable in the style of {tradition}. "
            f"{cfg['style']} "
            "Write an original parable inspired by the provided sources; imitate the writing style closely. "
            "Do not quote long passages verbatim; paraphrase ideas instead. "
            f"{f'Target length: about {length} words. ' if length!='' else ''}"
            "Begin with EXACTLY: 'Title: [insert_parable_title_here]'. "
            "End with EXACTLY: 'Moral: [insert_moral_here]'. Be concise."
        )
        # USER: topic + constraints
        user = (
            f"Topic: {topic}\n\n"
            "User constraints (follow these carefully):\n"
            f"{(info or "").strip()}\n\n"
            f"Relevant {cfg['source_label']} "
            "(Imitate this tone and writing style as exactly as possible)"
            # "while following the Title: [insert_parable_title_here] and Moral: [insert_moral_here] format clearly.):"
            f"\n\n{sources_text}\n\n"
            "Now write the parable."
        )
        resp = chat(
            model=self.LLM_MODEL,
            messages=[
                {"role": "system", "content": system},
                {"role": "user", "content": user}
            ]
        )
        parable = resp["message"]["content"]
        return parable, sources
    
    def run(self):
        """
        Keep prompting user for input and generating parables until they quit.
        """
        print("Welcome to ParableGPT!\n")
        while True:
            while True:
                tradition_index = input("Select a tradition below or 'q' to quit: \n" \
                                        " (0) Christianity\n (1) Buddhism\n" \
                                        " (2) Islam\n (3) Taoism\n" \
                                        "Your choice: ").strip()
                if tradition_index.lower() == 'q':
                    return
                try:
                    idx = int(tradition_index)
                    if 0 <= idx < len(list(self.TRADITION_CONFIG.keys())):
                        break
                except ValueError:
                    pass
                print(f"Please enter a number 0-{len(list(self.TRADITION_CONFIG.keys()))-1}, or 'q'.")

            tradition = list(self.TRADITION_CONFIG.keys())[int(tradition_index)]
            topic = input("Enter topic for the parable: ").strip()
            length = input("Enter desired word count (enter to skip): ").strip()
            info = input("Enter any additional instructions for the parable (enter to skip): ").strip()
            parable, sources = self.generate(tradition, topic, length, info)
            self.print_parable(parable, sources)

    def print_parable(self, parable: str, sources: list[str]):
        """Prints the generated parable and its sources in a formatted way.

        Args:
            parable (str): The generated parable text.
            sources (list[str]): List of source texts used for generation.
        """
        print("\n---\n")
        print(parable)
        print("\n---\n")
        print("SOURCES USED:\n---")
        for s in sources:
            print(s)
            print()
    

## Test it out!

The following four code blocks show examples of running ParableGPT through this notebook—the results are printed below each. 

To use ParableGPT, just choose a tradition, a topic, a length (optional), and additional info (optional), and wait ~40 seconds for the parable to generate.

In [28]:
parable_gpt = ParableGPT()
p1, s1 = parable_gpt.generate(
    tradition="Christianity",
    topic="Hunger",
    # length=150, # words
    info="Include a character named 'Brian Greene'",
    k=6
)
parable_gpt.print_parable(p1, s1)


---

Title: The Empty Pot

There was a village nestled in a valley where hunger had become a constant companion. In this village lived Brian Greene, a humble farmer who struggled to provide for his family. One day, as he labored in his field, a stranger approached him with a gift of bread and vegetables.

The stranger said, "Brian, I have seen your struggles and the emptiness in your pot. Take this food and share it with your people, that they may eat and be satisfied." Brian was hesitant at first, but the stranger assured him, "Trust in my provision, for He who gives food to all flesh will also provide for you."

As Brian distributed the food, he watched as his neighbors ate until they were full, and then some. The pot was empty, yet there was still enough for everyone. A miracle had occurred.

Moral: Trust in God's provision, and He will supply your needs, that you may eat and be satisfied, and share with others to the glory of His mercy.

---

SOURCES USED:
---
Ezekiel 4:17-17
That

In [15]:
p2, s2 = parable_gpt.generate(
    tradition="Buddhism",
    topic="Materialism",
    length=150, # words
    info="Include a character named 'Brian Greene'",
    k=2
)
parable_gpt.print_parable(p2, s2)


---

Title: The Island of Desire

Brian Greene was a wealthy merchant who had built his fortune on the shores of a vast lake. He spent his days accumulating more wealth, thinking that it would bring him happiness and security. One day, as he stood at the edge of the lake, watching the sun set behind the hills, he felt a sense of emptiness within. He realized that despite all his possessions, he was still unsatisfied.

A wise old man appeared beside him, pointing to a small island in the distance. "Brian, you have built a grand house on the shore," said the old man, "but what about an inner refuge from desire? What about peace of mind?" Brian looked puzzled, but the old man continued, "You see, my friend, our desires are like waves that crash against the shore. They never truly satisfy us, and yet we keep striving for more."

Brian thought about this for a moment, then asked, "But how can I stop these waves? How can I find peace?" The old man smiled, saying, "Ah, Brian, you must learn 

In [17]:
p3, s3 = parable_gpt.generate(
    tradition="Islam",
    topic="Shame",
    length=150, # words
    info="Include a character named 'Brian Greene'",
    k=5
)
parable_gpt.print_parable(p3, s3)


---

Title: The Shadow of Shame

In a small village nestled between two great rivers, there lived a man named Brian Greene. He was known throughout the land for his extraordinary talent in weaving intricate patterns on silk fabric. However, despite his skill, Brian was consumed by the shadow of shame.

One day, while working on a particularly delicate design, he accidentally dropped a valuable thread, which fell into the river below. The villagers, who had gathered to witness his work, gasped in horror as the thread was carried away by the current. Brian's face turned bright red with mortification, and he hid his head in shame.

The Prophet of the village, seeing Brian's distress, approached him and said, "O Brian, do you know that the rivers are a reminder of Allah's mercy? Just as they flow constantly, cleansing all that comes before them, so too does Allah forgive and cleanse those who repent."

Brian looked up, his eyes brimming with tears. "But, revered one," he asked, "how can I

In [21]:
p4, s4 = parable_gpt.generate(
    tradition="Taoism",
    topic="The Way",
    length=150, # words
    info="Include a character named 'Brian Greene'",
    k=3
)
parable_gpt.print_parable(p4, s4)


---

Title: Brian's Dance with the Way

Brian Greene, a renowned physicist, stood on the mountaintop, gazing out at the vast expanse of the universe. He had spent his life studying the intricacies of time and space, but never truly grasped the essence of the Way.

As he pondered, a gentle breeze rustled his hair, and he felt an inexplicable urge to dance. With each step, he surrendered to the moment, letting go of his need for control and understanding. The world around him dissolved into chaos, yet somehow coalesced into harmony.

A passerby, observing Brian's spontaneous dance, laughed out loud at the absurdity of it all. "The master is a fool!" they jeered. Yet, as the onlooker joined in, their own steps blended with the rhythm of the universe, and together they swayed like reeds in a gentle stream.

Brian Greene, lost in the dance, became one with the Way. His laughter merged with the vital breath, and his being harmonized with the cosmos. In that moment, he transcended duality, e

## Reflections

What went well:
- The outputs were more coherent than I expected a 8B parameter language model to perform. Stories were more or less logical, at the least.
- The Dhammapada and Tao Te Ching were already in an ideal format for parable-like analysis.
- It was fun to conjure up random stories that miraculously tied back to some moralistic meaning.

For the future, I would want to:
- Include more relevant stylistic instructions for the LLM
- Use a model with more parameters for higher quality output
- Cleanup the Bible texts to only include sections with parables
- Add more religions