<a href="https://colab.research.google.com/github/marktsears/nanoGPT/blob/master/nanoGPT_Bible.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:
!git clone https://github.com/karpathy/nanoGPT

fatal: destination path 'nanoGPT' already exists and is not an empty directory.


In [6]:
pip install tiktoken transformers pysword

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [7]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [9]:
!mkdir /content/nanoGPT/data/bible/

In [10]:
import os
import json
import random
import numpy as np
import tiktoken
from pysword.modules import SwordModules


content = []

# Find the SWORD modules and load the ESV
modules = SwordModules('/content/drive/MyDrive/Dev/bibleGPT/content/')
found_modules = modules.parse_modules()
bible = modules.get_bible_from_module('ESV')

# Get the structure for the bible
bible_structure = bible.get_structure()
book_structure = bible_structure.get_books()

# Iterate over the books in both testaments
for testament in ['ot', 'nt']:
    for book in book_structure[testament]:
        # Iterate over the chapters in the book
        for chapter in range(1, len(book.chapter_lengths) + 1):
            # Use the get_iter() method to retrieve verses
            for verse in bible.get_iter(books=book.name, chapters=chapter):
                content.append(verse)  # directly append the verse to the list

# Load the JSON files
with open('/content/drive/MyDrive/Dev/bibleGPT/content/gotquestions.json', 'r') as file:
    gotquestions_data = json.load(file)

with open('/content/drive/MyDrive/Dev/bibleGPT/content/tgc-articles.json', 'r') as file:
    tgc_articles_data = json.load(file)

# Extract the text content
gotquestions_text = [item['answer'] for item in gotquestions_data]
tgc_articles_text = [item['content'] for item in tgc_articles_data]

# Concatenate the text content with the verses
content += gotquestions_text + tgc_articles_text

# Shuffle content
random.shuffle(content)

n = len(content)
train_content = content[:int(n*0.9)]
val_content = content[int(n*0.9):]

# Turn those into strings
train_data = " ".join(train_content)
val_data = " ".join(val_content)

# Encode with tiktoken gpt2 bpe
enc = tiktoken.get_encoding("gpt2")
train_ids = enc.encode_ordinary(train_data)
val_ids = enc.encode_ordinary(val_data)
print(f"train has {len(train_ids):,} tokens")
print(f"val has {len(val_ids):,} tokens")

# Export to bin files
train_ids = np.array(train_ids, dtype=np.uint16)
val_ids = np.array(val_ids, dtype=np.uint16)
train_ids.tofile('/content/nanoGPT/data/bible/train.bin')
val_ids.tofile('/content/nanoGPT/data/bible/val.bin')


train has 20,278,645 tokens
val has 2,240,847 tokens


In [17]:
!cd ./nanoGPT/ && python train.py --dtype=float16 --dataset=bible --block_size=64 --batch_size=8 --n_layer=4 --n_head=4 --n_embd=64 --max_iters=8000 --eval_interval=1000

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
iter 3011: loss 4.5300, time 413.42ms, mfu 0.41%
iter 3012: loss 4.8243, time 412.13ms, mfu 0.40%
iter 3013: loss 4.7411, time 410.42ms, mfu 0.39%
iter 3014: loss 4.3714, time 380.68ms, mfu 0.39%
iter 3015: loss 4.6154, time 257.50ms, mfu 0.40%
iter 3016: loss 4.6313, time 252.39ms, mfu 0.42%
iter 3017: loss 4.5148, time 249.13ms, mfu 0.43%
iter 3018: loss 4.8117, time 258.25ms, mfu 0.44%
iter 3019: loss 4.3277, time 256.66ms, mfu 0.45%
iter 3020: loss 4.6404, time 265.52ms, mfu 0.45%
iter 3021: loss 4.7288, time 252.19ms, mfu 0.46%
iter 3022: loss 4.7101, time 268.98ms, mfu 0.47%
iter 3023: loss 4.9137, time 260.85ms, mfu 0.47%
iter 3024: loss 4.5825, time 262.08ms, mfu 0.48%
iter 3025: loss 5.2005, time 257.66ms, mfu 0.48%
iter 3026: loss 4.2567, time 249.99ms, mfu 0.49%
iter 3027: loss 4.5887, time 284.74ms, mfu 0.49%
iter 3028: loss 4.3339, time 272.40ms, mfu 0.49%
iter 3029: loss 4.5585, time 262.69ms, mfu 0.49%
iter

In [27]:
!cd ./nanoGPT/ && python train.py --out_dir=out-bible --dtype=float16 --dataset=bible --block_size=64 --batch_size=8 --n_layer=4 --n_head=4 --n_embd=64 --max_iters=20 --eval_interval=10 --init_from='gpt2-medium'

Overriding: out_dir = out-bible
Overriding: dtype = float16
Overriding: dataset = bible
Overriding: block_size = 64
Overriding: batch_size = 8
Overriding: n_layer = 4
Overriding: n_head = 4
Overriding: n_embd = 64
Overriding: max_iters = 20
Overriding: eval_interval = 10
Overriding: init_from = gpt2-medium
tokens per iteration will be: 20,480
Initializing from OpenAI GPT-2 weights: gpt2-medium
loading weights from pretrained gpt: gpt2-medium
forcing vocab_size=50257, block_size=1024, bias=True
overriding dropout rate to 0.0
number of parameters: 353.77M
Downloading (…)lve/main/config.json: 100% 718/718 [00:00<00:00, 4.14MB/s]
Downloading pytorch_model.bin: 100% 1.52G/1.52G [00:08<00:00, 183MB/s]
Downloading (…)neration_config.json: 100% 124/124 [00:00<00:00, 640kB/s]
num decayed parameter tensors: 98, with 353,518,592 parameters
num non-decayed parameter tensors: 194, with 321,536 parameters
using fused AdamW: True
compiling the model... (takes a ~minute)
step 0: train loss 3.8953, val

In [29]:
!cd ./nanoGPT && python sample.py --out_dir=out-bible --dtype=float16 --num_samples=5 --max_new_tokens=100 --start="Do all religions lead to the same God?"

Overriding: out_dir = out-bible
Overriding: dtype = float16
Overriding: num_samples = 5
Overriding: max_new_tokens = 100
Overriding: start = Do all religions lead to the same God?
number of parameters: 353.77M
No meta.pkl found, assuming GPT-2 encodings...
Do all religions lead to the same God? Or are some religions more like Christianity than others? Are all religions a type of Christianity? (A.P.S.: If you want to find an answer to these questions, you should follow this link. Go to the question and answer section and then to the Christian apologetics page.)

If you're still not convinced, here's the link to "Does Christianity Have a 'Romantic' Incompatible Nature?" On the first page, it has this paragraph:

Hence,
---------------
Do all religions lead to the same God? Probably not. But we can't say for sure because there is nothing to indicate that Judaism or Christianity leads to God.

Is the Bible God's word? We know that belief in God is the foundation of the Christian faith. But