# Load and slice parallel biblical data

This short notebook facilitates the quick loading up and slicing of **parallel biblical data**. The data is stored in a CSV file, which is loaded into a **Pandas** dataframe (i.e., a Python-native tabular format). The dataframe can then be sliced to extract the desired data (such as a particular book). 

This notebook also exemplifies prompting the **OpenAI chat completion** endpoint using **LangChain**, to help get you started prompting **ChatGPT** with biblical data.

LangChain is a library with helpful abstractions for large language model (LLM) related tasks, such as prompting, output parsing, retrieval, AI-agent usage, and more.

## Data Shape

Here is a description with examples of the fields in the CSV file:

| Field                | Description                                                                              | Examples                                                                                      |
|----------------------|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| id                   | Numeric line ID                                                                          | 0, 1                                                                                          |
| asv                  | Content from American Standard Version                                                  | 1001002, 1001003                                                                              |
| ylt                  | Content from Young's Literal Translation                                                 | "the earth hath existed waste and void, and dar...", "and God saith, `Let light be;' and light is." |
| bbe                  | Content from Bible in Basic English                                                      | "And the earth was waste and without form; and ...", "And God said, Let there be light: and there wa..." |
| kjv                  | Content from King James Version                                                          | "And the earth was without form, and void; and ...", "And God said, Let there be light: and there wa..." |
| book                 | Abbreviation of book name ('GEN', 'EXO', etc.)                                           | GEN, GEN                                                                                      |
| chapter:verse        | Chapter plus verse number                                                                | 1:2, 1:3                                                                                      |
| chapter              | Chapter number                                                                           | 1, 1                                                                                          |
| verse                | Verse number                                                                             | 2, 3                                                                                          |
| book_id              | Numeric value for book name ('1' for 'GEN', '66' for 'REV', etc.)                        | 1.0, 1.0                                                                                      |
| book_chapter_verse   | USFM format reference (e.g., 'GEN 1:1', 'REV 22:21', etc.)                               | GEN 1:2, GEN 1:3                                                                              |
| source_content       | Source text content associated with verse; Hebrew (OT) or Greek (NT)                     | וְהָאָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ וָבֹ֔הוּ וְחֹ֖..., וַיֹּ֥אמֶר אֱלֹהִ֖ים יְהִ֣י א֑וֹר וַֽיְהִי... |


In [2]:
# NOTE: you will need to install the following libraries:
!pip3 install pandas openai langchain -qq

In [2]:
import pandas as pd

bibles_dataframe = pd.read_csv('../data/bibles.csv')

bibles_dataframe.head() # Sanity check to make sure the data is loading properly

Unnamed: 0,id,asv,ylt,bbe,kjv,book,chapter:verse,chapter,verse,book_id,book_chapter_verse,source_content
0,1001002,And the earth was waste and void; and darkness...,"the earth hath existed waste and void, and dar...",And the earth was waste and without form; and ...,"And the earth was without form, and void; and ...",GEN,1:2,1,2,1.0,GEN 1:2,וְהָאָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ וָבֹ֔הוּ וְחֹ֖...
1,1001003,"And God said, Let there be light: and there wa...","and God saith, `Let light be;' and light is.","And God said, Let there be light: and there wa...","And God said, Let there be light: and there wa...",GEN,1:3,1,3,1.0,GEN 1:3,וַיֹּ֥אמֶר אֱלֹהִ֖ים יְהִ֣י א֑וֹר וַֽיְהִי...
2,1001004,"And God saw the light, that it was good: and G...","And God seeth the light that `it is' good, and...","And God, looking on the light, saw that it was...","And God saw the light, that it was good: and G...",GEN,1:4,1,4,1.0,GEN 1:4,וַיַּ֧רְא אֱלֹהִ֛ים אֶת־ הָא֖וֹר כִּי־ ט֑וֹ...
3,1001005,"And God called the light Day, and the darkness...","and God calleth to the light `Day,' and to the...","Naming the light, Day, and the dark, Night. An...","And God called the light Day, and the darkness...",GEN,1:5,1,5,1.0,GEN 1:5,וַיִּקְרָ֨א אֱלֹהִ֤ים׀ לָאוֹר֙ י֔וֹם וְלַחֹ...
4,1001006,"And God said, Let there be a firmament in the ...","And God saith, `Let an expanse be in the midst...","And God said, Let there be a solid arch stretc...","And God said, Let there be a firmament in the ...",GEN,1:6,1,6,1.0,GEN 1:6,וַיֹּ֣אמֶר אֱלֹהִ֔ים יְהִ֥י רָקִ֖יעַ בְּת֣...


## Slice biblical data

In [3]:
selected_book = 'JHN' # John -- the 'book' field uses USFM abbreviations

book_dataframe = bibles_dataframe[bibles_dataframe['book'] == selected_book]

book_dataframe.head()

Unnamed: 0,id,asv,ylt,bbe,kjv,book,chapter:verse,chapter,verse,book_id,book_chapter_verse,source_content
25910,43001001,"In the beginning was the Word, and the Word wa...","In the beginning was the Word, and the Word wa...","From the first he was the Word, and the Word w...","In the beginning was the Word, and the Word wa...",JHN,1:1,1,1,43.0,JHN 1:1,"Ἐν ἀρχῇ ἦν ὁ Λόγος, καὶ ὁ Λόγος ἦν πρὸ..."
25911,43001002,The same was in the beginning with God.,this one was in the beginning with God;,This Word was from the first in relation with ...,The same was in the beginning with God.,JHN,1:2,1,2,43.0,JHN 1:2,Οὗτος ἦν ἐν ἀρχῇ πρὸς τὸν Θεόν.
25912,43001003,All things were made through him; and without ...,"all things through him did happen, and without...","All things came into existence through him, an...",All things were made by him; and without him w...,JHN,1:3,1,3,43.0,JHN 1:3,"πάντα δι’ αὐτοῦ ἐγένετο, καὶ χωρὶς αὐτοῦ ..."
25913,43001004,In him was life; and the life was the light of...,"In him was life, and the life was the light of...","What came into existence in him was life, and ...",In him was life; and the life was the light of...,JHN,1:4,1,4,43.0,JHN 1:4,"ἐν αὐτῷ ζωὴ ἦν, καὶ ἡ ζωὴ ἦν τὸ φῶς τ..."
25914,43001005,And the light shineth in the darkness; and the...,"and the light in the darkness did shine, and t...",And the light goes on shining in the dark; it ...,And the light shineth in darkness; and the dar...,JHN,1:5,1,5,43.0,JHN 1:5,"καὶ τὸ φῶς ἐν τῇ σκοτίᾳ φαίνει, καὶ ἡ ..."


## Prompt ChatGPT with biblical data using **LangChain**

First, set your `OPENAI_API_KEY` environment variable. This allows you to securely authenticate with the OpenAI API, without worrying about leaving the value in your code. You can find your API key in your [OpenAI dashboard](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key).

In [12]:
import getpass, os, openai
from langchain.chat_models import ChatOpenAI, AzureChatOpenAI

secret_key = getpass.getpass('Enter OpenAI secret key: ') 
os.environ['OPENAI_API_KEY'] = secret_key

MODEL = 'gpt-3.5-turbo'
CHAT_MODEL = ChatOpenAI




Using Azure API endpoint


In [24]:
# If using the azure API endpoint, use the Azure instead
if not secret_key.startswith('sk-'):
    print("Using Azure API endpoint")
    openai.api_type = "azure"
    os.environ['OPENAI_API_BASE'] = 'https://americasopenai.azure-api.net'
    os.environ['OPENAI_API_VERSION'] = '2023-05-15'
    openai.api_version = "2023-03-15-preview"
    MODEL = 'gpt-35-turbo-16k'
    CHAT_MODEL = AzureChatOpenAI

Using Azure API endpoint


In [25]:
def chat(message, system_message = "You are a phD in theology and linguistics.  You know all the nuances of Greek and Hebrew and the major languages of English, German, Chinese, Arabic, Swahali.  You also speak over 2000 other languages and can learn new languages quickly.  You specialize in translating the Bible and explaining the nuances of the words.  You speak accurately, for anything you are unsure of you add [??] beside it."):
    messages = [
        {"role":"system","content": system_message},
        {"role":"user","content": message},
    ]
    response = openai.ChatCompletion.create(
            engine=MODEL,
            messages = messages,
            temperature=0.15,
            max_tokens=8000,
            top_p=0.95,
            stop=None)
    return response.get('choices',[{}])[0].get('message',{'content':''}).get('content','') 

# Test with John 3:15 (likely will do John 3:16 since more famous and using GPT 3.5 instead of 4)
chat("Explore John 3:15 word by word using Strong's concordance, return your answer as a properly formatted json array of objects with one object per word")

'{\n  "words": [\n    {\n      "word": "ἵνα",\n      "strongs_number": "G2443",\n      "transliteration": "hina",\n      "definition": "in order that, so that",\n      "part_of_speech": "conjunction"\n    },\n    {\n      "word": "πᾶς",\n      "strongs_number": "G3956",\n      "transliteration": "pas",\n      "definition": "all, every",\n      "part_of_speech": "adjective"\n    },\n    {\n      "word": "ὁ",\n      "strongs_number": "G3588",\n      "transliteration": "ho",\n      "definition": "the",\n      "part_of_speech": "article"\n    },\n    {\n      "word": "πιστεύων",\n      "strongs_number": "G4100",\n      "transliteration": "pisteuōn",\n      "definition": "believing, having faith in",\n      "part_of_speech": "verb"\n    },\n    {\n      "word": "εἰς",\n      "strongs_number": "G1519",\n      "transliteration": "eis",\n      "definition": "into, to, for",\n      "part_of_speech": "preposition"\n    },\n    {\n      "word": "αὐτὸν",\n      "strongs_number": "G846",\n      "tr

In [27]:
from langchain import PromptTemplate, LLMChain


# Select the first verse only for this example, but you can put this into a loop if you find something that works really well 😊 
# Cf. the LLMChain docs here: https://python.langchain.com/docs/modules/chains/foundational/llm_chain
verse_data = book_dataframe.iloc[0] 

# Building the prompt template
# Note that this template assumes two inputs, `english_text` and `greek_text`
prompt_template = """\
Given the following excerpt from the King James Version (kjv) of the Bible and its corresponding underlying Greek text (source_content), the verse in question is:

## English Text (kjv)
{english_text}

## Greek Text (source_content)
{greek_text}

## Instruction
Please generate 5 question-and-answer pairs that could be utilized for comprehension evaluation concerning the given text. These questions should demonstrate an understanding of the verse, its context, and its potential interpretations. The answer should be short (1-5 words). Format the output in JSON.

## Output
"""

# Creating a PromptTemplate instance
prompt = PromptTemplate.from_template(prompt_template)

# Configuring the LLMChain
llm = CHAT_MODEL(temperature=0, model=MODEL) # Or use 'gpt-4'
llm_chain = LLMChain(
    llm=llm,
    prompt=prompt,
)

# Forming the required content
english_text = verse_data['kjv']
greek_text = verse_data['source_content']

# Executing the LLMChain
response = llm_chain(inputs={'english_text': english_text, 'greek_text': greek_text})

InvalidRequestError: The API deployment for this resource does not exist. If you created the deployment within the last 5 minutes, please wait a moment and try again.

In [29]:
# Pretty print response
for key, value in response.items():
    print(key, ":", value)

english_text : In the beginning was the Word, and the Word was with God, and the Word was God.
greek_text : Ἐν  ἀρχῇ  ἦν  ὁ  Λόγος, καὶ  ὁ  Λόγος  ἦν  πρὸς  τὸν  Θεόν, καὶ  Θεὸς  ἦν  ὁ  Λόγος.
text : {
  "questions": [
    {
      "question": "What was in the beginning?",
      "answer": "The Word"
    },
    {
      "question": "Who was the Word with?",
      "answer": "God"
    },
    {
      "question": "What was the Word?",
      "answer": "God"
    },
    {
      "question": "Who was with God in the beginning?",
      "answer": "The Word"
    },
    {
      "question": "What does the Greek word 'Λόγος' mean?",
      "answer": "Word"
    }
  ]
}
