# Build `bibles.csv` data

In this notebook I download several Bibles from [scrollmapper/bible_databases](https://github.com/scrollmapper/bible_databases) (I wound up dropping one that has an off-by-one mismatch at some point, for simplicity's sake), and then I combine them into a single CSV file, including the following columns:

| Field                | Description                                                                              |
|----------------------|------------------------------------------------------------------------------------------|
| id                   | Numeric line ID                                                                          |
| **asv**                  | Content from American Standard Version                                                  |
| **ylt**                  | Content from Young's Literal Translation                                                 |
| **bbe**                  | Content from Bible in Basic English                                                      |
| **kjv**                  | Content from King James Version                                                          |
| book                 | Abbreviation of book name ('GEN', 'EXO', etc.)                                           |
| chapter:verse        | Chapter plus verse number                                                                |
| chapter              | Chapter number                                                                           |
| verse                | Verse number                                                                             |
| book_id              | Numeric value for book name ('1' for 'GEN', '66' for 'REV', etc.)                        |
| book_chapter_verse   | USFM format reference (e.g., 'GEN 1:1', 'REV 22:21', etc.)                               |
| source_content       | Source text content associated with verse; Hebrew (OT) or Greek (NT)                     |


In [1]:
!wget https://raw.githubusercontent.com/scrollmapper/bible_databases/master/csv/t_bbe.csv
!wget https://raw.githubusercontent.com/scrollmapper/bible_databases/master/csv/t_asv.csv 
!wget https://raw.githubusercontent.com/scrollmapper/bible_databases/master/csv/t_kjv.csv
!wget https://raw.githubusercontent.com/scrollmapper/bible_databases/master/csv/t_web.csv
!wget https://raw.githubusercontent.com/scrollmapper/bible_databases/master/csv/t_ylt.csv

"""
Example first lines

id,b,c,v,t
1001001,1,1,1,In the beginning God created the heaven and the earth.
1001002,1,1,2,"And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters."
1001003,1,1,3,"And God said, Let there be light: and there was light."
"""



--2023-08-15 12:11:11--  https://raw.githubusercontent.com/scrollmapper/bible_databases/master/csv/t_bbe.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8002::154, 2606:50c0:8003::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4783775 (4.6M) [text/plain]
Saving to: ‘t_bbe.csv’


2023-08-15 12:11:13 (29.8 MB/s) - ‘t_bbe.csv’ saved [4783775/4783775]

--2023-08-15 12:11:13--  https://raw.githubusercontent.com/scrollmapper/bible_databases/master/csv/t_asv.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8002::154, 2606:50c0:8003::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4766460 (4.5M) [text/plain]
Saving to: ‘t_asv.csv’


2023-08-15 12

'\nExample first lines\n\nid,b,c,v,t\n1001001,1,1,1,In the beginning God created the heaven and the earth.\n1001002,1,1,2,"And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters."\n1001003,1,1,3,"And God said, Let there be light: and there was light."\n'

In [2]:
import os
import pandas as pd

file_names = ['t_bbe.csv', 't_asv.csv', 't_kjv.csv', 't_web.csv', 't_ylt.csv']

path = '.' # FIXME: I moved the files to the 'data/english' folder after the fact
files = [f for f in os.listdir(path) if f in file_names]
for file in files:
    with open(os.path.join(path, file), 'r') as f:
        print(len(f.readlines()))

31104
31104
31103
31104
31104


In [22]:
import os
import pandas as pd

path = '.'
files = [f for f in os.listdir(path) if f in file_names]

# Initialize an empty DataFrame with index
combined_df = pd.DataFrame()

ids = pd.read_csv(os.path.join(path, 't_kjv.csv'), sep=',', header=1, usecols=[0], names=['id'])
# Read and add each file as a new column to the DataFrame
for file in files:
    col_name = file.replace('t_', '').replace('.csv', '')
    df = pd.read_csv(os.path.join(path, file), sep=',', header=1, usecols=[0, 4], names=['id', col_name], nrows=num_lines)
    combined_df[col_name] = df[col_name]

combined_df.drop(['web'], inplace=True, axis=1)

complete_df = pd.concat([ids, combined_df], axis=1)

complete_df.tail()

Unnamed: 0,id,asv,ylt,bbe,kjv
31097,66022017,"And the Spirit and the bride say, Come. And he...","And the Spirit and the Bride say, Come; and he...","And the Spirit and the bride say, Come. And le...","And the Spirit and the bride say, Come. And le..."
31098,66022018,I testify unto every man that heareth the word...,`For I testify to every one hearing the words ...,For I say to every man to whose ears have come...,For I testify unto every man that heareth the ...
31099,66022019,and if any man shall take away from the words ...,and if any one may take away from the words of...,And if any man takes away from the words of th...,And if any man shall take away from the words ...
31100,66022020,"He who testifieth these things saith, Yea: I c...",he saith -- who is testifying these things -- ...,"He who gives witness to these things says, Tru...","He which testifieth these things saith, Surely..."
31101,66022021,The grace of the Lord Jesus be with the saints...,The grace of our Lord Jesus Christ `is' with y...,The grace of the Lord Jesus be with the saints...,The grace of our Lord Jesus Christ be with you...


In [35]:
complete_df.head()

Unnamed: 0,id,asv,ylt,bbe,kjv
0,1001002,And the earth was waste and void; and darkness...,"the earth hath existed waste and void, and dar...",And the earth was waste and without form; and ...,"And the earth was without form, and void; and ..."
1,1001003,"And God said, Let there be light: and there wa...","and God saith, `Let light be;' and light is.","And God said, Let there be light: and there wa...","And God said, Let there be light: and there wa..."
2,1001004,"And God saw the light, that it was good: and G...","And God seeth the light that `it is' good, and...","And God, looking on the light, saw that it was...","And God saw the light, that it was good: and G..."
3,1001005,"And God called the light Day, and the darkness...","and God calleth to the light `Day,' and to the...","Naming the light, Day, and the dark, Night. An...","And God called the light Day, and the darkness..."
4,1001006,"And God said, Let there be a firmament in the ...","And God saith, `Let an expanse be in the midst...","And God said, Let there be a solid arch stretc...","And God said, Let there be a firmament in the ..."


In [39]:
vref_df = pd.read_csv('vref.txt', header=None)
# split the only column into two columns by space
vref_df = vref_df[0].str.split(' ', expand=True)
vref_df.columns = ['book', 'chapter:verse']
# now additionally split the chapter:verse column into two columns by colon
vref_df['chapter'], vref_df['verse'] = vref_df['chapter:verse'].str.split(':', expand=True)[0], vref_df['chapter:verse'].str.split(':', expand=True)[1]


# iterate through each row, and count the number of unique 'book' values, and add a new column 'book_id' (1-indexed)
book_id = 1
for index, row in vref_df.iterrows():
    if index == 0:
        vref_df.loc[index, 'book_id'] = book_id
        continue
    if row['book'] != vref_df.loc[index-1, 'book']:
        book_id += 1
    vref_df.loc[index, 'book_id'] = book_id
    
for index, row in vref_df.iterrows():
    # add a verse_id like '1001002' for 'GEN/1.0 1:1' Note that the chapter and verse values are 3-digit with leading zeros
    vref_df.loc[index, 'verse_id'] = str(int(row['book_id'])) + str(int(row['chapter'])).zfill(3) + str(int(row['verse'])).zfill(3)

vref_df.head()

Unnamed: 0,book,chapter:verse,chapter,verse,book_id,verse_id
0,GEN,1:1,1,1,1.0,1001001
1,GEN,1:2,1,2,1.0,1001002
2,GEN,1:3,1,3,1.0,1001003
3,GEN,1:4,1,4,1.0,1001004
4,GEN,1:5,1,5,1.0,1001005


In [29]:
vref_df.tail()

Unnamed: 0,book,chapter:verse,book_id
41894,ENO,42:12,89.0
41895,ENO,42:13,89.0
41896,ENO,42:14,89.0
41897,ENO,42:15,89.0
41898,ENO,42:16,89.0


In [31]:
# drop any row where the book id is over 66
protestant_vref_df = vref_df[vref_df['book_id'] < 67]


protestant_vref_df.tail()

Unnamed: 0,book,chapter:verse,book_id
31165,REV,22:17,66.0
31166,REV,22:18,66.0
31167,REV,22:19,66.0
31168,REV,22:20,66.0
31169,REV,22:21,66.0


In [40]:
# Convert the 'verse_id' column in vref_df to the same data type as the 'id' column in complete_df
vref_df['verse_id'] = vref_df['verse_id'].astype(complete_df['id'].dtype)

# Merge the DataFrames on the respective columns
merged_df = pd.merge(complete_df, vref_df, left_on='id', right_on='verse_id', how='inner')

# Optionally, you can drop the duplicate 'verse_id' column if you want
merged_df.drop('verse_id', axis=1, inplace=True)

merged_df.head()


Unnamed: 0,id,asv,ylt,bbe,kjv,book,chapter:verse,chapter,verse,book_id
0,1001002,And the earth was waste and void; and darkness...,"the earth hath existed waste and void, and dar...",And the earth was waste and without form; and ...,"And the earth was without form, and void; and ...",GEN,1:2,1,2,1.0
1,1001003,"And God said, Let there be light: and there wa...","and God saith, `Let light be;' and light is.","And God said, Let there be light: and there wa...","And God said, Let there be light: and there wa...",GEN,1:3,1,3,1.0
2,1001004,"And God saw the light, that it was good: and G...","And God seeth the light that `it is' good, and...","And God, looking on the light, saw that it was...","And God saw the light, that it was good: and G...",GEN,1:4,1,4,1.0
3,1001005,"And God called the light Day, and the darkness...","and God calleth to the light `Day,' and to the...","Naming the light, Day, and the dark, Night. An...","And God called the light Day, and the darkness...",GEN,1:5,1,5,1.0
4,1001006,"And God said, Let there be a firmament in the ...","And God saith, `Let an expanse be in the midst...","And God said, Let there be a solid arch stretc...","And God said, Let there be a firmament in the ...",GEN,1:6,1,6,1.0


In [43]:
# Read the combined_greek_hebrew_vref.csv file into a DataFrame
greek_hebrew_df = pd.read_csv('data/combined_greek_hebrew_vref.csv')

# Rename the 'content' column to 'source_content' for clarity
greek_hebrew_df.rename(columns={'content': 'source_content'}, inplace=True)

# Make a book chapter verse column
merged_df['book_chapter_verse'] = merged_df['book'] + ' ' + merged_df['chapter:verse']

# Merge the DataFrames on the 'chapter:verse' and 'vref' columns
final_df = pd.merge(merged_df, greek_hebrew_df, left_on='book_chapter_verse', right_on='vref', how='left')

# Optionally, you can drop the 'vref' column if you want, as it's now redundant with 'chapter:verse'
final_df.drop('vref', axis=1, inplace=True)

final_df.head()


Unnamed: 0,id,asv,ylt,bbe,kjv,book,chapter:verse,chapter,verse,book_id,book_chapter_verse,source_content
0,1001002,And the earth was waste and void; and darkness...,"the earth hath existed waste and void, and dar...",And the earth was waste and without form; and ...,"And the earth was without form, and void; and ...",GEN,1:2,1,2,1.0,GEN 1:2,וְהָאָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ וָבֹ֔הוּ וְחֹ֖...
1,1001003,"And God said, Let there be light: and there wa...","and God saith, `Let light be;' and light is.","And God said, Let there be light: and there wa...","And God said, Let there be light: and there wa...",GEN,1:3,1,3,1.0,GEN 1:3,וַיֹּ֥אמֶר אֱלֹהִ֖ים יְהִ֣י א֑וֹר וַֽיְהִי...
2,1001004,"And God saw the light, that it was good: and G...","And God seeth the light that `it is' good, and...","And God, looking on the light, saw that it was...","And God saw the light, that it was good: and G...",GEN,1:4,1,4,1.0,GEN 1:4,וַיַּ֧רְא אֱלֹהִ֛ים אֶת־ הָא֖וֹר כִּי־ ט֑וֹ...
3,1001005,"And God called the light Day, and the darkness...","and God calleth to the light `Day,' and to the...","Naming the light, Day, and the dark, Night. An...","And God called the light Day, and the darkness...",GEN,1:5,1,5,1.0,GEN 1:5,וַיִּקְרָ֨א אֱלֹהִ֤ים׀ לָאוֹר֙ י֔וֹם וְלַחֹ...
4,1001006,"And God said, Let there be a firmament in the ...","And God saith, `Let an expanse be in the midst...","And God said, Let there be a solid arch stretc...","And God said, Let there be a firmament in the ...",GEN,1:6,1,6,1.0,GEN 1:6,וַיֹּ֣אמֶר אֱלֹהִ֔ים יְהִ֥י רָקִ֖יעַ בְּת֣...


In [44]:
# export csv
final_df.to_csv('bibles.csv', index=False)