Goal of Notebook
-----

This notebook helps to understand how to load the whole bible versions available at this [github repo](https://github.com/scrollmapper/bible_databases) and extract gospels from it. 

Steps 
-------
    
1. Understand the arrangement of data. 

The data arrangement is explained in detail at original repo README. Please check it out [here](https://github.com/scrollmapper/bible_databases#verse-id-system).  

2. Load a complete bible data in json format.   

Here we use American Standard Version (ASV) of Bible. Check the github repo for available versions. 

3. Extract gospels

Here we extract Gospel of Matthew from the whole data. The extracted data is saved to a json file. 

In [1]:
import json 
import pandas as pd

In [2]:
def load_json(file_name):
    """
    Load the json file to a json object
    """
    return json.load(open(file_name))

def list_to_json(input_list, file_name):
    """
    Dump an input list as a json file
    """
    with open(file_name, 'w') as outfile:
        json.dump(input_list, outfile)

In [3]:
# Read key_abbreviations_english.json to understand which are the ids of gospel

bible_keys = pd.read_json("../bible_database/key_abbreviations_english.json")
bible_keys.head()

Unnamed: 0,a,b,id,p
0,Gen,1,1,1
1,Ge,1,2,0
2,Gn,1,3,0
3,Exo,2,4,1
4,Ex,2,5,0


The meaning of each column names are elaborated at [here](https://github.com/scrollmapper/bible_databases#key_abbreviations_english). For less confusion I'm taking only the columns required and rename them more intuitively

In [4]:
bible_keys.columns = ["Abbreviations", "bookID", "id", "p"]
bible_keys = bible_keys.loc[:, ["Abbreviations", "bookID"]]
bible_keys.head()

Unnamed: 0,Abbreviations,bookID
0,Gen,1
1,Ge,1
2,Gn,1
3,Exo,2
4,Ex,2


In [5]:
# get the book_ids of gospels

bible_keys[(bible_keys.bookID > 39) & (bible_keys.bookID < 44)]

Unnamed: 0,Abbreviations,bookID
170,Matt,40
171,Mt,40
172,Mrk,41
173,Mk,41
174,Mr,41
175,Luk,42
176,Lk,42
177,John,43
178,Jn,43
179,Jhn,43


We need `book_IDs` 40, 41, 42 and 43. 

1. Matthew - 40
2. Mark - 41
3. Luke - 42
4. John - 43

In [6]:
asv_bible = load_json("../bible_database/t_asv.json")

In [7]:
list_of_verse = asv_bible["resultset"]["row"]

In [8]:
list_of_verse[0]

{'field': [1001001,
  1,
  1,
  1,
  'In the beginning God created the heavens and the earth.']}

In [9]:
list_of_verse[1]

{'field': [1001002,
  1,
  1,
  2,
  'And the earth was waste and void; and darkness was upon the face of the deep: and the Spirit of God moved upon the face of the waters.']}

In [38]:
def extract_verses(start_index, stop_index, list_of_verse):
    """
    Extract the verses between start_index and stop_index (not included)
    List of verse is a list of dictionaries with key as field and values as verse 
    Output will be a list of list with chapter number, verse number and verses 
    """
    res_list = []
    for _, verse in enumerate(list_of_verse):
        if verse["field"][0] >= start_index and verse["field"][0] < stop_index:
            res_list.append(verse["field"])
    return res_list

Gospel of Matthew starts from 40001001 is explained as below. 

1. Book ID is 40
2. Chapter 1 - 001
3. Verse 1 - 001

Gospel of Mark start from 41001001 which is the ending of Gospel of Matthew

In [39]:
asv_matthew = extract_verses(40001001, 41001001, list_of_verse)

In [40]:
asv_matthew[1]

[40001002,
 40,
 1,
 2,
 'Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judah and his brethren;']

In [41]:
asv_matthew[-1]

[40028020,
 40,
 28,
 20,
 'teaching them to observe all things whatsoever I commanded you: and lo, I am with you always, even unto the end of the world.']

There are 1071 verses available in Gospel of Matthew (ASV) according. Let's cross verify this with our extracted data.

In [42]:
assert len(asv_matthew) == 1071, "The number of verses is wrong."

In [17]:
!mkdir ../raw_gospel_data
!mkdir ../raw_gospel_data/asv

In [19]:
list_to_json(asv_matthew, "../raw_gospel_data/asv/matthew_asv.json")

You can checkout the scripts folder for this demo in action. 