<a href="https://colab.research.google.com/github/faith-quant-lab/bible-scripting/blob/master/BibleDF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# FaithQuant Lab: Bible dataframe

Simple tool to download the Bible text (from many versions in SWORD project) into a DataFrame.

Credit: This module makes much use of https://github.com/wasdin/SWORD-to-JSON

## your changes to this code:
To tailor for your own use, the only change you need to make in this document is the setting of the bible version you wish to use:

> VERSION_CODE = 'ESV2001'

in the second section named **"download Bible from SWORD project"**



## prepare SWORD-2-JSON

In [0]:
! pip install pysword

Collecting pysword
  Downloading https://files.pythonhosted.org/packages/c0/aa/20d32448dd6a0018e5ed77c8115004fddc08e44b66e9629257639d382150/pysword-0.2.6.tar.gz
Building wheels for collected packages: pysword
  Building wheel for pysword (setup.py) ... [?25l[?25hdone
  Stored in directory: /root/.cache/pip/wheels/d5/a0/05/662298ce54f2723110779c160d4297419060fe97d623cb87e5
Successfully built pysword
Installing collected packages: pysword
Successfully installed pysword-0.2.6


In [0]:
! git clone https://github.com/wasdin/SWORD-to-JSON.git

Cloning into 'SWORD-to-JSON'...
remote: Enumerating objects: 20, done.[K
remote: Total 20 (delta 0), reused 0 (delta 0), pack-reused 20[K
Unpacking objects: 100% (20/20), done.


In [0]:
# Pretty Print version: Add a "indent=4" in json.dump 

! sed -e "s/outfile)/outfile, indent=4)/" SWORD-to-JSON/sword_to_json.py > SWORD-to-JSON/sword_to_json_pp.py

## download Bible from SWORD project

http://crosswire.org/sword/modules/ModDisp.jsp?modType=Bibles

Available among the more famous English versions are: 


*   ESV2001
*   KJV
*   ASV
*   ISV




In [0]:
# VERSION_CODE is the "Name" column in the SWORD index
VERSION_CODE = 'ESV2001'

#VERSION_CODE = 'ChiUn'  # Chinese Union Version in Unicode

zipfile = VERSION_CODE+'.zip'
url = "http://crosswire.org/ftpmirror/pub/sword/packages/rawzip/"+zipfile
dest_file = '/content/'+zipfile



In [0]:
import os
exists = os.path.isfile(dest_file)
if not exists:
    import urllib.request
    urllib.request.urlretrieve(url, dest_file)
else:
    print(dest_file + " already exists")

## Convert to JSON

In [0]:
! python SWORD-to-JSON/sword_to_json_pp.py --source_file '$dest_file' --bible_version '$VERSION_CODE' --output_file '$VERSION_CODE'.json

In [0]:
! head -17  '$VERSION_CODE'.json

{
    "books": [
        {
            "name": "Genesis",
            "chapters": [
                {
                    "chapter": 1,
                    "name": "Genesis 1",
                    "verses": [
                        {
                            "verse": 1,
                            "chapter": 1,
                            "name": "Genesis 1:1",
                            "text": "\u8d77\u521d\uff0c\u3000\u795e\u5275\u9020\u5929\u5730\u3002"
                        },
                        {
                            "verse": 2,


## Create module for SWORD-2-DF


In [0]:
from pysword.modules import SwordModules
import argparse, json, sys
import pandas as pd

if sys.version_info > (3, 0):
    from past.builtins import xrange


def generate_DF(source_file, bible_version):
    modules = SwordModules(source_file)
    found_modules = modules.parse_modules()
    bible = modules.get_bible_from_module(bible_version)

    books = bible.get_structure()._books['ot'] + bible.get_structure()._books['nt']

    bib = None
    verses = []
            
    book_id = 0
    for book in books:
        book_id += 1
        for chapter in xrange(1, book.num_chapters+1):
            for verse in xrange(1, len(book.get_indicies(chapter))+1 ):
                verses.append({
                    'index' : str(book_id) + ":" + str(chapter) + ":" + str(verse),
                    'id_book' : book.name,
                    'id_bookCt' : book_id,
                    'id_chapter': chapter,
                    'id_verse': verse,
                    'name_verse': book.name + " " + str(chapter) + ":" + str(verse),
                    'text': bible.get(books=[book.name], chapters=[chapter], verses=[verse])
                    })

    bib = pd.DataFrame(verses).set_index('index')
    return bib


# test code
bible_DF = generate_DF(dest_file, VERSION_CODE)
bible_DF.head()

Unnamed: 0_level_0,id_book,id_bookCt,id_chapter,id_verse,name_verse,text
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1:1:1,Genesis,1,1,1,Genesis 1:1,"In the beginning, God created the heavens and ..."
1:1:2,Genesis,1,1,2,Genesis 1:2,"The earth was without form and void, and darkn..."
1:1:3,Genesis,1,1,3,Genesis 1:3,"And God said, Let there be light, and there wa..."
1:1:4,Genesis,1,1,4,Genesis 1:4,And God saw that the light was good. And God s...
1:1:5,Genesis,1,1,5,Genesis 1:5,"God called the light Day, and the darkness he ..."


### Saving output

In [0]:
# pickle
import pickle
with open(VERSION_CODE+'_DF.pickle','wb') as outfile:
    pickle.dump(bible_DF, outfile)

# bible_DF.to_csv(VERSION_CODE+'.csv')

### Loading result from previous runs


In [0]:
import pickle
with open(VERSION_CODE+'_DF.pickle','rb') as infile:
    bible_DF = pickle.load(infile)

## Other versions in JSON

https://github.com/honza/bibles


*   ESV
*   MSG
*   NIV
*   NLT




# Future research directions

*   use parallel versions to explore translation patterns
*   compare how words are used in particular situations, using parallel versions for parallel patterns. 
*     i.e. more confident in drawing relationship when similar usage appear for other languages
*   explore manual methods occuring in FaithLife Logos platform, and identify case studies that can be automated here

