# Bible Word Analyser

## How to use

Enter a strongs number to the text box, and run all cells

In [None]:
strongs_number = input("Enter a Strong's number: ")
print ("You have selected: " + strongs_number)

Enter a Strong's number: 334
You have selected: 334


# Reading Data

## Raw Dataset Info
 * https://github.com/openscriptures/morphhb/tree/master - hebrew bible with strongs numbers - lemma is the strongs number. XML format, but can be converted to json

 * https://github.com/openscriptures/HebrewLexicon/tree/master - use this to see what hebrew word is represented by strongs number - XML concordance

* https://github.com/openscriptures/GreekResources/tree/master - json with lexicon containing strongs for the Septuagint. Useful, but maybe not needed at this point

* https://github.com/openscriptures/strongs/blob/master/greek/strongs-greek-dictionary.js - strongs dictionary for hebrew, and importantly, greek

* OpenGNT_BASE_TEXT.zip from https://github.com/eliranwong/OpenGNT/tree/master - stores the greek nt and strongs numbers corresponding to each word

## Generate dictionary of Hebrew words and strongs numbers

In [6]:
"""Download the files"""
from urllib.request import urlretrieve
import os

os.popen("mkdir hebrew")

# The names of each hebrew book to download
book_names = [
    "1Chr",
    "1Kgs",
    "1Sam",
    "2Chr",
    "2Kgs",
    "2Sam",
    "Amos",
    "Dan",
    "Deut",
    "Eccl",
    "Esth",
    "Exod",
    "Ezek",
    "Ezra",
    "Gen",
    "Hab",
    "Hag",
    "Hos",
    "Isa",
    "Jer",
    "Job",
    "Joel",
    "Jonah",
    "Josh",
    "Judg",
    "Lam",
    "Lev",
    "Mal",
    "Mic",
    "Nah",
    "Neh",
    "Num",
    "Obad",
    "Prov",
    "Ps",
    "Ruth",
    "Song",
    "Zech",
    "Zeph"
]

# Download each book
url = "https://raw.githubusercontent.com/openscriptures/morphhb/master/wlc/"
filename = "hebrew/"

for number, book in enumerate(book_names):
  print ("Downloading book " + str(number+1) + " of 39: " + book + ".xml")
  urlretrieve(url + book + ".xml", filename + book + ".xml")

print ("\n\u001b[32mDownloaded all hebrew books!\u001b[0m")


Downloading book 1 of 39: 1Chr.xml
Downloading book 2 of 39: 1Kgs.xml
Downloading book 3 of 39: 1Sam.xml
Downloading book 4 of 39: 2Chr.xml
Downloading book 5 of 39: 2Kgs.xml
Downloading book 6 of 39: 2Sam.xml
Downloading book 7 of 39: Amos.xml
Downloading book 8 of 39: Dan.xml
Downloading book 9 of 39: Deut.xml
Downloading book 10 of 39: Eccl.xml
Downloading book 11 of 39: Esth.xml
Downloading book 12 of 39: Exod.xml
Downloading book 13 of 39: Ezek.xml
Downloading book 14 of 39: Ezra.xml
Downloading book 15 of 39: Gen.xml
Downloading book 16 of 39: Hab.xml
Downloading book 17 of 39: Hag.xml
Downloading book 18 of 39: Hos.xml
Downloading book 19 of 39: Isa.xml
Downloading book 20 of 39: Jer.xml
Downloading book 21 of 39: Job.xml
Downloading book 22 of 39: Joel.xml
Downloading book 23 of 39: Jonah.xml
Downloading book 24 of 39: Josh.xml
Downloading book 25 of 39: Judg.xml
Downloading book 26 of 39: Lam.xml
Downloading book 27 of 39: Lev.xml
Downloading book 28 of 39: Mal.xml
Downloading

In [43]:
def extract_numbers(lemma):
    """ Convert strongs numbers to only numbers """
    # Use regular expression to extract numbers
    numbers = re.findall(r'\d+', lemma)
    # Join the extracted numbers into a single string
    result = ''.join(numbers)
    return result

def get_verse(verse_id) :
  """ Get the verse from the verse id in format Obad.1.18 """
  return verse_id.split(".")[2]

def get_book(verse_id) :
  """ Get the book from the verse id in format Obad.1.18 """
  return verse_id.split(".")[0]

def get_chapter(verse_id) :
  """ Get the chapter from the verse id in format Obad.1.18 """
  return verse_id.split(".")[1]

In [65]:
# Read all the book data into a dataframe containing: word, strongs, book, chapter, verse
import xml.etree.ElementTree as ET
import pandas as pd
import re

data = {
    "word": [],
    "lemma": [],
    "book": [],
    "chapter": [],
    "verse": [],
}

# Parse the XML file
for book in book_names:
  tree = ET.parse('hebrew/' + book + '.xml')
  root = tree.getroot()

  # Define the namespace
  namespace = {'osis': 'http://www.bibletechnologies.net/2003/OSIS/namespace'}

  # Find all word elements within each verse
  for verse in root.findall('.//{http://www.bibletechnologies.net/2003/OSIS/namespace}verse'):
      verse_id = verse.attrib.get('osisID')
      for word in verse.findall('.//{http://www.bibletechnologies.net/2003/OSIS/namespace}w'):
          value = word.text
          lemma = word.attrib.get('lemma')
          data["word"].append(value)
          data["lemma"].append(extract_numbers(lemma))
          data["book"].append(get_book(verse_id))
          data["chapter"].append(get_chapter(verse_id))
          data["verse"].append(get_verse(verse_id))

hebrew_words = pd.DataFrame(data)

print ("\n\u001b[32mValues in each book extracted!\u001b[0m")


[32mValues in each book extracted![0m
