# Bible Authorship
Authors: <a href="mailto:razmalkau@gmail.com">Raz Malka</a> and <a href="mailto:shoham39@gmail.com">Shoham Yamin</a>
under the supervision of <a href="mailto:vlvolkov@braude.ac.il">Prof. Zeev Volkovich</a> and <a href="mailto:r_avros@braude.ac.il@braude.ac.il">Dr. Renata Avros</a>.\
Source:</br> https://github.com/ShohamYamin/BibleAuthorship/

# 1. Data Preparation

### 1.1 - Fetch BHSA - Hebrew Bible Database
<mark><i>Text-Fabric</i></mark> module lets us fetch the <mark><i>BHSA Hebrew Bible Database</i></mark> and use it programmatically.\
This open-source database by Eep Talstra Centre for Bible and Computer is also used by <mark><i>Tiberias</i></mark>.\
\
Let us import the required modules for this notebook:

In [1]:
# !pip install text-fabric

%load_ext autoreload
%autoreload 2

import aaib_util as util
from tf.app import use
import pandas as pd
import os

<mark><i>aaib_util</i></mark> is our utility extension, which defines essential information that is used along this project such as the list of books in the Hebrew Bible.\
We will now fetch the BHSA Hebrew Bible Database.

In [2]:
A = use('bhsa', hoist=globals())

### 1.2 - Export Books into .tsv Files
Our current objective is to export all of the books' texts into .tsv files. The whole database is already fetched.\
With appropriate queries we have to seperate it into books and verses, and finally export it as desired.

In [3]:
dir_path = os.getcwd() + '\\data'
for i in range(len(util.books)):
    query = 'book book=' + util.books[i] + '\n  sentence'
    results = A.search(query)
    
    print('Exporting .tsv file for', util.books[i])
    A.export(results, toDir=dir_path + '\\tsv', toFile= util.books[i] + '.tsv')

  0.16s 4617 results
Exporting .tsv file for Genesis
  0.06s 3227 results
Exporting .tsv file for Exodus
  0.06s 2146 results
Exporting .tsv file for Leviticus
  0.03s 2752 results
Exporting .tsv file for Numeri
  0.06s 2354 results
Exporting .tsv file for Deuteronomium
  0.05s 1612 results
Exporting .tsv file for Josua
  0.06s 2249 results
Exporting .tsv file for Judices
  0.06s 3124 results
Exporting .tsv file for Samuel_I
  0.02s 2406 results
Exporting .tsv file for Samuel_II
  0.02s 2512 results
Exporting .tsv file for Reges_I
  0.06s 2530 results
Exporting .tsv file for Reges_II
  0.07s 4211 results
Exporting .tsv file for Jesaia
  0.07s 4432 results
Exporting .tsv file for Jeremia
  0.06s 4094 results
Exporting .tsv file for Ezechiel
  0.01s 677 results
Exporting .tsv file for Hosea
  0.01s 221 results
Exporting .tsv file for Joel
  0.01s 451 results
Exporting .tsv file for Amos
  0.01s 64 results
Exporting .tsv file for Obadia
  0.01s 175 results
Exporting .tsv file for Jona
  0

### 1.3 - Generate Plain Texts
The exported files contain various data columns, some of them are not useful to us.\
Therefore, with the help of <mark><i>Pandas</i></mark> we intend to retain only their rightmost column - verse text.\
Note that the exported .tsv files have a non-standard encoding - <mark><i>UTF16-LE</i></mark>.

Afterwards, we have to prepare plain texts to ease our future tasks.

In [4]:
for i in range(len(util.books)):
    file_path = dir_path + '\\tsv\\' + util.books[i]
    df = pd.read_table(file_path + '.tsv', sep='\t', encoding = "utf_16_le")
    
    print('Generating Plain Text for', util.books[i])
    file_path = dir_path + '\\txt\\' + util.books[i]
    with open(file_path + '.txt', 'w', encoding = 'utf_16_le') as f:
        f.write(df['TEXT2'].str.cat(sep=' '))

Generating Plain Text for Genesis
Generating Plain Text for Exodus
Generating Plain Text for Leviticus
Generating Plain Text for Numeri
Generating Plain Text for Deuteronomium
Generating Plain Text for Josua
Generating Plain Text for Judices
Generating Plain Text for Samuel_I
Generating Plain Text for Samuel_II
Generating Plain Text for Reges_I
Generating Plain Text for Reges_II
Generating Plain Text for Jesaia
Generating Plain Text for Jeremia
Generating Plain Text for Ezechiel
Generating Plain Text for Hosea
Generating Plain Text for Joel
Generating Plain Text for Amos
Generating Plain Text for Obadia
Generating Plain Text for Jona
Generating Plain Text for Micha
Generating Plain Text for Nahum
Generating Plain Text for Habakuk
Generating Plain Text for Zephania
Generating Plain Text for Haggai
Generating Plain Text for Sacharia
Generating Plain Text for Maleachi
Generating Plain Text for Psalmi
Generating Plain Text for Iob
Generating Plain Text for Proverbia
Generating Plain Text f

### Extra - BHSA Structure
The Text-Fabric module extracts useful data from the BHSA Database in a tabular form.\
Run the following cell in order to inspect the obtained structure:

In [5]:
df

Unnamed: 0,R,S1,S2,S3,NODE1,TYPE1,book1,NODE2,TYPE2,TEXT2
0,1,2_Chronicles,1,1,426623,book,Chronica_II,1233773,sentence,וַיִּתְחַזֵּ֛ק שְׁלֹמֹ֥ה בֶן־דָּוִ֖יד עַל־מַלְ...
1,2,2_Chronicles,1,1,426623,book,Chronica_II,1233774,sentence,וַיהוָ֤ה אֱלֹהָיו֙ עִמֹּ֔ו
2,3,2_Chronicles,1,1,426623,book,Chronica_II,1233775,sentence,וַֽיְגַדְּלֵ֖הוּ לְמָֽעְלָה׃
3,4,2_Chronicles,1,2,426623,book,Chronica_II,1233776,sentence,וַיֹּ֣אמֶר שְׁלֹמֹ֣ה לְכָל־יִשְׂרָאֵ֡ל לְשָׂרֵ...
4,5,2_Chronicles,1,3,426623,book,Chronica_II,1233777,sentence,וַיֵּלְכ֗וּ שְׁלֹמֹה֙ וְכָל־הַקָּהָ֣ל עִמֹּ֔ו ...
...,...,...,...,...,...,...,...,...,...,...
2239,2240,2_Chronicles,36,23,426623,book,Chronica_II,1236012,sentence,כָּל־מַמְלְכֹ֤ות הָאָ֨רֶץ֙ נָ֣תַן לִ֗י יְהוָה֙...
2240,2241,2_Chronicles,36,23,426623,book,Chronica_II,1236013,sentence,וְהֽוּא־פָקַ֤ד עָלַי֙ לִבְנֹֽות־לֹ֣ו בַ֔יִת בּ...
2241,2242,2_Chronicles,36,23,426623,book,Chronica_II,1236014,sentence,מִֽי־בָכֶ֣ם מִכָּל־עַמֹּ֗ו
2242,2243,2_Chronicles,36,23,426623,book,Chronica_II,1236015,sentence,יְהוָ֧ה אֱלֹהָ֛יו עִמֹּ֖ו
