# Bible Authorship
Authors: <a href="mailto:razmalkau@gmail.com">Raz Malka</a> and <a href="mailto:shoham39@gmail.com">Shoham Yamin</a>
under the supervision of <a href="mailto:vlvolkov@braude.ac.il">Prof. Zeev Volkovich</a> and <a href="mailto:r_avros@braude.ac.il@braude.ac.il">Dr. Renata Avros</a>.\
Source:</br> https://github.com/ShohamYamin/BibleAuthorship/

# 3. Word Embedding - ELMo

### 3.1 - General
A prerequisite to this notebook is HIT-SCIR's <mark>ELMoForManyLangs</mark>, which can be installed by running the following cell.\
Moreover, their <a href="http://vectors.nlpl.eu/repository/11/154.zip">pre-trained Hebrew model</a> is required and has to be extracted under 'models\pretrained\he_elmo_model' relative to the path of this notebook.

In [1]:
!git clone https://github.com/HIT-SCIR/ELMoForManyLangs.git
!pip install -e ELMoForManyLangs/

fatal: destination path 'ELMoForManyLangs' already exists and is not an empty directory.


Obtaining file:///C:/Users/Raz/jupyter_notebooks/AAiB/submittion/ELMoForManyLangs
Installing collected packages: elmoformanylangs
  Attempting uninstall: elmoformanylangs
    Found existing installation: elmoformanylangs 0.0.4.post2
    Uninstalling elmoformanylangs-0.0.4.post2:
      Successfully uninstalled elmoformanylangs-0.0.4.post2
  Running setup.py develop for elmoformanylangs
Successfully installed elmoformanylangs


Let us import the required modules for this notebook:

In [2]:
%load_ext autoreload
%autoreload 2

import aaib_util as util
import json
import os
import numpy as np
from ELMoForManyLangs.elmoformanylangs.elmo import Embedder

os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

### 3.1 - Loading Preprocessed Books
The data prepared in the previous task <mark>Preprocessing and Dividing</mark> has to be loaded into memory to be used programmatically:

In [3]:
collection = []
for i in range(len(util.books)):
    with open(util.file_path + "json\\" + util.books[i] + ".json", "r") as fp:
        collection.append(json.load(fp))

### 3.2 - Embedding with ELMoForManyLangs
Now that the data is loaded into memory, we can use <mark>ELMoForManyLangs</mark> to perform word embedding with the Hebrew texts.\
The used model is 'he_elmo_model', but it can be used accordingly for any other language supported by HIT-SCIR's repository.

<b>Note:</b> An embedding process with ELMo for the entire Hebrew Bible may take some time and up to 2GB of storage.\
For instance, it takes roughly ~12 minutes on a machine with 32GB RAM, 9th Generation Intel i7 Processor and NVIDIA RTX 2060 GPU.

In [4]:
e = Embedder('models\\pretrained\\he_elmo_model')
embeddings = []
for i in range(len(util.books)):
    embeddings.append(e.sents2elmo(collection[i]))
    print("ELMo Word Embedding Completed for {} ({}/{})".format(util.books[i], i + 1, len(util.books)))
    
"""
try:
    e = Embedder('models\\pretrained\\he_elmo_model')
except:
    print("Embedding Failed.")
else:
    embeddings = []
    for i in range(len(util.books)):
        embeddings.append(e.sents2elmo(collection[i]))
        print("ELMo Word Embedding Completed for {} ({}/{})".format(util.books[i], i + 1, len(util.books)))
"""

2021-06-15 03:10:40,118 INFO: char embedding size: 2289
2021-06-15 03:10:40,901 INFO: word embedding size: 189561
2021-06-15 03:10:43,633 INFO: Model(
  (token_embedder): ConvTokenEmbedder(
    (word_emb_layer): EmbeddingLayer(
      (embedding): Embedding(189561, 100, padding_idx=3)
    )
    (char_emb_layer): EmbeddingLayer(
      (embedding): Embedding(2289, 50, padding_idx=2286)
    )
    (convolutions): ModuleList(
      (0): Conv1d(50, 32, kernel_size=(1,), stride=(1,))
      (1): Conv1d(50, 32, kernel_size=(2,), stride=(1,))
      (2): Conv1d(50, 64, kernel_size=(3,), stride=(1,))
      (3): Conv1d(50, 128, kernel_size=(4,), stride=(1,))
      (4): Conv1d(50, 256, kernel_size=(5,), stride=(1,))
      (5): Conv1d(50, 512, kernel_size=(6,), stride=(1,))
      (6): Conv1d(50, 1024, kernel_size=(7,), stride=(1,))
    )
    (highways): Highway(
      (_layers): ModuleList(
        (0): Linear(in_features=2048, out_features=4096, bias=True)
        (1): Linear(in_features=2048, out_fe

ELMo Word Embedding Completed for Genesis (1/39)


2021-06-15 03:11:27,276 INFO: 3 batches, avg len: 130.0


ELMo Word Embedding Completed for Exodus (2/39)


2021-06-15 03:12:02,985 INFO: 2 batches, avg len: 130.0


ELMo Word Embedding Completed for Leviticus (3/39)


2021-06-15 03:12:28,051 INFO: 3 batches, avg len: 130.0


ELMo Word Embedding Completed for Numeri (4/39)


2021-06-15 03:13:02,737 INFO: 2 batches, avg len: 130.0


ELMo Word Embedding Completed for Deuteronomium (5/39)


2021-06-15 03:13:31,867 INFO: 2 batches, avg len: 130.0


ELMo Word Embedding Completed for Josua (6/39)


2021-06-15 03:13:54,957 INFO: 2 batches, avg len: 130.0


ELMo Word Embedding Completed for Judices (7/39)


2021-06-15 03:14:18,246 INFO: 2 batches, avg len: 130.0


ELMo Word Embedding Completed for Samuel_I (8/39)


2021-06-15 03:14:46,461 INFO: 2 batches, avg len: 130.0


ELMo Word Embedding Completed for Samuel_II (9/39)


2021-06-15 03:15:10,680 INFO: 2 batches, avg len: 130.0


ELMo Word Embedding Completed for Reges_I (10/39)


2021-06-15 03:15:38,189 INFO: 2 batches, avg len: 130.0


ELMo Word Embedding Completed for Reges_II (11/39)


2021-06-15 03:16:04,227 INFO: 3 batches, avg len: 130.0


ELMo Word Embedding Completed for Jesaia (12/39)


2021-06-15 03:16:41,808 INFO: 3 batches, avg len: 130.0


ELMo Word Embedding Completed for Jeremia (13/39)


2021-06-15 03:17:26,088 INFO: 3 batches, avg len: 130.0
2021-06-15 03:18:05,204 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Ezechiel (14/39)


2021-06-15 03:18:12,304 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Hosea (15/39)


2021-06-15 03:18:17,020 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Joel (16/39)


2021-06-15 03:18:24,197 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Amos (17/39)


2021-06-15 03:18:27,693 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Obadia (18/39)


2021-06-15 03:18:32,721 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Jona (19/39)


2021-06-15 03:18:38,804 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Micha (20/39)


2021-06-15 03:18:43,187 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Nahum (21/39)


2021-06-15 03:18:48,114 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Habakuk (22/39)


2021-06-15 03:18:52,744 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Zephania (23/39)


2021-06-15 03:18:57,377 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Haggai (24/39)


2021-06-15 03:19:06,155 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Sacharia (25/39)
ELMo Word Embedding Completed for Maleachi (26/39)


2021-06-15 03:19:12,368 INFO: 3 batches, avg len: 130.0


ELMo Word Embedding Completed for Psalmi (27/39)


2021-06-15 03:19:54,584 INFO: 2 batches, avg len: 130.0


ELMo Word Embedding Completed for Iob (28/39)


2021-06-15 03:20:16,209 INFO: 1 batches, avg len: 130.0
2021-06-15 03:20:31,309 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Proverbia (29/39)


2021-06-15 03:20:37,015 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Ruth (30/39)


2021-06-15 03:20:42,467 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Canticum (31/39)


2021-06-15 03:20:52,019 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Ecclesiastes (32/39)
ELMo Word Embedding Completed for Threni (33/39)


2021-06-15 03:20:59,921 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Esther (34/39)


2021-06-15 03:21:09,859 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Daniel (35/39)


2021-06-15 03:21:23,859 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Esra (36/39)


2021-06-15 03:21:35,770 INFO: 1 batches, avg len: 130.0


ELMo Word Embedding Completed for Nehemia (37/39)


2021-06-15 03:21:49,896 INFO: 2 batches, avg len: 130.0


ELMo Word Embedding Completed for Chronica_I (38/39)


2021-06-15 03:22:17,705 INFO: 2 batches, avg len: 130.0


ELMo Word Embedding Completed for Chronica_II (39/39)


'\ntry:\n    e = Embedder(\'models\\pretrained\\he_elmo_model\')\nexcept:\n    print("Embedding Failed.")\nelse:\n    embeddings = []\n    for i in range(len(util.books)):\n        embeddings.append(e.sents2elmo(collection[i]))\n        print("ELMo Word Embedding Completed for {} ({}/{})".format(util.books[i], i + 1, len(util.books)))\n'

### 3.3 - Saving the Embeddings
After the word embedding process has completed, generated data should be saved locally for future use:

In [6]:
for i in range(len(util.books)):
    with open(util.file_path + "npy_elmo\\embedded\\" + util.books[i] + ".npy", "wb") as fp:
        np.save(fp, np.array(embeddings[i]))