<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Import-and-Load-Data" data-toc-modified-id="Import-and-Load-Data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Import and Load Data</a></span></li><li><span><a href="#What-Did-Jesus-Say?" data-toc-modified-id="What-Did-Jesus-Say?-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>What Did Jesus Say?</a></span><ul class="toc-item"><li><span><a href="#Basic-Processing" data-toc-modified-id="Basic-Processing-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Basic Processing</a></span></li><li><span><a href="#Proper-Nouns" data-toc-modified-id="Proper-Nouns-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Proper Nouns</a></span></li></ul></li></ul></div>

# WWJS

What Would Jesus Say?<br>
The bible is a very structured text. So there are only a very small number of verbs that are related with Jesus. This means we can encode all the types of responses he had into simple integers, and maybe train a classifier to see what Jesus may say.

## Import and Load Data

In [3]:
import pandas as pd

import sys 
sys.path.append('../../data/')
from utils import load_book

In [5]:
books = {}
for book in ["Matthew", "Mark", "Luke", "John"]:
    books[book] = load_book(book, '../../data/new_testament/')
    books[book]["book"] = book

## What Did Jesus Say?

To get a grasp at our task, let's first try and take a look at what Jesus said.<br>
To do so, we need to first be able to figure out who said what for each sentence.<br><br>
Which is why we'll start by looking at every Proper Noun found in the first 4 books.<br>
Then we'll try and match each pronoun to a proper noun.

### Basic Processing

In [6]:
# Concatenate the 4 starting books
df = pd.concat(books.values()).reset_index(drop=True)

In [8]:
df

Unnamed: 0,verse,passage,text,book
0,1,1,"The book of the generation of Jesus Christ, th...",Matthew
1,1,2,Abraham begat Isaac; and Isaac begat Jacob; an...,Matthew
2,1,3,And Judas begat Phares and Zara of Thamar; and...,Matthew
3,1,4,And Aram begat Aminadab; and Aminadab begat Na...,Matthew
4,1,5,And Salmon begat Booz of Rachab; and Booz bega...,Matthew
...,...,...,...,...
3774,21,21,"Peter seeing him saith to Jesus, Lord, and wha...",John
3775,21,22,"Jesus saith unto him, If I will that he tarry ...",John
3776,21,23,Then went this saying abroad among the brethre...,John
3777,21,24,This is the disciple which testifieth of these...,John


### Proper Nouns

In [2]:
from nltk.tag import pos_tag

In [15]:
df['pos_tagged'] = df.text.apply(lambda s : pos_tag(s.split()))

In [16]:
def extract_tag(lst, target_tag):
    answer = []
    for word, tag in lst:
        if tag == target_tag:
            answer.append(word)
    return answer
df['NNP'] = df.pos_tagged.apply(lambda l : extract_tag(l, 'NNP'))

In [23]:
all_proper_nouns = set()
_ = df.NNP.apply(lambda l : all_proper_nouns.update(l))

In [24]:
all_proper_nouns

{'(as',
 '(for',
 '(he',
 '(let',
 '(which',
 '(whoso',
 'A',
 'Aaron,',
 'Abba,',
 'Abel',
 'Abia',
 'Abia:',
 'Abia;',
 'Abiathar',
 'Abide',
 'Abilene,',
 'Abiud',
 'Abiud;',
 'Abraham',
 "Abraham's",
 'Abraham,',
 'Abraham.',
 'Abraham:',
 'Abraham?',
 'Achaz',
 'Achaz;',
 'Achim',
 'Achim;',
 'Adam,',
 'Addi,',
 'Aenon',
 'Afterward',
 'Again',
 'Again,',
 'Agree',
 'Ah,',
 'Alexander',
 'All',
 'Alphaeus',
 'Alphaeus,',
 'Am',
 'Amen.',
 'Aminadab',
 'Aminadab,',
 'Aminadab;',
 'Amon',
 'Amon;',
 'Amos,',
 'And,',
 'Andrew',
 'Andrew,',
 'Andrew:',
 'Anna,',
 'Annas',
 'Answerest',
 'Aram',
 'Aram,',
 'Aram;',
 'Archelaus',
 'Are',
 'Arimathaea,',
 'Arise,',
 'Arise.',
 'Arphaxad,',
 'Art',
 'Asa',
 'Asa;',
 'Aser:',
 'Ask',
 'Ask,',
 'Augustus',
 'Avenge',
 'Away',
 'Azor',
 'Azor;',
 'Babylon',
 'Babylon,',
 'Babylon:',
 'Baptist',
 "Baptist's",
 'Baptist,',
 'Baptist.',
 'Baptist:',
 'Baptist;',
 'Barabbas',
 'Barabbas,',
 'Barabbas.',
 'Barabbas:',
 'Barachias,',
 'Barjona:',