# Introduction to ```Jupyter Notebooks```
## What is a notebook?
Jupyter Notebooks are the ideal environment for doing python (its a programming language) based research in both the sciences and humanities.

Each notebook conissts of two types of cell blocks:
1. Mardown cells 
2. Code cells

The markdown cells are used to describe what one is doing in the code cells.

The code cells are used in order to write code and execute it.
The next cell is a code cell with a very simple code:

In [1]:
x=5
y=3

print(x, "times", y, "is", x*y)

5 times 3 is 15


This cell block is a markdown cell. You can double click with your mouse on this cell and you will see the Markdown codes used to write this cell block. All important Markdown commands can be found in this handy Markdown Cheat Sheet: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet.

## Installing the necessary environment to run jupyter notebooks
1. Go to https://www.anaconda.com/distribution/ and download the Phython 3.7 version for your platform (available for Linux, MacOS, Windows).

2. After Anaconda has been installed start the ```Anaconda prompt``` terminal.

3. Once the terminal is available you want to install the TextFabric environment that holds all the biblical data of the ETCBC research group (http://etcbc.nl/). You do so by writing the following command into the terminal:

``` 
pip3 install text-fabric
``` 
or when you are on windows 
```
pip install text-fabric
```

After TF has been installed run the following command to make sure that you have upgraded to the latest TF version:

```
pip3 install --upgrade text-fabric
```
or when you are on windows
```
pip install --upgrade text-fabric
```

## Starting a jupyter notebook
1. Now all the tools are installed for doing our exegetical research. Lets fire up Jupyter Notebooks by executing the command ```jupyter notebook``` in your Anaconda prompt termnal.
2. Check where your default folder is located and copy this notebook into that folder.
3. Start this folder by clicking on the notebook name within your jupyter's folder overview.

## Loading important apps
This very jupyter notebook is now running in your browser!
Lets load some important python apps so that we can start writing our first queries.

Run the code cells below.

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
# First, I have to laod different modules that I use for analyzing the data and for plotting:
import sys, os, collections
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt; plt.rcdefaults()
from matplotlib.pyplot import figure
from collections import Counter

# Second, I have to load the Text Fabric app
from tf.fabric import Fabric
from tf.app import use

# Introduction to ```TF``` Textfabric - the Python way of doing ```SHEBANQ/MQL```

After having loaded the TF app I need to load the Biblia Hebraica Stuttgartensia ```bhsa``` as a dataset that TF can work with.

In [4]:
A = use('bhsa', hoist=globals())

	connecting to online GitHub repo annotation/app-bhsa ... connected
Using TF-app in C:\Users\Oliver Glanz/text-fabric-data/annotation/app-bhsa/code:
	rv1.0=#d3cf8f0c2ab5d690a0fda14ea31c33da5c5c8483 (latest release)
	connecting to online GitHub repo etcbc/bhsa ... connected
Using data in C:\Users\Oliver Glanz/text-fabric-data/etcbc/bhsa/tf/c:
	rv1.6=#bac4a9f5a2bbdede96ba6caea45e762fe88f88c5 (latest release)
	connecting to online GitHub repo etcbc/phono ... connected
Using data in C:\Users\Oliver Glanz/text-fabric-data/etcbc/phono/tf/c:
	r1.2 (latest release)
	connecting to online GitHub repo etcbc/parallels ... connected
Using data in C:\Users\Oliver Glanz/text-fabric-data/etcbc/parallels/tf/c:
	r1.2 (latest release)


# TF-BHS Queries
## Building Simple Queries
### Searching on the WORD-level
#### Simple Word Search
It follows a simple search for the word “Abraham” in the book of Genesis in the chapter 17-22.
Some principles:
1. each query has to start with `Queryname = ```` and needs to end with ` ````.
2. *Indentation* plays an important role! In the following we are searching a word (YHWH) within a specific book (Genesis):
```
book=Genesis
    word=JHWH/
```
3. After the query has been written it needs to be executed with the command `A.search(Queryname)`. Its best to give the results of the query a particular name (e.g. AbrahamSearchFindings).
4. Finally, you want to choose for a proper representation of the data Here you can either choose for `A.show(QueryResults)` or for `A.table(QueryResults)`. Both have their specific advantages.
5. There are several parameters that can be used to specify what one wants to see in the query result showing. These will be discussed in a later cell. For now, we only make use of `start`, `end`, and `condensed`. With `start=1` we define that we want to have the 1st query result being shown to us. With `end=7` we define that the last query result that we want to have shown to us is the 7th query result. With `condensed` we define that we want to see only one verse in the result showing, even if that one verse contains two query results (cf Gen 17:23 for the query below).

In [5]:
AbrahamSearch = '''
verse book=Genesis chapter=17|18|19|10|21|22
    word lex=>BRHM/
'''
AbrahamSearchFindings  = A.search(AbrahamSearch)
A.table(AbrahamSearchFindings, start=1, end=7, condensed=True)

## see SHEBANQ query results here: https://shebanq.ancient-data.org/hebrew/query?version=4b&id=1365 

  0.40s 63 results


n,p,verse,word,Unnamed: 4
1,Genesis 17:5,וְלֹא־יִקָּרֵ֥א עֹ֛וד אֶת־שִׁמְךָ֖ אַבְרָ֑ם וְהָיָ֤ה שִׁמְךָ֙ אַבְרָהָ֔ם כִּ֛י אַב־הֲמֹ֥ון גֹּויִ֖ם נְתַתִּֽיךָ׃,אַבְרָהָ֔ם,
2,Genesis 17:9,וַיֹּ֤אמֶר אֱלֹהִים֙ אֶל־אַבְרָהָ֔ם וְאַתָּ֖ה אֶת־בְּרִיתִ֣י תִשְׁמֹ֑ר אַתָּ֛ה וְזַרְעֲךָ֥ אַֽחֲרֶ֖יךָ לְדֹרֹתָֽם׃,אַבְרָהָ֔ם,
3,Genesis 17:15,וַיֹּ֤אמֶר אֱלֹהִים֙ אֶל־אַבְרָהָ֔ם שָׂרַ֣י אִשְׁתְּךָ֔ לֹא־תִקְרָ֥א אֶת־שְׁמָ֖הּ שָׂרָ֑י כִּ֥י שָׂרָ֖ה שְׁמָֽהּ׃,אַבְרָהָ֔ם,
4,Genesis 17:17,וַיִּפֹּ֧ל אַבְרָהָ֛ם עַל־פָּנָ֖יו וַיִּצְחָ֑ק וַיֹּ֣אמֶר בְּלִבֹּ֗ו הַלְּבֶ֤ן מֵאָֽה־שָׁנָה֙ יִוָּלֵ֔ד וְאִ֨ם־שָׂרָ֔ה הֲבַת־תִּשְׁעִ֥ים שָׁנָ֖ה תֵּלֵֽד׃,אַבְרָהָ֛ם,
5,Genesis 17:18,וַיֹּ֥אמֶר אַבְרָהָ֖ם אֶל־הָֽאֱלֹהִ֑ים ל֥וּ יִשְׁמָעֵ֖אל יִחְיֶ֥ה לְפָנֶֽיךָ׃,אַבְרָהָ֖ם,
6,Genesis 17:22,וַיְכַ֖ל לְדַבֵּ֣ר אִתֹּ֑ו וַיַּ֣עַל אֱלֹהִ֔ים מֵעַ֖ל אַבְרָהָֽם׃,אַבְרָהָֽם׃,
7,Genesis 17:23,וַיִּקַּ֨ח אַבְרָהָ֜ם אֶת־יִשְׁמָעֵ֣אל בְּנֹ֗ו וְאֵ֨ת כָּל־יְלִידֵ֤י בֵיתֹו֙ וְאֵת֙ כָּל־מִקְנַ֣ת כַּסְפֹּ֔ו כָּל־זָכָ֕ר בְּאַנְשֵׁ֖י בֵּ֣ית אַבְרָהָ֑ם וַיָּ֜מָל אֶת־בְּשַׂ֣ר עָרְלָתָ֗ם בְּעֶ֨צֶם֙ הַיֹּ֣ום הַזֶּ֔ה כַּאֲשֶׁ֛ר דִּבֶּ֥ר אִתֹּ֖ו אֱלֹהִֽים׃,אַבְרָהָ֜ם,אַבְרָהָ֑ם


As you can see above, I used the latin transliteration for the word אברהם=>BRHM/.
I prefer this as it allows me type quicker and not have to switch my keyboard back and forward. You can find the transliteration table here: https://annotation.github.io/text-fabric/Writing/Hebrew

You can of course also choose to write in Hebrew script. 

```
AbrahamSearch = '''
verse book=Genesis chapter=17|18|19|10|21|22
    word lex_utf8=אברהם
'''
```

This will render the same results.

With "/" at the end of >BRHM you define that word as a noun.

With "[" at the end of a word you define that word as a verb.

=> When you search for nouns you have to add “/” behind the word: "DBR/" (word)

=> When you search for verbs you have to add “[“ behind the word. "DBR[" (to speak)

If you want to save time with typing Hebrew words or transliterated words, you can just copy/paste them from a concrete text. For example:

![alt text](https://e7kfoa.dm.files.1drv.com/y4pL_0Ui8KINjU_VpveK9fJwcZV7CSujp6bv7th8GjHOK739975f6dVWJU2Gftv0X4uTruoqUTZKM9tCNXsseyn8w6E87WfgiGfsPzHzvZVLloVHfHoKjgQ_dLDML3yLFhzxxWiydh80JJRdYdLs_TN9VbqzoNCWLrFPtCcVG_ijp6hEpkXq9nAWkgD-9kfD4WyJLtX6gYbgVHoz2LaCSS3Pw/Annotation%202019-06-09%20165506.jpg?psid=1 "SHEBANQ Syntax visualiation")

#### Advanced Word Search
Searching “Abraham” in clauses that contain a predicate in qatal(perfect) tense in the book of Genesis.

In [6]:
Abraham2='''
book book=Genesis
    clause
        word vt=perf
        word lex=>BRHM/
'''
Abraham2  = A.search(Abraham2)
##### Codes for verbal tenses (vt)
A full list of verbal tenses can be found here: 
https://etcbc.github.io/bhsa/features/vt/

code|description
---|---
`perf` |perfect
`impf` |imperfect
`wayq` |wayyiqtol
`impv` |imperative
`infa` |infinitive (absolute)
`infc` |infinitive (construct)
`ptca` |participle
`ptcp` |participle (passive)A.show(Abraham2, start=3, end=4, condensed=True)

SyntaxError: invalid syntax (<ipython-input-6-2b3f081bf022>, line 9)

##### Codes for BHS books (books)
A full list of verbal tenses can be found here: 
https://etcbc.github.io/bhsa/features/book/

book | #chapters
---|---
`Genesis`      | 50
`Exodus`       | 40
`Leviticus`    | 27
`Numeri`       | 36
`Deuteronomium`| 34
`Josua`        | 24
`Judices`      | 21
`Samuel_I`     | 31
`Samuel_II`    | 24
`Reges_I`      | 22
`Reges_II`     | 25
`Jesaia`       | 66
`Jeremia`      | 52
`Ezechiel`     | 48
`Hosea`        | 14
`Joel`         |  4
`Amos`         |  9
`Obadia`       |  1
`Jona`         |  4
`Micha`        |  7
`Nahum`        |  3
`Habakuk`      |  3
`Zephania`     |  3
`Haggai`       |  2
`Sacharia`     | 14
`Maleachi`     |  3
`Psalmi`       |150
`Iob`          | 42
`Proverbia`    | 31
`Ruth`         |  4
`Canticum`     |  8
`Ecclesiastes` | 12
`Threni`       |  5
`Esther`       | 10
`Daniel`       | 12
`Esra`         | 10
`Nehemia`      | 13
`Chronica_I`   | 29
`Chronica_II`  | 35  

##### Codes for verbal tenses (vt)
A full list of verbal tenses can be found here: 
https://etcbc.github.io/bhsa/features/vt/

code|description
---|---
`perf` |perfect
`impf` |imperfect
`wayq` |wayyiqtol
`impv` |imperative
`infa` |infinitive (absolute)
`infc` |infinitive (construct)
`ptca` |participle
`ptcp` |participle (passive)

#### Node Relations
As one can see in the results the first case has first Abraham and then the yiqtol tense, while the second case has first the yiqtol tense and then Abraham. 
We can specify the relations between the elements by the following operators.

In [None]:
S.relationsLegend()

Lets now run the same query but define that the qatal (perfect tense) needs to **follow right after** the word Abraham.

In [None]:
Abraham3='''
book book=Genesis
    clause
        word vt=perf
        :> word lex=>BRHM/
'''
Abraham3  = A.search(Abraham3)
A.show(Abraham3, start=1, end=2, condensed=True)

One could of course also write:

In [None]:
Abraham4='''
book book=Genesis
    clause
        word lex=>BRHM/
        <: word vt=perf

'''
Abraham4  = A.search(Abraham4)
A.table(Abraham4, start=1, end=2, condensed=True)

### Searching on the PHRASE-level
#### Simple Phrase search: YHWH as syntactical subject
Lets search for all cases in which YHWH is subject in the Pentateuch.

In [None]:
YHWH1='''
book book=Genesis|Exodus|Leviticus|Numeri|Deuteronomium
    clause
        phrase function=Subj
            word lex=JHWH/
'''
YHWH1  = A.search(YHWH1)
A.table(YHWH1, start=1, end=2, condensed=True)

## SHEBANQ query result: https://shebanq.ancient-data.org/hebrew/query?version=4b&id=1367

We can now see all cases in which YHWH is the subject of a clause. But we might be more interested in what YHWH is actually doing!

#### Advanced Phrase search: YHWH as subject of predicates
So lets search for the predicates of which YHWH is the subject and in which YHWH fills the enitre subject phrase (excluding cases like "YHWH, Elohim" by the relation code ```::```). We use the same `::` in order to make sure that there is only one verb filling the predicate function. 

For visualiation purposes we are going to look for the transliterated characters with `lex` and the un-transliterated Hebrew characters with `lex_utf8`

Finally, in order to see the predicates better, we are going to highlight them with a specified ```colorMap```.

In [None]:
YHWH2='''
book book=Genesis|Exodus|Leviticus|Numeri|Deuteronomium
    clause
        phrase function=Subj
            :: word lex=JHWH/
        phrase function=Pred
            :: word lex_utf8* lex*
'''
YHWH2  = A.search(YHWH2)
A.table(YHWH2, start=1, end=2, condensed=True, colorMap={1: 'magenta', 2: 'blue', 3: 'cyan', 4: 'blue'})

##### Codes for phrase functions (function)
A full list of phrase functions can be found here: 
https://etcbc.github.io/bhsa/features/function/

code|description|examples
---|---|---
`Adju`|Adjunct
`Cmpl`|Complement
`Conj`|Conjunction
`EPPr`|Enclitic personal pronoun                 |!
`ExsS`|Existence with subject suffix
`Exst`|Existence
`Frnt`|Fronted element
`Intj`|Interjection
`IntS`|Interjection with subject suffix
`Loca`|Locative
`Modi`|Modifier
`ModS`|Modifier with subject suffix
`NCop`|Negative copula
`NCoS`|Negative copula with subject suffix
`Nega`|Negation
`Objc`|Object
`PrAd`|Predicative adjunct
`PrcS`|Predicate complement with subject suffix
`PreC`|Predicate complement
`Pred`|Predicate
`PreO`|Predicate with object suffix
`PreS`|Predicate with subject suffix
`PtcO`|Participle with object suffix
`Ques`|Question
`Rela`|Relative
`Subj`|Subject
`Supp`|Supplementary constituent
`Time`|Time reference
`Unkn`|Unknown
`Voct`|Vocative

### Exporting => ```pandas dataframe``` => Plotting
Now, lest unleash the power of phython's data analysis tools, by exporting our TF results with ```A.export``` and load it into a ```pandas``` dataframe. That will allow us to do some powerful data analysis and data visualization!

In [None]:
A.export(YHWH2, toDir='D:/OneDrive/1200_AUS-research/Fabric-TEXT', toFile='YHWH2.tsv')

In [None]:
YHWH2=pd.read_csv('D:/OneDrive/1200_AUS-research/Fabric-TEXT/YHWH2.tsv',delimiter='\t',encoding='utf-16')
pd.set_option('display.max_columns', 50)
YHWH2.head()

The code cell below shows that there are 573 clauses in which we find a verbal predicate with YHWH as subject.

In [None]:
YHWH2.describe()

Its in the last column "lex_utf86" where we can find the verbs used for predicating YHWH. Lets see which verbs are mostly used to describe YHWH's actions with the function `count_values` (this function does the same as `size().sort_values(ascending=False)` would do).

In [None]:
YHWH2["lex_utf86"].value_counts()

There are many verbs used only once as predicate. For our plotting purposes below, lets delete all verbs that appear less frequently than 5 times. We do this by creating a new dataframe "YHWH2reduced" that filters out all the verbs with low frequency by using the `lambda` function.

In [None]:
YHWH2reduced = YHWH2.groupby("lex_utf86").filter(lambda x: len(x) >6)
YHWH2reduced["lex_utf86"].value_counts()

The upper `value_count`shows that we have 10 verbs left. That is an excellent amount for plotting.

#### Bar Plot
Lets graph it out by different plotting options.

Since matplot does not know that Hebrew is written from Right-to-Left (it would display the Hebrew words with its consonants being written from Left-to-Right!) we are going to use the `lex` feature instead of the `lex_utf8` feature.

In [None]:
figure(num=None, figsize=(5, 5), dpi=80, facecolor='w', edgecolor='k')
YHWH2reduced.groupby("lex6").size().sort_values(ascending=True).plot.barh()
plt.xlabel('occurence of verb')
plt.ylabel('lexemes used as predicates for the subject "YHWH"')
plt.title('YHWH and his acting')
plt.show()

#### Pie Chart

In [None]:
YHWH2reduced.lex6.value_counts(sort=False).plot.pie(autopct='%1.0f%%', shadow=True, startangle=140)
plt.show()

#### Scatter Plot

In [None]:
fig, ax = plt.subplots()
fig.set_size_inches(5, 5)

for S1, df in YHWH2reduced.groupby('lex6'):
    ax.scatter(x="S1", y="lex6", data=df, label=S1)

ax.set_xlabel("predicates used by YHWH as subject")
ax.set_ylabel("book")
ax.legend();

#### Seaborn Implot (sophisticated scatter plot)

In [None]:
sns.lmplot(x="S1", y="R", data=YHWH2reduced, hue='lex6', height=20, aspect=1/3, fit_reg=False, scatter_kws={"s": 200})
ax = plt.gca()
ax.set_ylabel('Number of occurence of ImpChainType')
ax.set_xlabel('OT books')

### Searching on the CLAUSE-level
#### Simple sentence search
Lets search for a sentence that contains two clauses in Genesis 20.

In [None]:
ClausesinSentence1='''
chapter book=Genesis chapter=20
    sentence
        clause
        <: clause
'''
ClausesinSentence1 = A.search(ClausesinSentence1)
A.show(ClausesinSentence1, start=1, end=2, condensed=True, colorMap={1: 'yellow', 2: 'yellow', 3: 'cyan', 4: 'magenta'})

## SHEBANQ query result: https://shebanq.ancient-data.org/hebrew/query?version=4b&id=1369

#### Advanced sentence search
Now we are searching for a sentence that contains an independent clause and a dependent attributive/relative clause in Gen 20.

In [None]:
ClausesinSentence2='''
chapter book=Genesis chapter=20
    sentence
        clause
        <: clause rela=Attr
'''
ClausesinSentence2 = A.search(ClausesinSentence2)
A.show(ClausesinSentence2, start=1, end=2, condensed=True, colorMap={1: 'yellow', 2: 'yellow', 3: 'cyan', 4: 'magenta'})

## SHEBANQ query result: https://shebanq.ancient-data.org/hebrew/query?version=4b&id=1370 

##### Codes for clause relations (rela)
A full list of clause relations can be found here: 
https://etcbc.github.io/bhsa/features/rela/

code|description
---|---
`Adju`|Adjunctive clause
`Attr`|Attributive clause
`Cmpl`|Complement clause
`Coor`|Coordinated clause
`Objc`|Object clause
`PrAd`|Predicative adjunct clause
`PreC`|Predicative complement clause
`ReVo`|Referral to the vocative
`Resu`|Resumptive clause
`RgRc`|Regens/rectum connection
`Spec`|Specification clause
`Subj`|Subject clause

# What codes? What abbreviations?
As the above examples have shown, one cannot run any queries succesfully without knowing 2 things:
1. What is the database model (what are clauses, phrases, words, etc.)?
2. What are the codes used for signifying tenses, stems, part of speech, etc.?

While we will tackle the the first question in our next session the second question can be answered in the ETBCB's online version of its features doc: https://etcbc.github.io/bhsa/features/0_home/

# In Class Tasks
For all assignments listed below I advise you to use https://shebanq.ancient-data.org/hebrew/text to inspect the text, copy/paste codes and words into your TF search cells.
1. Find all cases in which David appears in 2 Samuel 1-5.
2. Find all cases in which David appears as subject in 2 Samuel 1-5.
4. Find all cases in which David does something to somebody/something, i.e. you need to find the object phrase within a clause in which David is subject and does (predicate) something.