# Working with Jupyter Notebooks
In this notebook we are going to work with Jupyter Notebooks. We will:
1. Load different TF databases.
2. Do some simple word-based queries.
3. Export query results to TSV files.

# Starting a Jupyter Notebook
There are different ways to open notebooks within jupyter:
1. You can either open the terminal/prompt and type

>```jupyter notebook```


    This will open the the jupyter environment and a new notebook in your default webbrowser.
2. Or you open the Anaconda Navigator and click on *Launch* in the juypter notebook box:

    ![image.png](attachment:1ace2f6a-7a33-468e-9310-438e70a3eea6.png)

Alternatively you can choose to work with notebooks within the `jupyter lab` environment. This can be started on the console/terminal/prompt with the command

>```jupyter lab```

`Jupyter Lab` can also be launched via the Anaconda Navigator.

If you have a dedicated folder within which you want to work. You should first `cd` into that folder in your terminal and then launch the jupyter command. In my case this looks likes this:
>```cd D:\OneDrive\1200_Research\Fabric-Text```

>```jupyter lab```

Once you have executed the command or clicked the launch button the jupyter environment will open in your webbrowser. Open a new notebook and copy/paste the code cells from this notebook into your own notebook or download this notebook to your machine and open it as a jupyter notebook.

## Getting the TF workbench ready
The first thing we need to do in our jupyter notebook is to
1. load the TF program
2. load the TF database

In [1]:
# First we load the TF program
from tf.fabric import Fabric
from tf.app import use

In [2]:
# Now we load the TF bhsa database
BHS = use('bhsa', hoist=globals())

In [3]:
# Now we load the TF tisch database
NT = use('tisch', hoist=globals())

We have now both the BHS as well as the NT Tischendorf text loaded and are ready to do some simple querying.

# Some simple word based queries...

## Lemma Searches
Lets search in the BHS the word Abram (>BRM/) and Abraham (>BRHM/) by typing

```word lex=>BRHM/|>BRM/```

The `|` stands for the AND-OR. Thus, we are searching for Abraham **and/or** Abram.

The feature **lex** of the object type **word** allows us to write Hebrew words in transliteration. In the next workshops we will get to know many more features and get introduced into the database.

In [4]:
# Searching for "Abram" and "Abraham" in the OT
BHSAbrahamSearch = '''
word lex=>BRHM/|>BRM/ 
'''
BHSAbrahamSearch  = BHS.search(BHSAbrahamSearch)
BHS.show(BHSAbrahamSearch, start=1, end=10, condensed=True)

  0.50s 236 results


While Abraham and/or Abram appear 236 times in the OT we want to know how often Abraham appears in the NT?

In [5]:
# Searching for "Abraham" in the NT
NTAbrahamSearch = '''
word anlex_lem=Ἀβραάμ
'''
NTAbrahamSearch  = NT.search(NTAbrahamSearch)
NT.table(NTAbrahamSearch, start=1, end=10, condensed=False)

  0.13s 73 results


n,p,word
1,Matthew 1:1,Ἀβραάμ.
2,Matthew 1:2,Ἀβραὰμ
3,Matthew 1:17,Ἀβραὰμ
4,Matthew 3:9,"Ἀβραάμ,"
5,Matthew 3:9,Ἀβραάμ.
6,Matthew 8:11,Ἀβραὰμ
7,Matthew 22:32,Ἀβραὰμ
8,Mark 12:26,Ἀβραὰμ
9,Luke 1:55,Ἀβραὰμ
10,Luke 1:73,Ἀβραὰμ


Abraham appears a total of 73x in the NT. If you want to search lemmas without havng to type Greek script (and thus switching your keyboard), make sure that you watch this video: https://youtu.be/9V1zRampYjc . All the additional tf files are to be found here: https://github.com/oliverglanz/Tischendorf-Morphology-tf-.

Assuming that you have watched the video and imported the additional tf files into your TF folder. You can now write the same Abraham search in transliteration by using the feature `lex_og`:

In [6]:
# Searching for "Abraham" in the NT with transliteration
NTAbrahamSearchLex = '''
word lex_og=Abraam anlex_lem*
'''
NTAbrahamSearchLex  = NT.search(NTAbrahamSearchLex)
NT.show(NTAbrahamSearchLex, start=1, end=2, condensed=False)

  0.27s 73 results


Another example in which we search for  "θεός", " Ἰησοῦς", the personal pronoun "ἐγώ", and the verb "εἰμί" appearing within one verse within the Tischendorf corpus. Instead of using the Greek script, this query uses its Latin transliteration.

In [7]:
translit = '''
verse
    word lex_og=Iesous
    word lex_og=theos
    word lex_og=eimi
    word lex_og=ego
'''
translit  = NT.search(translit)
NT.table(translit, start=1, end=7, extraFeatures={'anlex_lem'}, condensed=True)

  0.53s 72 results


n,p,verse,word,word.1,word.2,word.3,word.4,Unnamed: 8,Unnamed: 9
1,Matthew 26:63,,θεοῦ.,Ἰησοῦς,θεοῦ,ἡμῖν,εἶ,,
2,Matthew 27:46,,ἔστιν·,θεέ,μου,θεέ,"μου,",με,Ἰησοῦς
3,Mark 1:24,,ἡμῖν,Ἰησοῦ,ἡμᾶς·,"εἶ,",θεοῦ.,,
4,Mark 10:14,,ἐστὶν,"με,",θεοῦ.,Ἰησοῦς,,,
5,Mark 12:29,,Ἰησοῦς,ἐστίν·,θεὸς,ἡμῶν,"ἐστιν,",,
6,Mark 15:34,,Ἰησοῦς,ἐστιν,θεός,μου,θεός,"μου,",με;
7,Luke 4:34,,ἡμᾶς;,"εἶ,",θεοῦ.,ἡμῖν,Ἰησοῦ,,


## Morphology Searches
Lets search for 1sg Futurum-I of εἰμί.

In [8]:
eimi = '''
word ps=p1 nu=sg vt=future-I lex_og=eimi
'''
eimi  = NT.search(eimi)
NT.show(eimi, start=1, end=2, condensed=True, extraFeatures={'anlex_lem', 'gloss'})

  0.12s 13 results


Lets look up a variety of morphological features in John 1:1:

In [9]:
# Searching for "Abraham" in the NT
ShowMorph = '''
book book=John
 chapter chapter=1
  verse verse=1
   word case* gn* mood* nountype* nu* ps* sp* voice* vt* anlex_lem*
'''
ShowMorph  = NT.search(ShowMorph)
NT.show(ShowMorph, start=1, end=7, condensed=True)

  1.29s 17 results


## Word-Frequency Searches
Let us see how often each word in Matthew 1:1-2 appears in the entire NT. For this we use in the `show` function the `extraFeatures` option. We want to show both the Greek lexemes (**anlex_lem**) as well as the frequecy of distribution (**freq_lex_og**) by writing:
```python
extraFeatures={'anlex_lem', 'freq_lex_og'}
```

In [10]:
# Checking Frequency Counts
freqcount = '''
book book=Matthew
    chapter chapter=1
        verse verse=1|2
            word
'''
freqcount  = NT.search(freqcount)
NT.show(freqcount, start=1, end=1, extraFeatures={'anlex_lem', 'freq_lex_og'}, condensed=True)

  0.16s 26 results


Do you see the difference between `condensed=False` and `condensed=True`?

Imagine your Greek class know all the Greek vocab down to a frequency of 10. Next week you want to translate with them John 3. Thus, they have to prepare by learning all words that appear with a frequency of <10. You want to create a vocab list for them. We have to start with identifying the words first by using the feature `freq_lex_new`:

In [11]:
# Searching for all words that appear leass than 10 times in John 3
NTfreq = '''
book book=John
    chapter chapter=3
        word freq_lex_new<10 gloss* anlex_lem* lex_og*

'''
NTfreq  = NT.search(NTfreq)
NT.table(NTfreq, start=1, end=15, extraFeatures={'anlex_lem','freq_lex', 'freq_lex_og', 'freq_lex_new', 'gloss'}, condensed=False)

  0.26s 15 results


n,p,book,chapter,word
1,John 3:1,John,John 3,Νικόδημος
2,John 3:4,John,John 3,Νικόδημος·
3,John 3:4,John,John 3,γέρων
4,John 3:8,John,John 3,"πνεῖ,"
5,John 3:9,John,John 3,Νικόδημος
6,John 3:12,John,John 3,ἐπίγεια
7,John 3:16,John,John 3,μονογενῆ
8,John 3:18,John,John 3,μονογενοῦς
9,John 3:20,John,John 3,φαῦλα
10,John 3:23,John,John 3,Αἰνὼν


There is a total of 15 words in John 3 that appear less frequent than 10x. In order to produce a Vocab List we have to export our search results. How to do this, we will learn in one of our next notebooks... ;-)

# Query-result-export as a preparation for Data Mining
We have to export our query results into TSV files to that we can do some further data analysis. The fille paths will have to be written differently, depending on whether you are working in a Windows or MacOS environment. 

In a **Windows** environment your patch would look something like this:
```python
D:/OneDrive/1200_AUS-research/Fabric-TEXT
```

A TF export command could look like this:
```python
BHS.export(BHSAbrahamSearch, toDir='D:/OneDrive/1200_AUS-research/Fabric-TEXT', toFile='BHSAbrahamSearch.tsv')
```

In a **MacOS** environment your path would look something like this:
```python
/Users/glanz/OneDrive/1200_AUS-research/Fabric-TEXT
```

A TF export command could look like this:
```python
BHS.export(BHSAbrahamSearch, toDir='/Users/glanz/OneDrive/1200_AUS-research/Fabric-TEXT', toFile='BHSAbrahamSearch.tsv')
```

## Query Export
Lets export our query results of our Abraham queries for both OT and NT:

In [12]:
#Command for Windows environment:
BHS.export(BHSAbrahamSearch, toDir='D:/OneDrive/1200_AUS-research/Fabric-TEXT', toFile='BHSAbrahamSearch.tsv')

In [13]:
#Command for MacOS environment:
#BHS.export(BHSAbrahamSearch, toDir='/Users/glanz/OneDrive/1200_AUS-research/Fabric-TEXT', toFile='BHSAbrahamSearch.tsv')

In [14]:
#Command for Windows environment:
NT.export(NTAbrahamSearchLex, toDir='D:/OneDrive/1200_AUS-research/Fabric-TEXT', toFile='NTAbrahamSearchLex.tsv')

In [15]:
#Command for MacOS environment:
#NT.export(NTAbrahamSearch, toDir='/Users/glanz/OneDrive/1200_AUS-research/Fabric-TEXT', toFile='NTAbrahamSearch.tsv')

## Loading Data Analysis Tools
Lets now read the TSV files and do some further analysis of them. To enable data analysis functions we need to load some further python modules:

In [16]:
import sys, os, collections
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt; plt.rcdefaults()
from matplotlib.pyplot import figure
from collections import Counter

## Creating Dataframe of exported Query Results

Now we can load our TSV files as pandas dataframes

In [17]:
#Command for Windows environment:

BHSAbrahamSearch=pd.read_csv('D:/OneDrive/1200_AUS-research/Fabric-TEXT/BHSAbrahamSearch.tsv',delimiter='\t',encoding='utf-16')
pd.set_option('display.max_columns', 50)
BHSAbrahamSearch.head()

Unnamed: 0,R,S1,S2,S3,NODE1,TYPE1,TEXT1,lex1
0,1,Genesis,11,26,5366,word,אַבְרָ֔ם,>BRM/
1,2,Genesis,11,27,5379,word,אַבְרָ֔ם,>BRM/
2,3,Genesis,11,29,5405,word,אַבְרָ֧ם,>BRM/
3,4,Genesis,11,29,5412,word,אַבְרָם֙,>BRM/
4,5,Genesis,11,31,5437,word,אַבְרָ֣ם,>BRM/


In [18]:
#Command for MacOS environment:

#BHSAbrahamSearch=pd.read_csv('/Users/glanz/OneDrive/1200_AUS-research/Fabric-TEXT/BHSAbrahamSearch.tsv',delimiter='\t',encoding='utf-16')
#pd.set_option('display.max_columns', 50)
#BHSAbrahamSearch.head()

In [19]:
#Command for Windows environment:

NTAbrahamSearchLex=pd.read_csv('D:/OneDrive/1200_AUS-research/Fabric-TEXT/NTAbrahamSearchLex.tsv',delimiter='\t',encoding='utf-16')
pd.set_option('display.max_columns', 50)
NTAbrahamSearchLex.head()

Unnamed: 0,R,S1,S2,S3,NODE1,TYPE1,TEXT1,book1
0,1,Matthew,1,1,8,word,Ἀβραάμ.,
1,2,Matthew,1,2,9,word,Ἀβραὰμ,
2,3,Matthew,1,17,252,word,Ἀβραὰμ,
3,4,Matthew,3,9,1032,word,"Ἀβραάμ,",
4,5,Matthew,3,9,1047,word,Ἀβραάμ.,


In [20]:
#Command for MacOS environment:
#NTAbrahamSearchLex=pd.read_csv('/Users/glanz/OneDrive/1200_AUS-research/Fabric-TEXT/NTAbrahamSearchLex.tsv',delimiter='\t',encoding='utf-16')
#pd.set_option('display.max_columns', 50)
#NTAbrahamSearchLex.head()

# Assignments
1. Think of two words that you would like to search in both the BHS and the NT(Tischendorf). For example יהוה/κύριος and אֱלֹהִים/θεός. Since you do not yet know how to write your word in a TF accaptable way you can simple look up the book/chapter/verse where they appear. In BHS this would look like:

```
book book=yourbook
 chapter chapter=yourchapter
  verse verse=yourverse
   word lex
```

    For Tischendorf this would look like:
```
book book=yourbook
 chapter chapter=yourchapter
  verse verse=yourverse
   word lex_og
```
    In both cases run the `show` command instead of the `table` command. This will enable you to copy paste the correct writing into your word search query.

2. Search for your words in both the BHS and the NT.
3. Export your TF queries with the Export function so that we can do some data-mining on it next time.

# Whats Next?: Complex Query building
1. We will learn how to do some Data-Mining in pandas.
2. We will learn how to visualize your data.