**Always good to do:**

Once in a while run in your prompt/command line/terminal:


```pip install --upgrade text-fabric```

This will update your TextFabric to the latest version.

# Building a VocabList
In this notebook we are going to build a VocabList that is sensitive to word frequencies. The following steps are involved:
1. Run a TF query.
2. Export the TF query.
3. Import the TF query into a `pandas` dataframe.
4. Substract the data we want for our VocabList.
5. Export our selected data as a spreadsheet.
6. Create a PDF from your spreadsheet so that it can be printed of and distributed among students.


**Prerequisites**: <br>
All extra TF features of the ```tisch``` app and the ```bhsa``` need to be loaded:

|tisch|bhsa|
|:---:|:---:|
|![Screenshot%202020-09-29%20110133.png](attachment:Screenshot%202020-09-29%20110133.png)|![Screenshot%202020-09-29%20110431.png](attachment:Screenshot%202020-09-29%20110431.png)|

# Getting the Data-Analysis workbench ready

## Loading Data-Analysis tools

## Getting the TF workbench ready
The first thing we need to do in our jupyter notebook is to
1. load the TF program
2. load the TF database

In [1]:
# First we load the TF program
from tf.fabric import Fabric
from tf.app import use

In [2]:
# Now we load the TF bhsa database
OT = use('bhsa', version='2017', mod='CenterBLC/BHSaddons/tf')

This is Text-Fabric 9.1.1
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

127 features found and 0 ignored


In [None]:
# Now we load the TF tisch database
NT = use('tisch', hoist=globals())

## Loading data analysis tools

In [6]:
import sys, os, collections
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt; plt.rcdefaults()
from matplotlib.pyplot import figure
from collections import Counter

# Creating a Greek Vocablist for students
We will run the same query on John 3 that we ran in our last Notebook and will use it to produce a VocabList for students.

## Searching for all words that appear leass than 10 times in John 3

In [None]:
VocabListJohn3 = '''
book book=John
    chapter chapter=3
        word freq_lex_og<10 gloss* anlex_lem* lex_og* lex_abc*

'''
VocabListJohn3  = NT.search(VocabListJohn3)
NT.table(VocabListJohn3, start=1, end=15, extraFeatures={'anlex_lem','freq_lex', 'freq_lex_og', 'lex_abc', 'gloss'}, condensed=False)

There is a total of 15 words in John 3 that appear less frequent than 10x. In order to produce a Vocab List we have to export our search results. How to do this, we will learn in one of our next notebooks... ;-)

## Creating a Vocab List

In [None]:
NT.export(VocabListJohn3, toDir='D:/OneDrive/1200_AUS-research/Fabric-TEXT', toFile='VocabListJohn3.tsv')

In [None]:
VocabListJohn3=pd.read_csv('D:/OneDrive/1200_AUS-research/Fabric-TEXT/VocabListJohn3.tsv',delimiter='\t',encoding='utf-16')
pd.set_option('display.max_columns', 50)
VocabListJohn3

In [None]:
VocabListJohn3 = VocabListJohn3[['anlex_lem3', 'lex_og3', 'gloss3', 'freq_lex_og3', 'lex_abc3']]
VocabListJohn3

In [None]:
VocabListJohn3Final = VocabListJohn3.drop_duplicates().sort_values(by='lex_abc3', ascending=[True])
VocabListJohn3Final

## Exporting our List and Post-Production
Now we can export this result into spreadsheet file and print it of for our students.

In [None]:
VocabListJohn3Final.to_excel('d:/OneDrive/1200_AUS-research/Fabric-TEXT/TF-tutorial/VocabListJohn3Final.xlsx')

![Untitled1.png](https://jiu5dq.dm.files.1drv.com/y4mUdVTya7tGSDv7NPGazTQsD-15SzEXQoENwsgYvx388TnTa7qjB3Tqh0Vt6Q0T46GD8Kp1rsTBLeU-wkdIvxzIXhM6vkgCIAc503wajiCsyfDGQSnEbCh8Q5ZUAUc5BHleJEHiPw__jxAXXMvcAvzH2vjvsxQp3WXm4MIeYFKT9XMfB5qijG8L5C3I09EMqGI09EYQg0INmKHHJ8U8pSvqWPk__Fwstw3wZkp0-04EpQ/greek%20vocab.png?psid=1)

# Creating a Hebrew Vocablist for students

## Searching for all words that appear less than 100 times in Esther 6

In [10]:
VocabListEsther6 = '''
book book=Esther
    chapter chapter=6
        word freq_lex<100 freq_occ* gloss* dict_bol_HebArm* dict_bol_EN* dict_bol_abc*
'''
VocabListEsther6  = OT.search(VocabListEsther6)
OT.table(VocabListEsther6, start=1, end=15, extraFeatures={'freq_lex', 'dict_bol_HebArm', 'dict_bol_EN', 'dict_bol_abc','gloss'}, condensed=False)

  4.69s 74 results


n,p,book,chapter,word
1,Esther 6:1,Esther,Esther 6,נָדְדָ֖ה
2,Esther 6:1,Esther,Esther 6,שְׁנַ֣ת
3,Esther 6:1,Esther,Esther 6,זִּכְרֹנֹות֙
4,Esther 6:2,Esther,Esther 6,מָרְדֳּכַ֜י
5,Esther 6:2,Esther,Esther 6,בִּגְתָ֣נָא
6,Esther 6:2,Esther,Esther 6,תֶ֗רֶשׁ
7,Esther 6:2,Esther,Esther 6,סָרִיסֵ֣י
8,Esther 6:2,Esther,Esther 6,סַּ֑ף
9,Esther 6:2,Esther,Esther 6,אֲחַשְׁוֵרֹֽושׁ׃
10,Esther 6:3,Esther,Esther 6,יְקָ֧ר


### Creating a Vocab List

In [11]:
OT.export(VocabListEsther6, toDir='D:/OneDrive/1200_AUS-research/Fabric-TEXT', toFile='VocabListEsther6.tsv')

In [12]:
VocabListEsther6=pd.read_csv('D:/OneDrive/1200_AUS-research/Fabric-TEXT/VocabListEsther6.tsv',delimiter='\t',encoding='utf-16')
pd.set_option('display.max_columns', 50)
VocabListEsther6

Unnamed: 0,R,S1,S2,S3,NODE1,TYPE1,book1,NODE2,TYPE2,chapter2,NODE3,TYPE3,TEXT3,dict_bol_EN3,dict_bol_HebArm3,dict_bol_abc3,freq_lex3,freq_occ3,gloss3
0,1,Esther,6,1,426618,book,Esther,427448,chapter,6,367938,word,נָדְדָ֖ה,"qal: run away, flee; wander; flutter (wings); ...",נדד,4947,27,2,
1,2,Esther,6,1,426618,book,Esther,427448,chapter,6,367939,word,שְׁנַ֣ת,sleep,שֵׁנָה II,8003,23,125,
2,3,Esther,6,1,426618,book,Esther,427448,chapter,6,367949,word,זִּכְרֹנֹות֙,reminder,זִכָּרֹון,2044,24,1,
3,4,Esther,6,2,426618,book,Esther,427448,chapter,6,367965,word,מָרְדֳּכַ֜י,Mordecai,מָרְדְּכַי,4655,60,60,
4,5,Esther,6,2,426618,book,Esther,427448,chapter,6,367967,word,בִּגְתָ֣נָא,Bigthana,בִּגְתָן,860,2,1,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69,70,Esther,6,14,426618,book,Esther,427448,chapter,6,368315,word,סָרִיסֵ֥י,"court-official, eunuch",סָרִיס,5493,45,5,
70,71,Esther,6,14,426618,book,Esther,427448,chapter,6,368320,word,יַּבְהִ֨לוּ֙,ni: be terrified; pi: terrify; hasten; be in h...,בהל,881,39,3,
71,72,Esther,6,14,426618,book,Esther,427448,chapter,6,368324,word,הָמָ֔ן,Haman,הָמָן II,1901,54,54,
72,73,Esther,6,14,426618,book,Esther,427448,chapter,6,368327,word,מִּשְׁתֶּ֖ה,(drinking-) feast,מִשְׁתֶּה,4843,46,40,


In [9]:
VocabListEsther6 = VocabListEsther6[['dict_bol_HebArm3', 'dict_bol_EN3', 'freq_lex3', 'dict_bol_abc3']]
VocabListEsther6.head(15)

Unnamed: 0,dict_bol_HebArm3,dict_bol_EN3,freq_lex3,dict_bol_abc3
0,נדד,"qal: run away, flee; wander; flutter (wings); ...",27,4947
1,שֵׁנָה II,sleep,23,8003
2,זִכָּרֹון,reminder,24,2044
3,מָרְדְּכַי,Mordecai,60,4655
4,בִּגְתָן,Bigthana,2,860
5,תֶּרֶשׁ,Teresh,2,8511
6,סָרִיס,"court-official, eunuch",45,5493
7,סַף II,threshold,25,5452
8,אֲחַשְׁוֵרֹושׁ,Xerxes,31,311
9,יְקָר II,"preciousness; honoring, esteeming",17,3249


In [None]:
VocabListEsther6Final = VocabListEsther6.drop_duplicates().sort_values('dict_bol_abc3', ascending=[True])
VocabListEsther6Final

### Exporting our List and Post-Production
Now we can export this result into spreadsheet file and print it of for our students.

In [None]:
VocabListEsther6Final.to_excel('d:/OneDrive/1200_AUS-research/Fabric-TEXT/TF-tutorial/VocabListEsther6Final.xlsx')

![Untitled2.png](https://vq9gsa.dm.files.1drv.com/y4mLQGo-miTACRFWwdyTMZ-MKkCm8WE1cCEnl-cSSim6th2db2mS3HriZM34i8yBzGWT_hAbsU3L_qvLjbocB_O-vpquJZoNSwKKKWwk5IppKZQihowvPrpCpKe0j8xs32-YRhTpEcbzUs02BoMd3Yk0xllYEDV5pU9N_1uf_Mvic0p9RFEk-QajjMiBJrEOENnHIo7LgBoWr7_HdBzn-LQ-0s8Vz4HNLkNj9-_0oQWvU4/heb%20vocab.png?psid=1)

## Building a vocab list for the book of Jona

In [4]:
VocabListJona = '''
book book=Jona
        word freq_lex* dict_bol_HebArm* dict_bol_EN* dict_bol_abc* lex*
'''
VocabListJona  = OT.search(VocabListJona)
OT.table(VocabListJona, start=1, end=15, extraFeatures={'freq_lex', 'dict_bol_HebArm', 'dict_bol_EN', 'dict_bol_abc','gloss'}, condensed=False)

  2.81s 985 results


n,p,book,word
1,Jonah 1:1,Jonah,וַֽ
2,Jonah 1:1,Jonah,יְהִי֙
3,Jonah 1:1,Jonah,דְּבַר־
4,Jonah 1:1,Jonah,יְהוָ֔ה
5,Jonah 1:1,Jonah,אֶל־
6,Jonah 1:1,Jonah,יֹונָ֥ה
7,Jonah 1:1,Jonah,בֶן־
8,Jonah 1:1,Jonah,אֲמִתַּ֖י
9,Jonah 1:1,Jonah,לֵ
10,Jonah 1:1,Jonah,אמֹֽר׃


In [5]:
OT.export(VocabListJona, toDir='D:/OneDrive/1200_AUS-research/Fabric-TEXT', toFile='VocabListJona.tsv')

In [8]:
VocabListJona=pd.read_csv('D:/OneDrive/1200_AUS-research/Fabric-TEXT/VocabListJona.tsv',delimiter='\t',encoding='utf-16')
pd.set_option('display.max_columns', 50)
VocabListJona.head(25)

Unnamed: 0,R,S1,S2,S3,NODE1,TYPE1,book1,NODE2,TYPE2,TEXT2,dict_bol_EN2,dict_bol_HebArm2,dict_bol_abc2,freq_lex2,lex2
0,1,Jonah,1,1,426603,book,Jona,298555,word,וַֽ,"and; also, even (conj); but",וְ,1953,50272,W
1,2,Jonah,1,1,426603,book,Jona,298556,word,יְהִי֙,"qal: be, happen, become, occur; ni: be realize...",היה,1865,3561,HJH[
2,3,Jonah,1,1,426603,book,Jona,298557,word,דְּבַר־,"word; thing, matter; deed",דָּבָר I,1616,1441,DBR/
3,4,Jonah,1,1,426603,book,Jona,298558,word,יְהוָ֔ה,"YHWH, LORD",יְהוָה,2965,6828,JHWH/
4,5,Jonah,1,1,426603,book,Jona,298559,word,אֶל־,"toward (prep), unto; towards",אֶל I,392,5517,>L
5,6,Jonah,1,1,426603,book,Jona,298560,word,יֹונָ֥ה,Jonah,יֹונָה II,3016,19,JWNH=/
6,7,Jonah,1,1,426603,book,Jona,298561,word,בֶן־,son,בֵּן I,1066,4937,BN/
7,8,Jonah,1,1,426603,book,Jona,298562,word,אֲמִתַּ֖י,Amittai,אֲמִתַּי,558,2,>MTJ/
8,9,Jonah,1,1,426603,book,Jona,298563,word,לֵ,"to, toward (prep); Do, Yes, (voc); in regard t...",לְ,3682,20069,L
9,10,Jonah,1,1,426603,book,Jona,298564,word,אמֹֽר׃,"qal: say, think; ni: be said, be called; hi: d...",אמר I,548,5307,>MR[


### Lets get the information we want

In [46]:
VocabListJona = VocabListJona[['S1','S2','S3', 'dict_bol_HebArm2', 'dict_bol_EN2', 'dict_bol_abc2', 'freq_lex2',]]
VocabListJona

Unnamed: 0,S1,S2,S3,dict_bol_HebArm2,dict_bol_EN2,dict_bol_abc2,freq_lex2
0,Jonah,1,1,וְ,"and; also, even (conj); but",1953,50272
1,Jonah,1,1,היה,"qal: be, happen, become, occur; ni: be realize...",1865,3561
2,Jonah,1,1,דָּבָר I,"word; thing, matter; deed",1616,1441
3,Jonah,1,1,יְהוָה,"YHWH, LORD",2965,6828
4,Jonah,1,1,אֶל I,"toward (prep), unto; towards",392,5517
...,...,...,...,...,...,...,...
980,Jonah,4,11,לְ,"to, toward (prep); Do, Yes, (voc); in regard t...",3682,20069
981,Jonah,4,11,שְׂמֹאל,left (side); left hand,7907,54
982,Jonah,4,11,וְ,"and; also, even (conj); but",1953,50272
983,Jonah,4,11,בְּהֵמָה,animals; cattle,883,190


### Lets rename columns

In [47]:
VocabListJona.rename(columns={'S1':'book','S2':'chapter','S3':'verse', 'dict_bol_HebArm2':'dictionary form', 'dict_bol_EN2':'English translation', 'dict_bol_abc2':'alphabetic order', 'freq_lex2':'word frequency',}, inplace=True)
VocabListJona

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


Unnamed: 0,book,chapter,verse,dictionary form,English translation,alphabetic order,word frequency
0,Jonah,1,1,וְ,"and; also, even (conj); but",1953,50272
1,Jonah,1,1,היה,"qal: be, happen, become, occur; ni: be realize...",1865,3561
2,Jonah,1,1,דָּבָר I,"word; thing, matter; deed",1616,1441
3,Jonah,1,1,יְהוָה,"YHWH, LORD",2965,6828
4,Jonah,1,1,אֶל I,"toward (prep), unto; towards",392,5517
...,...,...,...,...,...,...,...
980,Jonah,4,11,לְ,"to, toward (prep); Do, Yes, (voc); in regard t...",3682,20069
981,Jonah,4,11,שְׂמֹאל,left (side); left hand,7907,54
982,Jonah,4,11,וְ,"and; also, even (conj); but",1953,50272
983,Jonah,4,11,בְּהֵמָה,animals; cattle,883,190


### Dropping duplicates
Now we need to make sure that all duplicate entries are being removed. We do not want a student to learn W four times ;-)

https://www.interviewqs.com/ddi-code-snippets/drop-duplicate-rows-pandas
https://www.journaldev.com/33488/pandas-drop-duplicate-rows-drop_duplicates-function

So we want to delete all rows in which a lexeme appears that has already appeared earlier. In addition, we want to always make sure that the first appearance of a lexeme is kept!:

```python
VocabListJona.drop_duplicates(subset='dictionary form', keep="first")

```




In [49]:
VocabListJona = VocabListJona.drop_duplicates(subset=['dictionary form'], keep="first")
VocabListJona

Unnamed: 0,book,chapter,verse,dictionary form,English translation,alphabetic order,word frequency
0,Jonah,1,1,וְ,"and; also, even (conj); but",1953,50272
1,Jonah,1,1,היה,"qal: be, happen, become, occur; ni: be realize...",1865,3561
2,Jonah,1,1,דָּבָר I,"word; thing, matter; deed",1616,1441
3,Jonah,1,1,יְהוָה,"YHWH, LORD",2965,6828
4,Jonah,1,1,אֶל I,"toward (prep), unto; towards",392,5517
...,...,...,...,...,...,...,...
972,Jonah,4,11,עֶשְׂרֵה I,ten,6119,134
973,Jonah,4,11,רִבֹּוא,"ten thousand, countless",7132,10
978,Jonah,4,11,בַּיִן,between (prep); interval (n),950,407
979,Jonah,4,11,יָמִין I,"south; right hand, right side",3115,139


### Export!
Now we have what we want and can export the files.

In [50]:
VocabListJona.to_excel('d:/OneDrive/1200_AUS-research/Fabric-TEXT/TF-tutorial/VocabListJona.xlsx')

# Assignments
Create your own Vocab List on a passage you select.

# Whats Next?: Complex Query building
1. We will learn the basic architecture of the BHS and the Tischendorf database.
2. Understanding the database better will allow us to build sophisticated queries, including syntax queries...