In [2]:
%load_ext autoreload
%autoreload 2

# Lowfat to TF - version 0.5.0 (updated version of Dirk Roorda's code v. 0.4.0)

We use the machinery of Text-Fabric combined with some custom code to convert
the lowfat XML of the Greek New Testament into TF.

# Set up

We gather all prerequisites.

In [3]:
from tf.convert.xml import XML
from lowfat import convertTaskCustom
from tf.advanced.helpers import dm
from tf.app import use

The custom code is in `lowfat.py`, here in this directory.

It consists of two functions that replace default functions in
[xmlCustom](https://annotation.github.io/text-fabric/tf/convert/xmlCustom.html),
which is part of TF.

So you only have to focus on the bits that actually touch the lowfat XML.

We pass the function `convertCustomTask()`, defined in `lowfat.py`, to the XML converter.

We also specify the way we want to see some attributes in the report files:

* keyword attributes: we want to see an inventory of all words that occur in such attributes
* trim attributes: we do not want to see the values of these attributes

In [5]:
keywordAtts = set(
    """
    case
    class
    number
    gender
    mood
    person
    role
    tense
    type
    voice
""".strip().split()
)

trimAtts = set(
    """
    domain
    frame
    gloss
    id
    lemma
    ln
    morph
    normalized
    ref
    referent
    rule
    strong
    subjref
    unicode
""".strip().split()
)

We do not want both the `Rule` and `rule` features in our dataset, because this can clash on file systems
that are case insensitive.

We translate the `frame` attribute to an edge feature, but we retain the original contents in the
`framespec` attribute.

The name `class` is exceptionally cumbersome if you want to use it inside Python code,
so we rename it to `cls`.

Also, considering a friendly query, we switch the _label_ of node type from `w` to `word`.

In [6]:
renameAtts = {
    "Rule": "crule",
    "frame": "framespec",
    "subjref": "subjrefspec",
    "class": "cls",
}

In [7]:
X = XML(
    convertTaskCustom=convertTaskCustom,
    keywordAtts=keywordAtts,
    trimAtts=trimAtts,
    renameAtts=renameAtts,
    verbose=1,
    xml=0,
    tf="0.5.0",
)

Working in repository XML-nestle1904/programs in backend Downloads
XML data version is 2022-11-01 (most recent)
TF data version is 0.5.0 (explicit existing)


Now we can run tasks.

# Check

First we check the input:

In [6]:
X.task(check=True)

XML to TF checking: ~/Downloads/XML-nestle1904/programs/xml/2022-11-01 => ~/Downloads/XML-nestle1904/programs/report/2022-11-01
Start folder gnt:
  27 27-revelation.xml                                 
End   folder gnt

151 info line(s) written to ~/Downloads/XML-nestle1904/programs/report/2022-11-01/elements.txt
0 error(s) in 0 file(s) written to ~/Downloads/XML-nestle1904/programs/report/2022-11-01/errors.txt
7 tags of which 0 with multiple namespaces written to ~/Downloads/XML-nestle1904/programs/report/2022-11-01/namespaces.txt


True

# Convert, Load, and App Creation

Here we generate, check to see that the TF is valid is to load, and create the config file that turns the dataset into a TF app.

In [89]:
X.task(convert=True)
X.task(load=True)
X.task(app=True)

XML to TF converting: ~/Downloads/XML-nestle1904/programs/xml/2022-11-01 => ~/Downloads/XML-nestle1904/programs/tf/0.5.0
  0.00s Not all of the warp features otype and oslots are present in
~/Downloads/XML-nestle1904/programs/tf/0.5.0
  0.00s Only the Feature and Edge APIs will be enabled
  0.00s Warp feature "otext" not found. Working without Text-API

  0.00s Importing data from walking through the source ...
   |     0.00s Preparing metadata... 
   |     0.00s No structure nodes will be set up
   |   SECTION   TYPES:    book, chapter, verse
   |   SECTION   FEATURES: book, chapter, verse
   |   STRUCTURE TYPES:    
   |   STRUCTURE FEATURES: 
   |   TEXT      FEATURES:
   |      |   text-orig-full       after, text
   |     0.02s OK
   |     0.00s Following director... 
  27 27-revelation.xml                                 
There are no broken subjref references.
There are 9 broken frame references.
gnt/01-matthew.xml            : n40003016026, n40004024030
gnt/03-luke.xml         

True

# Test

We test a bit of the resulting dataset right here.

In [90]:
A = use("app:~/Downloads/XML-nestle1904/programs/app", version="0.5.0", hoist=globals())
B = use("etcbc/bhsa", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots/node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7944,17.34,100
sentence,8011,17.2,100
wg,114879,7.6,633
clause,30152,7.37,161
phrase,42636,3.21,99
word,137779,1.0,100


**Locating corpus resources ...**

Name,# of nodes,# slots/node,% coverage
book,39,10938.21,100
chapter,929,459.19,100
lex,9230,46.22,100
verse,23213,18.38,100
half_verse,45179,9.44,100
sentence,63717,6.7,100
sentence_atom,64514,6.61,100
clause,88131,4.84,100
clause_atom,90704,4.7,100
phrase,253203,1.68,100


## Comparing queries between the BHSA and our results

In [91]:
results = B.search('''
verse book=Genesis chapter=1 verse=1
''')
B.show(results, start=1, end=1, condensed=True, multiFeatures=False)

  0.02s 1 result


In [94]:
results = A.search('''
verse book=Matthew chapter=1
''')
A.show(results, start=1, end=1, condensed=True, multiFeatures=False)

  0.01s 25 results


# Browse

We are ready to browse the data.
If you run this notebook, then the next cell will open a browser window with the TF-browser
on the Greek New Testament.

In [50]:
X.task(browse=True)

This is Text-Fabric 11.4.12
Starting new kernel listening on 17116
Loading data for ETCBC/nestle1904. Please wait ...
Setting up TF kernel for ETCBC/nestle1904  
**Locating corpus resources ...**
Using app in ~/github/ETCBC/nestle1904/app:
	repo clone offline under ~/github (local github)
Using data in ~/github/ETCBC/nestle1904/tf/0.4.0:
	repo clone offline under ~/github (local github)
TF setup done.
Starting new webserver listening on 27116


 * Running on http://localhost:27116
[33mPress CTRL+C to quit[0m


Opening ETCBC/nestle1904 in browser
Press <Ctrl+C> to stop the TF browser


127.0.0.1 - - [11/May/2023 15:14:49] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [11/May/2023 15:14:49] "[36mGET /server/static/highlight.css HTTP/1.1[0m" 304 -
127.0.0.1 - - [11/May/2023 15:14:49] "[36mGET /server/static/fonts.css HTTP/1.1[0m" 304 -
127.0.0.1 - - [11/May/2023 15:14:49] "[36mGET /server/static/index.css HTTP/1.1[0m" 304 -
127.0.0.1 - - [11/May/2023 15:14:49] "[36mGET /server/static/display.css HTTP/1.1[0m" 304 -
127.0.0.1 - - [11/May/2023 15:14:49] "[36mGET /server/static/fontawesome.css HTTP/1.1[0m" 304 -
127.0.0.1 - - [11/May/2023 15:14:49] "[36mGET /server/static/base.css HTTP/1.1[0m" 304 -
127.0.0.1 - - [11/May/2023 15:14:49] "[36mGET /server/static/jquery.js HTTP/1.1[0m" 304 -
127.0.0.1 - - [11/May/2023 15:14:49] "[36mGET /server/static/tf3.0.js HTTP/1.1[0m" 304 -
127.0.0.1 - - [11/May/2023 15:14:49] "[36mGET /server/static/icon.png HTTP/1.1[0m" 304 -
127.0.0.1 - - [11/May/2023 15:14:49] "[36mGET /server/static/huc.png HTTP/1.1[0m" 304 -
127.0.0.1 - 

Kernel listening at port 17116

TF web server has stopped
TF kernel has stopped


keyboard interrupt!


True

# Terminate

You can stop the browser by pressing `i` twice.

# Create zip

It is time to commit and push the repo to GitHub now:

```
git add --all .
git commit "new data version"
git push origin master
```

Then go over to GitHub and create a new release there.

After that, fetch the new tags from GitHub by

```
git pull --tags
```

Then we are ready to create a zip file for publishing the dataset in a release on Github,
so that users can get it easily.

In [51]:
A.zipAll()

Data to be zipped:
	OK       app                      (v0.4.0 07c60c)     : ~/github/ETCBC/nestle1904/app
	OK       main data                (v0.4.0 07c60c)     : ~/github/ETCBC/nestle1904/tf/0.4.0
Writing zip file ...
Result: ~/Downloads/github/ETCBC/nestle1904/complete.zip


# Fetch

We now test wether users can use this dataset in the normal way.

Run this after you have attached the complete.zip file that we create earlier, to the latest release on GitHub.

In [52]:
A = use("ETCBC/nestle1904:latest")

**Locating corpus resources ...**

   |     0.17s T otype                from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.4.0
   |     2.36s T oslots               from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.4.0
   |     0.36s T text                 from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.4.0
   |     0.28s T after                from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.4.0
   |     0.28s T book                 from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.4.0
   |     0.24s T chapter              from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.4.0
   |     0.26s T verse                from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.4.0
   |      |     0.05s C __levels__           from otype, oslots, otext
   |      |     1.47s C __order__            from otype, oslots, __levels__
   |      |     0.06s C __rank__             from otype, __order__
   |      |     4.38s C __levUp__            from otype, oslots, __rank__
   |      |     2.38s C __levDown__          fr

Name,# of nodes,# slots/node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7944,17.34,100
sentence,8011,17.2,100
wg,114879,7.6,633
clause,30152,7.37,161
phrase,42636,3.21,99
w,137779,1.0,100


Indeed, downloading and installing went without hassle.

# Demo

We demo the effect of the reshuffling of the words.

Our test corpus is the letter of Jude, first sentence, twice.

The first time we do not shuffle the words in the sentence, the second time we do.

We run the conversion with `demo = True` in `lowfat.py`.

In [5]:
X = XML(
    convertTaskCustom=convertTaskCustom,
    keywordAtts=keywordAtts,
    trimAtts=trimAtts,
    renameAtts=renameAtts,
    verbose=1,
    xml=-1,
    tf="0.3.1t",
)

Working in repository ETCBC/nestle1904 in backend github
XML data version is 2000-01-01 (oldest)
TF data version is 0.3.1t (explicit new)


In [7]:
X.task(convert=True, load=True, app=True)

XML to TF converting: ~/github/ETCBC/nestle1904/xml/2000-01-01 => ~/github/ETCBC/nestle1904/tf/0.3.1t
  0.00s Not all of the warp features otype and oslots are present in
~/github/ETCBC/nestle1904/tf/0.3.1t
  0.00s Only the Feature and Edge APIs will be enabled
  0.00s Warp feature "otext" not found. Working without Text-API

  0.00s Importing data from walking through the source ...
   |     0.00s Preparing metadata... 
   |     0.00s No structure nodes will be set up
   |   SECTION   TYPES:    book, chapter, verse
   |   SECTION   FEATURES: book, chapter, verse
   |   STRUCTURE TYPES:    
   |   STRUCTURE FEATURES: 
   |   TEXT      FEATURES:
   |      |   text-orig-full       after, text
   |     0.00s OK
   |     0.00s Following director... 
   1 26-jude.xml                                       
source reading done
   |     0.00s "edge" actions: 0
   |     0.00s "feature" actions: 71
   |     0.00s "node" actions: 39
   |     0.00s "resume" actions: 0
   |     0.00s "slot" actions

True

In [9]:
A = use("ETCBC/nestle1904:clone", checkout="clone", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots/node,% coverage
book,1,34.0,100
chapter,1,34.0,100
verse,1,34.0,100
sentence,2,17.0,100
wg,34,6.0,600
w,34,1.0,100


In [10]:
(s1, s2) = F.otype.s("sentence")

In [21]:
color1 = "cyan"
color2 = "goldenrod"
start = 5
offset = 17
highlights = {
    start: color1,
    start + 1: color2,
    start + offset: color2,
    start + offset + 1: color1,
}
A.displaySetup(standardFeatures=True, highlights=highlights)

In [22]:
A.pretty(s1)

In [23]:
A.pretty(s2)

# Restore

We restore the app so that it uses the normal tf version again.

In [32]:
X = XML(
    convertTaskCustom=convertTaskCustom,
    keywordAtts=keywordAtts,
    trimAtts=trimAtts,
    renameAtts=renameAtts,
    verbose=1,
    tf="0.3.1",
)

Working in repository ETCBC/nestle1904 in backend github
XML data version is 2022-11-01 (most recent)
TF data version is 0.3.1 (explicit existing)


In [33]:
X.task(app=True)

App updating ...
	~/github/ETCBC/nestle1904/app/static/logo.png (already exists, not overwritten)
	~/github/ETCBC/nestle1904/app/static/display.css (no custom info, older orginal exists)
	~/github/ETCBC/nestle1904/app/config.yaml (generated with custom info)
	~/github/ETCBC/nestle1904/app/app.py (deleted)
Done


True

Now save this notebook, commit and push the repo again to publish this very notebook.

```
git add --all .
git commit "maker notebook updated"
git push origin master
```