<img align="right" src="images/tf-small.png" width="128"/>
<img align="right" src="images/etcbc.png"/>
<img align="right" src="images/dans-small.png"/>

# Sharing data features

## Explore additional data
The ETCBC has a few other repositories with data that work in conjunction with the BHSA data.
One of them you have already seen: 
[phono](https://github.com/ETCBC/phono),
for phonetic transcriptions.
There is also
[parallels](https://github.com/ETCBC/parallels)
for detecting parallel passages,
and
[valence](https://github.com/ETCBC/valence)
for studying patterns around verbs that determine their meanings.

## Make your own data
If you study the additional data, you can observe how that data is created and also
how it is turned into a text-fabric data module.
The last step is incredibly easy. You can write out every Python dictionary where the keys are numbers
and the values string or numbers as a Text-Fabric feature.
When you are creating data, you have already constructed those dictionaries, so writing
them out is just one method call.
See for example how the
[flowchart](https://nbviewer.jupyter.org/github/etcbc/valence/blob/master/programs/flowchart.ipynb#Add-sense-feature-to-valence-module)
notebook in valence writes out verb sense data.

## Share your new data
You can then easily share your new features on GitHub, so that your colleagues everywhere 
can try it out for themselves.

Here is how you draw in other data, for example

* [etcbc/valence/tf](https://github.com/etcbc/valence) :
  the results of the *verbal valence* work of Janet Dyk in the SYNVAR project;
* [etcbc/lingo/heads/tf](https://github.com/etcbc/lingo/tree/master/heads) :
  head words for phrases, work done by Cody Kingham;
* [ch-jensen/Semantic-mapping-of-participants/actor/tf](https://github.com/ch-jensen/Semantic-mapping-of-participants) :
  participant analysis in progress by Christian Høygaard-Jensen;
* [cmerwich/bh-reference-system/tf](https://github.com/cmerwich/bh-reference-system):
  participant analysis in progress by Christiaan Erwich;
* or whatever you have in the making!

You can add such data on the fly, by passing a `mod={org}/{repo}/{path}` parameter,
or a bunch of them separated by commas.

If the data is there, it will be auto-downloaded and stored on your machine.

Let's do it.

In [1]:
%load_ext autoreload
%autoreload 2

# Incantation

The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are
explained in the [start tutorial](start.ipynb).

In [2]:
from tf.app import use

In [3]:
A = use(
    'bhsa',
    mod=(
        'etcbc/valence/tf,'
        'etcbc/lingo/heads/tf,'
        'ch-jensen/Semantic-mapping-of-participants/actor/tf'
    ),
    hoist=globals(),
)

Using TF-app in /Users/dirk/github/annotation/app-bhsa/code:
	repo clone offline under ~/github (local github)
	connecting to online GitHub repo etcbc/bhsa ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/bhsa/tf/c:
	rv1.6 (latest release)
	connecting to online GitHub repo etcbc/phono ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/phono/tf/c:
	r1.2 (latest release)
	connecting to online GitHub repo etcbc/parallels ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/parallels/tf/c:
	r1.2 (latest release)
	connecting to online GitHub repo etcbc/valence ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/valence/tf/c:
	r1.1 (latest release)
	connecting to online GitHub repo etcbc/lingo ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/lingo/heads/tf/c:
	r0.1 (latest release)
	connecting to online GitHub repo ch-jensen/Semantic-mapping-of-participants ... connected
	downloading https://github.com/ch-jensen/participants/releas

You see that the features from the *etcbc/valence/tf* and *etcbc/lingo/heads/tf* modules have been added to the mix.

If you want to check for data updates, you can add an `check=True` argument.

Note that edge features are in **_bold italic_**.

## sense from valence

Let's find out about *sense*.

In [4]:
F.sense.freqList()

(('--', 17999),
 ('d-', 9979),
 ('-p', 6193),
 ('-c', 4250),
 ('-i', 2869),
 ('dp', 1853),
 ('dc', 1073),
 ('di', 889),
 ('l.', 876),
 ('i.', 629),
 ('n.', 533),
 ('-b', 66),
 ('db', 61),
 ('c.', 57),
 ('k.', 54))

Which nodes have a sense feature?

In [5]:
{F.otype.v(n) for n in N() if F.sense.v(n)}

{'word'}

In [6]:
results = A.search('''
word sense
''')

  0.32s 47381 results


Let's show some of the rarer sense values:

In [7]:
results = A.search('''
word sense=k.
''')

  0.39s 54 results


In [8]:
A.table(results, end=5)

n,p,word
1,Genesis 4:17,יִּקְרָא֙
2,Genesis 13:16,שַׂמְתִּ֥י
3,Genesis 32:13,שַׂמְתִּ֤י
4,Genesis 34:31,יַעֲשֶׂ֖ה
5,Genesis 48:20,יְשִֽׂמְךָ֣


If we do a pretty display, the `sense` feature shows up.

In [9]:
A.show(results, start=1, end=1, withNodes=True)

## actor from semantic

Let's find out about *actor*.

In [10]:
fl = F.actor.freqList()
len(fl)

411

In [11]:
fl[0:10]

(('JHWH', 358),
 ('BN JFR>L', 203),
 ('>JC', 103),
 ('2sm"YOUSgmas"', 66),
 ('MCH', 61),
 ('>RY', 58),
 ('>TM', 45),
 ('JFR>L', 35),
 ('NPC', 35),
 ('>X "YOUSgmas"', 34))

Which nodes have an actor feature?

In [12]:
{F.otype.v(n) for n in N() if F.actor.v(n)}

{'phrase_atom', 'subphrase'}

In [13]:
results = A.search('''
phrase_atom actor
''')

  0.18s 2073 results


Let's show some of the rarer actor values:

In [14]:
results = A.search('''
phrase_atom actor=KHN
''')

  0.27s 30 results


In [15]:
A.table(results)

n,p,phrase_atom
1,Leviticus 17:5,אֶל־הַכֹּהֵ֑ן
2,Leviticus 17:6,זָרַ֨ק
3,Leviticus 17:6,הַכֹּהֵ֤ן
4,Leviticus 17:6,הִקְטִ֣יר
5,Leviticus 19:22,כִפֶּר֩
6,Leviticus 19:22,הַכֹּהֵ֜ן
7,Leviticus 21:1,אֶל־הַכֹּהֲנִ֖ים
8,Leviticus 21:1,בְּנֵ֣י אַהֲרֹ֑ן
9,Leviticus 21:5,יִקְרְח֤וּ
10,Leviticus 21:5,יְגַלֵּ֑חוּ


In [16]:
A.show(results, start=1, end=1)

# heads from lingo

Now, `heads` is an edge feature, we cannot directly make it visible in pretty displays, but we can use it in queries.

We also want to make the feature `sense` visible, so we mention the feature in the query, without restricting the results.

In [17]:
results = A.search('''
book book=Genesis
  chapter chapter=1
    clause
      phrase
      -heads> word sense*
'''
)

  0.57s 402 results


We make the feature `sense` visible:

In [18]:
A.show(results, start=1, end=3, withNodes=True)

Note how the words that are **_heads_** of their phrases are highlighted within their phrases.

# All together!

Here is a query that shows results with all features.

In [19]:
results = A.search('''
book book=Leviticus
  phrase sense*
    phrase_atom actor=KHN
  -heads> word
''')

  0.74s 30 results


In [20]:
A.displaySetup(condensed=True, condenseType='verse')
A.show(results, start=8, end=8)
A.displaySetup()

# Features from custom locations

If you want to load your features from your own local github repositories, instead of from the data
that TF has downloaded for you into `~/text-fabric-data`, you can do so by passing the checkout parameter `checkout='clone'`.

In [21]:
A = use('bhsa', checkout='clone', hoist=globals())

Using TF-app in /Users/dirk/github/annotation/app-bhsa/code:
	repo clone offline under ~/github (local github)
Using data in /Users/dirk/github/etcbc/bhsa/tf/c:
	repo clone offline under ~/github (local github)
Using data in /Users/dirk/github/etcbc/phono/tf/c:
	repo clone offline under ~/github (local github)
Using data in /Users/dirk/github/etcbc/parallels/tf/c:
	repo clone offline under ~/github (local github)


Hover over the features to see where they come from, and you'll see they come from your local github repo.

You may load extra features by specifying locations and modules manually.

Here we get the valence features, but not as a module, but in a custom way.

In [22]:
A = use('bhsa', locations='~/text-fabric-data/etcbc/valence/tf', modules='c', hoist=globals())

Using TF-app in /Users/dirk/github/annotation/app-bhsa/code:
	repo clone offline under ~/github (local github)
	connecting to online GitHub repo etcbc/bhsa ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/bhsa/tf/c:
	rv1.6 (latest release)
	connecting to online GitHub repo etcbc/phono ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/phono/tf/c:
	r1.2 (latest release)
	connecting to online GitHub repo etcbc/parallels ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/parallels/tf/c:
	r1.2 (latest release)


Still, all features of the main corpus and the standard modules have been loaded.

Using `locations` and `modules` is useful if you want to load extra features from custom locations on your computer.

# Less features

If you want to load less features,
you can set up TF in the traditional way first, and then wrap the app API around it.

Here we load just the minimal set of features to get going.

In [23]:
from tf.fabric import Fabric

In [24]:
TF = Fabric(locations='~/github/etcbc/bhsa/tf', modules='c')

This is Text-Fabric 7.5.4
Api reference : https://annotation.github.io/text-fabric/Api/Fabric/

114 features found and 0 ignored


In [25]:
api = TF.load('pdp vs vt gn nu ps lex')

  0.00s loading features ...
   |     0.10s B lex                  from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.11s B pdp                  from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.11s B vs                   from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.11s B vt                   from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.08s B gn                   from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.10s B nu                   from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.10s B ps                   from /Users/dirk/github/etcbc/bhsa/tf/c
  4.58s All features loaded/computed - for details use loadLog()


And finally we wrap the app around it:

In [26]:
A = use('bhsa', api=api, hoist=globals())

Using TF-app in /Users/dirk/github/annotation/app-bhsa/code:
	repo clone offline under ~/github (local github)


This loads much quicker.

A small test: what are the verbal stems?

In [27]:
F.vs.freqList()

(('NA', 352874),
 ('qal', 50205),
 ('hif', 9407),
 ('piel', 6811),
 ('nif', 4145),
 ('hit', 960),
 ('peal', 654),
 ('pual', 492),
 ('hof', 427),
 ('hsht', 172),
 ('haf', 163),
 ('pael', 88),
 ('htpe', 53),
 ('peil', 40),
 ('htpa', 30),
 ('shaf', 15),
 ('etpa', 8),
 ('hotp', 8),
 ('pasq', 6),
 ('poel', 5),
 ('tif', 5),
 ('afel', 4),
 ('etpe', 3),
 ('htpo', 3),
 ('nit', 3),
 ('poal', 3))

# Next steps

* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[export](export.ipynb)** export your dataset as an Emdros database

Back to [start](start.ipynb)