## 0. Setup

This notebook queries the BHSA data compiled by the ETCBC (https://etcbc.github.io/bhsa/) for clauses with a verb that governs an object phrase or a complement phrase in two separate searches. 

The results of the two searches are combined and then grouped by verbal root and stem. Examples in which an object and complement both appear in a single clause are removed and those root + stem combos that are found governing both objects and complements are added to the final data set for further investigation.

Note that the defintion of a complement used for the linguistic annotation is a bit unclear to me, and it seems like only a small subset of these examples represent cases where the prepositional complement fills a similar semantic role as the object phrases.

In [1]:
# these statements import the packages we need to run the scripts

# 1. This is how you import a text-fabric data object and populate it with the BHSA data
from tf.app import use
A = use('bhsa', hoist=globals())

# 2. This is a Python package that I will use to do the grouping as I aggregate the data
from itertools import groupby

This is Text-Fabric 9.1.6
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

122 features found and 0 ignored


In [2]:
# this is a function I wrote to return a 'book chap:verse' reference string for given node
def reference(node):
    book = L.u(node, 'book')
    chap = L.u(node, 'chapter')
    verse = L.i(node, 'verse')
    bk = Fs("book@en").v(book[0])
    ch = F.chapter.v(chap[0])
    vs = F.verse.v(verse[0])
    ref = f'{bk} {ch}:{vs}'
    return ref

In [3]:
# This is a function I wrote to return the preposition used to mark the complement node.
# Basically it follows a path down from the phrase node to find the first preposition and 
# then breaks the loop once it finds one. If there is no preposition at the head of the 
# phrase it returns a 0.
#
# I have to do it this way because there may be multiple embedded phrases underneath the 
# main node that hit from the search (e.g., if it is a coordinated phrase). If I just 
# search for any preposition within the main phrase, there is no guarantee that it is the 
# one actually marking the complement (it could be embedded in a lower level phrase). 
#
# Note that in the case where it is a coordinated phrase, I only return the first preposition. 
# I assume prepositional marking would probably be the same for each complement but it
# is possible that assumption is wrong.

def case_prep(node):
    prep = '0'
    for n in L.d(node):
        cat = F.otype.v(n)
        sp = F.sp.v(n)
        if (cat == 'word'): 
            if sp == 'prep':
                prep = F.lex_utf8.v(n)
            break
    return prep

## 1. Querying BHSA data

There are several ways to get data from the BHSA in text fabric. One of the easier to understand is to use the search method. Basically, we build a search template (stored as a string) and then use the A.search() method to return the results as a list. 

Here is some basic info about building searces: (https://annotation.github.io/text-fabric/tf/about/searchusage.html#usage)

Here is a list of the features that are available: (https://etcbc.github.io/bhsa/features/0_home/)

The results list will have one set of nodes that correspond to the discrete items in the template for each hit. In the templates I am using below there are 4 items, so we get sets of 4 nodes. Each node is represented by an id number that we can use to pull feature level data using text fabric methods.

Note that in Python my larger structure is a list, indicated by brackets []. The inner structures for the results of each hit are called tuples, indicated by parentheses (). The nice thing about Python is that it is relatively simple to iterate through lists in a for-loop without needing to know how long it is or setting any conditions. I can also index specific elements of a list but NB that indexing starts at 0 (so to get the first element, you want results[0])! 

In [4]:
# Query for all verb-object clauses
obj_query = '''
clause
    phrase function=Pred|PreC|PreS|PreO
        word sp=verb language=Hebrew     
    phrase function=Objc           
'''

In [5]:
# Query for all verb-complement clauses
cmpl_query = '''
clause
    phrase function=Pred|PreC|PreS|PreO
        word sp=verb language=Hebrew     
    phrase function=Cmpl           
'''

In [6]:
# store object results in a list
obj_results = A.search(obj_query)

  1.52s 21114 results


In [7]:
# store complement results in a list
cmpl_results = A.search(cmpl_query)

  1.56s 28094 results


In [8]:
# This loops through each hit in the results list and pulls feature data from the relevant nodes
# 1. I use the clause node to get our reference string with the function defined above
# 2. I use lex_utf8 to get the verbal root
# 3. I use vs to get the verbal stem
# 4. I use my case_prep function to get the preposition 
# 5. I append this new set of data to the new list

obj_data = []
for (clause, verb_phrase, verb_word, obj_phrase) in obj_results:
    ref = reference(clause)
    verb = F.lex_utf8.v(verb_word)
    stem = F.vs.v(verb_word)
    prep = case_prep(obj_phrase)
    obj_data.append((ref, clause, verb, stem, prep, 'obj'))

In [9]:
# Same thing but now for the complements

cmpl_data = []
for (clause, verb_phrase, verb_word, cmpl_phrase) in cmpl_results:
    ref = reference(clause)
    verb = F.lex_utf8.v(verb_word)
    stem = F.vs.v(verb_word)
    prep = case_prep(cmpl_phrase)
    cmpl_data.append((ref, clause, verb, stem, prep, 'cmpl'))

In [10]:
# Combine the two lists into 1
full_data = obj_data + cmpl_data

In [11]:
# Here is a sample of the first 10 results in the full list
full_data[0:10]

[('Genesis 1:1', 427559, 'ברא', 'qal', 'את', 'obj'),
 ('Genesis 1:4', 427566, 'ראה', 'qal', 'את', 'obj'),
 ('Genesis 1:5', 427569, 'קרא', 'qal', '0', 'obj'),
 ('Genesis 1:5', 427570, 'קרא', 'qal', '0', 'obj'),
 ('Genesis 1:7', 427577, 'עשׂה', 'qal', 'את', 'obj'),
 ('Genesis 1:8', 427582, 'קרא', 'qal', '0', 'obj'),
 ('Genesis 1:10', 427590, 'קרא', 'qal', '0', 'obj'),
 ('Genesis 1:10', 427591, 'קרא', 'qal', '0', 'obj'),
 ('Genesis 1:11', 427595, 'דשׁא', 'hif', '0', 'obj'),
 ('Genesis 1:11', 427596, 'זרע', 'hif', '0', 'obj')]

## 2. Aggregating the Results

Now we need to do some grouping in order to reaarange the results into something more useful. In this step I will use the groupby function. It is slightly complicated, but basically I need to sort our list by the keys we want to use for the grouping and then apply the groupby.

Our data has the following structure:

     0 - reference str
     1 - clause id
     2 - verbal root
     3 - verbal stem
     4 - preposition
     5 - phrase type (obj or cmpl)
     
So, if I want to first group by verb root and stem, then the keys will be index [2] and [3]. 

After this main level of grouping, I will separate out all the examples where the root-stem combo governed an object phrase and then sub-group the complements again based on the preposition used to mark it.

In [12]:
# In this cell I will group by verb + stem combo and store the results in a new list

# 1. empty list for results
verb_groups = []

# 2. sort in place by root, stem, and clause
full_data.sort(key = lambda x: (x[2], x[3], x[1]))

# 3. create groups and add to new list
for key, group in groupby(full_data, lambda x: (x[2], x[3])):
    verb_groups.append((key, list(group)))

In [18]:
# My new grouped list is slightly more complicated. 
#     1. For each root-stem combo there is a main tuple in the list
#     2. This tuple is subdivided into the (root, stem) key and a 
#        second list of all the data that belong to that pair

index = 2
print(f'root-stem key for record at index {index}')
print(verb_groups[index][0])
print()
print(f'objects/complements associated with that key:')
print()
for data in verb_groups[index][1]:
    print(data)

root-stem key for record at index 2
('אבד', 'qal')

objects/complements associated with that key:

('Leviticus 26:38', 440796, 'אבד', 'qal', 'ב', 'cmpl')
('Numbers 16:33', 442785, 'אבד', 'qal', 'מן', 'cmpl')
('Deuteronomy 4:26', 445433, 'אבד', 'qal', 'מן', 'cmpl')
('Deuteronomy 7:20', 445814, 'אבד', 'qal', 'מן', 'cmpl')
('Deuteronomy 11:17', 446217, 'אבד', 'qal', 'מן', 'cmpl')
('Deuteronomy 22:3', 447229, 'אבד', 'qal', 'מן', 'cmpl')
('Deuteronomy 28:20', 447856, 'אבד', 'qal', 'מן', 'cmpl')
('Joshua 23:13', 450940, 'אבד', 'qal', 'מן', 'cmpl')
('Joshua 23:16', 450961, 'אבד', 'qal', 'מן', 'cmpl')
('1_Samuel 9:20', 454907, 'אבד', 'qal', 'ל', 'cmpl')
('Jeremiah 18:18', 475990, 'אבד', 'qal', 'מן', 'cmpl')
('Jeremiah 25:35', 476755, 'אבד', 'qal', 'מן', 'cmpl')
('Jeremiah 49:7', 479276, 'אבד', 'qal', 'מן', 'cmpl')
('Ezekiel 7:26', 480663, 'אבד', 'qal', 'מן', 'cmpl')
('Amos 2:14', 486788, 'אבד', 'qal', 'מן', 'cmpl')
('Micah 7:2', 488025, 'אבד', 'qal', 'מן', 'cmpl')
('Zechariah 9:5', 489555, 'אב

In [19]:
# Now I want to do some more complicated grouping. For each root-stem combo
# the following code will:
#     1. group the data by clause number in order to eliminate examples where 
#        an obj and cmpl occur in the same clause (these aren't candidates for 
#        alternation) 
#     2. separate objs and cmpls into separate lists
#     3. group data in cmpl list by the preposition used to mark the complement
#
# All of this is stored in the final results list of candidate verbs

cand_verbs = []
for verb in verb_groups:
    clause_groups = []
    for key, group in groupby(verb[1], lambda x: x[1]):
        clause_groups.append((key, list(group)))
        
    cmps = []
    objs = []
    for c in clause_groups:
        # c[1] indexes the data within the clause group
        # if it has one element it is good, 2 or more means obj + cmpl in the same clause
        if len(c[1]) == 1:    
            if c[1][0][5] == 'obj':
                objs.append(c[1][0])
            elif c[1][0][5] == 'cmpl':
                cmps.append(c[1][0])
    
    # a root+stem combo that has both objs and cmps is a candidate to check the alternation 
    # group the cmps by preposition and append to final result list
    if (len(objs) > 0) & (len(cmps) > 0):
        prep_groups = []
        for key, group in groupby(sorted(cmps, key = lambda x: x[4]), lambda x: x[4]):
            prep_groups.append((key, list(group)))
        cand_verbs.append((verb[0], objs, prep_groups))

## 3. Looking at the results

Now I will give you some functions to look through the list of root-stem pairs that are candidates to participate in an object-prepositional complement alternation. There are a lot of these so I could also output the list to an excel file or something like that for you to scan it all at once.

In [108]:
# How many root-stem pairs ended up as candidates?
len(cand_verbs)

669

In [21]:
# Lets look at some examples
for verb in cand_verbs[:15]:
    print(f'{verb[0][0]} {verb[0][1]} occurs:')
    print(f'    {len(verb[1])}x with an obj')
    for prep in verb[2]:
        print(f'    {len(prep[1])}x with {prep[0]} marking a complement')
    print()
    

אבד hif occurs:
    11x with an obj
    1x with מן marking a complement

אבה qal occurs:
    1x with an obj
    3x with ל marking a complement

אהב qal occurs:
    88x with an obj
    2x with ב marking a complement
    3x with ל marking a complement

אוה hit occurs:
    7x with an obj
    3x with ל marking a complement

אור hif occurs:
    17x with an obj
    1x with אל marking a complement
    3x with ל marking a complement
    1x with מן marking a complement
    3x with על marking a complement

אזן hif occurs:
    15x with an obj
    4x with אל marking a complement
    4x with ל marking a complement
    2x with עד marking a complement
    1x with על marking a complement

אזר qal occurs:
    4x with an obj
    1x with ב marking a complement

אחז qal occurs:
    13x with an obj
    21x with ב marking a complement

אחר piel occurs:
    2x with an obj
    1x with ל marking a complement
    1x with על marking a complement

איב qal occurs:
    2x with an obj
    2x with ל marking a complem

In [24]:
# To return one record at a time replace the value of n with the index of the root-stem combo you want
n = 5
verb = cand_verbs[n]
print(verb[0])
print()
print(f'    objects')
for clause in verb[1]:
    print(f'          {clause[0]}')
print()
for prep in verb[2]:
    print(f'    {prep[0]} complements')
    for clause in prep[1]:
        print(f'          {clause[0]}')

('אזן', 'hif')

    objects
          Genesis 4:23
          Isaiah 1:10
          Isaiah 32:9
          Isaiah 42:23
          Psalms 5:2
          Psalms 17:1
          Psalms 39:13
          Psalms 55:2
          Psalms 78:1
          Psalms 86:6
          Psalms 140:7
          Psalms 141:1
          Job 9:16
          Job 33:1
          Job 37:14

    אל complements
          Deuteronomy 1:45
          Isaiah 51:4
          Psalms 77:2
          Psalms 143:1
    ל complements
          Exodus 15:26
          Psalms 54:4
          Job 34:2
          Job 34:16
    עד complements
          Numbers 23:18
          Job 32:11
    על complements
          Proverbs 17:4


In [38]:
# To build Accordance search strings, again, replace n with the index of the root-stem combo you want
# you should be able to just copy paste the string into Accordance, though some of the stem 
# abbreviations might be different so let me know if you run into problems

n = 5
verb = cand_verbs[n]
obj_refs = []
for a in verb[1]:
    obj_refs.append(a[0])
ref_string = ", ".join(obj_refs)
acc_str = f'{verb[0][0]} @ [VERB {verb[0][1]}] <AND> [RANGE {", ".join(obj_refs)}]'

print('objects:')
print()
print(acc_str)
print()

for prep in verb[2]:
    print(f'{prep[0]} complements:')
    print()
    refs = []
    for a in prep[1]:
        refs.append(a[0])
    acc_str = f'{verb[0][0]} @ [VERB {verb[0][1]}] <AND> [RANGE {", ".join(refs)}]'
    print(acc_str)


objects:

אזן @ [VERB hif] <AND> [RANGE Genesis 4:23, Isaiah 1:10, Isaiah 32:9, Isaiah 42:23, Psalms 5:2, Psalms 17:1, Psalms 39:13, Psalms 55:2, Psalms 78:1, Psalms 86:6, Psalms 140:7, Psalms 141:1, Job 9:16, Job 33:1, Job 37:14]

אל complements:

אזן @ [VERB hif] <AND> [RANGE Deuteronomy 1:45, Isaiah 51:4, Psalms 77:2, Psalms 143:1]
ל complements:

אזן @ [VERB hif] <AND> [RANGE Exodus 15:26, Psalms 54:4, Job 34:2, Job 34:16]
עד complements:

אזן @ [VERB hif] <AND> [RANGE Numbers 23:18, Job 32:11]
על complements:

אזן @ [VERB hif] <AND> [RANGE Proverbs 17:4]
