# Find Split Ole Weyored

The objective of this notebook is to find the ole weyored accent -- a composite accent which only occurs in the poetic accentuation system. Often it occurs on single words but, because it is a composite accent, sometimes it occurs split over two words.

We begin by importint the necessary text-fabric data

In [1]:
from tf.fabric import Fabric

TF = Fabric(locations='../text-fabric-data', modules='hebrew/etcbc4c')
api = TF.load('trailer_utf8 g_word_utf8')

api.makeAvailableIn(globals())

This is Text-Fabric 2.3.2
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
114 features found and 0 ignored
  0.00s loading features ...
   |     0.18s B g_word_utf8          from /home/jcuenod/Programming/text-fabric-data/hebrew/etcbc4c
   |     0.07s B trailer_utf8         from /home/jcuenod/Programming/text-fabric-data/hebrew/etcbc4c
   |     0.00s Feature overview: 108 nodes; 5 edges; 1 configs; 7 computeds
  4.01s All features loaded/computed - for details use loadLog()


Now we process all nodes looking for the poetic ones...

In [2]:
poetic_passages = [
    ("Psalms",),
    ("Proverbs",),
    ("Job",),
]
poetic_passages_blacklist = [
    ("Job", 1),
    ("Job", 2),
    ("Job", 3, 1),
    ("Job", 42, 7),
    ("Job", 42, 8),
    ("Job", 42, 9),
    ("Job", 42, 11),
    ("Job", 42, 12),
    ("Job", 42, 13),
    ("Job", 42, 14),
    ("Job", 42, 15),
    ("Job", 42, 16),
    ("Job", 42, 17),
]

from itertools import chain
def createWordNodeListFromPassageTupleList(passage_list):
    cumulative_node_list = []
    for passage in passage_list:
        filter_node = T.nodeFromSection(passage)
        if filter_node is not None:
            cumulative_node_list = list(chain(cumulative_node_list, L.d(filter_node, otype='word')))
        else:
            print("Failed on", passage)
    return cumulative_node_list

poetic_passage_word_nodes = createWordNodeListFromPassageTupleList(poetic_passages)
poetic_passage_word_nodes_blacklist = createWordNodeListFromPassageTupleList(poetic_passages_blacklist)

reduced_list = list(filter(lambda x: x not in poetic_passage_word_nodes_blacklist, poetic_passage_word_nodes))

print("Now we should have a list of nodes from poetic passages:")
print("--")
print("poetic_passage_word_nodes", len(poetic_passage_word_nodes))
print("poetic_passage_word_nodes_blacklist", len(poetic_passage_word_nodes_blacklist))
print("reduced_list", len(reduced_list))

Now we should have a list of nodes from poetic passages:
--
poetic_passage_word_nodes 45142
poetic_passage_word_nodes_blacklist 1060
reduced_list 44082


In [3]:
print("\nFinding accent units:")
accent_units = []
glue = {'', '־'}
node2au = []
current_au = ""
current_au_nodes = []
for w in reduced_list:
    trailer = F.trailer_utf8.v(w)
    current_au += F.g_word_utf8.v(w) + trailer
    current_au_nodes.append(w)
    if trailer not in glue:
        accent_units.append({
            "accent_unit": current_au,
            "nodes": current_au_nodes
        })
        current_au = ""
        current_au_nodes = []
print("Found:", len(accent_units))


Finding accent units:
Found: 29649


In [4]:
import re
unicode_accent_range = '[\u0591-\u05AE\u05BE\u05C0\u05BD\u05C3]'
ole_accent = "\u05AB"
yored_accent = "\u05A5"

Let's find the Ole Weyoreds that are found on single words:

In [5]:
ole_weyored = ["\u05AB", "\u05A5"]
ole_weyoreds_found = 0
for au in accent_units:
    accent_matches = re.findall(unicode_accent_range, au["accent_unit"])
    if accent_matches == ole_weyored:
        ole_weyoreds_found += 1
        if ole_weyoreds_found < 10:
            print(T.sectionFromNode(au["nodes"][0]), au["accent_unit"])
        elif ole_weyoreds_found == 10:
            print("(only printing the first 10)")
print("Found altogether:", ole_weyoreds_found)

('Psalms', 1, 1) רְשָׁ֫עִ֥ים 
('Psalms', 1, 2) חֶ֫פְצֹ֥ו 
('Psalms', 3, 3) לְנַ֫פְשִׁ֥י 
('Psalms', 4, 9) וְאִ֫ישָׁ֥ן 
('Psalms', 5, 7) כָ֫זָ֥ב 
('Psalms', 5, 10) הַ֫וֹּ֥ות 
('Psalms', 5, 13) צַ֫דִּ֥יק 
('Psalms', 7, 1) לְדָ֫וִ֥ד 
('Psalms', 7, 9) עַ֫מִּ֥ים 
(only printing the first 10)
Found altogether: 265


Now let's try to find the Ole Weyoreds split across two words:

In [6]:
worked_this_time = 0
ole_just_found = False
for au in accent_units:
    accent_matches = re.findall(unicode_accent_range, au["accent_unit"])
    if len(accent_matches):
        if accent_matches[-1] == ole_accent:
            # Found an ole
            ole_just_found = au["accent_unit"]
            continue
        elif ole_just_found:
            if re.search(yored_accent, au["accent_unit"]):
                print(T.sectionFromNode(au["nodes"][0]), ole_just_found + " " + au["accent_unit"])
    ole_just_found = False

('Psalms', 1, 3) עַֽל־פַּלְגֵ֫י  מָ֥יִם 
('Psalms', 4, 7) מִֽי־יַרְאֵ֪נוּ֫  טֹ֥וב 
('Psalms', 6, 3) אֻמְלַ֫ל  אָ֥נִי 
('Psalms', 8, 3) יִסַּ֪דְתָּ֫  עֹ֥ז 
('Psalms', 14, 4) כָּל־פֹּ֪עֲלֵ֫י  אָ֥וֶן 
('Psalms', 18, 44) מֵרִ֪יבֵ֫י  עָ֥ם 
('Psalms', 28, 3) וְעִם־פֹּ֪עֲלֵ֫י  אָ֥וֶן 
('Psalms', 30, 8) לְֽהַרְרִ֫י  עֹ֥ז 
('Psalms', 31, 19) שִׂפְתֵ֫י  שָׁ֥קֶר 
('Psalms', 31, 21) מֵֽרֻכְסֵ֫י  אִ֥ישׁ 
('Psalms', 37, 7) וְהִתְחֹ֪ולֵ֫ל  לֹ֥ו 
('Psalms', 40, 18) יַחֲשָׁ֫ב  לִ֥י 
('Psalms', 44, 4) לֹא־הֹושִׁ֪יעָ֫ה  לָּ֥מֹו 
('Psalms', 45, 8) וַתִּשְׂנָ֫א  רֶ֥שַׁע 
('Psalms', 53, 3) עַֽל־בְּנֵ֫י  אָדָ֥ם 
('Psalms', 53, 5) פֹּ֤עֲלֵ֫י  אָ֥וֶן 
('Psalms', 53, 6) לֹא־הָ֪יָה֫  פָ֥חַד 
('Psalms', 56, 9) סָפַ֪רְתָּ֫ה  אָ֥תָּה 
('Psalms', 62, 10) בְּנֵ֫י  אִ֥ישׁ 
('Psalms', 88, 1) לִבְנֵ֫י  קֹ֥רַח 
('Psalms', 88, 10) מִנִּ֫י  עֹ֥נִי 
('Psalms', 97, 10) שִׂנְא֫וּ  רָ֥ע 
('Psalms', 102, 3) צַ֫ר  לִ֥י 
('Psalms', 115, 1) לֹ֫א  לָ֥נוּ 
('Psalms', 130, 7) אֶל־יְה֫וָה  כִּֽי־עִם־יְהוָ֥ה 
('Psalms', 142, 7) כִּֽי־ד