
You might want to consider the [start](search.ipynb) of this tutorial.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from tf.app import use

In [3]:
VERSION = '2017'

In [4]:
A = use('bhsa', hoist=globals(), version=VERSION)

Using etcbc/bhsa - 2017 r1.4 in ~/text-fabric-data/etcbc/bhsa/tf/2017
Using etcbc/phono - 2017 r1.1 in ~/text-fabric-data/etcbc/phono/tf/2017
Using etcbc/parallels - 2017 r1.1 in ~/text-fabric-data/etcbc/parallels/tf/2017


**Documentation:** <a target="_blank" href="https://etcbc.github.io/bhsa" title="provenance of BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis">BHSA</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Writing/Hebrew" title="('Hebrew characters and transcriptions',)">Character table</a> <a target="_blank" href="https://etcbc.github.io/bhsa/features/hebrew/2017/0_home.html" title="BHSA feature documentation">Feature docs</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Api/Bhsa/" title="bhsa API documentation">bhsa API</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Api/General/" title="text-fabric-api">Text-Fabric API 6.4.6</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Api/General/#search-templates" title="Search Templates Introduction and Reference">Search Reference</a>


This notebook online:
<a target="_blank" href="https://nbviewer.jupyter.org/github/etcbc/bhsa/blob/master/tutorial/searchQuantifiers.ipynb">NBViewer</a>
<a target="_blank" href="https://github.com/etcbc/bhsa/blob/master/tutorial/searchQuantifiers.ipynb">GitHub</a>


# Quantifiers

Quantifiers add considerable power to search templates.

Quantifiers consist of full-fledged search templates themselves, and give rise to 
auxiliary searches being performed.

The use of quantifiers may prevent the need to resort to hand-coding in many cases.
That said, they can also be exceedingly tricky, so that it is advisable to check the results
by hand-coding anyway, until you are perfectly comfortable with them.

# Examples

## Lexemes

It is easy to find the lexemes that occur in a specific book only.
Because the `lex` node of such a lexeme is contained in the node of that specific book.

Lets get the lexemes specific to Ezra and then those specific to Nehemiah.

In [5]:
query = '''
book book@en=Ezra
    lex
'''
ezLexemes = A.search(query)
ezSet = {r[1] for r in ezLexemes}

query = '''
book book@en=Nehemiah
    lex
'''
nhLexemes = A.search(query)
nhSet = {r[1] for r in nhLexemes}

print(f'Total {len(ezSet | nhSet)} lexemes')

  0.01s 199 results
  0.01s 110 results
Total 309 lexemes


What if we want to have the lexemes that occur only in Ezra and Nehemia?

If such a lexeme occurs in both books, it will not be contained by either book.
So we have missed them by the two queries above.

We have to find a different way. Something like: search for lexemes of which all words occur either in Ezra or in Nehemia.

With the template constructions you have seen so far, this is impossible to say.

This is where [*quantifiers*](https://dans-labs.github.io/text-fabric/Api/General/#quantifiers) come in.

## /without/

First we are going to query for these lexemes by means of a `no:` quantifier.

In [6]:
query = '''
lex
/without/
book book@en#Ezra|Nehemiah
  w:word
  w ]] ..
/-/
'''
query1results = A.search(query, shallow=True)

  1.19s 382 results


## /where/

Now the `/without/` quantifier is a bit of a roundabout way to say what you really mean.
We can also employ the more positive `/where/` quantifier.

In [7]:
query = '''
lex
/where/
  w:word
/have/
b:book book@en=Ezra|Nehemiah
w ]] b
/-/
'''
query2results = A.search(query, shallow=True)

  0.59s 382 results


Check by hand coding:

In [8]:
indent(reset=True)
universe = F.otype.s('lex')
wordsEzNh = set(
    L.d(T.bookNode('Ezra', lang='en'), otype='word') + 
    L.d(T.bookNode('Nehemiah', lang='en'), otype='word')
)
handResults = set()
for lex in universe:
    occs = set(L.d(lex, otype='word'))
    if occs <= wordsEzNh:
        handResults.add(lex)
info(len(handResults))

  0.13s 382


Looks good, but we are thorough:

In [9]:
print(query1results == handResults)
print(query2results == handResults)

True
True


## Verb phrases

Let's look for clauses with where all `Pred` phrases contain only verbs and look for `Subj`
phrases in those clauses.

In [10]:
query = '''
clause
/where/
  phrase function=Pred
/have/
  /without/
    word sp#verb
  /-/
/-/
  phrase function=Subj
'''
queryResults = A.search(query)

A.show(queryResults, end=5)

  1.32s 31399 results




**verse** *1*





**verse** *2*





**verse** *3*





**verse** *4*





**verse** *5*



Note that the pieces of template that belong to a quantifier, do not correspond to nodes in the result tuples!

Check by hand:

In [11]:
indent(reset=True)
handResults = []
for clause in F.otype.s('clause'):
    phrases = L.d(clause, otype='phrase')
    preds = [p for p in phrases if F.function.v(p) == 'Pred']
    good = True
    for pred in preds:
        if any(F.sp.v(w) != 'verb' for w in L.d(pred, otype='word')):
            good = False
    if good:
        subjs = [p for p in phrases if F.function.v(p) == 'Subj']
        for subj in subjs:
            handResults.append((clause, subj))
info(len(handResults))

  0.99s 31399


In [12]:
queryResults == handResults

True

### Inspection

We can see which templates are being composed in the course of interpreting the quantifier.
We use the good old `S.study()`:

In [13]:
query = '''
clause
/where/
  phrase function=Pred
/have/
  /without/
    word sp#verb
  /-/
/-/
  phrase function=Subj
'''
S.study(query)

   |     0.00s Feature overview: 111 for nodes; 8 for edges; 2 configs; 7 computed
  0.00s Checking search template ...
  0.00s Setting up search space for 2 objects ...
   |     0.00s "Quantifier on "parent:clause"
   |      |   /where/
   |      |   parent:clause
   |      |     phrase function=Pred
   |      |     0.48s 57070 matching nodes
   |      |   /have/
   |      |   parent:clause
   |      |     phrase function=Pred
   |      |     /without/
   |      |       word sp#verb
   |      |     /-/
   |      |   /-/
   |      |     0.00s "Quantifier on "parent:phrase function=Pred"
   |      |      |   /without/
   |      |      |   parent:phrase function=Pred
   |      |      |     word sp#verb
   |      |      |   /-/
   |      |      |     1.32s 4893 nodes to exclude
   |      |     1.33s reduction from 57070 to 52177 nodes
   |      |     1.61s 52177 matching nodes
   |      |     1.65s 4893 match antecedent but not consequent
   |     1.65s reduction from 88101 to 83208 nodes

Observe the stepwise unraveling of the quantifiers, and the auxiliary templates that are distilled
from your original template.

If you ever get syntax errors, run `S.study()` to find clues.

## Subject at start or at end

We want the clauses that consist of at least two adjacent phrases, has a Subj phrase, which is either at the beginning or at the end.

In [14]:
query = '''
c:clause
/with/
  =: phrase function=Subj
/or/
  := phrase function=Subj
/-/
  phrase
  <: phrase
'''

queryResults = sorted(A.search(query, shallow=True))

  0.86s 15332 results


Check by hand:

In [15]:
indent(reset=True)
handResults = []
for clause in F.otype.s('clause'):
    clauseWords = L.d(clause, otype='word')
    phrases = set(L.d(clause, otype='phrase'))
    if any(L.n(p, otype='phrase') and (L.n(p, otype='phrase')[0] in phrases) for p in phrases):
        # handResults.append(clause)
        # continue
        subjPhrases = [p for p in phrases if F.function.v(p) == 'Subj']
        if (
            any(L.d(p, otype='word')[0] == clauseWords[0] for p in subjPhrases)
            or
            any(L.d(p, otype='word')[-1] == clauseWords[-1] for p in subjPhrases)
        ):
            handResults.append(clause)
info(len(handResults))

  1.86s 15332


A nice case where the search template performs better than this particular piece of hand-coding.

In [16]:
queryResults == handResults

True

Let's also study this query:

In [17]:
S.study(query)

   |     0.00s Feature overview: 111 for nodes; 8 for edges; 2 configs; 7 computed
  0.00s Checking search template ...
  0.00s Setting up search space for 3 objects ...
   |     0.00s "Quantifier on "c:clause"
   |      |   /with/
   |      |   c:clause
   |      |     =: phrase function=Subj
   |      |     0.35s adding 5297 to 0 yields 5297 nodes
   |      |   /or/
   |      |   c:clause
   |      |     := phrase function=Subj
   |      |     0.37s adding 11118 to 5297 yields 15924 nodes
   |      |   /-/
   |     0.37s reduction from 88101 to 15924 nodes
  0.45s Constraining search space with 3 relations ...
  0.74s Setting up retrieval plan ...
  0.76s Ready to deliver results from 313680 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results


## Verb-containing phrases

Suppose we want to collect all phrases with the condition that if they
contain a verb, their `function` is `Pred`.

This is a bit theoretical, but it shows two powerful constructs to increase readability
of quantifiers.

### Unreadable

First we express it without special constructs.

In [18]:
query = '''
p:phrase
/where/
  w:word pdp=verb
/have/
q:phrase function=Pred
q = p
/-/
'''
results = A.search(query, shallow=True)

  0.96s 241232 results


We check the query by means of hand-coding:

1. is every result a phrase: either without verbs, or with function Pred?
2. is every phrase without verbs or with function Pred contained in the results?

In [19]:
allPhrases = set(F.otype.s('phrase'))

ok1 = all(
    F.function.v(p) == 'Pred'
    or 
    all(F.pdp.v(w) != 'verb' for w in L.d(p, otype='word'))
    for p in results
)
ok2 = all(
    p in results
    for p in allPhrases
    if (
        F.function.v(p) == 'Pred'
        or
        all(F.pdp.v(w) != 'verb' for w in L.d(p, otype='word'))
    )
)

print(f'Check 1: {ok1}')
print(f'Check 2: {ok2}')

Check 1: True
Check 2: True


Ok, we are sure that the query does what we think it does.

### Readable

Now let's make it more readable.

In [20]:
query = '''
phrase
/where/
  w:word pdp=verb
/have/
.. function=Pred
/-/
'''

In [21]:
results2 = A.search(query, shallow=True)

print(f'Same results as before? {results == results2}')

  1.19s 241232 results
Same results as before? True


Try to see how search is providing the name `parent` to the phrase atom and how it resolves the name `..`:

In [22]:
S.study(query)

   |     0.00s Feature overview: 111 for nodes; 8 for edges; 2 configs; 7 computed
  0.00s Checking search template ...
  0.00s Setting up search space for 1 objects ...
   |     0.00s "Quantifier on "parent:phrase"
   |      |   /where/
   |      |   parent:phrase
   |      |     w:word pdp=verb
   |      |     0.74s 69026 matching nodes
   |      |   /have/
   |      |   parent:phrase
   |      |     w:word pdp=verb
   |      |   parent function=Pred
   |      |   /-/
   |      |     1.08s 57070 matching nodes
   |      |     1.11s 11955 match antecedent but not consequent
   |     1.12s reduction from 253187 to 241232 nodes
  1.13s Constraining search space with 0 relations ...
  1.13s Setting up retrieval plan ...
  1.13s Ready to deliver results from 241232 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results


# Next

You master the theory.

In practice, their are pitfalls:
[rough edges](searchRough.ipynb)

---

[basic](search.ipynb)
[advanced](searchAdvanced.ipynb)
[sets](searchSets.ipynb)
[relations](searchRelations.ipynb)
quantifiers
[rough](searchRough.ipynb)
[gaps](searchGaps.ipynb)