## Building Higher-Order Structures

By using combinations of rules, you can build up structures of information. This is useful in natural-language parsing for detecting phrase patterns, such as noun phrases (NPs), verb phrases (VPs), and sentences (S). You can combine information from smaller constituents during processing to pass along semantic and other information into the higher-order structures.

In [18]:
import os, sys
sys.path.insert(1, os.path.abspath('..\\..'))
from thoughts.rules_engine import RulesEngine
from pprint import pprint

# start a new engine
engine = RulesEngine()

## Example - Natural Language Parsing

Structures are useful in natural-language parsing. Suppose you detect a NP (noun phrase) and you want to wait to see if there is a VP (verb phrase) which follows, to detect if a sentence is present. Ideally, you want to combine the meanings from the individual NP and VP consituents into a final meaning for the S (sentence). Structures can help you to do this.

In the example below, when {"cat": "art", "lemma": "the"} is asserted, the rule will match the first constituent and add the rule as an arc to the active arcs. The new arc will "wait" for another consituent with {"cat": "n", "lemma": "..."} to be asserted before matching and firing the "then" portion.

In [19]:
rule = {"#when": [
        {"cat" :"art", "lemma": "?det"},
        {"cat" :"n", "lemma": "?entity"}],
       "#then": 
        {"cat": "np", "entity": "?entity", "det": "?det"}
}

engine.add_rule(rule)

Let's try a quick example:

In [20]:
engine.process([{"cat": "art", "lemma": "the"}, {"cat": "n", "lemma": "fox"}])

[{'cat': 'np', 'entity': 'fox', 'det': 'the'}]

Essentially, the rule "combines" the information from the ART constituent 'the' with the information from the N constituent 'fox' into a larger NP consituent which tags the information from both with "entity" and "det" attributes. This supplies the final constituent with some semantics, or meaning, based on the smaller constituents.

## Adding More Rules and Capturing More Meaning

Now let's add another rule to capture information about a verb phrase (VP). We'll keep this simple from now and detect whenever there's a single V (verb) consituent, followed by a noun phrase (NP).

In [21]:
rule = {"#when": [
        {"cat" :"v", "lemma": "?verb"},
        {"cat" :"np", "entity": "?entity", "det": "?det"}],
       "#then": 
        {"cat": "vp", "action": "?verb", "entity": "?entity", "det": "?det"}
}

engine.add_rule(rule)

Let's try it on a sample:

In [22]:
engine.process([{"cat": "v", "lemma": "jumped over"}, {"cat": "art", "lemma": "the"}, {"cat": "n", "lemma": "dog"}] )

[{'cat': 'vp', 'action': 'jumped over', 'entity': 'dog', 'det': 'the'}]

Great - the verb phrase has been combined into a single constituent, and has attributes representing the semantics (meaning) of the phrase.

Now let's create a final rule, to detect whenever there's a NP followed by a VP - aka a Sentence (S).

In [23]:
rule = {"#when": [
        {"cat" :"np", "entity": "?entity1", "det": "?det1"},
        {"cat": "vp", "action": "?verb", "entity": "?entity2", "det": "?det2"}],
       "#then": 
        {"cat": "s", "action": "?verb", "subject": "?entity1", "subj-det": "?det1", "object": "?entity2", "obj-det": "?det2"}
}

engine.add_rule(rule)

Now let's try a full sentence:

In [24]:
phrase = [
{"cat": "art", "lemma": "the"}, 
{"cat": "n", "lemma": "fox"},
{"cat": "v", "lemma": "jumped over"}, 
{"cat": "art", "lemma": "the"}, 
{"cat": "n", "lemma": "dog"}
]

conclusions = engine.process(phrase)
pprint(conclusions)

[{'cat': 'np', 'det': 'the', 'entity': 'fox'},
 {'action': 'jumped over',
  'cat': 's',
  'obj-det': 'the',
  'object': 'dog',
  'subj-det': 'the',
  'subject': 'fox'}]


Here we added the semantic information to indicate that the first NP is the "subject" of the sentence, and the second NP found within the VP is the "object" of the sentence. This is how the semantics of a sentence are built from the constituent parts.

The engine returned two final conclusions. This happens because the first NP doesn't directly lead to a rule which ends the sentence, and so by itself is a valid final conclusion. To return only the conclusions which are sentences (where cat = 's'):

In [25]:
sentences = [s for s in conclusions if s["cat"] == "s"]
pprint(sentences)

[{'action': 'jumped over',
  'cat': 's',
  'obj-det': 'the',
  'object': 'dog',
  'subj-det': 'the',
  'subject': 'fox'}]


## Deepening the Structure

The above structure is rather flat. We had to create two attributes for each of the NP (subject and object) to track the entity 'entity' and the determiner 'det' for each noun phrase.

Let's rewrite our rules a bit to store a 'sem', or semantic, attribute as we build up the constituents. This will actually make things easier as we do not have to keep specifying all of the semantic attributes needed for rules to match along the way.

In [26]:
# let's stat over
engine.clear_rules()

rule = {"#when": [
        {"cat" :"art", "lemma": "?det"},
        {"cat" :"n", "lemma": "?entity"}],
       "#then": 
        {"cat": "np", "sem": {"entity": "?entity", "det": "?det"}}
}

engine.add_rule(rule)

Here we wrapped the 'entity' and 'det' attributes into a single 'sem' attribute within the noun phrase. Now we can use just the 'sem' attribute in the higher-order matching rules:

In [27]:
rule = {"#when": [
        {"cat" :"v", "lemma": "?verb"},
        {"cat" :"np", "sem": "?semnp"}],
       "#then": 
        {"cat": "vp", "action": "?verb", "object": "?semnp"}
}

engine.add_rule(rule)

Note above we've taken another minor shortcut by specifying that the NP is an 'object', since we're able to infer that already by the position of the constituents.

Let's finish things off for the Sentence (S) rule:

In [28]:
rule = {"#when": [
        {"cat" :"np", "sem": "?sem-np1"},
        {"cat": "vp", "action": "?verb", "object": "?sem-obj"}],
       "#then": 
        {"cat": "s", "action": "?verb", "subject": "?sem-np1", "object": "?sem-obj"}
}

engine.add_rule(rule)

And now for the test, using the same phrase as before to compare and extracting just the sentences, where 'cat' = 's'.

In [31]:
phrase = [
{"cat": "art", "lemma": "the"}, 
{"cat": "n", "lemma": "fox"},
{"cat": "v", "lemma": "jumped over"}, 
{"cat": "art", "lemma": "the"}, 
{"cat": "n", "lemma": "dog"}
]

conclusions = engine.process(phrase)
sentences = [s for s in conclusions if s["cat"] == "s"]
pprint(sentences)

[{'action': 'jumped over',
  'cat': 's',
  'object': {'det': 'the', 'entity': 'dog'},
  'subject': {'det': 'the', 'entity': 'fox'}}]


This is more clear. The subject and object information is now contained in a hierarchical structure, where we can embed additional information about the sub-constituents, and the rules were easier to write!

## Summary

You've seen how to construct higher-order structures from lower level constituents, and how to pass information along from the lower constituents to those higher-order structures. Experiment with your own rules and enjoy!