## Set-Based Rules

Set-based rules allow you to detect when a group of facts with specific attributes are asserted, independent of the order they were asserted.

Let's start up a fresh engine:

In [1]:
import os, sys
sys.path.insert(1, os.path.abspath('..\\..\\..'))
from thoughts.rules_engine import RulesEngine
from pprint import pprint

# start a new engine
engine = RulesEngine()

## Which Animal Am I?

Let's build a set of rules to detect what kind of animal is being described, given a set of characteristics.

First up, the rules describing the animals:

In [2]:
rules = [   {"#when": [{"ears": "pointy"}, {"tail": "long"}, {"nose": "short"}],
            "#seq-type": "set",
            "#then": [{"isa": "CAT"}] },
            
            {"#when": [{"ears": "floppy"}, {"tail": "long"}, {"nose": "medium"}], 
            "#seq-type": "set",
            "#then": [{"isa": "DOG"}] },

            {"#when": [{"ears": "huge"}, {"tail": "short"}, {"nose": "long"}], 
            "#seq-type": "set",
            "#then": [{"isa": "CAT"}] }
        ]

engine.add_rules(rules)

These look similiar to regular sequence rules with one difference - notice the #seq-type attribute, with the value of "set". This tells the engine to trigger once all three facts in the #when part have been asserted, regardless of which order they were asserted.

Let's try an example:

In [3]:
observations = [{"nose": "medium"}, {"tail": "long"}, {"ears": "floppy"}]

result = engine.process(observations, extract_conclusions=True)
pprint(result)

[{'isa': 'DOG'}]


Notice that the three characteristics were asserted in a different order than the order specified in the rule's #when portion. Just like sequence rules, sets use partially matches rules, or "arcs", but do not care about the order that the constituent facts arrive.

We can run this again using a different order for the observations and the rule will still trigger:

In [4]:
observations = [{"ears": "floppy"}, {"nose": "medium"}, {"tail": "long"}]

result = engine.process(observations, extract_conclusions=True)
pprint(result)

[{'isa': 'DOG'}]


## Constituent Set Members are Mutually Exclusive Within the Same Rule

Things get more complex when you are detecting different types of sets and those sets are built upon the same underlying constituent members. You don't want the same members being used twice in the same rule.

Consider the following scenario, which will detect the sequence ABA and then if two of those sequences are in a set, will trigger FOUND.

In [5]:
rules = [ {"#when": ["A", "B", "A"], "#then": "SEQ1" },
          {"#when": ["SEQ1", "SEQ1"], "#then": "FOUND", "#seq-type": "set" } ]

engine.clear_rules()
engine.add_rules(rules)

observations = ["A", "B", "A", "B", "A"]

result = engine.process(observations, extract_conclusions=True)
pprint(result)

['SEQ1', 'SEQ1']


FOUND did not trigger. This is because the middle 'A' would be a member of both SEQ1 sets. The engine does this to prevent the same member from being used in two different portions of the same rule.

There are two ways to relax this restriction, depending on if you want to preserve some connection between the members or to allow for members to belong to as many sets in the same rule as possible.

## Overlapping Sets

First, you can allow for the rule to trigger if the constituent is both the last member in the previous sequence and the first member in the next sequence, by using the #seq-type of "overlap-connected". Here the middle 'A' connects both sequences. It is the last member of the first sequence and the first member of the last sequence.

In [6]:
rules = [ {"#when": ["A", "B", "A"], "#then": "SEQ1" },
          {"#when": ["SEQ1", "SEQ1"], "#then": "FOUND", "#seq-type": "overlap-connected" } ]

engine.clear_rules()
engine.add_rules(rules)

observations = ["A", "B", "A", "B", "A"]

result = engine.process(observations, extract_conclusions=True)
pprint(result)

['SEQ1', 'FOUND']


## Alllowing the Same Constituent in Multiple Sets in the Same Rule

Second, you can set the #seq-allow-multi to allow members to be used more than once within the set rule, removing the restriction altogether. In this example, the middle 'A' is used in both sets, and is allowed due to the presence of the #seq-allow-multi attribute.

In [7]:
rules = [ {"#when": ["A", "B", "A"], "#then": "SEQ1" },
          {"#when": ["SEQ1", "SEQ1"], "#then": "FOUND", "#seq-type": "set", "#seq-allow-multi": True } ]

engine.clear_rules()
engine.add_rules(rules)

observations = ["A", "B", "A", "B", "A"]

result = engine.process(observations, extract_conclusions=True)
pprint(result)

['SEQ1', 'FOUND']


Because the example above was also connected by an overlapping position, let's look at another example which has the set members more embedded, to illustrate the concept further.

Suppose you want to detect whenever you find a pair of consituents alongside a triplet of constituents. If you a familiar with poker hands, this is similiar to the concept of a "full house".

Here's an example hand:

+ A B A B A

If members are allowed to be used more than once in a rule, then it would detect multiple full house combinations here, because it would detect three pairs of A's:

+ A . A . .
+ A . . . A
+ . . A . A

It would also detect the pair of B's:

+ . B . B . 

Then it would detect the one occurence of the triplet (three of a kind):

+ A . A . A

Then the rule to detect the full house would use the 3 'A' pairs alongside the 'A' triplet to conclude that there were three full houses detected!

Let's see that in action now, with the #seq-allow-multi flag on, which allows for this behavior:

In [8]:
rules = [ {"#when": ["A", "A"], "#then": "PAIR", "#seq-type": "allow-junk" },
          {"#when": ["B", "B"], "#then": "PAIR", "#seq-type": "allow-junk" },
          {"#when": ["A", "A", "A"], "#then": "THREES", "#seq-type": "allow-junk" },
          {"#when": ["B", "B", "B"], "#then": "THREES", "#seq-type": "allow-junk" },
          {"#when": ["THREES", "PAIR"], "#then": "FULL HOUSE", "#seq-type": "set", "#seq-allow-multi": True } ]

engine.clear_rules()
engine.add_rules(rules)

observations = ["A", "B", "A", "B", "A"]

result = engine.process(observations, extract_conclusions=True)
pprint(result)

['PAIR', 'PAIR', 'PAIR', 'FULL HOUSE', 'FULL HOUSE', 'FULL HOUSE', 'FULL HOUSE']


In some scenarios you may want to allow this behavior. 

In other scenarios, you may want to exclude sets which are already used in other constituents within the same rule. This is the default behavior, and so the #seq-allow-multi flag is not present:

In [9]:
rules = [ {"#when": ["A", "A"], "#then": "PAIR", "#seq-type": "allow-junk" },
          {"#when": ["B", "B"], "#then": "PAIR", "#seq-type": "allow-junk" },
          {"#when": ["A", "A", "A"], "#then": "THREES", "#seq-type": "allow-junk" },
          {"#when": ["B", "B", "B"], "#then": "THREES", "#seq-type": "allow-junk" },
          {"#when": ["THREES", "PAIR"], "#then": "FULL HOUSE", "#seq-type": "set" } ]

engine.clear_rules()
engine.add_rules(rules)

observations = ["A", "B", "A", "B", "A"]

result = engine.process(observations, extract_conclusions=True)
pprint(result)

['PAIR', 'PAIR', 'PAIR', 'FULL HOUSE', 'PAIR']


This only found one full house (which is what we would expect in a real game of poker, for example).

## Summary

Sets are a powerful complement to sequence-based rules, when you don't care as much about the sequential order of facts arriving, but want to look for a group of facts asserted in any order. 

You can control whether or not the sets allow the same underlying constituent facts to be included in multiple sets, and you can restrict that either by requiring them to by connected by position, or allowing them to be present in multiple sets in the same rule regardless of position.

Next we'll see how you can build higher-order structures out of the sequence and structure rules.