<div style="text-align: right">
    <i>
        LIN 537: Computational Lingusitics 1 <br>
        Fall 2019 <br>
        Cody St. Clair
    </i>
</div>

# Brill Tagger

The Brill tagger is a rule-based tagging algorithm, unlike the Viterbi algorithm, which uses a stochastic Hidden Markov Model (HMM). The Brill algorithm breaks part of speech (POS) tagging into two steps:

1. Assign every word in the text whatever part of speech is most common for that word.
2. Apply a set of rules to the text to change parts of speech based on context. (Then repeat this step until either no more changes are made or a certain maximum number of steps has been reached.)

## Assigning the Most Common POS

In Brill's original description, he proposed having a learning phase in which a pre-tagged (i.e. tagged by a human) text would be analyzed to determine which tag is most common for each word in the corpus.

Tagged text is typically written with foreward slashes separating each word from its assigned POS. E.g. `The/DET quick/ADJ brown/ADJ fox/N jumps/V over/PRE the/DET lazy/ADJ dog/N .` (When text is tokenized, it is common to separate punctuation `[,.!?]` etc. from other words with a space; this just makes pulling words out of the text easier later.)

Parsing this and counting the number of times each word is assigned each POS is fairly trivial with what we've covered so far, so to spare ourselves some trouble, we will act as though this has already been done, and our functions will take a pre-made dictionary mapping words to their most common POS. E.g. `{"the": "DET", "apple": "N", "run": "V"}` and so on.

You might also be wondering what you might do if you don't have a large pre-tagged corpus for a language (e.g. if you're working with an uncommon or under-studied language). One of the beautiful things about the Brill algorithm is that this notion of most common part of speech is fairly intuitive for a native speaker (or even a linguist with sufficient documentation). So, it's possible for the dictionary of most common parts of speech to be written by hand if need be.

As an example, if you see the word, "run," in isolation, what part of speech would you assume it to be? You would probably call it a verb, unless you had more context, even though it can be used as a noun too. If you were short on corpus material, you could then just append this fact to your POS lookup table by hand. E.g. `dictionary["run"] = "V"`.

In [None]:
example_dictionary = {
    "the": "DET",
    "of": "PREP",
    "a": "DET",
    "in": "PREP",
    "on": "PREP",
    "about": "PREP",
    "might": "MOD",
    "would": "MOD",
    "could": "MOD",
    "give": "VERB",
    "gives": "VERB",
    "run": "VERB",
    "runs": "VERB",
    "person": "NOUN"
}

We can now create the first half of the Brill Tagger, with one extra note: no corpus/dictionary will ever contain every word that could occur in a language; even if you meticulously included every word you could, new words are coined all the time, so it would become outdated very quickly. As a result, we should have a default tag, and every time our tagger encounters a word that's not in its dictionary, we will assign it the default tag.

In [None]:
class BrillTagger:
  def __init__(self, default_tag, dictionary):
    """
    Creates a new Brill Tagger.

    Arguments:
      default_tag (str): The tag that should be assigned if a word
    is not in the dictionary.
      dictionary (dict): A dictionary mapping words to their most common POS.
    """
    self.default_tag = default_tag
    self.dictionary = dictionary
  def tag(self, words):
    """
    Tags a list of words.

    Arguments:
      words (list): A list of words to tag.

    Return (list):
      A list of tagged words as dictionaries.
    """
    tagged = []
    for word in words:
      if word.lower() in self.dictionary:
        tagged.append({"word": word, "tag": self.dictionary[word]})
      else:
        tagged.append({"word": word, "tag": self.default_tag})
    return tagged

Let's try this simple half-parser with the example dictionary above and see how it performs on some text.

In [None]:
tagger = BrillTagger("NOUN", example_dictionary)

sample_one = "John said he would give Mary a book in the garden"
sample_two = """Beware the Jabberwock my son the jaws that bite
  the claws that catch beware the Jubjub bird and shun
  the furimous Bandersnatch"""

print(tagger.tag(sample_one.split()))
print(tagger.tag(sample_two.split()))

In these cases, our tagger does fairly well on the first sample, only failing for "said," which might should be added to the dictionary to correct. For the second sample, it does much worse, part of which is because it's missing some very common words in its dictionary, like "that," but several words in the second sample don't occur anywhere in English outside of Lewis Carroll's "Jabberwocky," so it's unlikely any dictionary would contain them, yet a native speaker can easily figure out their parts of speech (even if they don't understand the words themselves).

Before moving on, it's worth noting that Brill's original tagger also had two rules built into this step:

1. If a word is unknown and starts with a capital letter, it's probably a proper noun (`NOUN`).
2. The tagger has another dictionary for the most common tags for the last 3 letters of a word. Whenever an unknown word is encountered, it checks whether the last 3 letters of that word are in this dictionary and assigns that tag to the word if possible. (This accounts for suffixes like -ing, -ous, etc.)

For our parser, we will group these rules with the contextual rules in the next phase. This has the benefit of keeping this part simple, and it will demonstrate a use-case for class inheritance in the next section.

## Rule Application

Brill's tagger learned a set of rules to improve its performance by running the first step over another pre-tagged text and comparing its own output with the given tags. From this, it made a list to count how many times it tagged something that was actually *x* as *y*. Then it would automatically generate potential rules following a template and choose whichever rule that improved performance most to add to its list of rules. Rules would be things along the lines of: if the previous word is tagged `DET` and the following word is tagged `N`, then change the current word to `ADJ`.

Again, this process has the benefit of being easy for a human to interpret and follow. As a result, we could potentially write rules by hand if we didn't have a pre-tagged corpus handy and we were familiar with the target language's grammar.

For our tagger, we want to have to main types (classes) of rules:

1. Rules that look at the context of a word (what tags come before/after it).
2. Rules that look at the spelling of a word (to capture generalizations in the morphology and the capitalization rule mentioned above)

Before we implement either of these classes, let's consider an even more basic kind of tag transformation rule, one which simply changes all tags of POS *a* to *b*. We can capture this with the following class:

In [None]:
class TagRule:
  def __init__(self, target_tag, output_tag):
    """
    Creates a new Tag Rule.

    Arguments:
      target_tag (str): The POS tag that the rule should apply to, or None
      if the rule should apply to any tag.
      output_tag (str): The POS tag that the word should be changed to if
      the rule applies.
    """
    self.target_tag = target_tag
    self.output_tag = output_tag
  def can_apply(self, text, index):
    """
    Checks if the rule can apply to text[index].
    
    Arguments:
      text (list): A list of tagged words as dictionaries.
      index (int): The index of the word currently being considered in the list.
    
    Return (bool):
      True if the rule can apply, False otherwise.
    """
    return text[index]["tag"] == self.target_tag if self.target_tag != None else True

### The Ternary Operator

The last line of that cell might be a bit confusing. It's called the ternary operator, but essentially, it's just a way to have a quick if/else statement on a single line. It has the format:
```
[value] if [condition] else [other_value]
```

If `[condition]` is True, the expression reduces to `[value]`, otherwise, it reduces to `[other_value]`. So, in the last line above, our condition is `self.target_tag != None`. If this is True (i.e. we care about the tag of the current word), the expression reduces to `text[index]["tag"] == self.target_tag`; if it's False, it just reduces to `True`.

We put this in the return statement, so that if `target_tag` is None, we return True, but otherwise, we return the truth value of `text[index]["tag"] == self.target_tag`.

You're probably wondering why we pass a list and an index to the `can_apply` function instead of just passing an individual word/tag dictionary. This will become clearer when we implement contextual rules, and keeping the function consistent across all the rules will be very helpful in the tagger later.

This class of rules is not very useful in itself, but we will extend it to form the other rules of our Tagger. Next, we will implement the class of spelling rules (rules that apply to capitalized words, or words with certain endings, etc.). We will use regular expressions to keep things simple for ourselves (though they are a bit overpowered for this).

In [None]:
import re

class MorphoRule(TagRule):
  def __init__(self, target_tag, output_tag, exp):
    """
    Creates a new Morphological Rule.

    Arguments:
      target_tag (str): The POS tag that the rule should apply to, or None
      if the rule should apply to any tag.
      output_tag (str): The POS tag that the word should be changed to if
      the rule applies.
      exp (str): A regular expression to test against a word.
    """
    super().__init__(target_tag, output_tag)
    self.exp = exp
  def can_apply(self, text, index):
    """
    Checks whether the rule can apply to text[index].

    Arguments:
      text (list): A list of tagged words as dictionaries.
      index (int): The index of the word currently being considered in the list.
    
    Return (bool):
      True if text[index] has the target_tag and if exp is in the word,
      False otherwise.
    """
    if super().can_apply(text, index):
      if re.search(self.exp, text[index]["word"]):
        return True
    return False

Finally, we turn to contextual rules. To keep things expandable, we won't limit the number of words we can look at before/after the current word. Rather, we will have two lists (or tuples) of POS tags before/after the current word. If any position in these lists is `None`, we will treat it as matching any tag, and if the list of previous/next tags would go outside the bounds of our text, we will treat it as an automatic failure.

In [None]:
class ContextRule(TagRule):
  def __init__(self, target_tag, output_tag, prev_tags, next_tags):
    """
    Creates a new Contextual Rule.

    Arguments:
      target_tag (str): The POS tag that the rule should apply to, or None
      if the rule should apply to any tag.
      output_tag (str): The POS tag that the word should be changed to if
      the rule applies.
      prev_tags (list): A list of tags to look for before the current word, with
      the first element being the word immediately before the current one and so
      on. Any element which is None matches any tag.
      next_tags (list): A list of tags to look for after the current word,
      starting with the word immediately after. Any element which is None
      matches any tag.
    """
    super().__init__(target_tag, output_tag)
    self.prev_tags = prev_tags
    self.next_tags = next_tags
  def can_apply(self, text, index):
    """
    Checks whether the rule can apply to text[index].

    Arguments:
      text (list): A list of tagged words as dictionaries.
      index (int): The index of the word currently being considered in the list.
    
    Return (bool):
      True if text[index] has the target_tag and the previous/following words
      have the tags in prev_tags and next_tags, respectively. False otherwise,
      including if the length of prev_tags/next_tags would go beyond
      the length of text.
    """
    if not super().can_apply(text, index):
      return False
    if (len(self.prev_tags) - index) < 0:
      return False
    if (len(self.next_tags) + index) >= len(text):
      return False
    for i in range(len(self.prev_tags)):
      if self.prev_tags[i] != None and self.prev_tags[i] != text[index-i]["tag"]:
        return False
    for i in range(len(self.next_tags)):
      if self.next_tags[i] != None and self.next_tags[i] != text[index+i]["tag"]:
        return False
    return True

Now we can create a few simple rules to improve our parser.

In [None]:
rules = []

Once we have some rules, the tagger applies the dictionary-based tags above, and then iterates through its list of rules applying any that fit. (This process can then be repeated until either there are no more changes, or we hit some fixed maximum number of cycles, but for simplicity, our's will only apply the rules once.)

In [None]:
class BrillTagger:
  def __init__(self, default_tag, dictionary, rules):
    """
    Creates a new Brill Tagger.

    Arguments:
      default_tag (str): The tag that should be assigned if a word
    is not in the dictionary.
      dictionary (dict): A dictionary mapping words to their most common POS.
      rules (list): A list of rules to apply to the dictionary-tagged text.
    """
    self.default_tag = default_tag
    self.dictionary = dictionary
    self.rules = rules
  def tag(self, words):
    """
    Tags a list of words.

    Arguments:
      words (list): A list of words to tag.

    Return (list):
      A list of tagged words as dictionaries.
    """
    tagged = []
    for word in words:
      if word in self.dictionary:
        tagged.append({"word": word, "tag": self.dictionary[word]})
      else:
        tagged.append({"word": word, "tag": self.default_tag})
    for i in range(len(tagged)):
      for rule in self.rules:
        if rule.can_apply(tagged, i):
          tagged[i]["tag"] = rule.output_tag
    return tagged

Now, let's test the improved tagger on one of the samples from before and see how things improve.

In [None]:
tagger = BrillTagger("NOUN", example_dictionary, rules)

print(tagger.tag(sample_two.split()))