## Tips
- To avoid unpleasant surprises, I suggest you _run all cells in their order of appearance_ (__Cell__ $\rightarrow$ __Run All__).


- If the changes you've made to your solution don't seem to be showing up, try running __Kernel__ $\rightarrow$ __Restart & Run All__ from the menu.


- Before submitting your assignment, make sure everything runs as expected. First, restart the kernel (from the menu, select __Kernel__ $\rightarrow$ __Restart__) and then **run all cells** (from the menu, select __Cell__ $\rightarrow$ __Run All__).

## Reminder

- Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name, UA email, and collaborators below:



Several of the cells in this notebook are **read only** to ensure instructions aren't unintentionally altered.  

If you can't edit the cell, it is probably intentional.

In [1]:
NAME = "Kathleen Costa"
# University of Arizona email address
EMAIL = "kathleencosta@arizona.edu"
# Names of any collaborators.  Write N/A if none.
COLLABORATORS = "N/A"

## Scratchpad

You are welcome to create new cells (see the __Cell__ menu) to experiment and debug your solution.

In [2]:
%load_ext autoreload
%autoreload 2

# Mini Python tutorial

This course uses Python 3.11.

Below is a very basic (and incomplete) overview of the Python language... 

For those completely new to Python, [this section of the official documentation may be useful](https://docs.python.org/3.11/library/stdtypes.html#common-sequence-operations).

In [3]:
# This is a comment.  
# Any line starting with # will be interpreted as a comment

# this is a string assigned to a variable
greeting = "hello"

# If enclosed in triple quotes, strings can also be multiline:

"""
I'm a multiline
string.
"""

# let's use a for loop to print it letter by letter
for letter in greeting:
    print(letter)
    
# Did you notice the indentation there?  Whitespace matters in Python!

# here's a list of integers

numbers = [1, 2, 3, 4]

# let's add one to each number using a list comprehension
# and assign the result to a variable called res
# list comprehensions are used widely in Python (they're very Pythonic!)

res = [num + 1 for num in numbers]

# let's confirm that it worked
print(res)

# now let's try spicing things up using a conditional to filter out all values greater than or equal to 3...
print([num for num in res if not num >= 3])

# Python 3.7 introduced "f-strings" as a convenient way of formatting strings using templates
# For example ...
name = "Josuke"

print(f"{greeting}, {name}!")

# f-strings are f-ing convenient!


# let's look at defining functions in Python..

def greet(name):
    print(f"Howdy, {name}!")

# here's how we call it...

greet("partner")

# let's add a description of the function...

def greet(name):
    """
    Prints a greeting given some name.
    
    :param name: the name to be addressed in the greeting
    :type name: str
    
    """
    print(f"Howdy, {name}!")
    
# I encourage you to use docstrings!

# Python introduced support for optional type hints in v3.5.
# You can read more aobut this feature here: https://docs.python.org/3.7/library/typing.html
# let's give it a try...
def add_six(num: int) -> int:
    return num + 6

# this should print 13
print(add_six(7))

# Python also has "anonymous functions" (also known as "lambda" functions)
# take a look at the following code:

greet_alt = lambda name: print(f"Hi, {name}!")

greet_alt("Fred")

# lambda functions are often passed to other functions
# For example, they can be used to specify how a sequence should be sorted
# let's sort a list of pairs by their second element
pairs = [("bounce", 32), ("bighorn", 12), ("radical", 4), ("analysis", 7)]
# -1 is last thing in some sequence, -2 is the second to last thing in some seq, etc.
print(sorted(pairs, key=lambda pair: pair[-1]))

# we can sort it by the first element instead
# NOTE: python indexing is zero-based
print(sorted(pairs, key=lambda pair: pair[0]))

# You can learn more about other core data types and their methods here: 
# https://docs.python.org/3.7/library/stdtypes.html

# Because of its extensive standard library, Python is often described as coming with "batteries included".  
# Take a look at these "batteries": https://docs.python.org/3.7/library/

# You now know enough to complete this homework assignment (or at least where to look)

h
e
l
l
o
[2, 3, 4, 5]
[2]
hello, Josuke!
Howdy, partner!
13
Hi, Fred!
[('radical', 4), ('analysis', 7), ('bighorn', 12), ('bounce', 32)]
[('analysis', 7), ('bighorn', 12), ('bounce', 32), ('radical', 4)]


# Getting started

In this assignment, you'll be implementing a simplistic and incomplete rule-based system for POS tagging. 

In cases where training data is available, part-of-speech tagging is typically performed using statistical methods.  While this assignment is intended to help you review token attributes, there are situations when rule-based approaches are still used.  For example, ... 

- When little annotated data is available for training statistical models
- To relabel data using a different annotation scheme (ex. Penn $\rightarrow$ UPOS)
- To simplify the annotation task for humans by automatically labeling a portion of the data
- etc.

Can you think of other cases where a rule-based approach might be useful?

In [4]:
from typing import Tuple, Sequence, Text, Optional

## Defining our `Sentence` class 

We'll use a Python class to keep track of tokens and their attributes.

In [5]:

class Sentence:
    # Used to represent unknown symbols
    UNKNOWN: Text = "???"
    """
    Class representing a Sentence's tokens and their attributes.
    """
    def __init__(
        self, 
        tokens: Sequence[Text],
        norms: Optional[Sequence[Text]] = None,
        pos: Optional[Sequence[Text]] = None
    ):
        # tokens
        # NOTE: Tuple[Text, ...] means a tuple (i.e., an immutable sequence) 
        # of variable length where each element is a string (Text)
        self.tokens: Tuple[Text, ...]   = tuple(tokens)
        # normalized forms of each token
        self.norms: Tuple[Text, ...]    = tuple(norms) if norms else tokens[::]
        # part-of-speech tags
        self.pos: Tuple[Text, ...]      = tuple(pos) if pos else tuple([Sentence.UNKNOWN] * self.size)
        # ensure each token has an attribute of each type
        assert all(self.size == len(attr) for attr in [self.pos, self.norms])
        
    @property
    def size(self):
        """
        Calculates the number of tokens in our sentence.
        
        # Example: 
        s = Sentence(tokens=["I", "like", "turtles"])
        s.size == 3
        """
        return len(self.tokens)
    
    def __len__(self):
        """
        Calculates the number of tokens in our sentence.
        
        # Example: 
        s = Sentence(tokens=["I", "like", "turtles"])
        len(s) == 3
        """
        return self.size
    
    def __repr__(self):
        """
        The text displayed when printing an instance of our sentence.
        """
        # convenience function to join lists
        to_str = lambda elems: "\t".join(elems)
        return f"""
        tokens:           {to_str(self.tokens)}
        normalize tokens: {to_str(self.norms)}
        pos:              {to_str(self.pos)}
        """
    
    def copy(self, 
        tokens = None, 
        norms = None,
        pos = None):
        """
        Convenience method to copy a Sentence and replace one or more of its attributes.
        """
        return Sentence(
            tokens   = tokens or self.tokens[::],
            norms    = norms or self.norms[::],
            pos      = pos or self.pos[::]
        )

Let's practice using the `Sentence` class.  We'll create a sentence from the tokens `["I", "ate", "the", "muffin"]` and assign `VBD` (past tense verb) to the token `ate` to create a new sentence instance.

In [6]:
s = Sentence(tokens = ["I", "ate", "the", "muffin"])
print(s)

s2 = s.copy(pos = [Sentence.UNKNOWN, "VBD", Sentence.UNKNOWN, Sentence.UNKNOWN])
print(s2)


        tokens:           I	ate	the	muffin
        normalize tokens: I	ate	the	muffin
        pos:              ???	???	???	???
        

        tokens:           I	ate	the	muffin
        normalize tokens: I	ate	the	muffin
        pos:              ???	VBD	???	???
        


# Implement a rule-based part of speech tagger

We're going to implement a simplistic and incomplete rule-based part of speech tagger for English.  To further solidify what you learn, feel free to complete or extend it on your own!

## `rule_based_ly_adv_tagger(s)`

`rule_based_ly_adv_tagger(s)` is a function designed to take a `Sentence` as input and rewrite the POS tag for any token ending in _ly_ as `RB` to produce a new `Sentence` as output.

- Add you solution after the comment **YOUR CODE HERE** and remove the `raise NotImplementedError`.  
- Do **not** use the `re` module!

This rule will fail in certain cases.  Can you think of any way to improve it?  What are some adverbs will it fail to find altogether?

In [7]:
def rule_based_ly_adv_tagger(s: Sentence) -> Sentence:
    """
    Takes a Sentence and returns a new Sentence with updated POS tags.
    
    If a token ends with "ly", assign it a POS tag of RB.
    """
    # YOUR CODE HERE
    updated_s = list(s.pos)
    
    for i, token in enumerate(s.tokens):
        if token[-2:] == "ly":
            updated_s[i] = "RB"  
    
    return s.copy(pos=tuple(updated_s))


In [8]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s   = Sentence(tokens = ["I", "'m", "fairly", "certain", "this", "will", "be", "easy"])
res = rule_based_ly_adv_tagger(s)
print(res)
assert res.pos == (UNK, UNK, "RB", UNK, UNK, UNK, UNK, UNK)


        tokens:           I	'm	fairly	certain	this	will	be	easy
        normalize tokens: I	'm	fairly	certain	this	will	be	easy
        pos:              ???	???	RB	???	???	???	???	???
        


In [9]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

# NOTE: this kind of naive rule-based tagger will make mistakes:
s   = Sentence(tokens = ["Do", "n't", "be", "an", "ugly", "bully"])
res = rule_based_ly_adv_tagger(s)
# which tag(s) is/are wrong?
print(res)
assert res.pos == (UNK, UNK, UNK, UNK, "RB", "RB")


        tokens:           Do	n't	be	an	ugly	bully
        normalize tokens: Do	n't	be	an	ugly	bully
        pos:              ???	???	???	???	RB	RB
        


In [10]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s = Sentence(tokens=[
    "I",
    "'m", 
    "anxiously", 
    "awaiting", 
    "your", 
    "answer"
])
assert rule_based_ly_adv_tagger(s).pos == (Sentence.UNKNOWN, Sentence.UNKNOWN, "RB", Sentence.UNKNOWN, Sentence.UNKNOWN, Sentence.UNKNOWN)

## `rule_based_det_tagger(s)`

`rule_based_det_tagger(s)` is a function designed to take a `Sentence` as input and rewrite the POS tag for any token ending that is one of the following words:

```
a, all, an, any, each, every, no, some, that, the, these, this, those, which
```
... as `DT`.

- Add you solution after the comment **YOUR CODE HERE** and remove the `raise NotImplementedError`.  
- Do **not** use the `re` module!
- Ignore case


In [11]:
def rule_based_det_tagger(s: Sentence) -> Sentence:
    """
    Takes a Sentence and returns a new Sentence with updated POS tags.
    
    If a token is one of the following:
      a, all, an, any, each, every, no, some, that, the, these, this, those, which
    ... replace its tag with DT.
    """
    # YOUR CODE HERE
    updated_s = list(s.pos)
    det_list = ["a", "all", "an", "any", "each", "every", "no","some", "that", "the", "these", "this", "those","which"]
    
    for i, word in enumerate(s.tokens):
        if word.lower() in det_list:
            updated_s [i] = "DT"
    return s.copy(pos=tuple(updated_s))

In [12]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s   = Sentence(tokens = ["some", "cats", "LOVE", "that", "fish"])
res = rule_based_det_tagger(s)
assert res.pos == ("DT", UNK, UNK, "DT", UNK)
print(res)


        tokens:           some	cats	LOVE	that	fish
        normalize tokens: some	cats	LOVE	that	fish
        pos:              DT	???	???	DT	???
        


In [13]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

# NOTE: While pretty reliable, this rule will occasionally make mistakes:
s   = Sentence(tokens = ["No", "means", "no", ",", "Dr.", "No", "!"])
res = rule_based_det_tagger(s)
# which tag(s) is/are wrong?
print(res)
assert res.pos == ("DT", UNK, "DT", UNK, UNK, "DT", UNK)


        tokens:           No	means	no	,	Dr.	No	!
        normalize tokens: No	means	no	,	Dr.	No	!
        pos:              DT	???	DT	???	???	DT	???
        


In [14]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s = Sentence(tokens=[
    "ALL",
    "THE", 
    "YOUNG", 
    "DUDES", 
    "CARRY", 
    "THE",
    "NEWS"
])
assert rule_based_det_tagger(s).pos == ("DT", "DT", UNK, UNK, UNK, "DT", UNK)

## `rule_based_not_adv_adj_tagger(s)`

So far we've written rules that function independently of one another.  That's very limiting.  One way we might improve our naive approach is to order the execution of our rules from highest confidence to lowest and have some rules depend on the output of others.  This time we'll write a rule that depends on the output of another.

`rule_based_not_adv_adj_tagger(s)` is a function designed to take a `Sentence` as input and rewrite the POS tag to `JJ` for any token ending in a `y` that was not previously tagged as an adverb.

- Add you solution after the comment **YOUR CODE HERE** and remove the `raise NotImplementedError`.  
- Do **not** use the `re` module!

In [15]:
def rule_based_not_adv_adj_tagger(s: Sentence) -> Sentence:
    """
    Takes a Sentence and returns a new Sentence with updated POS tags.
    
    If a token ends in y and is not already tagged as an adverb (RB), tag it as JJ.
    """
    # YOUR CODE HERE
    updated_s = list(s.pos)
    
    for i, token in enumerate(s.tokens):
        if token[-2:] == "ly":
            updated_s[i] = "RB"  
        
        elif token[-1:] == "y" and updated_s[i] != "RB":
            updated_s[i] = "JJ"
    
    return s.copy(pos=tuple(updated_s))


In [16]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s   = Sentence(tokens=["That", "mask", "is", "fairly", "scary"])
res = rule_based_not_adv_adj_tagger(s)
print(res)
assert res.pos == (UNK, UNK, UNK, "RB", "JJ")


        tokens:           That	mask	is	fairly	scary
        normalize tokens: That	mask	is	fairly	scary
        pos:              ???	???	???	RB	JJ
        


In [17]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

# NOTE: of course, this tagger will still make mistakes:
s   = Sentence(tokens=["Harry", "is", "very", "hairy"])
res = rule_based_not_adv_adj_tagger(s)
# which tag(s) is/are wrong?
print(res)
assert res.pos == ("JJ", UNK, "JJ", "JJ")

# How can we improve this further?


        tokens:           Harry	is	very	hairy
        normalize tokens: Harry	is	very	hairy
        pos:              JJ	???	JJ	JJ
        


## `verb_copula(s)` and `det_noun_verb(s)`

Let's try to incorporate additional context in our rules.  We know that the surrounding tags can determine, constrain, or inform the tag assignment for the current word or token.

`det_noun_verb(s)` is a function designed to take a `Sentence` as input and rewrite the POS tag to `NOUN` for any token immediately preceded by a token tagged as `DT` and immediately followed by a token that is tagged as some kind of verb.

We'll also implement another function (`verb_copula`) to tag instances of the English "be" verb.

- Add you solution after the comment **YOUR CODE HERE** and remove the `raise NotImplementedError`.  
- Do **not** use the `re` module!

In [18]:
def verb_copula(s: Sentence) -> Sentence:
    """
    Takes a Sentence and returns a new Sentence with updated POS tags.
    
    Assigns the following tags:
    
    am -> VBP
    is -> VBZ
    are -> VBP
    was -> VBD
    were -> VBD
    """
    # YOUR CODE HERE
    verbs = {
        "am": "VBP",
        "is": "VBZ",
        "are": "VBP",
        "was": "VBD",
        "were": "VBD"
    }

    updated_pos = list(s.pos)

    for i, token in enumerate(s.tokens):
        if token.lower() in verbs:
            updated_pos[i] = verbs[token.lower()]
        else:
            updated_pos[i] = Sentence.UNKNOWN  

    return s.copy(pos=tuple(updated_pos))


def det_noun_verb(s: Sentence) -> Sentence:
    """
    Takes a Sentence and returns a new Sentence with updated POS tags.
    
    If a token _t_ is ...
    1) immediately preceded by a token $_t_{t-1}$ already tagged as a determiner 
    and ...
    2) immediately followed by a token already tagged as a verb
    tag the token as a NOUN (we'll ignore plural vs singular distinctions).
    """
    determiner_tag = "DT"
    verbs_tags = {"VBP", "VBZ", "VBD", "VBN", "VB"}

    updated_pos = list(s.pos)

    for i in range(1, len(s.tokens) - 1):
        if updated_pos[i - 1] == determiner_tag and updated_pos[i + 1] in verbs_tags:
            updated_pos[i] = "NOUN"

    return s.copy(pos=tuple(updated_pos))

In [19]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s   = Sentence(tokens=["I", "am", "the", "walrus"])
res = verb_copula(s)
print(res)
assert res.pos == (UNK, "VBP", UNK, UNK)


        tokens:           I	am	the	walrus
        normalize tokens: I	am	the	walrus
        pos:              ???	VBP	???	???
        


In [20]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s   = Sentence(tokens=["We", "are", "the", "champions"])
res = verb_copula(s)
print(res)
assert res.pos == (UNK, "VBP", UNK, UNK)


        tokens:           We	are	the	champions
        normalize tokens: We	are	the	champions
        pos:              ???	VBP	???	???
        


In [21]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s   = Sentence(tokens=["They", "were", "late"])
res = verb_copula(s)
print(res)
assert res.pos == (UNK, "VBD", UNK)


        tokens:           They	were	late
        normalize tokens: They	were	late
        pos:              ???	VBD	???
        


In [22]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s   = Sentence(tokens=["Who", "was", "singing", "?"])
res = verb_copula(s)
print(res)
assert res.pos == (UNK, "VBD", UNK, UNK)


        tokens:           Who	was	singing	?
        normalize tokens: Who	was	singing	?
        pos:              ???	VBD	???	???
        


In [23]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s   = Sentence(tokens=["The", "goat"], pos=["DT", UNK])
res = det_noun_verb(s)
print(res)
assert res.pos == ("DT", UNK)


        tokens:           The	goat
        normalize tokens: The	goat
        pos:              DT	???
        


In [24]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s   = Sentence(tokens=["The", "goat", "dreams"], pos=["DT", UNK, "VBZ"])
res = det_noun_verb(s)
print(res)
assert res.pos == ("DT", "NOUN", "VBZ")


        tokens:           The	goat	dreams
        normalize tokens: The	goat	dreams
        pos:              DT	NOUN	VBZ
        


In [25]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s   = Sentence(tokens=["The", "goat", "dreams"])
res = det_noun_verb(s)
print(res)
assert res.pos == (UNK, UNK, UNK)


        tokens:           The	goat	dreams
        normalize tokens: The	goat	dreams
        pos:              ???	???	???
        


In [26]:
# verify the re module is not in scope
assert "re" not in dir()

UNK = Sentence.UNKNOWN

s   = Sentence(tokens=["The"], pos=["DT"])
res = det_noun_verb(s)
print(res)
assert res.pos == ("DT",)


        tokens:           The
        normalize tokens: The
        pos:              DT
        


## Putting it all together

Now that we have some rules, let's order them by priority to tag a few examples...

In [27]:
def tag_with_rules(s: Sentence) -> Sentence:
    """
    Takes a Sentence and returns a new Sentence with updates POS tags
    by applying a series of rules for part of speech tagging
    """
    # YOUR CODE HERE
    #you may notice comment lines between codes, those are not for any particular reason aside from my own organization
    verbs = {
        "am": "VBP",
        "is": "VBZ",
        "are": "VBP",
        "was": "VBD",
        "were": "VBD"
    }
    det_list = ["a", "all", "an", "any", "each", "every", "no","some", "that", "the", "these", "this", "those","which"]
    
    determiner_tag = "DT"
    verbs_tags = {"VBP", "VBZ", "VBD", "VBN", "VB"}

    updated_s = list(s.pos)
    
   
    for i, token in enumerate(s.tokens):
        token_lower = token.lower()
        # # #
        if token_lower in det_list:
            updated_s[i] = "DT"
        # # #
        elif token_lower in verbs:
            updated_s[i] = verbs[token_lower]
        # # #
        elif token_lower.endswith("ly"):
            updated_s[i] = "RB"
        # # #
        elif token_lower.endswith("y"):
            updated_s[i] = "JJ"
        # # #  
        for i in range(1, len(s.tokens) - 1):
            if updated_s[i - 1] == determiner_tag and updated_s[i + 1] in verbs_tags:
                updated_s[i] = "NOUN"
        
    return s.copy(pos=tuple(updated_s))

In [28]:
UNK = Sentence.UNKNOWN

s   = Sentence(tokens=["Those", "hungry", "goats", "bleat", "quietly"])
res = tag_with_rules(s)
print(res)
assert res.pos == ("DT", "JJ", UNK, UNK, "RB")


        tokens:           Those	hungry	goats	bleat	quietly
        normalize tokens: Those	hungry	goats	bleat	quietly
        pos:              DT	JJ	???	???	RB
        


In [29]:
UNK = Sentence.UNKNOWN

s   = Sentence(tokens=["Those", "hungry", "goats", "bleat", "quietly"], pos=[UNK, UNK, UNK, "V??", "RB"])
res = tag_with_rules(s)
print(res)
assert res.pos == ("DT", "JJ", UNK, "V??", "RB")


        tokens:           Those	hungry	goats	bleat	quietly
        normalize tokens: Those	hungry	goats	bleat	quietly
        pos:              DT	JJ	???	V??	RB
        


In [30]:
UNK = Sentence.UNKNOWN

s   = Sentence(tokens=["All", "bats", "are", "righteously", "funky"])
res = tag_with_rules(s)
print(res)
assert res.pos == ("DT", "NOUN", "VBP", "RB", "JJ")


        tokens:           All	bats	are	righteously	funky
        normalize tokens: All	bats	are	righteously	funky
        pos:              DT	NOUN	VBP	RB	JJ
        


Part of speech tagging is often treated as a preliminary step to many other NLP tasks (ex. shallow parsing, syntactic parsing, information extraction, etc.). In LING 539, you'll learn about statistical approaches to sequence tagging and train your own sequence tagger.  

In situations where you have little to no existing annotated data to train statistical models, rule-based taggers can expedite the data annotation process by performing a "first pass" at partially annotating a dataset automatically.  As partial annotations are collected from human annotators, the rule-based system can be re-run to help fill in the blanks.  These systems can quickly grow in complexity.  Writing tests as you develop rules can help to ensure your system behaves as intended.

What are some strengths and weaknesses of this method for assigning part of speech tags?  How can it be improved further?

## BONUS (max 2 points; code + description)


### Option 1: Improve your tagger


You've now written a toy rule-based POS tagger.  Try improving it!  

- You may copy your previous rules below and/or write entirely new ones. 
- If you're feeling confident, try further organizing your code using a class.
  - ex. `class EnglishRuleBasedPosTagger` with a `def tag(self, s: Sentence) -> Sentence:` method.

Write a few tests to demonstrate its capabilities.  
- What are some of its shortcomings?  
- How does it improve upon the earlier version?  
- What tagset did you adopt?

```python
def tag_with_rules_v2(s: Sentence) -> Sentence:
    """
    Takes a Sentence and returns a new Sentence with updates POS tags
    by applying a series of rules for part of speech tagging
    """
    # YOUR CODE HERE
```

### Option 2: Write a simple rule-based tagger for a non-English language

In this unit, we looked at some ways part of speech tagging can differ across languages and finished up by building a simple a toy rule-based tagger for English.  Try applying what you've learned to build a rule-based tagger for a non-English language of your choice.

- Write **at least 3** rules

Write a few tests to demonstrate its capabilities.  
- What are some of its shortcomings? 
- What tagset did you adopt?

In [31]:
# YOUR CODE HERE

_Please describe your implementation for the bonus problem in the cell below and address the questions presented in the bonus description. If you're writing a tagger for a non-English language, provide some background on the language._

YOUR ANSWER HERE