## Part 1. Welcome to the second lab session

This time we will focus on **replace rules** that are expressed using **regular expressions**. We will also see how rules can be incorporated in a morphological lexicon, such that we move on from using models based on Item & Arrangement (I&A) to models based on Item & Process (I&P).

The first code cell below demonstrates one single replace rule that is applied to two words: _mamma_ and _pappa_. Note the following:
* In the beginning of your code, you need to perform the import from morpholexicon, as before.
* You need a function that will contain all your rules. You can name the function as you like; here we simply call it _rules_.
* The rules function takes one parameter, here called _input_.
* The rules function contains one or multiple replace rules, expressed using the _replace_ function, which takes five parameters, which are the following:
    1. the string to be replaced
    2. the replacement string (= what the string is replaced by)
    3. the left context that needs to match in order for the replacement to take place
    4. the right context that needs to match in order for the replacement to take place
    5. the _input_ parameter
    

* The four first parameters are all **regular expressions**. We went through the syntax of Python regular expressions during the second language technology lecture.
* It is strongly recommended that you use so-called _raw strings_ to express regular expressions:
    * A raw string is written like a Python string, but there is a small letter 'r' in front of it; for instance:
        * `"This is a normal Python string written with double quotes."`
        * `'This is a normal Python string written with single quotes.'`
        * `r"This is a Python raw string written with double quotes."`
        * `r'This is a Python raw string written with single quotes.'`
    * You can use single or double quotes, as you wish. Here we use single quotes for raw strings in order for the code to look "less messy".
    * You don't have to use raw strings for regular expressions, but then you will need to use backslash \ as an escape character in many obvious and less obvious places, which increases the risks of _bugs_ in your code.
* Below you can also see how you can apply your rule(s) to some input words.
* Press Ctrl-Enter inside the cell below to see what it produces. Can you understand what goes on here?

In [None]:
import sys
sys.path.append("../../../morf-synt-2025/src")
from morpholexicon import *

def rules(input):
    # Rule: a -> o / [mp] _
    # That is: replace a with o, when preceded by m or p (and followed by anything)
    replace(r'a', r'o', r'[mp]', r'', input)
    
apply(rules, "mamma")
apply(rules, "pappa")

Next, we keep our first rule, but add a second one. What happens now?

In [None]:
def rules(input):
    # Rule 1: a -> o / [mp] _
    # That is: replace a with o, when preceded by a m or p (and followed by anything)
    replace(r'a', r'o', r'[mp]', r'', input)

    # Rule 2: p -> b / _
    # That is: replace p with b anywhere
    replace(r'p', r'b', r'', r'', input)
    
apply(rules, "mamma")
apply(rules, "pappa")

The more rules you have, the trickier it is to follow what goes on. If you want to trace the flow of replacements that take place on your input word, you can add an extra parameter to your apply call: debug=True.

In [None]:
apply(rules, "mamma", debug=True)
apply(rules, "pappa", debug=True)

Also note that the order of the rules matter. The next rule in the chain takes over where the previous rule has left the string. So, the output of the previous rule becomes the input of the next rule.

Below we have added a third rule. Explain what the third rule does.

In [None]:
def rules(input):
    # Rule 1: a -> o / [mp] _
    # That is: replace a with o, when preceded by a m or p (and followed by anything)
    replace(r'a', r'o', r'[mp]', r'', input)

    # Rule 2: p -> b / _
    # That is: replace p with b anywhere
    replace(r'p', r'b', r'', r'', input)
    
    # Rule 3: b -> m / [aeiouyåäö] _ [pbm]
    # What does this rule do actually?
    replace(r'b', r'm', r'[aeiouyåäö]', r'[pbm]', input)
    
apply(rules, "mamma", debug=True)
apply(rules, "pappa", debug=True)

Next, we have added two more rules. Figure out what goes on here!

In [None]:
def rules(input):
    # Rule 1: a -> o / [mp] _
    # That is: replace a with o, when preceded by a m or p (and followed by anything)
    replace(r'a', r'o', r'[mp]', r'', input)

    # Rule 2: p -> b / _
    # That is: replace p with b anywhere
    replace(r'p', r'b', r'', r'', input)
    
    # Rule 3: b -> m / [aeiouyåäö] _ [pbm]
    # What does this rule do actually?
    replace(r'b', r'm', r'[aeiouyåäö]', r'[pbm]', input)

    # Rule 4: ?
    replace(r'o', r'um', r'', r'$', input)
    
    # Rule 5: ?
    replace(r'', r'ka', r'^', r'', input)
    
apply(rules, "mamma", debug=True)
apply(rules, "pappa", debug=True)

When you are done here, continue to Part 2.