## Part 4. Reduplication ##

In this part, we introduce a new feature called **matching groups**. In the example below, we perform *full stem reduplication for Malay nouns*. In Malay, plural forms can be formed by simply duplicating the stem. We implement this with the help of matching groups. Execute the code in the cell to see what happens:

In [None]:
import sys
sys.path.append("../../../morf-synt-2025/src")
from morpholexicon import *

def rules(input):
    """ Stem reduplication in Malay nouns """
    replace(r'(?P<stem>.+)', r'\g<stem>\g<stem>', r'^', r'$', input)
    
apply(rules, "buku", debug=True)       # buku means book, bukubuku means books
apply(rules, "lembu", debug=True)      # lembu means cow, lembulembu means cows
apply(rules, "pelabuhan", debug=True)  # pelabuhan means harbor, pelabuhanpelabuhan means harbors

How does this work?

First look at the left and right contexts: To the left we have ^ (beginning of string) and to the right we have $ (end of string). That is, the string to be replaced must match the _entire_ string from beginning to end.

What do we replace? We can see .+ (a dot followed by a plus sign). This regular expression means any string consisting of one or more characters. So, indeed we replace any string that spans from our left to right context (from beginning of string to end of string).

Around .+ we have brackets ( ) and the label ?P&lt;stem&gt;. This label means that we assign our own name _stem_ to whatever string we match inside the brackets. That is, we match .+ (any string) and assign that value to a variable  called _stem_. Usually this is called a matching group, here called _stem_. Note that the brackets and the label are not part of the string that is actually matched.

In the replacement string, we see \g&lt;stem&gt;\g&lt;stem&gt;. When we type \g&lt;stem&gt;, we say: "into this place insert the value that we have captured using (?P&lt;stem&gt;)". Since \g&lt;stem&gt; is written twice, we print the same value twice, so we end up producing reduplication. Just what we wanted!

Let us take another example. *Limited reduplication in Tagalog verbs* is shown below. Now the reduplication only applies to the first syllable in the stem. The syllable consists of a consonant followed by a vowel. Here we call our matching group _CV_ (C meaning consonant, V meaning vowel). Python does not understand that C means consonant and V means vowel, but as you can see, our matching group contains a regular expression that precisely matches a consonant followed by a vowel: [bcdfghjklmnpqrstvwxyz][aeiou] 

In [None]:
def rules(input):
    """ Limited reduplication in Tagalog verbs """
    replace(r'(?P<CV>[bcdfghjklmnpqrstvwxyz][aeiou])', r'\g<CV>\g<CV>', r'^', r'', input)
    
apply(rules, "pili", debug=True)
apply(rules, "tahi", debug=True)
apply(rules, "kuha", debug=True)

As our next step, we incorporate reduplication rules within a lexicon. Below you see a lexicon for Malay nouns. To begin with, there are no replace rules:

In [None]:
def root(state):
    entry_a("", malay_stems, state)
    
def malay_stems(state):
    entry_a("buku", malay_nouns, state)
    entry_a("lembu", malay_nouns, state)
    entry_a("pelabuhan", malay_nouns, state)

def malay_nouns(state):
    entry_t("+Noun", "^", malay_number, state)
    
def malay_number(state):
    entry_t("+Unmarked", "", None, state)
    entry_t("+Plural", "D", None, state)
    
load_lexicon(root, None)

generate_all()

We need two rules now: One rule is supposed to replace the special character D with whatever stem precedes it. The other rule needs to delete our end of stem marker ^:

In [None]:
def rules(input):
    """ Stem reduplication in Malay nouns """
    
    # Rule 1
    replace(r'D', r'\g<stem>', r'^(?P<stem>.+)\^', r'', input)
    
    # Rule 2
    replace(r'\^', r'', r'', r'', input)
    
load_lexicon(root, rules)

generate_all()

As usual, you can trace the replacement process with the _apply_ command:

In [None]:
apply(rules, "lembu^D", debug=True)

When you are done with this page, continue to Part 5.