# Chamorro Lexicon Expander

**Chamorro Lexicon Expander** is a Python project designed to expand the Chamorro-English dictionary by generating all possible affixed variations of Chamorro root words. This tool automates the process of creating word forms using common Chamorro prefixes, suffixes, and infixes according to linguistic rules. This project enables a more comprehensive representation of Chamorro vocabulary for language learners, linguists, and dictionary development.

**Name:** Schyuler Lujan <br>
**Date Started:** 10-Nov-2024 <br>
**Date Complete:** In Progress <br>

In [1]:
# Import libraries
import re
import pandas as pd
import csv

# IMPORT AND CLEAN DATA

**About this data:** For this project, we will be using the words and part-of-speech tags from the Revised Chamorro-English dictionary, which is available for free at https://natibunmarianas.org/chamorro-dictionary/. We will be using this data because it is currently the only freely available resource online with the most complete and reliably accurate part-of-speech tags on Chamorro words. Part-of-speech tags will determine which words can be transformed with the different affixes.

In [21]:
# Import files and convert to dataframes
# tverbs_df = pd.read_csv("transitive-verbs.csv", encoding="utf-8")
# iverbs_df = pd.read_csv("intransitive-verbs.csv", encoding="utf-8")

In [22]:
# Preview dataframe
#tverbs_df.head()

# VOWEL HARMONY

**About Vowel Harmony:** In Chamorro, vowel harmony refers to how certain vowels must change into different vowels, depending upon the vowel that comes before it. In general, the vowels "å", "o", and "e" must shift if they are preceded by an "i".

## Vowel Harmony Rules

In [2]:
# Create a list of vowels that need to be transformed by vowel harmony
vowels = ['a', 'á', 'å', 'e', 'é', 'i', 'í', 'o', 'ó', 'u']

In [124]:
# Create a dictionary of vowel harmony transformation rules
vowel_harmony_rules = {"å": "a", "o": "e", "u": "i"}

# Create a dictionary of vowel lowering / vowel fronting rules to be used with some prefixes
### DICTIONARY FOR VOWEL LOWERING RULES GOES HERE ###

## Create Vowel Harmonized Word

In [4]:
def vowel_harmonized_word(word, vowels, vowel_harmony_rules):
    """
    Transforms a word to apply vowel harmony rules and returns the vowel harmonized word
    """
    for letter in word:
        if letter in vowels:
            
            # If the first vowel is in the vowel harmony rule list, transform the word
            if letter in vowel_harmony_rules: 
                new_word = word.replace(letter, vowel_harmony_rules[letter], 1)
            else:
                new_word = word
            break # Stop iteration after first vowel
    
    return new_word

In [5]:
### DELETE ME TEST CODE ###
word = "chalåni"
vowel_harmonized_word(word, vowels, vowel_harmony_rules)

'chalåni'

## Apply Vowel Harmony Transformation

In [125]:
def apply_vowel_harmony(data, vowels, vowel_harmony_rules):
    """
    Takes a list of words and transforms them with vowel harmony.
    Produces a list of tuples, with each tupple containing the new word, base word, and the transformation name.
    """
    # Initialize list to store new words
    new_words = []
    
    # Iterate through data and apply vowel harmony rules
    for word in data:
        new_word = vowel_harmonized_word(word, vowels, vowel_harmony_rules)
        new_words.append((new_word, word, "Vowel Harmony"))
        
    return new_words

In [126]:
### DELETE ME TEST CODE ###
word_list = ["chålan", "guma'", "to'lang"]
print(apply_vowel_harmony(word_list, vowels, vowel_harmony_rules))

[('chalan', 'chålan', 'Vowel Harmony'), ("gima'", "guma'", 'Vowel Harmony'), ("te'lang", "to'lang", 'Vowel Harmony')]


# INFIXES

**About Chamorro Infixes:** Infixes are affixes that occur within the word, rather than being attached in front of the word or at the end. In Chamorro, infixes are always applied before the first vowel of the word they are attached to. If the word it's being attached to starts with a vowel, the infix is still placed in front of that vowel. There are two infixes in Chamorro: -in- and -um-.

## Create Infixed Word

In [128]:
def infixed_word(word, vowels, infix):
    """
    Transforms a word with an infix and returns the infixed word.
    Vowel harmony transformations are in the function `vowel_harmonized_word`
    """
    
    for letter in word:
        if letter in vowels:
            i = word.index(letter) # Get vowel's index
            new_word = word[0:i]+infix+word[i:] # Add the infix before vowel
            break # Stop iteration after first vowel
    
    return new_word

In [129]:
### DELETE ME TEST CODE ###
word = "påtgon"
infix = "in"
infixed_word(word, vowels, infix)

'pinåtgon'

## Apply UM Infix Transformation

In [130]:
def apply_um_infix(data, vowels):
    """
    Iterates through a list of words and applies the UM Infix
    """
    # Define our infix
    infix = "um"
    
    # Intialize list to store the infixed word, root word, and the infix name
    new_words = []
    
    # Transform word with UM Infix and append the new word to new_words
    for word in data:
        new_word = infixed_word(word, vowels, infix)
        new_words.append((new_word, word, f"{infix} infix"))
        
    return new_words

In [131]:
### DELETE ME TEST CODE ###
data = ["hånao", "konne'", "taitai", "tufong"]
print(apply_um_infix(data, vowels))

[('humånao', 'hånao', 'um infix'), ("kumonne'", "konne'", 'um infix'), ('tumaitai', 'taitai', 'um infix'), ('tumufong', 'tufong', 'um infix')]


## Apply IN Infix Transformation

In [132]:
def apply_in_infix(data, vowels, vowel_harmony_rules):
    """
    Iterates through a list of words and applies the IN Infix.
    Vowel harmony rules must also be applied with the IN infix transformation.
    """
    # Define the infix
    infix = "in"
    
    # Intialize list to store the infixed word, root word, and the infix name
    new_words = []
    
    # Transform word with IN infix and append the new word to new_words
    for word in data:
        # Apply vowel harmony rules first
        harmonized_word = vowel_harmonized_word(word, vowels, vowel_harmony_rules)
        # Transform the harmonized_word with the IN infix
        new_word = infixed_word(harmonized_word, vowels, infix)
        new_words.append((new_word, word, f"{infix} infix"))
        
    return new_words

In [133]:
### DELETE ME TEST CODE ###
data = ["hånao", "konne'", "taitai", "tufong"]
print(apply_in_infix(data, vowels, vowel_harmony_rules))

[('hinanao', 'hånao', 'in infix'), ("kinenne'", "konne'", 'in infix'), ('tinaitai', 'taitai', 'in infix'), ('tinifong', 'tufong', 'in infix')]


# PREFIXES

**About Chamorro Prefixes:** Prefixes in Chamorro are attached to the front of a base word, and one or more prefixes can be attached. Some prefixes require morphophonemic changes to the base word when attached.

## Prefix Lists

We will create different lists of prefixes, excluding the man- prefixes because those require more changes to the base word to be applied. Lists are organized by the part-of-speech of base word they can be attached to, and include single prefixes and known combinations of prefixes.

In [144]:
# Create a list of prefixes for transitive verbs
transitive_verb_prefixes = [
    "án", "fa'", "á", "umá", "maná", "acha", "ma'acha", "fanma'acha", 
    "ké", "kéke",  "na'", "muna'", "nina'", "mannina'", "fannina'", "manna'", "fanna'",
    "ma", "manma", "fanma" 
]

# Create a list of prefixes for nouns
noun_prefixes = ["gai", "tai", "an", "mi", "é", "fa'", "mama'", "fama'", "ga'", "pinat"]

# Create a list of prefixes for adjectives
adjective_prefixes = ["chat", "fa'", "há", "lá", "pinat", "na'", "ná", "acha", "na'lá"]

# Create a list of the directional prefixes
directional_prefixes = ["gé", "hat", "san", "tak", "talak", "ya"]

In [23]:
### DELETE ME: PREFIX NOTES ###
# combos é+um, é+man, ga+um+verb, ga+indef man-, ha+ma+trans verb
# vowel lowering or harmony required for fa', ge', hat, san, tak, talak
# vowel harmony required for gai, tai, mi

## Create Prefixed Word

In [145]:
def prefixed_word(word, prefix, vowels):
    """
    Transform a word by attaching a prefix to the front, and returning the new, transformed word.
    """
    # Apply prefix
    if word[0] in vowels and prefix[-1] in vowels:
        new_word = prefix + """'""" + word
    else:
        new_word = prefix + word
    
    return new_word

In [146]:
### DELETE ME TEST CODE ###
word = "li'e'"
prefix = "man"
print(prefixed_word(word, prefix, vowels))

manli'e'


## Apply Prefix Transformations - Without Morphophonemic Changes

In [147]:
def apply_prefixes_no_morphophonemic_change(data, prefixes, vowels):
    """
    Iterates through a list of words and prefixes and attaches each prefix to the front of each word.
    Returns a list of tuples, where each tuple is the new word, the root word, and the prefix.
    """
    # Initialize list to store new words
    new_words = []
    
    # Apply prefixes
    for prefix in prefixes:
        for word in data:
            new_word = prefixed_word(word, prefix, vowels)
            new_words.append((new_word, word, f"{prefix} prefix"))
            
    return new_words

In [148]:
### DELETE ME TEST CODE ###
word_list = ["kånno'", "oppe'"]
print(apply_prefixes_no_morphophonemic_change(word_list, transitive_verb_prefixes, vowels))

[("ánkånno'", "kånno'", 'án prefix'), ("ánoppe'", "oppe'", 'án prefix'), ("fa'kånno'", "kånno'", "fa' prefix"), ("fa'oppe'", "oppe'", "fa' prefix"), ("ákånno'", "kånno'", 'á prefix'), ("á'oppe'", "oppe'", 'á prefix'), ("umákånno'", "kånno'", 'umá prefix'), ("umá'oppe'", "oppe'", 'umá prefix'), ("manákånno'", "kånno'", 'maná prefix'), ("maná'oppe'", "oppe'", 'maná prefix'), ("achakånno'", "kånno'", 'acha prefix'), ("acha'oppe'", "oppe'", 'acha prefix'), ("ma'achakånno'", "kånno'", "ma'acha prefix"), ("ma'acha'oppe'", "oppe'", "ma'acha prefix"), ("fanma'achakånno'", "kånno'", "fanma'acha prefix"), ("fanma'acha'oppe'", "oppe'", "fanma'acha prefix"), ("kékånno'", "kånno'", 'ké prefix'), ("ké'oppe'", "oppe'", 'ké prefix'), ("kékekånno'", "kånno'", 'kéke prefix'), ("kéke'oppe'", "oppe'", 'kéke prefix'), ("na'kånno'", "kånno'", "na' prefix"), ("na'oppe'", "oppe'", "na' prefix"), ("muna'kånno'", "kånno'", "muna' prefix"), ("muna'oppe'", "oppe'", "muna' prefix"), ("nina'kånno'", "kånno'", "nina

## The Man- Prefix

The Man- Prefix in Chamorro can be applied to a wide variety of words. But it is set apart from other prefixes because it requires morphophonemic changes to the base word. This section includes the lists of the different man- prefixes and the rules for morphophonemic changes.

### Man- Prefix Rules

In [97]:
# Create a list of man- prefixes
man_prefixes = ["man", "fan"]

# Create a dictionary of man- prefix rules
man_prefix_rules = {
    "man": {"b": "mam", "c": "mañ", "f": "mam", "g": "mang", "k": "mang", "p": "mam", "s": "mañ", "t": "man"},
    "fan": {"b": "fam", "c": "fañ", "f": "fam", "g": "fang", "k": "fang", "p": "fam", "s": "fañ", "t": "fan"}
}

### Create Man- Prefixed Word

In [140]:
def apply_man_prefix(word, man_prefix_rules, prefix):
    """
    Transforms a base word with the man- prefix and returns the transformed word
    """
    first_letter = word[0]
    
    # If morphophonemic changes are needed, get the appropriate man- prefix
    if first_letter in man_prefix_rules[prefix]:
        man_prefix = man_prefix_rules[prefix][first_letter]
        # If the word begins with "c", start slicing at position 2
        if first_letter == "c":
            new_word = man_prefix + word[2:]
        else:
            new_word = man_prefix + word[1:]
    # For all other words, apply the prefix without any morphophonemic changes        
    else:
        new_word = prefix + word
    
    return new_word

In [142]:
### DELETE ME TEST CODE ###
word = "taitai"
word = apply_man_prefix(word, man_prefix_rules, "man")
print(word)

manaitai


In [143]:
### DELETE ME TEST CODE ###
word = "taitai"
word = apply_man_prefix(word, man_prefix_rules, "fan")
print(word)

fanaitai


### Apply Single Man- Prefix Transformation

In [115]:
def apply_single_man_prefix(data, man_prefix_rules, prefixes):
    """
    Transforms a list of words with the man- and fan- prefixes, outputting a list of tuples
    Each tuple contains the new word, the base word, and the prefix name
    """
    # Initialize new list to store new words
    new_words = []
    
    # Apply the man- and fan- prefixes
    for word in data:
        for prefix in prefixes:
            # Apply the man- prefix
            new_word = apply_man_prefix(word, man_prefix_rules, prefix)
            # Append to new_words
            new_words.append((new_word, word, f"{prefix} prefix"))
    
    return new_words

In [116]:
### DELETE ME TEST CODE ###
word_list = ["fa'tinas"]
new_words = apply_single_man_prefix(word_list, man_prefix_rules, man_prefixes)
print(new_words)

[("mama'tinas", "fa'tinas", 'man prefix'), ("fama'tinas", "fa'tinas", 'fan prefix')]


### Apply Double Man- Prefix Transformation

In [113]:
def apply_double_man_prefixes(data, man_prefix_rules, prefixes):
    """
    Transforms a list of words with the double man- prefix, outputting a list of tuples
    First applies the single man- prefix, then applies the man- or fan-
    Each tuple contains the new word, the base word, and the prefix name
    """
    # Initialize list to store new words
    new_words = []
    
    for word in data:
        for prefix in prefixes:
            # Apply the single man- prefix first
            single_man_word = apply_man_prefix(word, man_prefix_rules, "man")
            # Apply the second man- or fan-
            new_word = prefix + single_man_word
            # Store in new list
            new_words.append((new_word, word, f"{prefix}man prefix"))
            
    return new_words

In [114]:
### DELETE ME TEST CODE ###
word_list = ["fa'tinas"]
new_words = apply_double_man_prefixes(word_list, man_prefix_rules, man_prefixes)
print(new_words)

[("manmama'tinas", "fa'tinas", 'manman prefix'), ("fanmama'tinas", "fa'tinas", 'fanman prefix')]


## Other Prefixes

Some prefixes require vowel lowering or vowel fronting to take place, which is similar to vowel harmony rules

In [100]:
### FIX ME: Confirm all the prefixes that require vowel-lowering ###
vowel_lowered_prefixes = ["gé", "ké"]

In [123]:
def apply_vowel_lowered_prefixes(data, prefixes, vowel_harmony_rules):
    """
    Transforms words with prefixes that require vowel lowering and returns a list of tuples.
    Each tuple contains the new word, the base word and the prefix name.
    """
    ### FIX ME ###
    
    return None

In [122]:
### DELETE ME TEST CODE ###

# Apply Suffixes

**About Chamorro Suffixes:** Chamorro suffixes are attached to the end of a word, and sometimes they are preceded by a dash (-). All possessive pronouns are also included in this section as suffixes.

In [149]:
# Create possessive pronoun lists
possessive_pronouns_one_syllable = ["-hu", "-mu", "-ña", "-ta"]
possessive_pronouns_two_syllable = ["-ñiha", "-miyu", "-måmi", "-ñíñiha", "-mímiyu", "-måmåmi"]

# Create a list of other suffixes
### FIX ME: Confirm all suffixes ###
directional_suffixes = ["magi", "guatu"]

# Apply Circumfixes

**About Chamorro Circumfixes:** There is only one circumfix in Chamorro, with is the fan-...-an circumfix. Morphophonemic changes to the base word are usually required.

# Apply Reduplication

In this section, we will be experimenting with different ways to transform words with reduplication. There are several forms of reduplication in Chamorro, and we will start with the most common ones, which is to reduplicate on the stressed syllable. To achive this, we will first split a word into syllables and then count the syllables. In general, the second to last syllable is the one that is reduplicated. We do this because spelling in Chamorro is not consistent, and we cannot always rely upon stressed characters to be in our data.

In [159]:
# def split_syllables(word):
#     """
#     Splits a word into syllables in preparation for applying reduplication.
#     We reduplicate the consonant and the vowels attached to it
#     If the syllable ends in a consonant, don't repeat that last consonant
#     If the syllable has two vowels, only repeat the first vowel
#     But if they are vowel combos like ua, ue, ui, uo, we repeat the entire vowel combo
#     """
#     # Match consonant(s) + vowel(s) + optional consonant(s), including glottal stops and diphthongs
#     pattern = re.compile(r"([bcdfghjklmnpqrstvwxyz]*[aeiouåáéíóú]*['])", re.IGNORECASE)
#     syllables = pattern.findall(word)
#     syllable_count = len(syllables)-1
#     return (syllables, syllable_count)

# Export to CSV

We will take all the word lists from above and export them to CSV files.

In [92]:
# def convert_to_dataframe(affixed_words):
#     """
#     In this function, we convert our newly affixed words to a dataframe, and then export it to a CSV file.
#     We will also include metadata from the original word list to our exported CSV file.
#     """
#     # Get new words
#     new_words = affixed_words[0]
    
#     # Get old words
#     old_words = affixed_words[1]
    
#     # Get Affix Name
#     affix_name = affixed_words[0][0][2]
    
#     # Convert list to dataframe
#     infixed_words_df = pd.DataFrame(new_words, columns=["Word", "Term", "Affix"])
#     # Add the original Definition and Root Word to the infixed words df
#     filtered_df = old_words[["Term", "Definition", "Root Word"]]
#     infixed_words_df = pd.merge(infixed_words_df, filtered_df, on="Term", how="left")
    
#     # Save dataframe as CSV
#     infixed_words_df.to_csv(f"{affix_name}_affixed_words.csv", index=False, encoding="utf-8")

In [99]:
# # TEST Pass thru the convert_to_dataframe function
# infixed_words = [um_infixed_words, in_infixed_words]

# for output in infixed_words:
#     convert_to_dataframe(output)