# Chamorro Lexicon Expander

**Chamorro Lexicon Expander** is a Python project designed to expand the Chamorro-English dictionary by generating all possible affixed variations of Chamorro root words. This tool automates the process of creating word forms using common Chamorro prefixes, suffixes, and infixes according to linguistic rules. This project enables a more comprehensive representation of Chamorro vocabulary for language learners, linguists, and dictionary development.

**Name:** Schyuler Lujan <br>
**Date Started:** 10-Nov-2024 <br>
**Date Complete:** In Progress <br>

In [1]:
# Import libraries
import re
import pandas as pd
import csv

# IMPORT AND CLEAN DATA

**About this data:** For this project, we will be using the words and part-of-speech tags from the Revised Chamorro-English dictionary, which is available for free at https://natibunmarianas.org/chamorro-dictionary/. We will be using this data because it is currently the only freely available resource online with the most complete and reliably accurate part-of-speech tags on Chamorro words. Part-of-speech tags will determine which words can be transformed with the different affixes.

In [2]:
# Import files and convert to dataframes
# tverbs_df = pd.read_csv("transitive-verbs.csv", encoding="utf-8")
# iverbs_df = pd.read_csv("intransitive-verbs.csv", encoding="utf-8")

In [3]:
# Preview dataframe
#tverbs_df.head()

# VOWEL HARMONY

**About Vowel Harmony:** In Chamorro, vowel harmony refers to how certain vowels must change into different vowels, depending upon the vowel that comes before it. In general, the vowels "å", "o", and "e" must shift if they are preceded by an "i".

## Vowel Harmony Rules

In [109]:
# Create a list of vowels that need to be transformed by vowel harmony
vowels = ['a', 'á', 'å', 'e', 'é', 'i', 'í', 'o', 'ó', 'u']

In [368]:
# Create a dictionary of vowel harmony transformation rules
vowel_harmony_rules = {"å": "a", "o": "e", "u": "i"}

## Create Vowel Harmonized Word

In [379]:
def vowel_harmonized_word(word, vowels, vowel_harmony_rules):
    """
    Transforms a word to apply vowel harmony rules and returns the vowel harmonized word
    """
    for i in range(len(word)):
        letter = word[i]
        next_letter = word[i + 1]
        if letter in vowels and next_letter in vowels:
            break
        elif letter in vowel_harmony_rules:
            new_word = word.replace(letter, vowel_harmony_rules[letter], 1)
        else:
            new_word = word
        break # Stop iteration after first vowel
    
    return new_word

In [378]:
### DELETE ME TEST CODE ###
word = "kuentus"
print(vowel_harmonized_word(word, vowels, vowel_harmony_rules))

kuentus


# INFIXES

**About Chamorro Infixes:** Infixes are affixes that occur within the word, rather than being attached in front of the word or at the end. In Chamorro, infixes are always applied before the first vowel of the word they are attached to. If the word it's being attached to starts with a vowel, the infix is still placed in front of that vowel. There are two infixes in Chamorro: -in- and -um-.

## Apply Infix

In [244]:
def apply_infix(word, vowels, infix):
    """
    Transforms a word with an infix and returns the infixed word.
    
    Parameters:
    - word: A word to transform
    - vowels: A list of vowels to search for in the word
    - infix: The infix we are transforming the word with
    
    Returns:
    - A new word, which has been transformed by the infix
    """
    
    for letter in word:
        if letter in vowels:
            i = word.index(letter) # Get vowel's index
            new_word = word[0:i]+infix+word[i:] # Add the infix before vowel
            break # Stop iteration after first vowel
    
    return new_word

## Apply UM Infix Transformation

In [247]:
def apply_um_infix(word):
    """
    Transforms a word with the UM infix.
    """
    infix = "um"
    
    new_word = apply_infix(word, vowels, infix)
    
    return new_word

## Apply IN Infix Transformation

In [256]:
def apply_in_infix(word, vowels, vowel_harmony_rules):
    """
    Iterates through a list of words and applies the IN Infix.
    Vowel harmony rules must also be applied with the IN infix transformation.
    """
    # Define the infix
    infix = "in"
    
    # Aply vowel harmony rules first
    harmonized_word = vowel_harmonized_word(word, vowels, vowel_harmony_rules)
    # Transform the harmonized_word with the IN infix
    new_word = apply_infix(harmonized_word, vowels, infix)
        
    return new_word

# PREFIXES

**About Chamorro Prefixes:** Prefixes in Chamorro are attached to the front of a base word, and one or more prefixes can be attached. Some prefixes require morphophonemic changes to the base word when attached.

## Prefix Lists

We will create different lists of prefixes, excluding the man- prefixes because those require more changes to the base word to be applied. Lists are organized by the part-of-speech of base word they can be attached to, and include single prefixes and known combinations of prefixes.

In [384]:
### FIX ME: Organize Prefix by Vowel or Consonant Changes ###

# Create a list of prefixes for transitive verbs
transitive_verb_prefixes = [
    "án", "fa'", "á", "umá", "maná", "acha", "ma'acha", "fanma'acha", 
    "ké", "kéke",  "na'", "muna'", "nina'", "mannina'", "fannina'", "manna'", "fanna'",
    "ma", "manma", "fanma" 
]

# Create a list of prefixes for nouns
noun_prefixes = ["gai", "tai", "an", "mi", "é", "fa'", "mama'", "fama'", "ga'", "pinat"]

# Create a list of prefixes for adjectives
adjective_prefixes = ["chat", "fa'", "há", "lá", "pinat", "na'", "ná", "acha", "na'lá"]

# Create a list of the directional prefixes
directional_prefixes = ["gé", "hat", "san", "tak", "talak", "ya"]

### MORE PREFIXES GO HERE ###
# combos é+um, é+man, ga+um+verb, ga+indef man-, ha+ma+trans verb
# vowel lowering or harmony required for fa', ge', hat, san, tak, talak
# vowel harmony required for gai, tai, mi

## Apply Prefix Transformation

In [264]:
def apply_prefix(word, prefix, vowels):
    """
    Transform a word by attaching a prefix to the front, and returning the new, transformed word.
    """
    # Apply prefix
    if word[0] in vowels and prefix[-1] in vowels:
        new_word = prefix + """'""" + word
    else:
        new_word = prefix + word
    
    return new_word

## The Man- Prefix

The Man- Prefix in Chamorro can be applied to a wide variety of words. But it is set apart from other prefixes because it requires morphophonemic changes to the base word. This section includes the lists of the different man- prefixes and the rules for morphophonemic changes.

### Man- Prefix Rules

In [331]:
# Create a list of man- prefixes
man_prefixes = ["man", "fan"]

# Create a dictionary of man- prefix rules
man_prefix_rules = {
    "man": {"b": "mam", "c": "mañ", "f": "mam", "g": "mang", "k": "mang", "p": "mam", "s": "mañ", "t": "man"},
    "fan": {"b": "fam", "c": "fañ", "f": "fam", "g": "fang", "k": "fang", "p": "fam", "s": "fañ", "t": "fan"}
}

### Apply Single Man- Prefix

In [332]:
def apply_man_prefix(word, man_prefix_rules, prefix):
    """
    Transforms a base word with the man- prefix and returns the transformed word
    """
    # Get the first letter of the word, which determines the prefix used
    first_letter = word[0]
    
    # If morphophonemic changes are needed, get the appropriate man- prefix
    if first_letter in man_prefix_rules[prefix]:
        man_prefix = man_prefix_rules[prefix][first_letter]
        # If the word begins with "c", start slicing at position 2
        if first_letter == "c":
            new_word = man_prefix + word[2:]
        else:
            new_word = man_prefix + word[1:]
    # For all other words, apply the prefix without any morphophonemic changes        
    else:
        new_word = prefix + word
    
    return new_word

In [346]:
def apply_fan_prefix(word, man_prefix_rules, prefix):
    new_word = apply_man_prefix(word, man_prefix_rules, "fan")
    return new_word

### Apply Double Man- Prefix

In [333]:
def apply_double_man_prefix(word, man_prefix_rules, prefix):
    """
    Transforms a word with the double man- prefix and returns the transformed word.
    """   
    # Transform word with the single man- prefix
    single_man_word = apply_man_prefix(word, man_prefix_rules, "man")
    
    # Apply the second man- or fan-
    new_word = prefix + single_man_word
            
    return new_word

In [347]:
def apply_double_man_prefix_f(word, man_prefix_rules, prefix):
    new_word = apply_double_man_prefix(word, man_prefix_rules, "fan")
    return new_word

## Apply Vowel Lowering Prefixes

Some prefixes require vowel lowering or vowel fronting to take place, which is similar to vowel harmony rules

In [355]:
### FIX ME: Confirm all the prefixes that require vowel-lowering ###

In [356]:
def apply_vowel_lowered_prefixes(data, prefixes, vowel_harmony_rules):
    """
    Transforms words with prefixes that require vowel lowering and returns a list of tuples.
    Each tuple contains the new word, the base word and the prefix name.
    """
    ### FIX ME ###
    
    return None

In [357]:
### DELETE ME TEST CODE ###

# Apply Suffixes

**About Chamorro Suffixes:** Chamorro suffixes are attached to the end of a word, and sometimes they are preceded by a dash (-). All possessive pronouns are also included in this section as suffixes.

## Define Suffix Rules and Suffixes

In [97]:
# Define our vowel list
vowel_list = ["a", "e", "i", "o", "u"]

# Define rules for how to shift the last vowel in the word
rules_suffixes_vowel_change = {"a": "å", "u": "o", "i": "e", "o": "o", "e": "e"}

In [98]:
# Suffixes that don't require shifts
suffixes_no_change = ["guatu", "mågi", "magi"]

## Apply Suffixes With Vowel Shifts

Vowel shifts with Chamorro suffixes will occur with the final vowel in the word. The final vowel may be the final character or not, so we will first find the position of the final vowel in the string and then modify that vowel if necessary

In [358]:
# Suffixes that require vowel shifts on base word
suffixes_vowel_change = ["-hu", "-ku", "-mu", "-ña", "-ta"]

In [102]:
def find_first_vowel_position(word, vowels):
    """
    Finds the position of the first vowel in the string, starting from the end of the string.
    """

    # Find the position of the first vowel in the word
    for i in range(len(word)-1, -1, -1):
        if word[i] in vowels:
            position = i
            break
    
    return position

In [303]:
### DELETE ME TEST CODE ###
word = "sapåtus"
pos = find_first_vowel_position(word, vowel_list)
print(pos)

5


In [104]:
def apply_suffix_vowel_shifts(word, vowels, suffix, rules):
    """
    Applies a suffix that requires vowel shifts to the base word.
    """
    # Convert word to a list of characters
    characters = list(word)
    
    # Get the position of the first vowel at the end of the string
    position = find_first_vowel_position(word, vowels)
    
    # Replace the vowel
    characters[position] = rules[characters[position]]
    
    new_word = "".join(characters) + suffix
    
    return new_word

In [304]:
### DELETE ME TEST CODE ###
word = "sapåtus"
suffix = "-mu"
new_word = apply_suffix_vowel_shifts(word, vowel_list, suffix, rules_suffixes_vowel_change)
print(new_word)

sapåtos-mu


## Apply Suffixes With "n" Change

In [359]:
# Suffixes that require "n" after vowels on base word
suffixes_n_change = ["-ñiha", "-miyu", "-måmi", "-ñíñiha", "-mímiyu", "-måmåmi"]

In [116]:
def apply_suffix_n_change(word, vowels, suffix):
    """
    Applies suffixes that require the letter "n" to be added to the end of the base word, if it ends with a vowel.
    """
    # Add "n" to the end of a string if the last character is in the vowel list
    if word[-1] in vowels:
        word = word + "n"
    
    # Add the suffix to the word
    new_word = word + suffix
    
    return new_word

In [305]:
### DELETE ME TEST CODE ###
word = "eskuela"
suffix = "-måmi"
new_word = apply_suffix_n_change(word, vowel_list, suffix)
print(new_word)

eskuelan-måmi


## Apply Variable Suffixes

Some suffixes are applied differently, depending upon the final character of the base word.

In [360]:
# Suffixes that vary based upon the final character of the base word
suffixes_variable = ["i", "iyi", "on", "yon"]

In [361]:
def apply_sufixes_varied(word, suffix_rules, suffix):
    """
    Transforms a word with suffixes that vary based on the base word.
    """
    
    return None

## Apply Suffixes With No Changes

In [140]:
def apply_suffixes_no_changes(word, suffix):
    """
    Apply suffixes that do not require changes to the base word
    """
    return word + suffix

In [141]:
### DELETE ME TEST CODE ###
word = "chule'"
suffix = "guatu"
print(apply_suffixes_no_changes(word, suffix))

chule'guatu


# Apply Circumfixes

**About Chamorro Circumfixes:** There is only one circumfix in Chamorro, with is the fan-...-an circumfix. Morphophonemic changes to the base word are usually required.

In [363]:
### FIX ME: Define Circumfix Rules ###

In [364]:
def apply_circumfix(word, circumfix_rules):
    """
    Applies the Chamorro circumfix fan..an to a word and returns the new word
    """
    ### FIX ME ###
    return None

# Apply Reduplication

In this section, we will be experimenting with different ways to transform words with reduplication. There are several forms of reduplication in Chamorro, and we will start with the most common ones, which is to reduplicate on the stressed syllable. To achive this, we will first split a word into syllables and then count the syllables. In general, the second to last syllable is the one that is reduplicated. We do this because spelling in Chamorro is not consistent, and we cannot always rely upon stressed characters to be in our data.

In [159]:
# def split_syllables(word):
#     """
#     Splits a word into syllables in preparation for applying reduplication.
#     We reduplicate the consonant and the vowels attached to it
#     If the syllable ends in a consonant, don't repeat that last consonant
#     If the syllable has two vowels, only repeat the first vowel
#     But if they are vowel combos like ua, ue, ui, uo, we repeat the entire vowel combo
#     """
#     # Match consonant(s) + vowel(s) + optional consonant(s), including glottal stops and diphthongs
#     pattern = re.compile(r"([bcdfghjklmnpqrstvwxyz]*[aeiouåáéíóú]*['])", re.IGNORECASE)
#     syllables = pattern.findall(word)
#     syllable_count = len(syllables)-1
#     return (syllables, syllable_count)

# Generate Word Lists

We will now apply as many affixes and affix combinations as we want to our base word list and return a list of our new words. It will be a list of tuples where each tuple has the affixed word, the base word, and the affixes applied.

In [381]:
def apply_functions_with_args(words, func_args_map):
    """
    Applies each function with its corresponding specific arguments for each word in the list.
    
    Parameters:
    - words: List of words to affix
    - func_args_map: Dictionary where keys are functions, and values are tuples of arguments
    
    Returns:
    - List of tuples containing the affixed word, their base words and the affix name
    """
    results = []
    
    for word in words:
        for func, args in func_args_map.items():
            #affix_name = args[-1]
            result = func(word, *args)
            results.append((result, word))
        
    return results

In [382]:
# Define functions with their specific arguments
func_args_map = {
    vowel_harmonized_word: (vowels, vowel_harmony_rules),
    apply_um_infix: (),
    apply_in_infix: (vowels, vowel_harmony_rules),
    apply_prefix: ("á", vowels), # FIX ME: We need to iterate through a list of prefixes
    apply_man_prefix: (man_prefix_rules, "man"),
    apply_double_man_prefix: (man_prefix_rules, "man"),
    apply_fan_prefix: (man_prefix_rules, "fan"),
    apply_double_man_prefix_f: (man_prefix_rules, "fan")
}

In [383]:
### DELETE ME TEST CODE ###
word_list = ["kuentusi"]
output = apply_functions_with_args(word_list, func_args_map)
for i in output:
    print(i)

('kuentusi', 'kuentusi')
('kumuentusi', 'kuentusi')
('kinuentusi', 'kuentusi')
('ákuentusi', 'kuentusi')
('manguentusi', 'kuentusi')
('manmanguentusi', 'kuentusi')
('fanguentusi', 'kuentusi')
('fanmanguentusi', 'kuentusi')


In [None]:
### FIX ME ###
### combine affixes by passing functions through other functions ###

# Export to CSV

We will take all the word lists from above and export them to CSV files.

In [92]:
# def convert_to_dataframe(affixed_words):
#     """
#     In this function, we convert our newly affixed words to a dataframe, and then export it to a CSV file.
#     We will also include metadata from the original word list to our exported CSV file.
#     """
#     # Get new words
#     new_words = affixed_words[0]
    
#     # Get old words
#     old_words = affixed_words[1]
    
#     # Get Affix Name
#     affix_name = affixed_words[0][0][2]
    
#     # Convert list to dataframe
#     infixed_words_df = pd.DataFrame(new_words, columns=["Word", "Term", "Affix"])
#     # Add the original Definition and Root Word to the infixed words df
#     filtered_df = old_words[["Term", "Definition", "Root Word"]]
#     infixed_words_df = pd.merge(infixed_words_df, filtered_df, on="Term", how="left")
    
#     # Save dataframe as CSV
#     infixed_words_df.to_csv(f"{affix_name}_affixed_words.csv", index=False, encoding="utf-8")

In [99]:
# # TEST Pass thru the convert_to_dataframe function
# infixed_words = [um_infixed_words, in_infixed_words]

# for output in infixed_words:
#     convert_to_dataframe(output)