# Chamorro Lexicon Expander

**Chamorro Lexicon Expander** is a Python project designed to expand the Chamorro-English dictionary by generating all possible affixed variations of Chamorro root words. This tool automates the process of creating word forms using common Chamorro prefixes, suffixes, and infixes according to linguistic rules. This project enables a more comprehensive representation of Chamorro vocabulary for language learners, linguists, and dictionary development.

**Name:** Schyuler Lujan <br>
**Date Started:** 10-Nov-2024 <br>
**Date Complete:** In Progress <br>

In [2]:
# Import libraries
import re
import pandas as pd
import csv

# Import and Clean Data

**About this data:** For this project, we will be using the words and part-of-speech tags from the Revised Chamorro-English dictionary, which is available for free at https://natibunmarianas.org/chamorro-dictionary/. We will be using this data because it is currently the only freely available resource online with the most complete and reliably accurate part-of-speech tags on Chamorro words. Part-of-speech tags will determine which words can be transformed with the different affixes.

In [52]:
# Import files and convert to dataframes
tverbs_df = pd.read_csv("transitive-verbs.csv", encoding="utf-8")
iverbs_df = pd.read_csv("intransitive-verbs.csv", encoding="utf-8")

In [53]:
# Select only the Term and Definition columns
tverbs_df = tverbs_df[["Term", "Definition"]]
iverbs_df = iverbs_df[["Term", "Definition"]]

# Apply Infixes

**About Chamorro Infixes:** Infixes are affixes that occur within the word, rather than being attached in front of the word or at the end. In Chamorro, infixes are always applied before the first vowel of the word they are attached to. If the word it's being attached to starts with a vowel, the infix is still placed in front of that vowel. There are two infixes in Chamorro: -in- and -um-.

In [72]:
def infixes(df, part_of_speech):
    """
    Applies the -in- and -um- infixes, since they follow the same pattern.
    We will find the first vowel in the word, and append the infix in front of it.
    """
    # Create a list of vowels to search for in the words
    vowels = ['a', 'á', 'å', 'e', 'é', 'i', 'í', 'o', 'ó', 'u']
    
    # Create a dictionary of vowel harmony rules, for -in- infix
    vowel_harmony = {"å": "a", "o": "e", "u": "i"}
    
    # Get the terms and convert dataframe to a list
    word_list = df["Term"].tolist()
    
    # Initialize list to store new words
    infixed_words = []
    
    # Affix words with -um- and append to list, with other metadata
    for word in word_list:
        for letter in word:
            if letter in vowels:
                slice_start = word.index(letter)
                um_word = word[0:slice_start]+"um"+word[slice_start:]
                infixed_words.append((um_word, word, "UM Infix", part_of_speech))
                break
                
    # Affix words with -in- using vowel harmony and append to list with other metadata
    ##### FIXME #####
    
    # Convert list to dataframe
    infixed_words_df = pd.DataFrame(infixed_words, columns=["NewWord", "Term", "Affix", "PartOfSpeech"])
    # Also add the original Definition
    infixed_words_df = pd.merge(infixed_words_df, df, on="Term", how="left")
    
    # Save dataframe as CSV
    infixed_words_df.to_csv("infixed_words.csv", index=False, encoding="utf-8")
    
    return infixed_words_df

In [73]:
# Pass the dataframe, and type the part of speech
infixed_words = infixes(tverbs_df, "Transitive Verb")