<div align=right>
Charlotte Puopolo<br>
HAPLAP Morphology<br>
EHU/UPV Univeristy of the Basque Country<br>
Tutor: Dr. Mans Hulden<br>
Spring 2024
</div>

<h1 align=center>A morphophonological generator and analyzer: Tagalog</h1>

## Linguistic analysis

Some context for Tagalog's grammar:
* Tagalog is a language incredibly rich in morphemes.
* Verbs are not subject- or pronoun-dependent, ie. they do not change depending on the gender or number of actors. They do have some tricky  conjugations, however.
* Verbs do not exactly have tenses, only aspects that are:
  * infinitive - the default or basic form of the verb; also used in the imperative (commands)
  * perfective / completed - action that has already been completed (sometimes confused with past)
  * imperfective / progressive - action that has been started but not yet completed
  * contemplative / unstarted- action that has yet to be started (sometimes confused with future).
* An example of the time aspects:
  * eg. Naglalakad ako - I am walking  (progressive)
  * eg. Naglalakad ako kagabi - I was walking last night (progressive, kagabi = last night)
* Tagalog uses a "focus system" that uses verbs to stress the importance of some element of the sentence, such as the actor, object, instrument, or direction of the action. Most Tagalog learners begin with **actor focused** conjugations so that is the scope of this project.
* Some nouns are included below the verb rules.

The data has been grouped into the 3 most common actor-focused verb groups, which is what most Tagalog learners study first.

|Verb type|Stem   |Infinitive|Perfective|Progressive|Contemplative|Translation|
|---------|-------|----------|----------|-----------|-----------|-----------|
|(1) mag- |lakad  |maglakad  |naglakad  |naglalakad |maglalakad | 'walk'    |
|         |salita |magsalita |nagsalita |nagsasalita|magsasalita| 'talk'    |
|         |luto   |magluto   |nagluto   |nagluluto  |magluluto  | 'cook'    |
|         |mahal  |magmahal  |nagmahal  |nagmamahal |nagmamahal | 'love'    |
|         |sulat  |magsulat  |nagsulat  |nagsusulat |magsusulat | 'write'   |
|         |aral   |magaral   |nagaral   |nagaaral   |magaaral   | 'study'   |
|         |turò   |magturò   |nagturò   |nagtuturò  |magtuturò  | 'teach'   |
|         |laba   |maglaba   |naglaba   |naglalaba  |maglalaba  | 'wash'    |
|         |maneho |magmaneho |nagmaneho |nagmamaneho|magmamaneho| 'drive'   |
|         |suklay |magsuklay |nagsuklay |nagsusuklay|magsusuklay| 'comb'    |
|         |basa   |magbasa   |nagbasa   |nagbabasa  |magbabasa  | 'read'    |
|(2) ma-  |tulog  |matulog   |natulog   |natutulog  |matutulog  |'sleep'    |
|         |kinig  |makinig   |natinig   |nakikinig  |makikinig  |'listen'   |
|         |nood   |manood    |nanood    |nanonood   |manonood   |'watch'    |
|         |ligo   |maligo    |naligo    |naliligo   |maliligo   |'bathe'    |
|         |lungkot|malungkot |nalungkot |nalulungkot|malulungkot|'be sad'   |
|         |higa   |mahigâ    |nahigâ    |nahihigâ   |mahihigâ   |'lie down' |
|         |upo    |maupô     |naupô     |nauupô     |mauupô     |'sit'      |
|         |galit  |magalit   |nagalit   |nagagalit  |magagalit  |'be angry' |
|         |nginig |manginig  |nanginig  |nanginginig|manginginig|'shiver'   |
|         |sisi   |masisi    |nasisi    |nasisisi   |masisisi   |'blame'    |
|         |huli   |mahuli    |nahuli    |nahuhuli   |mahuhuli   |'catch'    |
|(2) um-  |bili   |bumili    |bumili    |bumibili   |bibili     |'buy'      |
|         |kain   |kumain    |kumain    |kumakain   |kakain     |'eat'      |
|         |tawag  |tumawag   |tumawag   |tumatawag  |tatawag    |'call'     |
|         |ngiti  |ngumiti   |ngumiti   |ngumingiti |ngingiti   |'smile'    |
|         |labas  |lumabas   |lumabas   |lumalabas  |lumabas    |'wave'     |
|         |tawa   |tumawa    |tumawa    |tumatawa   |tatwa      |'laugh'    |
|         |takbo  |tumakbo   |tumakbo   |tumatakbo  |tatakbo    |'run'      |
|         |sigaw  |sumigaw   |sumigaw   |sumisigaw  |sisigaw    |'yell'     |
|         |talon  |tumalon   |tumalon   |tumatalon  |tatalon    |'jump'     |
|         |kanta  |kumanta   |kumanta   |kumakanta  |kakanta    |'sing'     |
|         |pili   |pumili    |pumili    |pumipili   |pipili     |'choose'   |

In [1]:
!pip install pyfoma
from pyfoma import *

Collecting pyfoma
  Downloading pyfoma-1.0.6-py3-none-any.whl (37 kB)
Collecting jedi>=0.16 (from IPython->pyfoma)
  Downloading jedi-0.19.1-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m30.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: jedi, pyfoma
Successfully installed jedi-0.19.1 pyfoma-1.0.6


##Defining all verb stems and aspect prefixes

In [2]:
fsts = {}
fsts['mag_stems'] = FST.re("(lakad)|(salita)|(luto)|(mahal)|(sulat)|(turò)|(aral)|(laba)|(suklay)|(basa)|(maneho)") # mag stems here
fsts['mag_prefix'] = FST.re("'[V INF]':(mag) |'[V INF]':(mag) | '[V Perfect]':(nag) | '[V Prog]':(nagxx) | '[V Contempl]':(magxx)")
fsts['mag_verb'] = FST.re("$mag_prefix $mag_stems", fsts)
# xx is used as a cue for syllable repetition

fsts['ma_stems'] = FST.re("(tulog)|(kinig)|(nood)|(ligo)|(lungkot)|(higâ)|(upô)|(galit)|(nginig)|(sisi)|(huli)") # ma stems here
fsts['ma_prefix']  = FST.re("'[V INF]':(ma) | '[V Perfect]':(na) | '[V Prog]': (naxx) | '[V Contempl]':(maxx)")
fsts['ma_verb']   = FST.re("$ma_prefix $ma_stems", fsts)

fsts['um_stems']   = FST.re("(bili)|(kain)|(tawag)|(ngiti)|(tawa)|(labas)|(takbo)|(kanta)|(pili)|(sigaw)|(talon)") # um stems here
fsts['um_prefix']  = FST.re("'[V INF]':(2um) | '[V Perfect]':(2um) | '[V Prog]':(2umxx) | '[V Contempl]':(xx)")
fsts['um_verb']   = FST.re("$um_prefix $um_stems", fsts)
# 2 is used as a cue for copying the first letter of the stem

fsts['VP'] = FST.re("$mag_verb|$ma_verb|$um_verb", fsts)

fsts['C'] = FST.re("[a-z]-[aeiou]")
fsts['V'] = FST.re("[aeiou]")

# print(Paradigm(fsts['mag_verb'], ".*"))

## Rules needed:
* double the consonant-vowel pair after 'xx'
  * applies to all mag- and ma- verbs that start with a consonant-vowel pair (all conjugations)
  * applies to all um- verbs in Progressive and Contemplative conjugations
* double the vowel that directly follows 'xx'
  * applies to all mag- and ma- verbs that start with a vowel (aral, upo)
* delete 'xx'
* change '2' to the first consonant of the stem
  * applies to all um- verbs in Infinitive and Perfective conjugations
* delete the original first consonant of the stem
  * applies to all um- verbs in Infinitive and Perfective conjugations

### Challenge: Syllabic repetition
Tagalog uses lots of syllablic repetition and what this project needs is a way to solve:

If a consonant-vowel pair follows 'xx' (my shorthand to cue a doubled syllable), it should double that exact consonant-vowel pair. I looked through the Pyfoma and RegularExpressionCompiler documentation and this is what I came up with:

```
fsts['syllable'] = FST.re("$C $V|(ngi)", fsts)
fsts['double_syllable'] = FST.re("$^rewrite($syllable:$syllable{2} / (xx)_ )", fsts) # delete xx and double the syllable
```
but what this code outputs is every possible syllable. Instead of just printing the desired output, it prints all possible CV pairs.

eg. for the verb luto --> magxxluto --> **lu**luto

this code produces: baluto beluto biluto boluto buluto caluto, etc. for every CV combo.

**Update** Dr. Mans Hulden, creator of Pyfoma, confirmed that Pyfoma lacks a simple way to do this. The following solution is one of the most efficient ways to do it.

In [30]:
# SYLLABIC REPETION
# Based on Tagalog verb catalogue there is no need for syllables that start with: c, f, j, q, v, x, z; and sparse m, n, r, y
# it's uncommon to use the letter 'e' in bigram syllables (exceptions: re, se, be)
# "nga" and "ngi" are the only trigram syllables because "ng" is regarded as a single letter (and phoneme) in the Tagalog alphabet
# the following syllables are broken up in 3 lines only for aesthetic reasosn

syllables1 = ["ba","be","bi","bo","bu","da","di","do","du","ga","gi","go","gu","ha","hi","ho","hu","ka","ki"]
syllables2 = ["ku","la","li","lo","lu","ma","mu","na","no","nu","pa","pi","po","pu","ra","re","ri","sa","se"]
syllables3 = ["si","so","su","ta","ti","to","tu","wa","wi","ya","ngi","nga","aa","ee","ii","oo","uu"]

fsts['syllables1'] = FST.re("$^rewrite(" + "|".join(f"({s}):({s}{s})" for s in syllables1) + '/ (xx)_ )', fsts)
fsts['syllables2'] = FST.re("$^rewrite(" + "|".join(f"({s}):({s}{s})" for s in syllables2) + '/ (xx)_ )', fsts)
fsts['syllables3'] = FST.re("$^rewrite(" + "|".join(f"({s}):({s}{s})" for s in syllables3) + '/ (xx)_ )', fsts)

# Combine all rules
fsts['repeat_rules'] = FST.re("$syllables1 @ $syllables2 @ $syllables3", fsts)

In [31]:
# delete xx
fsts["xxdelete"] = FST.re("$^rewrite((xx):'')")
fsts['um_cleanup'] = FST.re("$um_verb @ $repeat_rules @ $xxdelete", fsts)
fsts['ma_mag_cleanup'] = FST.re("($mag_verb | $ma_verb) @ $repeat_rules @ $xxdelete", fsts)
print(Paradigm(fsts['ma_mag_cleanup'], ".*"))

aral     [V Contempl]  magaral      
aral     [V INF]       magaral      
aral     [V Perfect]   nagaral      
aral     [V Prog]      nagaral      
basa     [V Contempl]  magbabasa    
basa     [V INF]       magbasa      
basa     [V Perfect]   nagbasa      
basa     [V Prog]      nagbabasa    
galit    [V Contempl]  magagalit    
galit    [V INF]       magalit      
galit    [V Perfect]   nagalit      
galit    [V Prog]      nagagalit    
higâ     [V Contempl]  mahihigâ     
higâ     [V INF]       mahigâ       
higâ     [V Perfect]   nahigâ       
higâ     [V Prog]      nahihigâ     
huli     [V Contempl]  mahuhuli     
huli     [V INF]       mahuli       
huli     [V Perfect]   nahuli       
huli     [V Prog]      nahuhuli     
kinig    [V Contempl]  makikinig    
kinig    [V INF]       makinig      
kinig    [V Perfect]   nakinig      
kinig    [V Prog]      nakikinig    
laba     [V Contempl]  maglalaba    
laba     [V INF]       maglaba      
laba     [V Perfect]   naglaba      
l

The following rules target "um" verbs that insert the prefix "um" _after_ the first consonant of the stem.

Eg. _verb_ **kanta** + _prefix_ **um** in _aspect_ [V Perfect] -->  **kumanta**

In [25]:
# Rewrite with initial consonant in front of "um"
fsts['bum'] = FST.re("$^rewrite((2um):(bum) / _b)", fsts) #change 2um to bum if follwed by b AND delete c
fsts['dum'] = FST.re("$^rewrite((2um):(dum) / _d)", fsts)
fsts['gum'] = FST.re("$^rewrite((2um):(gum) / _g)", fsts)
fsts['hum'] = FST.re("$^rewrite((2um):(hum) / _h)", fsts)
fsts['kum'] = FST.re("$^rewrite((2um):(kum) / _k)", fsts)
fsts['lum'] = FST.re("$^rewrite((2um):(lum) / _l)", fsts)
fsts['mum'] = FST.re("$^rewrite((2um):(mum) / _m)", fsts)
fsts['num'] = FST.re("$^rewrite((2um):(num) / _n)", fsts)
fsts['pum'] = FST.re("$^rewrite((2um):(pum) / _p)", fsts)
fsts['rum'] = FST.re("$^rewrite((2um):(rum) / _r)", fsts)
fsts['sum'] = FST.re("$^rewrite((2um):(sum) / _s)", fsts)
fsts['tum'] = FST.re("$^rewrite((2um):(tum) / _t)", fsts)
fsts['yum'] = FST.re("$^rewrite((2um):(yum) / _y)", fsts)
fsts['ngum'] = FST.re("$^rewrite((2um):(ngum) / _ng)", fsts)

fsts['add_um'] = FST.re("$um_cleanup @ $ngum @ $bum @ $dum @ $gum @ $hum @ $kum @ $lum @ $mum @ $num @ $pum @ $rum @ $sum @ $tum @ $yum", fsts)

In [26]:
# Delete the original consonant that now appears after "um"
fsts['ng_delete'] = FST.re("$^rewrite((ng):'' / u m _ $V)", fsts) #ng is the exception
fsts['C_delete'] = FST.re("$^rewrite($C:'' / u m _ $V)", fsts)
fsts['um_final'] = FST.re("$add_um @ $ng_delete @ $C_delete", fsts)
print(Paradigm(fsts['um_final'], ".*"))

bili   [V Contempl]  bibili      
bili   [V INF]       bumili      
bili   [V Perfect]   bumili      
bili   [V Prog]      bumibili    
kain   [V Contempl]  kakain      
kain   [V INF]       kumain      
kain   [V Perfect]   kumain      
kain   [V Prog]      kumakain    
kanta  [V Contempl]  kakanta     
kanta  [V INF]       kumanta     
kanta  [V Perfect]   kumanta     
kanta  [V Prog]      kumakanta   
labas  [V Contempl]  lalabas     
labas  [V INF]       lumabas     
labas  [V Perfect]   lumabas     
labas  [V Prog]      lumalabas   
ngiti  [V Contempl]  ngingiti    
ngiti  [V INF]       ngumiti     
ngiti  [V Perfect]   ngumiti     
ngiti  [V Prog]      ngumingiti  
pili   [V Contempl]  pipili      
pili   [V INF]       pumili      
pili   [V Perfect]   pumili      
pili   [V Prog]      pumipili    
sigaw  [V Contempl]  sisigaw     
sigaw  [V INF]       sumigaw     
sigaw  [V Perfect]   sumigaw     
sigaw  [V Prog]      sumisigaw   
takbo  [V Contempl]  tatakbo     
takbo  [V INF]

## All verbs in their final correct forms

In [27]:
fsts['all_verbs'] = FST.re("$um_final | $ma_mag_cleanup", fsts)
print(Paradigm(fsts['all_verbs'], ".*"))

aral     [V Contempl]  magaral      
aral     [V INF]       magaral      
aral     [V Perfect]   nagaral      
aral     [V Prog]      nagaral      
basa     [V Contempl]  magbabasa    
basa     [V INF]       magbasa      
basa     [V Perfect]   nagbasa      
basa     [V Prog]      nagbabasa    
bili     [V Contempl]  bibili       
bili     [V INF]       bumili       
bili     [V Perfect]   bumili       
bili     [V Prog]      bumibili     
galit    [V Contempl]  magagalit    
galit    [V INF]       magalit      
galit    [V Perfect]   nagalit      
galit    [V Prog]      nagagalit    
higâ     [V Contempl]  mahihigâ     
higâ     [V INF]       mahigâ       
higâ     [V Perfect]   nahigâ       
higâ     [V Prog]      nahihigâ     
huli     [V Contempl]  mahuhuli     
huli     [V INF]       mahuli       
huli     [V Perfect]   nahuli       
huli     [V Prog]      nahuhuli     
kain     [V Contempl]  kakain       
kain     [V INF]       kumain       
kain     [V Perfect]   kumain       
k

##Some notes on Nouns

These verbs are organized by type: person, object, place, event. This categorization would help in building a grammar checker that could detect, for example, which preposition to use with a type of noun.

In [28]:
fsts['plurality_marker'] = FST.re("'[Sing.]':''|'[Plural]':(mga' ')")
fsts['person'] = FST.re("'[N][person]':(lakaki)|'[N][person]':(babae)|'[N][person]':(atleta)|'[N][person]':(estudyante)|'[N][person]':(bayani)", fsts)
fsts['object'] = FST.re("'[N][object]':(pagkain)|'[N][object]':(sapatos)|'[N][object]':(kompyuter)|'[N][object]':(kotse)", fsts)
fsts['place'] = FST.re("'[N][place]':(unibersidad)|'[N][place]':(restawran)|'[N][place]':(mall)|'[N][place]':(ospital)|'[N][place]':(museo)", fsts)
fsts['event'] = FST.re("'[N][event]':(piyesta' 'opisyal)|'[N][event]':(pagdiriwang)|'[N][event]':(kapistahan)", fsts)
fsts['noun'] = FST.re("$person | $object | $place | $event", fsts)
fsts['NP'] = FST.re("$plurality_marker $noun", fsts)
print(Paradigm(fsts['NP'], ".*"))

  [Plural][N][event]   mga kapistahan       
  [Plural][N][event]   mga pagdiriwang      
  [Plural][N][event]   mga piyesta opisyal  
  [Plural][N][object]  mga kompyuter        
  [Plural][N][object]  mga kotse            
  [Plural][N][object]  mga pagkain          
  [Plural][N][object]  mga sapatos          
  [Plural][N][person]  mga atleta           
  [Plural][N][person]  mga babae            
  [Plural][N][person]  mga bayani           
  [Plural][N][person]  mga estudyante       
  [Plural][N][person]  mga lakaki           
  [Plural][N][place]   mga mall             
  [Plural][N][place]   mga museo            
  [Plural][N][place]   mga ospital          
  [Plural][N][place]   mga restawran        
  [Plural][N][place]   mga unibersidad      
  [Sing.][N][event]    kapistahan           
  [Sing.][N][event]    pagdiriwang          
  [Sing.][N][event]    piyesta opisyal      
  [Sing.][N][object]   kompyuter            
  [Sing.][N][object]   kotse                
  [Sing.][

###End notes
I have been interested in Tagalog since I visited the Philippines last summer and I found building this grammar to be very rewarding.

##Sources:
* https://github.com/mhulden/pyfoma/blob/main/docs/MorphologicalAnalyzerTutorial.ipynb
* https://github.com/mhulden/pyfoma/blob/main/docs/RegularExpressionCompiler.ipynb
* https://seasite.niu.edu/tagalog/tagalog_verbs.htm
* https://owlcation.com/humanities/Filipino-Verbs-and-Tenses
*https://ling-app.com/fil/tagalog-nouns/#:~:text=What%20Are%20Nouns%20In%20Tagalog,with%20these%20two%20Tagalog%20words.