#Cook Islands Maori Context Free Language Parsing
Dartmouth College, LING48, Spring 2024<br>
Samuel Peter (samuel.peter.25@dartmouth.edu)

CFG rules can model the syntax of any human language. It uses the rules described in section 1.2 of this
webpage: https://www.nltk.org/book/ch08.html. <br>
Some recommendations:<br>
(a) This language is Verb+Subject+Object, so you can’t make a single constituent that only includes the
verb+object. Therefore, your VP cannot contain the object.2
(b) The TAM words should be grouped with the verb inside of a VP.<br>
(c) The preposition i marks the direct object. I suggest you treat this as a special phrase (e.g. NPOBJ)<br>

##Step 1: Install nltk

In [1]:
pip install nltk



##Step 2: Import required packages

In [2]:
import nltk

##Step 3: Build the rules

In [3]:
groucho_grammar = nltk.CFG.fromstring("""
S -> VP NP NPOBJ | VP NP PP | VP NP
PP -> P NP
NP -> Det N | Det N N | N N | N
VP -> TAM V | TAM V TAM
NPOBJ -> PREPACC NP

Det -> 'a' | 'te'
TAM -> 'Kua' | 'Te' | 'nei' | 'Kia' | 'E' | 'ana' | 'ake'
N -> 'Tere' | 'taro' | "va'ine" | "'are" | 'maki' | 'kōtou' | 'kātoatoa' | 'koe' | "'ānani" | 'Rarotonga'
V -> 'tunu' | "'aere" | 'orāna' | 'reka'
PREPACC -> 'i'
P -> 'ki'
""")

##Step 4: Build the parser

In [4]:
#Build nltk parser
parser = nltk.ChartParser(groucho_grammar)

##Step 5: Get the sentence orthography

In [5]:
#Get words from the sentence
sentence = "Kua tunu a Tere i te taro"
sent = sentence.split()
print(sent)

#Get the tree
for tree in parser.parse(sent):
    print(tree)

['Kua', 'tunu', 'a', 'Tere', 'i', 'te', 'taro']
(S
  (VP (TAM Kua) (V tunu))
  (NP (Det a) (N Tere))
  (NPOBJ (PREPACC i) (NP (Det te) (N taro))))


In [6]:
#Get words from the sentence
sentence = "Te 'aere nei te va'ine ki te 'are maki"
sent = sentence.split()
print(sent)

#Get the tree
for tree in parser.parse(sent):
    print(tree)

['Te', "'aere", 'nei', 'te', "va'ine", 'ki', 'te', "'are", 'maki']
(S
  (VP (TAM Te) (V 'aere) (TAM nei))
  (NP (Det te) (N va'ine))
  (PP (P ki) (NP (Det te) (N 'are) (N maki))))


In [7]:
#Get words from the sentence
sentence = "Kia orāna kōtou kātoatoa"
sent = sentence.split()
print(sent)

#Get the tree
for tree in parser.parse(sent):
    print(tree)

['Kia', 'orāna', 'kōtou', 'kātoatoa']
(S (VP (TAM Kia) (V orāna)) (NP (N kōtou) (N kātoatoa)))


In [8]:
#Get words from the sentence
sentence = "E reka ana koe i te 'ānani"
sent = sentence.split()
print(sent)

#Get the tree
for tree in parser.parse(sent):
    print(tree)

['E', 'reka', 'ana', 'koe', 'i', 'te', "'ānani"]
(S
  (VP (TAM E) (V reka) (TAM ana))
  (NP (N koe))
  (NPOBJ (PREPACC i) (NP (Det te) (N 'ānani))))


In [9]:
#Get words from the sentence
sentence = "Kua 'aere ake koe ki Rarotonga"
sent = sentence.split()
print(sent)

#Get the tree
for tree in parser.parse(sent):
    print(tree)

['Kua', "'aere", 'ake', 'koe', 'ki', 'Rarotonga']
(S
  (VP (TAM Kua) (V 'aere) (TAM ake))
  (NP (N koe))
  (PP (P ki) (NP (N Rarotonga))))
