# FLIP(01):  Advanced Data Science
**(Module 03: Natural Language Processing)**

---
- Materials in this module include resources collected from various open-source online repositories.
- You are free to use, but NOT allowed to change or distribute this package.

Prepared by and for 
**Student Members** |
2006-2018 [TULIP Lab](http://www.tulip.org.au)

---


# Session 08 - Building Feature-Based Grammars

### Grammatical Features

In this part, we will investigate the role of features in building rule-based grammars. In contrast to feature extractors, which record features that have been automatically detected, we are now going to declare the features of words and phrases. We start off with a very simple example, using dictionaries to store features and their values.

In [1]:
kim = {'CAT': 'NP', 'ORTH': 'Kim', 'REF': 'k'}

In [2]:
chase = {'CAT': 'V', 'ORTH': 'chased', 'REL': 'chase'}

Feature structures contain various kinds of information about grammatical entities. The information need not be exhaustive, and we might want to add further properties. For example, in the case of a verb, it is often useful to know what “semantic role” is played by the arguments of the verb. In the case of chase, the subject plays the role of “agent,” whereas the object has the role of “patient.” Let’s add this information, using 'sbj' (subject) and 'obj' (object) as placeholders which will get filled once the verb combines with its grammatical arguments:

In [3]:
chase['AGT'] = 'sbj'

In [4]:
chase['PAT'] = 'obj'

If we now process a sentence Kim chased Lee, we want to “bind” the verb’s agent role to the subject and the patient role to the object. We do this by linking to the REF feature
of the relevant NP. In the following example, we make the simple-minded assumption that the NPs immediately to the left and right of the verb are the subject and object,respectively. We also add a feature structure for Lee to complete the example.

In [5]:
sent = "Kim chased Lee"
tokens = sent.split()
lee = {'CAT': 'NP', 'ORTH': 'Lee', 'REF': 'l'}

In [6]:
def lex2fs(word):
    for fs in [kim, lee, chase]:
        if fs['ORTH'] == word:
            return fs

In [7]:
subj, verb, obj = lex2fs(tokens[0]), lex2fs(tokens[1]), lex2fs(tokens[2])
verb['AGT'] = subj['REF'] # agent of 'chase' is Kim
verb['PAT'] = obj['REF'] # patient of 'chase' is Lee

In [8]:
for k in ['ORTH', 'REL', 'AGT', 'PAT']: # check featstruct of 'chase'
    print("%-5s => %s" % (k, verb[k]))

ORTH  => chased
REL   => chase
AGT   => k
PAT   => l


The same approach could be adopted for a different verb—say, surprise—though in this case, the subject would play the role of “source” (SRC), and the object plays the role of “experiencer” (EXP):

In [9]:
surprise = {'CAT': 'V', 'ORTH': 'surprised', 'REL': 'surprise',
            'SRC': 'sbj', 'EXP': 'obj'}

In [10]:
# Example feature-based grammar
import nltk
nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg')

% start S
# ###################
# Grammar Productions
# ###################
# S expansion productions
S -> NP[NUM=?n] VP[NUM=?n]
# NP expansion productions
NP[NUM=?n] -> N[NUM=?n] 
NP[NUM=?n] -> PropN[NUM=?n] 
NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]
NP[NUM=pl] -> N[NUM=pl] 
# VP expansion productions
VP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n]
VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP
# ###################
# Lexical Productions
# ###################
Det[NUM=sg] -> 'this' | 'every'
Det[NUM=pl] -> 'these' | 'all'
Det -> 'the' | 'some' | 'several'
PropN[NUM=sg]-> 'Kim' | 'Jody'
N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child'
N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children' 
IV[TENSE=pres,  NUM=sg] -> 'disappears' | 'walks'
TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'
IV[TENSE=pres,  NUM=pl] -> 'disappear' | 'walk'
TV[TENSE=pres, NUM=pl] -> 'see' | 'like'
IV[TENSE=past] -> 'disappeared' | 'walked'
TV[TENSE=past] -> 'saw' | 'liked'


In [11]:
# Trace of feature-based chart parser.
tokens = 'Kim likes children'.split()
print(tokens)

['Kim', 'likes', 'children']


In [12]:
from nltk import load_parser
cp = load_parser('grammars/book_grammars/feat0.fcfg', trace=2)

In [14]:
trees = cp.nbest_parse(tokens) #nbest_parse()only can be use in nltk2.0, it has been deprecated in nltk3.0

AttributeError: 'FeatureChartParser' object has no attribute 'nbest_parse'

# Processing Feature Structures

In this part, we will show how feature structures can be constructed and manipulated in NLTK. We will also discuss the fundamental operation of unification, which allows us to combine the information contained in two different feature structures. Feature structures in NLTK are declared with the FeatStruct() constructor. Atomic feature values can be strings or integers.

In [15]:
fs1 = nltk.FeatStruct(TENSE='past', NUM='sg')

In [16]:
print(fs1)

[ NUM   = 'sg'   ]
[ TENSE = 'past' ]


In [17]:
fs1 = nltk.FeatStruct(PER=3, NUM='pl', GND='fem')

In [18]:
print(fs1['GND'])

fem


In [19]:
fs2 = nltk.FeatStruct(POS='N', AGR=fs1)

In [20]:
print(fs2)

[       [ GND = 'fem' ] ]
[ AGR = [ NUM = 'pl'  ] ]
[       [ PER = 3     ] ]
[                       ]
[ POS = 'N'             ]


In [21]:
print(fs2['AGR'])

[ GND = 'fem' ]
[ NUM = 'pl'  ]
[ PER = 3     ]


In [22]:
print(fs2['AGR']['PER'])

3


In [23]:
print(nltk.FeatStruct("[POS='N', AGR=[PER=3, NUM='pl', GND='fem']]"))

[       [ GND = 'fem' ] ]
[ AGR = [ NUM = 'pl'  ] ]
[       [ PER = 3     ] ]
[                       ]
[ POS = 'N'             ]


In [24]:
print(nltk.FeatStruct(name='Lee', telno='01 27 86 42 96', age=33))

[ age   = 33               ]
[ name  = 'Lee'            ]
[ telno = '01 27 86 42 96' ]


In [25]:
print(nltk.FeatStruct("""[NAME='Lee', ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
                          SPOUSE=[NAME='Kim', ADDRESS->(1)]]"""))

[ ADDRESS = (1) [ NUMBER = 74           ] ]
[               [ STREET = 'rue Pascal' ] ]
[                                         ]
[ NAME    = 'Lee'                         ]
[                                         ]
[ SPOUSE  = [ ADDRESS -> (1)  ]           ]
[           [ NAME    = 'Kim' ]           ]


In [26]:
print(nltk.FeatStruct("[A='a', B=(1)[C='c'], D->(1), E->(1)]"))

[ A = 'a'             ]
[                     ]
[ B = (1) [ C = 'c' ] ]
[                     ]
[ D -> (1)            ]
[ E -> (1)            ]


In [27]:
fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal')
fs2 = nltk.FeatStruct(CITY='Paris')

In [28]:
print(fs1.unify(fs2))

[ CITY   = 'Paris'      ]
[ NUMBER = 74           ]
[ STREET = 'rue Pascal' ]


In [29]:
print(fs2.unify(fs1))

[ CITY   = 'Paris'      ]
[ NUMBER = 74           ]
[ STREET = 'rue Pascal' ]


In [30]:
fs0 = nltk.FeatStruct(A='a')
fs1 = nltk.FeatStruct(A='b')
fs2 = fs0.unify(fs1)

In [31]:
print(fs2)

None


In [32]:
fs0 = nltk.FeatStruct("""[NAME=Lee,
                          ADDRESS=[NUMBER=74,
                          STREET='rue Pascal'],
                          SPOUSE= [NAME=Kim,
                          ADDRESS=[NUMBER=74,
                          STREET='rue Pascal']]]""")

In [33]:
print(fs0)

[ ADDRESS = [ NUMBER = 74           ]               ]
[           [ STREET = 'rue Pascal' ]               ]
[                                                   ]
[ NAME    = 'Lee'                                   ]
[                                                   ]
[           [ ADDRESS = [ NUMBER = 74           ] ] ]
[ SPOUSE  = [           [ STREET = 'rue Pascal' ] ] ]
[           [                                     ] ]
[           [ NAME    = 'Kim'                     ] ]


In [34]:
fs1 = nltk.FeatStruct("[SPOUSE = [ADDRESS = [CITY = Paris]]]")

In [35]:
print(fs1.unify(fs0))

[ ADDRESS = [ NUMBER = 74           ]               ]
[           [ STREET = 'rue Pascal' ]               ]
[                                                   ]
[ NAME    = 'Lee'                                   ]
[                                                   ]
[           [           [ CITY   = 'Paris'      ] ] ]
[           [ ADDRESS = [ NUMBER = 74           ] ] ]
[ SPOUSE  = [           [ STREET = 'rue Pascal' ] ] ]
[           [                                     ] ]
[           [ NAME    = 'Kim'                     ] ]


In [36]:
fs2 = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
                          SPOUSE=[NAME=Kim, ADDRESS->(1)]]""")

In [37]:
print(fs1.unify(fs2))

[               [ CITY   = 'Paris'      ] ]
[ ADDRESS = (1) [ NUMBER = 74           ] ]
[               [ STREET = 'rue Pascal' ] ]
[                                         ]
[ NAME    = 'Lee'                         ]
[                                         ]
[ SPOUSE  = [ ADDRESS -> (1)  ]           ]
[           [ NAME    = 'Kim' ]           ]


In [38]:
fs1 = nltk.FeatStruct("[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]")
fs2 = nltk.FeatStruct("[ADDRESS1=?x, ADDRESS2=?x]")

In [39]:
print(fs2)

[ ADDRESS1 = ?x ]
[ ADDRESS2 = ?x ]


In [40]:
print(fs2.unify(fs1))

[ ADDRESS1 = (1) [ NUMBER = 74           ] ]
[                [ STREET = 'rue Pascal' ] ]
[                                          ]
[ ADDRESS2 -> (1)                          ]


# Extending a Feature-Based Grammar

In this part, we return to feature-based grammar and explore a variety of linguistic issues, and demonstrate the benefits of incorporating features into the grammar.

In [41]:
# Grammar with productions for inverted clauses and long-distance dependencies, making use of slash categories.
nltk.data.show_cfg('grammars/book_grammars/feat1.fcfg')

% start S
# ###################
# Grammar Productions
# ###################
S[-INV] -> NP VP
S[-INV]/?x -> NP VP/?x
S[-INV] -> NP S/NP
S[-INV] -> Adv[+NEG] S[+INV]
S[+INV] -> V[+AUX] NP VP
S[+INV]/?x -> V[+AUX] NP VP/?x
SBar -> Comp S[-INV]
SBar/?x -> Comp S[-INV]/?x
VP -> V[SUBCAT=intrans, -AUX]
VP -> V[SUBCAT=trans, -AUX] NP
VP/?x -> V[SUBCAT=trans, -AUX] NP/?x
VP -> V[SUBCAT=clause, -AUX] SBar
VP/?x -> V[SUBCAT=clause, -AUX] SBar/?x
VP -> V[+AUX] VP
VP/?x -> V[+AUX] VP/?x
# ###################
# Lexical Productions
# ###################
V[SUBCAT=intrans, -AUX] -> 'walk' | 'sing'
V[SUBCAT=trans, -AUX] -> 'see' | 'like'
V[SUBCAT=clause, -AUX] -> 'say' | 'claim'
V[+AUX] -> 'do' | 'can'
NP[-WH] -> 'you' | 'cats'
NP[+WH] -> 'who'
Adv[+NEG] -> 'rarely' | 'never'
NP/NP ->
Comp -> 'that'


In [42]:
tokens = 'who do you claim that you like'.split()

In [43]:
from nltk import load_parser
cp = load_parser('grammars/book_grammars/feat1.fcfg')

In [44]:
for tree in cp.nbest_parse(tokens):
    print(tree)

AttributeError: 'FeatureChartParser' object has no attribute 'nbest_parse'

In [45]:
tokens = 'you claim that you like cats'.split()

In [46]:
for tree in cp.nbest_parse(tokens):
    print(tree)

AttributeError: 'FeatureChartParser' object has no attribute 'nbest_parse'

In [47]:
tokens = 'rarely do you sing'.split()

In [50]:
for tree in cp.nbest_parse(tokens):
    print(tree) #Attention attribute 'nbest_parse' has been deprecated in nltk3.0

AttributeError: 'FeatureChartParser' object has no attribute 'nbest_parse'

In [51]:
# Example feature-based grammar.
nltk.data.show_cfg('grammars/book_grammars/german.fcfg')

% start S
# Grammar Productions
S -> NP[CASE=nom, AGR=?a] VP[AGR=?a]
NP[CASE=?c, AGR=?a] -> PRO[CASE=?c, AGR=?a]
NP[CASE=?c, AGR=?a] -> Det[CASE=?c, AGR=?a] N[CASE=?c, AGR=?a]
VP[AGR=?a] -> IV[AGR=?a]
VP[AGR=?a] -> TV[OBJCASE=?c, AGR=?a] NP[CASE=?c]
# Lexical Productions
# Singular determiners
# masc
Det[CASE=nom, AGR=[GND=masc,PER=3,NUM=sg]] -> 'der' 
Det[CASE=dat, AGR=[GND=masc,PER=3,NUM=sg]] -> 'dem'
Det[CASE=acc, AGR=[GND=masc,PER=3,NUM=sg]] -> 'den'
# fem
Det[CASE=nom, AGR=[GND=fem,PER=3,NUM=sg]] -> 'die' 
Det[CASE=dat, AGR=[GND=fem,PER=3,NUM=sg]] -> 'der'
Det[CASE=acc, AGR=[GND=fem,PER=3,NUM=sg]] -> 'die' 
# Plural determiners
Det[CASE=nom, AGR=[PER=3,NUM=pl]] -> 'die' 
Det[CASE=dat, AGR=[PER=3,NUM=pl]] -> 'den' 
Det[CASE=acc, AGR=[PER=3,NUM=pl]] -> 'die' 
# Nouns
N[AGR=[GND=masc,PER=3,NUM=sg]] -> 'Hund'
N[CASE=nom, AGR=[GND=masc,PER=3,NUM=pl]] -> 'Hunde'
N[CASE=dat, AGR=[GND=masc,PER=3,NUM=pl]] -> 'Hunden'
N[CASE=acc, AGR=[GND=masc,PER=3,NUM=pl]] -> 'Hunde'
N[AGR=[GND=fem,PER=3,

In [52]:
tokens = 'ich folge den Katzen'.split()

In [53]:
cp = load_parser('grammars/book_grammars/german.fcfg')

In [54]:
for tree in cp.nbest_parse(tokens):
    print(tree)

AttributeError: 'FeatureChartParser' object has no attribute 'nbest_parse'

In [55]:
tokens = 'ich folge den Katzen'.split()

In [56]:
cp = load_parser('grammars/book_grammars/german.fcfg',trace = 2)

In [57]:
for tree in cp.nbest_parse(tokens):
    print(tree)

AttributeError: 'FeatureChartParser' object has no attribute 'nbest_parse'