Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit Steven Bird, mEwan Klein, and Edward Loper http://www.nltk.org/book/

# Chapter 09 - Building Feature-Based Grammars

## 9.1 Grammatical Features

In this part, we will investigate the role of features in building rule-based grammars. In contrast to feature extractors, which record features that have been automatically detected, we are now going to declare the features of words and phrases. We start off with a very simple example, using dictionaries to store features and their values.

In [1]:
kim = {'CAT': 'NP', 'ORTH': 'Kim', 'REF': 'k'}

In [2]:
chase = {'CAT': 'V', 'ORTH': 'chased', 'REL': 'chase'}

Feature structures contain various kinds of information about grammatical entities. The information need not be exhaustive, and we might want to add further properties. For example, in the case of a verb, it is often useful to know what “semantic role” is played by the arguments of the verb. In the case of chase, the subject plays the role of “agent,” whereas the object has the role of “patient.” Let’s add this information, using 'sbj' (subject) and 'obj' (object) as placeholders which will get filled once the verb combines with its grammatical arguments:

In [3]:
chase['AGT'] = 'sbj'

In [4]:
chase['PAT'] = 'obj'

If we now process a sentence Kim chased Lee, we want to “bind” the verb’s agent role to the subject and the patient role to the object. We do this by linking to the REF feature
of the relevant NP. In the following example, we make the simple-minded assumption that the NPs immediately to the left and right of the verb are the subject and object,respectively. We also add a feature structure for Lee to complete the example.

In [5]:
sent = "Kim chased Lee"
tokens = sent.split()
lee = {'CAT': 'NP', 'ORTH': 'Lee', 'REF': 'l'}

In [6]:
def lex2fs(word):
    for fs in [kim, lee, chase]:
        if fs['ORTH'] == word:
            return fs

In [7]:
subj, verb, obj = lex2fs(tokens[0]), lex2fs(tokens[1]), lex2fs(tokens[2])
verb['AGT'] = subj['REF'] # agent of 'chase' is Kim
verb['PAT'] = obj['REF'] # patient of 'chase' is Lee

In [8]:
for k in ['ORTH', 'REL', 'AGT', 'PAT']: # check featstruct of 'chase'
    print("%-5s => %s" % (k, verb[k]))

ORTH  => chased
REL   => chase
AGT   => k
PAT   => l


The same approach could be adopted for a different verb—say, surprise—though in this case, the subject would play the role of “source” (SRC), and the object plays the role of “experiencer” (EXP):

In [9]:
surprise = {'CAT': 'V', 'ORTH': 'surprised', 'REL': 'surprise',
            'SRC': 'sbj', 'EXP': 'obj'}

### Syntactic Agreement

### Using Attributes and Constraints

In [10]:
# Example feature-based grammar
import nltk
nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg')

% start S
# ###################
# Grammar Productions
# ###################
# S expansion productions
S -> NP[NUM=?n] VP[NUM=?n]
# NP expansion productions
NP[NUM=?n] -> N[NUM=?n] 
NP[NUM=?n] -> PropN[NUM=?n] 
NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]
NP[NUM=pl] -> N[NUM=pl] 
# VP expansion productions
VP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n]
VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP
# ###################
# Lexical Productions
# ###################
Det[NUM=sg] -> 'this' | 'every'
Det[NUM=pl] -> 'these' | 'all'
Det -> 'the' | 'some' | 'several'
PropN[NUM=sg]-> 'Kim' | 'Jody'
N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child'
N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children' 
IV[TENSE=pres,  NUM=sg] -> 'disappears' | 'walks'
TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'
IV[TENSE=pres,  NUM=pl] -> 'disappear' | 'walk'
TV[TENSE=pres, NUM=pl] -> 'see' | 'like'
IV[TENSE=past] -> 'disappeared' | 'walked'
TV[TENSE=past] -> 'saw' | 'liked'


In [11]:
tokens = 'Kim likes children'.split()

In [12]:
from nltk import load_parser

cp = load_parser('grammars/book_grammars/feat0.fcfg', trace=2)
for tree in cp.parse(tokens):
    print(tree)

|.Kim .like.chil.|
Leaf Init Rule:
|[----]    .    .| [0:1] 'Kim'
|.    [----]    .| [1:2] 'likes'
|.    .    [----]| [2:3] 'children'
Feature Bottom Up Predict Combine Rule:
|[----]    .    .| [0:1] PropN[NUM='sg'] -> 'Kim' *
Feature Bottom Up Predict Combine Rule:
|[----]    .    .| [0:1] NP[NUM='sg'] -> PropN[NUM='sg'] *
Feature Bottom Up Predict Combine Rule:
|[---->    .    .| [0:1] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'sg'}
Feature Bottom Up Predict Combine Rule:
|.    [----]    .| [1:2] TV[NUM='sg', TENSE='pres'] -> 'likes' *
Feature Bottom Up Predict Combine Rule:
|.    [---->    .| [1:2] VP[NUM=?n, TENSE=?t] -> TV[NUM=?n, TENSE=?t] * NP[] {?n: 'sg', ?t: 'pres'}
Feature Bottom Up Predict Combine Rule:
|.    .    [----]| [2:3] N[NUM='pl'] -> 'children' *
Feature Bottom Up Predict Combine Rule:
|.    .    [----]| [2:3] NP[NUM='pl'] -> N[NUM='pl'] *
Feature Bottom Up Predict Combine Rule:
|.    .    [---->| [2:3] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'pl'}
Feature Single Edge Fundame

### Terminology

## 9.2 Processing Feature Structures

In this part, we will show how feature structures can be constructed and manipulated in NLTK. We will also discuss the fundamental operation of unification, which allows us to combine the information contained in two different feature structures. Feature structures in NLTK are declared with the FeatStruct() constructor. Atomic feature values can be strings or integers.

In [None]:
fs1 = nltk.FeatStruct(TENSE='past', NUM='sg')

In [None]:
print(fs1)

In [None]:
fs1 = nltk.FeatStruct(PER=3, NUM='pl', GND='fem')

In [None]:
print(fs1['GND'])

In [None]:
fs2 = nltk.FeatStruct(POS='N', AGR=fs1)

In [None]:
print(fs2)

In [None]:
print(fs2['AGR'])

In [None]:
print(fs2['AGR']['PER'])

In [None]:
print(nltk.FeatStruct("[POS='N', AGR=[PER=3, NUM='pl', GND='fem']]"))

In [None]:
print(nltk.FeatStruct(name='Lee', telno='01 27 86 42 96', age=33))

In [None]:
print(nltk.FeatStruct("""[NAME='Lee', ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
                          SPOUSE=[NAME='Kim', ADDRESS->(1)]]"""))

In [None]:
print(nltk.FeatStruct("[A='a', B=(1)[C='c'], D->(1), E->(1)]"))

In [None]:
fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal')
fs2 = nltk.FeatStruct(CITY='Paris')

In [None]:
print(fs1.unify(fs2))

In [None]:
print(fs2.unify(fs1))

In [None]:
fs0 = nltk.FeatStruct(A='a')
fs1 = nltk.FeatStruct(A='b')
fs2 = fs0.unify(fs1)

In [None]:
print(fs2)

In [None]:
fs0 = nltk.FeatStruct("""[NAME=Lee,
                          ADDRESS=[NUMBER=74,
                          STREET='rue Pascal'],
                          SPOUSE= [NAME=Kim,
                          ADDRESS=[NUMBER=74,
                          STREET='rue Pascal']]]""")

In [None]:
print(fs0)

In [None]:
fs1 = nltk.FeatStruct("[SPOUSE = [ADDRESS = [CITY = Paris]]]")

In [None]:
print(fs1.unify(fs0))

In [None]:
fs2 = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
                          SPOUSE=[NAME=Kim, ADDRESS->(1)]]""")

In [None]:
print(fs1.unify(fs2))

In [None]:
fs1 = nltk.FeatStruct("[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]")
fs2 = nltk.FeatStruct("[ADDRESS1=?x, ADDRESS2=?x]")

In [None]:
print(fs2)

In [None]:
print(fs2.unify(fs1))

# Extending a Feature-Based Grammar

In this part, we return to feature-based grammar and explore a variety of linguistic issues, and demonstrate the benefits of incorporating features into the grammar.

In [None]:
# Grammar with productions for inverted clauses and long-distance dependencies, making use of slash categories.
nltk.data.show_cfg('grammars/book_grammars/feat1.fcfg')

In [None]:
tokens = 'who do you claim that you like'.split()

In [None]:
from nltk import load_parser
cp = load_parser('grammars/book_grammars/feat1.fcfg')

In [None]:
for tree in cp.nbest_parse(tokens):
    print(tree)

In [None]:
tokens = 'you claim that you like cats'.split()

In [None]:
for tree in cp.nbest_parse(tokens):
    print(tree)

In [None]:
tokens = 'rarely do you sing'.split()

In [None]:
for tree in cp.nbest_parse(tokens):
    print(tree)

In [None]:
# Example feature-based grammar.
nltk.data.show_cfg('grammars/book_grammars/german.fcfg')

In [None]:
tokens = 'ich folge den Katzen'.split()

In [None]:
cp = load_parser('grammars/book_grammars/german.fcfg')

In [None]:
for tree in cp.nbest_parse(tokens):
    print(tree)

In [None]:
tokens = 'ich folge den Katzen'.split()

In [None]:
cp = load_parser('grammars/book_grammars/german.fcfg',trace = 2)

In [None]:
for tree in cp.nbest_parse(tokens):
    print(tree)