## 1. Grammatical Features

In this chapter, we will investigate the role of features in building rule-based grammars. In contrast to feature extractors, which record features that have been automatically detected, we are now going to declare the features of words and phrases. We start off with a very simple example, using dictionaries to store features and their values.

In [0]:
kim = {'CAT': 'NP', 'ORTH': 'Kim', 'REF': 'k'}
chase = {'CAT': 'V', 'ORTH': 'chased', 'REL': 'chase'}

The objects kim and chase both have a couple of shared features, CAT (grammatical category) and ORTH (orthography, i.e., spelling). In addition, each has a more semantically-oriented feature: kim['REF'] is intended to give the referent of kim, while chase['REL'] gives the relation expressed by chase. In the context of rule-based grammars, such pairings of features and values are known as feature structures, and we will shortly see alternative notations for them.

**Feature Structures** Feature structures contain various kinds of information about grammatical entities. The information need not be exhaustive, and we might want to add further properties. For example, in the case of a verb, it is often useful to know what "semantic role" is played by the arguments of the verb. In the case of chase, the subject plays the role of "agent", while the object has the role of "patient". Let's add this information, using 'sbj' and 'obj' as placeholders which will get filled once the verb combines with its grammatical arguments:

In [0]:
chase['AGT'] = 'sbj'
chase['PAT'] = 'obj'

If we now process a sentence Kim chased Lee, we want to "bind" the verb's agent role to the subject and the patient role to the object. We do this by linking to the REF feature of the relevant NP. In the following example, we make the simple-minded assumption that the NPs immediately to the left and right of the verb are the subject and object respectively. We also add a feature structure for Lee to complete the example.

In [0]:
sent = "Kim chased Lee"
 tokens = sent.split()
 lee = {'CAT': 'NP', 'ORTH': 'Lee', 'REF': 'l'}
 def lex2fs(word):
   for fs in [kim, lee, chase]:
       if fs['ORTH'] == word:
             return fs
 subj, verb, obj = lex2fs(tokens[0]), lex2fs(tokens[1]), lex2fs(tokens[2])
 verb['AGT'] = subj['REF']
 verb['PAT'] = obj['REF']
 for k in ['ORTH', 'REL', 'AGT', 'PAT']:
     print "%-5s => %s" % (k, verb[k])

The same approach could be adopted for a different verb, say surprise, though in this case, the subject would play the role of "source" (SRC) and the object, the role of "experiencer" (EXP):

In [0]:
surprise = {'CAT': 'V', 'ORTH': 'surprised', 'REL': 'surprise',
             'SRC': 'sbj', 'EXP': 'obj'}

Feature structures are pretty powerful, but the way in which we have manipulated them is extremely ad hoc. Our next task in this chapter is to show how the framework of context free grammar and parsing can be expanded to accommodate feature structures, so that we can build analyses like this in a more generic and principled way. We will start off by looking at the phenomenon of syntactic agreement; we will show how agreement constraints can be expressed elegantly using features, and illustrate their use in a simple grammar.

Since feature structures are a general data structure for representing information of any kind, we will briefly look at them from a more formal point of view, and illustrate the support for feature structures offered by NLTK. In the final part of the chapter, we demonstrate that the additional expressiveness of features opens up a wide spectrum of possibilities for describing sophisticated aspects of linguistic structure.

**Syntactic Agreement** The following examples show pairs of word sequences, the first of which is grammatical and the second not. (We use an asterisk at the start of a word sequence to signal that is ungrammatical.)

(1)		
a.		this dog

b.		*these dog


(2)		
a.		these dogs

b.		*this dogs


In English, nouns are usually marked as being singular or plural. The form of the demonstrative also varies: this (singular) and these (plural). Examples (1b) and (2b) show that there are constraints on the use of demonstratives and nouns within a noun phrase: either both are singular or both are plural. A similar constraint holds between subjects and predicates:

(3)		
a.		the dog runs

b.		*the dog run


(4)		
a.		the dogs run

b.		*the dogs runs


Here we can see that morphological properties of the verb co-vary with syntactic properties of the subject noun phrase. This co-variance is called agreement. If we look further at verb agreement in English, we will see that present tense verbs typically have two inflected forms: one for third person singular, and another for every other combination of person and number.



**Agreement Paradigm for English Regular Verbs** 






