Skip to content

Commit

Permalink
changed some x:gc: into X
Browse files Browse the repository at this point in the history
added comments on ch09

svn/trunk@7813
  • Loading branch information
stevenbird committed Feb 28, 2009
1 parent 624985e commit c73d6ab
Show file tree
Hide file tree
Showing 4 changed files with 83 additions and 68 deletions.
6 changes: 4 additions & 2 deletions book/CheckList.txt
Expand Up @@ -41,9 +41,11 @@ ch07 has a conclusion (non-standard) but no summary
ch07 needs some non-chunking exercises
ch07 could describe SRL in 7.1 as another shallow processing task
ch07 should describe NLTK's off-the-shelf NE tagger
ch07 typography should follow the simplified style of later chapters, e.g. with :gc:
ch07 typography should follow the simplified style of later chapters, e.g. with NP
ch08 language is more formal than necessary, less accessible than it should be
ch08 defines "string" to have a new meaning
ch08 typography should no longer use :gc:
ch08 typography should no longer use NP
ch08 section 8.6 on grammar development is incomplete (incl PE08 discussion)
ch08 assumes knowledge of "head" (did some content disappear?)
ch09 lacks our standard opening
ch09 uses :lex: role, not processed by docbook
4 changes: 2 additions & 2 deletions book/ch05.rst
Expand Up @@ -1804,7 +1804,7 @@ the idea of ongoing, incomplete action (e.g. `falling`:lx:, `eating`:lx:).
The `-ing`:lx: suffix also appears on nouns derived from verbs, e.g. `the
falling of the leaves`:lx: (this is known as the `gerund`:dt:).
(Since the present participle and the gerund cannot be systematically
distinguished, they are often tagged with the same tag, i.e. `VBG`:gc:
distinguished, they are often tagged with the same tag, i.e. ``VBG``
in the Brown Corpus tagset).

Syntactic Clues
Expand Down Expand Up @@ -2200,7 +2200,7 @@ Exercises
#. |soso| In the introduction we saw a table involving frequency counts for
the verbs `adore`:lx:, `love`:lx:, `like`:lx:, `prefer`:lx: and
preceding qualifiers such as `really`:lx:. Investigate the full
range of qualifiers (Brown tag `QL`:gc:) that appear before these
range of qualifiers (Brown tag ``QL``) that appear before these
four verbs.

#. |soso|
Expand Down
81 changes: 47 additions & 34 deletions book/ch09.rst
Expand Up @@ -76,7 +76,7 @@ sentences that the system should handle.

.. _ex-train0:
.. ex::
Which stations does the 9.00 express from Amsterdam to Paris stop at?
Which *stations* does the 9.00 express from Amsterdam to Paris stop at?

The information that the customer is seeking is not exotic |mdash| the
system back-end just needs to look up the list of stations on the
Expand All @@ -87,7 +87,13 @@ system trying to answer ex-train1_ instead:

.. _ex-train1:
.. ex::
Which station does the 9.00 express from Amsterdam terminate at?
Which *station* does the 9.00 express from Amsterdam terminate at?

.. XXX I was initially confused by this example given the most obvious
change (stop->terminate) and wrongly assumed this was making a point
about lexical semantics (stop means two things). Can we have an
example that only changes number, not also the lexical item?
(cf German example)
Part of your solution might use domain knowledge to figure out that if
a speaker knows that the train is a train to Paris, then she probably
Expand Down Expand Up @@ -152,6 +158,13 @@ representation of syntactic structure.
Why Grammatical Features?
-------------------------

.. XXX we haven't used the term "grammatical feature" at all so far
.. XXX how about calling them "grammatical properties" and using
Python dictionary notation?
.. XXX "of the rather general sort that are appealed to in the context of"
We have already used the term `grammatical feature`:dt: a few times in
this chapter, without saying what it means. We need to
clarify that we are not talking about features of the rather general
Expand All @@ -165,7 +178,7 @@ modest, and assume that we do not know `everything`:em: about this
word, but we can at least give a partial description. To signal that
we are talking about the lexeme of the word (cf chap-words_), we use
the label `these`:lex:. For example, we know that the orthography of
`these`:lex: is `these`:lx:, its phonological form is ``DH IY Z``
`these`:lex: is `these`:lx:, its pronunciation is ``DH IY Z``
(cf. the *Arpabet* mentioned in chap-corpora_), its part-of-speech is
``Det``, and its number is plural. We use "dot notation" to record
these observations:
Expand All @@ -175,7 +188,7 @@ these observations:
.. parsed-literal::
`these`:lex:.spelling = `these`:lx:
`these`:lex:.phonology = DH IY Z
`these`:lex:.pron = DH IY Z
`these`:lex:.pos = Det
`these`:lex:.number = plural
Expand Down Expand Up @@ -231,10 +244,10 @@ string to signal that is ungrammatical.)
.. ex::
\*this dogs

In English, nouns are usually morphologically marked as being singular
In English, nouns are usually marked as being singular
or plural. The form of the demonstrative also varies:
`this`:lx: (singular) and `these`:lx: (plural).
ex-thisdog_ and ex-thesedogs_ show that there are constraints on
Examples ex-thisdog_ and ex-thesedogs_ show that there are constraints on
the use of demonstratives and nouns within a noun phrase:
either both are singular or both are plural. A similar
constraint holds between subjects and predicates:
Expand Down Expand Up @@ -313,18 +326,18 @@ context-free grammar. We will begin with the simple CFG in ex-agcfg0_.
N -> 'dog'
V -> 'runs'

ex-agcfg0_ allows us to generate the sentence `this dog runs`:lx:;
Grammar ex-agcfg0_ allows us to generate the sentence `this dog runs`:lx:;
however, what we really want to do is also generate `these dogs
run`:lx: while blocking unwanted strings such as `*this dogs run`:lx:
run`:lx: while blocking unwanted sequences like `*this dogs run`:lx:
and `*these dog runs`:lx:. The most straightforward approach is to
add new non-terminals and productions to the grammar:

.. _ex-agcfg1:
.. ex::
::

S_SG -> NP_SG VP_SG
S_PL -> NP_PL VP_PL
S -> NP_SG VP_SG
S -> NP_PL VP_PL
NP_SG -> Det_SG N_SG
NP_PL -> Det_PL N_PL
VP_SG -> V_SG
Expand Down Expand Up @@ -404,12 +417,10 @@ these to state constraints:
VP[NUM=?n] -> V[NUM=?n]

We are using ``?n`` as a variable over values of ``NUM``; it can
be instantiated either to ``sg`` or ``pl``. Its scope is
limited to individual productions. That is, within any given production from
the list above,
``?n`` must be instantiated to the same constant value; we can
read the first production as saying that whatever value ``NP`` takes for the feature
``NUM``, ``VP`` must take the same value.
be instantiated either to ``sg`` or ``pl``, within a given production.
We can read the first production as saying that whatever
value ``NP`` takes for the feature ``NUM``,
``VP`` must take the same value.

In order to understand how these feature constraints work, it's
helpful to think about how one would go about building a tree. Lexical
Expand All @@ -433,7 +444,7 @@ depth one):
.. tree:: (N[NUM=pl] dogs)

Now ``S -> NP[NUM=?n] VP[NUM=?n]`` says that whatever the ``NUM``
values of ``N`` and ``Det` are, they have to be the
values of ``N`` and ``Det`` are, they have to be the
same. Consequently, ``NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]`` will
permit ex-this_ and ex-dog_ to be combined into an ``NP`` as shown
in ex-good1_ and it will also allow ex-these_ and ex-dogs_ to be
Expand Down Expand Up @@ -462,10 +473,10 @@ indicated informally with a *FAIL* value at the top node.

Production ``VP[NUM=?n] -> V[NUM=?n]`` says
that the ``NUM`` value of the head verb has to be the same as the
``NUM`` value of the `VP`:gc: mother. Combined with the production for
``NUM`` value of the ``VP`` parent. Combined with the production for
expanding ``S``, we
derive the consequence that if the ``NUM`` value of the subject head
noun is ``pl``, then so is the ``NUM`` value of the `VP`:gc:\ 's
noun is ``pl``, then so is the ``NUM`` value of the ``VP``\ 's
head verb.

.. ex::
Expand Down Expand Up @@ -1129,7 +1140,7 @@ Subcategorization

In chap-parse_, we proposed to augment our category labels to
represent different kinds of verb, and introduced labels such as
`iv`:gc: and `tv`:gc: for intransitive and transitive verbs
``IV`` and ``TV`` for intransitive and transitive verbs
respectively. This allowed us to write productions like the
following:

Expand All @@ -1140,15 +1151,15 @@ following:
VP -> IV
VP -> TV NP

Although we know that `iv`:gc: and `tv`:gc: are two
kinds of `v`:gc:, from a formal point of view
`iv`:gc: has no closer relationship with `tv`:gc: than it does
with `np`:gc:. As it stands, `iv`:gc: and `tv`:gc: are just atomic
Although we know that ``IV`` and ``TV`` are two
kinds of ``V``, from a formal point of view
``IV`` has no closer relationship with ``TV`` than it does
with ``NP``. As it stands, ``IV`` and ``TV`` are just atomic
nonterminal symbols from a CFG. This approach doesn't allow us
to say anything about the class of verbs in general.
For example, we cannot say something like "All lexical
items of category `v`:gc: can be marked for tense", since `walk`:lx:,
say, is an item of category `iv`:gc:, not `v`:gc:.
items of category ``V`` can be marked for tense", since `walk`:lx:,
say, is an item of category ``IV``, not ``V``.
The question arises as to whether we can replace category labels such
as ``TV`` and ``IV`` by ``V`` together with a feature that tells us
whether the verb in question combines with (or is subcategorized for)
Expand Down Expand Up @@ -1220,7 +1231,7 @@ and Head-driven Phrase Structure Grammar. Rather than using
``SUBCAT`` values as a way of indexing productions, the ``SUBCAT``
value directly encodes the valency of a head (the list of
arguments that it can combine with). For example, a verb like
`put`:lx: that takes `NP` and `PP` complements (`put the
`put`:lx: that takes ``NP`` and ``PP`` complements (`put the
book on the table`:lx:) might be represented as ex-subcathpsg0_:

.. TODO: angle brackets don't appear
Expand Down Expand Up @@ -1259,6 +1270,8 @@ category assignments combine in a parse of `Kim put the book on the table`:lx:.
Heads Revisited
---------------

.. XXX changed mother / head-daughter to parent / head-child in following
We noted in the previous section that by factoring subcategorization
information out of the main category label, we could express more
generalizations about properties of verbs. Another property of this
Expand All @@ -1270,7 +1283,7 @@ s (i.e., adjectives) are heads of ``PP``\ s. Not all phrases have
heads |mdash| for example, it is standard to say that coordinate
phrases (e.g., `the book and the bell`:lx:) lack heads |mdash|
nevertheless, we would like our grammar formalism to express the
mother / head-daughter
parent / head-child
relation where it holds. Now, although it looks as though there is
something in common between, say, ``V`` and ``VP``, this is more
of a handy convention than a real claim, since ``V`` and ``VP``
Expand Down Expand Up @@ -1445,7 +1458,7 @@ Consider the following contrasts:
\*You put.

The verb `like`:lx: requires an ``NP`` complement, while
`put`:lx: requires both a following ``NP`` and `pp`.
`put`:lx: requires both a following ``NP`` and ``PP``.
ex-gap1_ and ex-gap2_ show that these complements are *obligatory*:
omitting them leads to ungrammaticality. Yet there are contexts in
which obligatory complements can be omitted, as ex-gap3_ and ex-gap4_
Expand Down Expand Up @@ -1634,7 +1647,7 @@ In order to percolate the slash feature correctly, we need to add
slashes with variable values to both sides of the arrow in productions
that expand ``S``, ``VP`` and ``NP``. For example, ``VP/?x -> V SBar/?x`` is
the slashed version of ``VP -> V SBar`` and
says that a slash value can be specified on the ``VP`` mother of a
says that a slash value can be specified on the ``VP`` parent of a
constituent if the same value is also specified on the ``SBar``
daughter. Finally, ``NP/NP ->`` allows the slash information on ``NP`` to
be discharged as the empty string.
Expand Down Expand Up @@ -2115,12 +2128,12 @@ Exercises
combinations have the same realization. Propose and implement a
method for dealing with this.

#. |hard| So-called `head features`:dt: are shared between the mother
and head daughter. For example, `tense` is a head feature
that is shared between a `vp` and its head ``V``
#. |hard| So-called `head features`:dt: are shared between the parent
node and head child. For example, `tense` is a head feature
that is shared between a ``VP`` and its head ``V``
daughter. See [Gazdar1985GPS]_ for more details. Most of the
features we have looked at are head features |mdash| exceptions are
``SUBCAT`` and `slash`. Since the sharing of head
``SUBCAT`` and ``slash``. Since the sharing of head
features is predictable, it should not need to be stated explicitly
in the grammar productions. Develop an approach that automatically
accounts for this regular behavior of head features.
Expand Down

0 comments on commit c73d6ab

Please sign in to comment.