Mapped x:gc: to X -- manual intervention for a variety of case …

…issues, e.g. `xp`:gc: goes to ``XP``, while `Det`:gc: goes to ``Det``. Finished annotating ch09. In addition to annotations I changed the following: * converted some ..ex:: displays of feature structures into program output, which reduces the total number of examples and ensures that they correspond to the input (some didn't) * fixed case in attribute names, e.g. there was a place where we were told that (24) is equivalent to (23), when one used uppercase attributes and the other used lowercase attributes * reworked a few passages to use simpler language, and make slightly less use of "formalism", "framework"; in one place changed "adopt" to "illustrate" since we're not taking a stance in a theory paper; sometimes there were phrases that put the material in context, but perhaps not a context that the reader needs to know about, e.g.: There are a number of notations for representing reentrancy in matrix-style representations of feature structures. * In the introduction to X-bar syntax, I moved the literature references and the notational remark about horizontal bars to the further reading section (though it still needs to be worked in there). * Some other heavy-going stuff is flagged. svn/trunk@7814
nltk · Feb 28, 2009 · 95db04f · 95db04f
1 parent c73d6ab
commit 95db04f
Show file tree

Hide file tree

Showing 10 changed files with 508 additions and 491 deletions.
diff --git a/book/CheckList.txt b/book/CheckList.txt
@@ -48,4 +48,6 @@ ch08 typography should no longer use NP
 ch08 section 8.6 on grammar development is incomplete (incl PE08 discussion)
 ch08 assumes knowledge of "head" (did some content disappear?)
 ch09 lacks our standard opening
-ch09 uses :lex: role, not processed by docbook
+ch09 uses :lex: role, not processed by docbook
+ch09 could mention use of trees as source of features for ML
+ch09 includes contents of grammar files that have changed in data distribution
diff --git a/book/ch07.rst b/book/ch07.rst
@@ -357,7 +357,7 @@ where parsing constructs nested structures that are arbitrarily deep,
 chunking creates structures of fixed depth (typically depth 2).  These
 chunks often correspond to the lowest level of grouping identified in
 the full parse tree. This is illustrated in ex-parsing-chunking_ below,
-which shows an `np`:gc: chunk structure and a completely parsed
+which shows an ``NP`` chunk structure and a completely parsed
 counterpart:
 
 .. _ex-parsing-chunking:
@@ -663,9 +663,9 @@ chunk, then excise the chink:
 
 Two other rules for forming chunks are splitting and merging.
 A permissive chunking rule might put
-`the cat the dog chased`:lx: into a single `np`:gc: chunk
+`the cat the dog chased`:lx: into a single ``NP`` chunk
 because it does not detect that determiners introduce new chunks.
-For this we would need a rule to split an `np`:gc: chunk
+For this we would need a rule to split an ``NP`` chunk
 prior to any determiner, using a pattern like: ``"NP: <.*>}{<DT>"``.
 Conversely, we can craft rules to merge adjacent chunks under
 particular circumstances, e.g. ``"NP: <NN>{}<NN>"``.
@@ -676,10 +676,10 @@ and merging patterns in any order.
 Multiple Chunk Types
 --------------------
 
-So far we have only developed `np`:gc: chunkers.  However, as we saw earlier
-in the chapter, the CoNLL chunking data is also annotated for `pp`:gc: and
-`vp`:gc: chunks.  Here is an example of a chunked sentence that contains
-`np`:gc:, `vp`:gc:, and `pp`:gc: chunk types.
+So far we have only developed ``NP`` chunkers.  However, as we saw earlier
+in the chapter, the CoNLL chunking data is also annotated for ``PP`` and
+``VP`` chunks.  Here is an example of a chunked sentence that contains
+``NP``, ``VP``, and ``PP`` chunk types.
 
     >>> from nltk.corpus import conll2000
     >>> print conll2000.chunked_sents('train.txt')[99]
@@ -863,7 +863,7 @@ Reading IOB Format and the CoNLL 2000 Corpus
 
 Using the ``corpora`` module we can load Wall Street Journal
 text that has been tagged then chunked using the IOB notation.  The
-chunk categories provided in this corpus are `np`:gc:, `vp`:gc: and `pp`:gc:.  As we
+chunk categories provided in this corpus are ``NP``, ``VP`` and ``PP``.  As we
 have seen, each sentence is represented using multiple lines, as shown
 below::
 
@@ -876,7 +876,7 @@ below::
 |nopar| A conversion function ``chunk.conllstr2tree()`` builds a tree
 representation from one of these multi-line strings.  Moreover, it
 permits us to choose any subset of the three chunk types to use.  The
-example below produces only `np`:gc: chunks:
+example below produces only ``NP`` chunks:
 
 .. doctest-ignore::
     >>> text = '''
@@ -933,7 +933,7 @@ example that reads the 100th sentence of the "train" portion of the corpus:
 
 
 |nopar|
-This  showed three chunk types, for `np`:gc:, `vp`:gc: and `pp`:gc:.
+This  showed three chunk types, for ``NP``, ``VP`` and ``PP``.
 We can also select which chunk types to read:
 
     >>> print nltk.corpus.conll2000.chunked_sents('train.txt', chunk_types=('NP',))[99]
@@ -961,7 +961,7 @@ We start off by establishing a baseline for the trivial chunk parser
     0.440845995079
 
 This indicates that more than a third of the words are tagged with
-``O`` (i.e., not in an `np`:gc: chunk).  Now let's try a naive regular
+``O`` (i.e., not in an ``NP`` chunk).  Now let's try a naive regular
 expression chunker that looks for tags (e.g., ``CD``, ``DT``, ``JJ``,
 etc.) beginning with letters that are typical of noun phrase tags:
 
@@ -975,7 +975,7 @@ order to develop a more data-driven approach, let's define a function
 ``chunked_tags()`` that takes some chunked data
 and sets up a conditional frequency distribution.
 For each tag, it counts up the number of times the tag
-occurs inside an `np`:gc: chunk (the ``True`` case, where ``chtag`` is
+occurs inside an ``NP`` chunk (the ``True`` case, where ``chtag`` is
 ``B-NP`` or ``I-NP``), or outside a chunk (the ``False`` case, where
 ``chtag`` is ``O``).  It returns a list of those tags that occur
 inside chunks more often than outside chunks.
@@ -1245,7 +1245,7 @@ structures having a depth of at most four.
        (NP the/DT cat/NN)
        (VP sit/VB (PP on/IN (NP the/DT mat/NN)))))
 
-Unfortunately this result misses the `vp`:gc: headed by `saw`:lx:.  It has
+Unfortunately this result misses the ``VP`` headed by `saw`:lx:.  It has
 other shortcomings too.  Let's see what happens when we apply this
 chunker to a sentence having deeper nesting.
 
@@ -1298,10 +1298,10 @@ example of a tree (note that they are standardly drawn upside-down):
   .. tree:: (S (NP Alice) (VP (V chased) (NP the rabbit)))
 
 We use a 'family' metaphor to talk about the
-relationships of nodes in a tree: for example, `s`:gc: is the
-`parent`:dt: of `vp`:gc:; conversely `vp`:gc: is a `daughter`:dt: (or
-`child`:dt:) of `s`:gc:.  Also, since `np`:gc: and `vp`:gc: are both
-daughters of `s`:gc:, they are also `sisters`:dt:.
+relationships of nodes in a tree: for example, ``S`` is the
+`parent`:dt: of ``VP``; conversely ``VP`` is a `daughter`:dt: (or
+`child`:dt:) of ``S``.  Also, since ``NP`` and ``VP`` are both
+daughters of ``S``, they are also `sisters`:dt:.
 For convenience, there is also a text format for specifying
 trees: 
 
@@ -1437,7 +1437,7 @@ answer because the part-of-speech tags are too impoverished and do not
 give us sufficient information about the lexical item.  A second
 approach is to write utility programs to analyze the training data,
 such as counting the number of times a given part-of-speech tag occurs
-inside and outside an `np`:gc: chunk.  A third approach is to evaluate the
+inside and outside an ``NP`` chunk.  A third approach is to evaluate the
 system against some gold standard data to obtain an overall
 performance score.  We can even use this to parameterize the system,
 specifying which chunk rules are used on a given run, and tabulating
@@ -1463,11 +1463,11 @@ The word `chink`:dt: initially meant a sequence of stopwords,
 according to a 1975 paper by Ross and Tukey [Abney1996PST]_.
 
 The IOB format (or sometimes  `BIO Format`:dt:) was developed for
-`np`:gc: chunking by [Ramshaw1995TCU]_, and was used for the shared `np`:gc:
+``NP`` chunking by [Ramshaw1995TCU]_, and was used for the shared ``NP``
 bracketing task run by the *Conference on Natural Language Learning*
 (|CoNLL|) in 1999.  The same format was
 adopted by |CoNLL| 2000 for annotating a section of Wall Street
-Journal text as part of a shared task on `np`:gc: chunking.
+Journal text as part of a shared task on ``NP`` chunking.
 
 Section 13.5 of [JurafskyMartin2008]_ contains a discussion of chunking.
 Chapter 22 covers information extraction, including named entity recognition.
@@ -1585,7 +1585,7 @@ Exercises
    re-evaluate it, to see if you have discovered an improved baseline.
 
 #. |hard|
-   Develop an `np`:gc: chunker that converts POS-tagged text into a list of
+   Develop an ``NP`` chunker that converts POS-tagged text into a list of
    tuples, where each tuple consists of a verb followed by a sequence of
    noun phrases and prepositions,
    e.g. ``the little cat sat on the mat`` becomes ``('sat', 'on', 'NP')``...

diff --git a/book/ch08-extras.rst b/book/ch08-extras.rst
@@ -32,7 +32,7 @@ follows:
 .. XXX "So we might adopt the heuristic that" -> "Suppose that"
 
 So we might adopt the heuristic that the subject of a sentence is the
-`np`:gc: chunk that immediately precedes the tensed verb: this would
+``NP`` chunk that immediately precedes the tensed verb: this would
 correctly yield ``(NP the/DT little/JJ bear/NN)`` as
 subject. Unfortunately, this simple rule very quickly fails, as shown
 by a more complex example.
@@ -58,27 +58,27 @@ by a more complex example.
 What's doing the "preventing" in this example is not the firm monetary
 policy, but rather the restated commitment to such a policy. We can
 also see from this example that a different simple rule, namely
-treating the initial `np`:gc: chunk  as the subject, also fails, since this
+treating the initial ``NP`` chunk  as the subject, also fails, since this
 would give us the ``(NP the/DT Exchequer/NNP)``. By contrast, a
 complete phrase structure analysis of
-the sentence would group together all the pre-verbal `np`:gc: chunks
-into a single `np`:gc: constituent:
+the sentence would group together all the pre-verbal ``NP`` chunks
+into a single ``NP`` constituent:
 
 .. ex:: 
     .. tree:: (NP(NP (NP (Nom (N Chancellor) (PP (P of)(NP (Det the) (N Exchequer))))(NP Nigel Lawson)) (POSS 's))(Nom (Adj restated)(Nom (N commitment)(PP (P to)(NP (Det a)(Nom (Adj firm) (Nom (Adj monetary)(Nom (N policy)))))))))
        :scale: 80:80:50
 
 We still have a little work to determine which part of this complex
-`np`:gc: corresponds to the "who", but nevertheless, this is much
+``NP`` corresponds to the "who", but nevertheless, this is much
 more tractable than answering the same question from a flat sequence
 of chunks.
 
 "Subject" and "direct object" are examples of `grammatical
 functions`:dt:. Although they are not captured directly in a phrase
 structure grammar, they can be defined in terms of tree
-configurations. In ex-gfs_, the subject of `s`:gc: is the `np`:gc:
-immediately dominated by  `s`:gc: while the direct object of `v`:gc:
-is the `np`:gc: directly dominated by `vp`:gc:.
+configurations. In ex-gfs_, the subject of ``S`` is the ``NP``
+immediately dominated by  ``S`` while the direct object of ``V``
+is the ``NP`` directly dominated by ``VP``.
 
 .. _ex-gfs:
 .. ex::
@@ -128,14 +128,14 @@ top-down parser processes *VP* |rarr| *V* *NP* *PP*,
 it may find *V* and *NP* but not the *PP*.  This work
 can be reused when processing *VP* |rarr| *V* *NP*.
 Thus, we will record the
-hypothesis that "the `v`:gc: constituent `likes`:lx: is the beginning of a `vp`:gc:."
+hypothesis that "the ``V`` constituent `likes`:lx: is the beginning of a ``VP``."
 
 We can do this by adding a `dot`:dt: to the edge's right hand side.
 Material to the left of the dot records what has been found so far;
 material to the right of the dot specifies what still needs to be found in order
 to complete the constituent.  For example, the edge in
-ex-dottededge_ records the hypothesis that "a `vp`:gc: starts with the `v`:gc:
-`likes`:lx:, but still needs an `np`:gc: to become complete":
+ex-dottededge_ records the hypothesis that "a ``VP`` starts with the ``V``
+`likes`:lx:, but still needs an ``NP`` to become complete":
 
 .. _ex-dottededge:
 .. ex::
@@ -149,18 +149,18 @@ Types of Edge
 -------------
 
 Let's take stock.
-An edge [`VP`:gc: |rarr| |dot| `V`:gc: `NP`:gc: `PP`:gc:, (*i*, *i*)]
-records the hypothesis that a `VP`:gc: begins at location *i*, and that we anticipate
-finding a sequence `V NP PP`:gc: starting here.  This is known as a
+An edge [``VP`` |rarr| |dot| ``V`` ``NP`` ``PP``, (*i*, *i*)]
+records the hypothesis that a ``VP`` begins at location *i*, and that we anticipate
+finding a sequence ``V NP PP`` starting here.  This is known as a
 `self-loop edge`:dt:; see ex-chart-intro-selfloop_.
-An edge [`VP`:gc: |rarr| `V`:gc: |dot| `NP`:gc: `PP`:gc:, (*i*, *j*)]
-records the fact that we have discovered a `V`:gc: spanning (*i*, *j*),
-and hypothesize a following `NP PP`:gc: sequence to complete a `VP`:gc:
+An edge [``VP`` |rarr| ``V`` |dot| ``NP`` ``PP``, (*i*, *j*)]
+records the fact that we have discovered a ``V`` spanning (*i*, *j*),
+and hypothesize a following ``NP PP`` sequence to complete a ``VP``
 beginning at *i*.  This is known as an `incomplete edge`:dt:;
 see ex-chart-intro-incomplete_.
-An edge [`VP`:gc: |rarr| `V`:gc: `NP`:gc: `PP`:gc: |dot| , (*i*, *k*)]
-records the discovery that a `VP`:gc: consisting of the sequence
-`V NP PP`:gc: has been discovered for the span (*i*, *j*).  This is known
+An edge [``VP`` |rarr| ``V`` ``NP`` ``PP`` |dot| , (*i*, *k*)]
+records the discovery that a ``VP`` consisting of the sequence
+``V NP PP`` has been discovered for the span (*i*, *j*).  This is known
 as a `complete edge`:dt:; see ex-chart-intro-parseedge_.
 If a complete edge spans the entire sentence, and has the grammar's
 start symbol as its left-hand side, then the edge is called a `parse
@@ -244,7 +244,7 @@ bottom-up parsing starts from the input string,
 and tries to find sequences of words and phrases that
 correspond to the *right hand* side of a grammar production. The
 parser then replaces these with the left-hand side of the production,
-until the whole sentence is reduced to an `S`:gc:.  Bottom-up chart
+until the whole sentence is reduced to an ``S``.  Bottom-up chart
 parsing is an extension of this approach in which hypotheses about
 structure are recorded as edges on a chart. In terms of our earlier
 terminology, bottom-up chart parsing can be seen as a parsing
@@ -296,11 +296,11 @@ for each grammar production whose right hand side begins with category
    :scale: 30
 
 The next step is to use the Fundamental Rule to add edges
-like [`np`:gc: |rarr| Lee |dot| , (0, 1)],
+like [``NP`` |rarr| Lee |dot| , (0, 1)],
 where we have "moved the dot" one position to the right.
 After this, we will now be able to add new self-loop edges such as 
-[`s`:gc: |rarr|  |dot| `np`:gc: `vp`:gc:, (0, 0)] and
-[`vp`:gc: |rarr|  |dot| `vp`:gc: `np`:gc:, (1, 1)], and use these to
+[``S`` |rarr|  |dot| ``NP`` ``VP``, (0, 0)] and
+[``VP`` |rarr|  |dot| ``VP`` ``NP``, (1, 1)], and use these to
 build more complete edges.
 
 Using these three rules, we can parse a sentence as shown in
@@ -329,29 +329,29 @@ Top-Down Parsing
 ----------------
 
 Top-down chart parsing works in a similar way to the recursive descent
-parser, in that it starts off with the top-level goal of finding an `s`:gc:.
-This goal is broken down into the subgoals of trying to find constituents such as `np`:gc: and
-`vp`:gc: predicted by the grammar.
+parser, in that it starts off with the top-level goal of finding an ``S``.
+This goal is broken down into the subgoals of trying to find constituents such as ``NP`` and
+``VP`` predicted by the grammar.
 To create a top-down chart parser, we use the Fundamental Rule as before plus
 three other rules: the `Top-Down Initialization Rule`:dt:, the `Top-Down
 Expand Rule`:dt:, and the `Top-Down Match Rule`:dt:.
 The Top-Down Initialization Rule in ex-td-init-rule_
 captures the fact that the root of any
-parse must be the start symbol `s`:gc:\.
+parse must be the start symbol ``S``\.
 
 .. _ex-td-init-rule:
-.. ex:: `Top-Down Initialization Rule`:dt: For each production `s`:gc: |rarr| |alpha|
-   add the self-loop edge [`s`:gc: |rarr| |dot|\ |alpha|\ , (0, 0)]
+.. ex:: `Top-Down Initialization Rule`:dt: For each production ``S`` |rarr| |alpha|
+   add the self-loop edge [``S`` |rarr| |dot|\ |alpha|\ , (0, 0)]
 
    |chart_td_ex1|
 
 .. |chart_td_ex1| image:: ../images/chart_td_ex1.png
    :scale: 30
 
-In our running example, we are predicting that we will be able to find an `np`:gc: and a
-`vp`:gc: starting at 0, but have not yet satisfied these subgoals.
-In order to find an  `np`:gc: we need to
-invoke a production that has `np`:gc: on its left hand side. This work
+In our running example, we are predicting that we will be able to find an ``NP`` and a
+``VP`` starting at 0, but have not yet satisfied these subgoals.
+In order to find an  ``NP`` we need to
+invoke a production that has ``NP`` on its left hand side. This work
 is done by the Top-Down Expand Rule ex-td-expand-rule_.
 This tells us that if our chart contains an incomplete
 edge whose dot is followed by a nonterminal *B*, then the parser
@@ -387,7 +387,7 @@ add an edge if the terminal corresponds to the current input symbol.
 
 Here we see our example chart after applying the Top-Down Match rule.
 After this, we can apply the fundamental rule to
-add the edge [`np`:gc: |rarr| Lee |dot| , (0, 1)].
+add the edge [``NP`` |rarr| Lee |dot| , (0, 1)].
 
 Using these four rules, we can parse a sentence top-down as shown in
 ex-top-down-strategy_.
@@ -452,8 +452,8 @@ which *P* dominates *w*. More precisely:
 
 To illustrate, suppose the input is of the form 
 `I saw ...`:lx:, and the chart already contains the edge 
-[`vp`:gc: |rarr|  |dot| `v`:gc: ..., (1, 1)]. Then the Scanner Rule will add to 
-the chart the edges [`v`:gc: |rarr| 'saw', (1, 2)]
+[``VP`` |rarr|  |dot| ``V`` ..., (1, 1)]. Then the Scanner Rule will add to 
+the chart the edges [``V`` |rarr| 'saw', (1, 2)]
 and ['saw'|rarr| |dot|\ , (1, 2)]. So in effect the Scanner Rule packages up a
 sequence of three rule applications: the Bottom-Up Initialization Rule for 
 [*w* |rarr| |dot|\ , (*j*, *j*\ +1)],
@@ -706,7 +706,7 @@ is the product of the probability of the production that
 generated it and the probabilities of its children.  For example, the
 probability of the edge ``[Edge: S`` |rarr| ``NP``\ |dot|\ ``VP, 0:2]``
 is the probability of the PCFG production ``S`` |rarr| ``NP VP``
-multiplied by the probability of its `np`:gc: child.
+multiplied by the probability of its ``NP`` child.
 (Note that an edge's tree only includes children for elements to the left
 of the edge's dot.)