From 95db04fea179610f812a83548cb06bd6b4f088be Mon Sep 17 00:00:00 2001
From: Steven Bird <stevenbird1@gmail.com>
Date: Sat, 28 Feb 2009 02:41:50 +0000
Subject: [PATCH] Mapped `x`:gc: to ``X`` -- manual intervention for a variety
 of case issues, e.g. `xp`:gc: goes to ``XP``, while `Det`:gc: goes to
 ``Det``.

Finished annotating ch09.  In addition to annotations I changed the following:

* converted some ..ex:: displays of feature structures into program output, which reduces the total number of examples and ensures that they correspond to the input (some didn't)

* fixed case in attribute names, e.g. there was a place where we were told that (24) is equivalent to (23), when one used uppercase attributes and the other used lowercase attributes

* reworked a few passages to use simpler language, and make slightly less use of "formalism", "framework"; in one place changed "adopt" to "illustrate" since we're not taking a stance in a theory paper; sometimes there were phrases that put the material in context, but perhaps not a context that the reader needs to know about, e.g.:

    There are a number of notations for representing reentrancy in
    matrix-style representations of feature structures.

* In the introduction to X-bar syntax, I moved the literature references and the notational remark about horizontal bars to the further reading section (though it still needs to be worked in there).

* Some other heavy-going stuff is flagged.





svn/trunk@7814
---
 book/CheckList.txt       |   4 +-
 book/ch07.rst            |  42 ++---
 book/ch08-extras.rst     |  76 ++++----
 book/ch08-old-extras.rst |  46 ++---
 book/ch08.rst            | 177 ++++++++---------
 book/ch09-old.rst        | 212 ++++++++++-----------
 book/ch09.rst            | 396 ++++++++++++++++++++-------------------
 book/deleted             |  32 ++--
 book/introduction.txt    |  10 +-
 definitions.rst          |   4 +-
 10 files changed, 508 insertions(+), 491 deletions(-)
diff --git a/book/CheckList.txt b/book/CheckList.txt
index 08d7102d..61491117 100644
--- a/book/CheckList.txt
+++ b/book/CheckList.txt
@@ -48,4 +48,6 @@ ch08 typography should no longer use NP
 ch08 section 8.6 on grammar development is incomplete (incl PE08 discussion)
 ch08 assumes knowledge of "head" (did some content disappear?)
 ch09 lacks our standard opening
-ch09 uses :lex: role, not processed by docbook
\ No newline at end of file
+ch09 uses :lex: role, not processed by docbook
+ch09 could mention use of trees as source of features for ML
+ch09 includes contents of grammar files that have changed in data distribution
diff --git a/book/ch07.rst b/book/ch07.rst
index 0214a8f2..0b829121 100644
--- a/book/ch07.rst
+++ b/book/ch07.rst
@@ -357,7 +357,7 @@ where parsing constructs nested structures that are arbitrarily deep,
 chunking creates structures of fixed depth (typically depth 2).  These
 chunks often correspond to the lowest level of grouping identified in
 the full parse tree. This is illustrated in ex-parsing-chunking_ below,
-which shows an `np`:gc: chunk structure and a completely parsed
+which shows an ``NP`` chunk structure and a completely parsed
 counterpart:
 
 .. _ex-parsing-chunking:
@@ -663,9 +663,9 @@ chunk, then excise the chink:
 
 Two other rules for forming chunks are splitting and merging.
 A permissive chunking rule might put
-`the cat the dog chased`:lx: into a single `np`:gc: chunk
+`the cat the dog chased`:lx: into a single ``NP`` chunk
 because it does not detect that determiners introduce new chunks.
-For this we would need a rule to split an `np`:gc: chunk
+For this we would need a rule to split an ``NP`` chunk
 prior to any determiner, using a pattern like: ``"NP: <.*>}{<DT>"``.
 Conversely, we can craft rules to merge adjacent chunks under
 particular circumstances, e.g. ``"NP: <NN>{}<NN>"``.
@@ -676,10 +676,10 @@ and merging patterns in any order.
 Multiple Chunk Types
 --------------------
 
-So far we have only developed `np`:gc: chunkers.  However, as we saw earlier
-in the chapter, the CoNLL chunking data is also annotated for `pp`:gc: and
-`vp`:gc: chunks.  Here is an example of a chunked sentence that contains
-`np`:gc:, `vp`:gc:, and `pp`:gc: chunk types.
+So far we have only developed ``NP`` chunkers.  However, as we saw earlier
+in the chapter, the CoNLL chunking data is also annotated for ``PP`` and
+``VP`` chunks.  Here is an example of a chunked sentence that contains
+``NP``, ``VP``, and ``PP`` chunk types.
 
     >>> from nltk.corpus import conll2000
     >>> print conll2000.chunked_sents('train.txt')[99]
@@ -863,7 +863,7 @@ Reading IOB Format and the CoNLL 2000 Corpus
 
 Using the ``corpora`` module we can load Wall Street Journal
 text that has been tagged then chunked using the IOB notation.  The
-chunk categories provided in this corpus are `np`:gc:, `vp`:gc: and `pp`:gc:.  As we
+chunk categories provided in this corpus are ``NP``, ``VP`` and ``PP``.  As we
 have seen, each sentence is represented using multiple lines, as shown
 below::
 
@@ -876,7 +876,7 @@ below::
 |nopar| A conversion function ``chunk.conllstr2tree()`` builds a tree
 representation from one of these multi-line strings.  Moreover, it
 permits us to choose any subset of the three chunk types to use.  The
-example below produces only `np`:gc: chunks:
+example below produces only ``NP`` chunks:
 
 .. doctest-ignore::
     >>> text = '''
@@ -933,7 +933,7 @@ example that reads the 100th sentence of the "train" portion of the corpus:
 
 
 |nopar|
-This  showed three chunk types, for `np`:gc:, `vp`:gc: and `pp`:gc:.
+This  showed three chunk types, for ``NP``, ``VP`` and ``PP``.
 We can also select which chunk types to read:
 
     >>> print nltk.corpus.conll2000.chunked_sents('train.txt', chunk_types=('NP',))[99]
@@ -961,7 +961,7 @@ We start off by establishing a baseline for the trivial chunk parser
     0.440845995079
 
 This indicates that more than a third of the words are tagged with
-``O`` (i.e., not in an `np`:gc: chunk).  Now let's try a naive regular
+``O`` (i.e., not in an ``NP`` chunk).  Now let's try a naive regular
 expression chunker that looks for tags (e.g., ``CD``, ``DT``, ``JJ``,
 etc.) beginning with letters that are typical of noun phrase tags:
 
@@ -975,7 +975,7 @@ order to develop a more data-driven approach, let's define a function
 ``chunked_tags()`` that takes some chunked data
 and sets up a conditional frequency distribution.
 For each tag, it counts up the number of times the tag
-occurs inside an `np`:gc: chunk (the ``True`` case, where ``chtag`` is
+occurs inside an ``NP`` chunk (the ``True`` case, where ``chtag`` is
 ``B-NP`` or ``I-NP``), or outside a chunk (the ``False`` case, where
 ``chtag`` is ``O``).  It returns a list of those tags that occur
 inside chunks more often than outside chunks.
@@ -1245,7 +1245,7 @@ structures having a depth of at most four.
        (NP the/DT cat/NN)
        (VP sit/VB (PP on/IN (NP the/DT mat/NN)))))
 
-Unfortunately this result misses the `vp`:gc: headed by `saw`:lx:.  It has
+Unfortunately this result misses the ``VP`` headed by `saw`:lx:.  It has
 other shortcomings too.  Let's see what happens when we apply this
 chunker to a sentence having deeper nesting.
 
@@ -1298,10 +1298,10 @@ example of a tree (note that they are standardly drawn upside-down):
   .. tree:: (S (NP Alice) (VP (V chased) (NP the rabbit)))
 
 We use a 'family' metaphor to talk about the
-relationships of nodes in a tree: for example, `s`:gc: is the
-`parent`:dt: of `vp`:gc:; conversely `vp`:gc: is a `daughter`:dt: (or
-`child`:dt:) of `s`:gc:.  Also, since `np`:gc: and `vp`:gc: are both
-daughters of `s`:gc:, they are also `sisters`:dt:.
+relationships of nodes in a tree: for example, ``S`` is the
+`parent`:dt: of ``VP``; conversely ``VP`` is a `daughter`:dt: (or
+`child`:dt:) of ``S``.  Also, since ``NP`` and ``VP`` are both
+daughters of ``S``, they are also `sisters`:dt:.
 For convenience, there is also a text format for specifying
 trees: 
 
@@ -1437,7 +1437,7 @@ answer because the part-of-speech tags are too impoverished and do not
 give us sufficient information about the lexical item.  A second
 approach is to write utility programs to analyze the training data,
 such as counting the number of times a given part-of-speech tag occurs
-inside and outside an `np`:gc: chunk.  A third approach is to evaluate the
+inside and outside an ``NP`` chunk.  A third approach is to evaluate the
 system against some gold standard data to obtain an overall
 performance score.  We can even use this to parameterize the system,
 specifying which chunk rules are used on a given run, and tabulating
@@ -1463,11 +1463,11 @@ The word `chink`:dt: initially meant a sequence of stopwords,
 according to a 1975 paper by Ross and Tukey [Abney1996PST]_.
 
 The IOB format (or sometimes  `BIO Format`:dt:) was developed for
-`np`:gc: chunking by [Ramshaw1995TCU]_, and was used for the shared `np`:gc:
+``NP`` chunking by [Ramshaw1995TCU]_, and was used for the shared ``NP``
 bracketing task run by the *Conference on Natural Language Learning*
 (|CoNLL|) in 1999.  The same format was
 adopted by |CoNLL| 2000 for annotating a section of Wall Street
-Journal text as part of a shared task on `np`:gc: chunking.
+Journal text as part of a shared task on ``NP`` chunking.
 
 Section 13.5 of [JurafskyMartin2008]_ contains a discussion of chunking.
 Chapter 22 covers information extraction, including named entity recognition.
@@ -1585,7 +1585,7 @@ Exercises
    re-evaluate it, to see if you have discovered an improved baseline.
 
 #. |hard|
-   Develop an `np`:gc: chunker that converts POS-tagged text into a list of
+   Develop an ``NP`` chunker that converts POS-tagged text into a list of
    tuples, where each tuple consists of a verb followed by a sequence of
    noun phrases and prepositions,
    e.g. ``the little cat sat on the mat`` becomes ``('sat', 'on', 'NP')``...
diff --git a/book/ch08-extras.rst b/book/ch08-extras.rst
index cac4e989..7215b226 100644
--- a/book/ch08-extras.rst
+++ b/book/ch08-extras.rst
@@ -32,7 +32,7 @@ follows:
 .. XXX "So we might adopt the heuristic that" -> "Suppose that"
 
 So we might adopt the heuristic that the subject of a sentence is the
-`np`:gc: chunk that immediately precedes the tensed verb: this would
+``NP`` chunk that immediately precedes the tensed verb: this would
 correctly yield ``(NP the/DT little/JJ bear/NN)`` as
 subject. Unfortunately, this simple rule very quickly fails, as shown
 by a more complex example.
@@ -58,27 +58,27 @@ by a more complex example.
 What's doing the "preventing" in this example is not the firm monetary
 policy, but rather the restated commitment to such a policy. We can
 also see from this example that a different simple rule, namely
-treating the initial `np`:gc: chunk  as the subject, also fails, since this
+treating the initial ``NP`` chunk  as the subject, also fails, since this
 would give us the ``(NP the/DT Exchequer/NNP)``. By contrast, a
 complete phrase structure analysis of
-the sentence would group together all the pre-verbal `np`:gc: chunks
-into a single `np`:gc: constituent:
+the sentence would group together all the pre-verbal ``NP`` chunks
+into a single ``NP`` constituent:
 
 .. ex:: 
     .. tree:: (NP(NP (NP (Nom (N Chancellor) (PP (P of)(NP (Det the) (N Exchequer))))(NP Nigel Lawson)) (POSS 's))(Nom (Adj restated)(Nom (N commitment)(PP (P to)(NP (Det a)(Nom (Adj firm) (Nom (Adj monetary)(Nom (N policy)))))))))
        :scale: 80:80:50
 
 We still have a little work to determine which part of this complex
-`np`:gc: corresponds to the "who", but nevertheless, this is much
+``NP`` corresponds to the "who", but nevertheless, this is much
 more tractable than answering the same question from a flat sequence
 of chunks.
 
 "Subject" and "direct object" are examples of `grammatical
 functions`:dt:. Although they are not captured directly in a phrase
 structure grammar, they can be defined in terms of tree
-configurations. In ex-gfs_, the subject of `s`:gc: is the `np`:gc:
-immediately dominated by  `s`:gc: while the direct object of `v`:gc:
-is the `np`:gc: directly dominated by `vp`:gc:.
+configurations. In ex-gfs_, the subject of ``S`` is the ``NP``
+immediately dominated by  ``S`` while the direct object of ``V``
+is the ``NP`` directly dominated by ``VP``.
 
 .. _ex-gfs:
 .. ex::
@@ -128,14 +128,14 @@ top-down parser processes *VP* |rarr| *V* *NP* *PP*,
 it may find *V* and *NP* but not the *PP*.  This work
 can be reused when processing *VP* |rarr| *V* *NP*.
 Thus, we will record the
-hypothesis that "the `v`:gc: constituent `likes`:lx: is the beginning of a `vp`:gc:."
+hypothesis that "the ``V`` constituent `likes`:lx: is the beginning of a ``VP``."
 
 We can do this by adding a `dot`:dt: to the edge's right hand side.
 Material to the left of the dot records what has been found so far;
 material to the right of the dot specifies what still needs to be found in order
 to complete the constituent.  For example, the edge in
-ex-dottededge_ records the hypothesis that "a `vp`:gc: starts with the `v`:gc:
-`likes`:lx:, but still needs an `np`:gc: to become complete":
+ex-dottededge_ records the hypothesis that "a ``VP`` starts with the ``V``
+`likes`:lx:, but still needs an ``NP`` to become complete":
 
 .. _ex-dottededge:
 .. ex::
@@ -149,18 +149,18 @@ Types of Edge
 -------------
 
 Let's take stock.
-An edge [`VP`:gc: |rarr| |dot| `V`:gc: `NP`:gc: `PP`:gc:, (*i*, *i*)]
-records the hypothesis that a `VP`:gc: begins at location *i*, and that we anticipate
-finding a sequence `V NP PP`:gc: starting here.  This is known as a
+An edge [``VP`` |rarr| |dot| ``V`` ``NP`` ``PP``, (*i*, *i*)]
+records the hypothesis that a ``VP`` begins at location *i*, and that we anticipate
+finding a sequence ``V NP PP`` starting here.  This is known as a
 `self-loop edge`:dt:; see ex-chart-intro-selfloop_.
-An edge [`VP`:gc: |rarr| `V`:gc: |dot| `NP`:gc: `PP`:gc:, (*i*, *j*)]
-records the fact that we have discovered a `V`:gc: spanning (*i*, *j*),
-and hypothesize a following `NP PP`:gc: sequence to complete a `VP`:gc:
+An edge [``VP`` |rarr| ``V`` |dot| ``NP`` ``PP``, (*i*, *j*)]
+records the fact that we have discovered a ``V`` spanning (*i*, *j*),
+and hypothesize a following ``NP PP`` sequence to complete a ``VP``
 beginning at *i*.  This is known as an `incomplete edge`:dt:;
 see ex-chart-intro-incomplete_.
-An edge [`VP`:gc: |rarr| `V`:gc: `NP`:gc: `PP`:gc: |dot| , (*i*, *k*)]
-records the discovery that a `VP`:gc: consisting of the sequence
-`V NP PP`:gc: has been discovered for the span (*i*, *j*).  This is known
+An edge [``VP`` |rarr| ``V`` ``NP`` ``PP`` |dot| , (*i*, *k*)]
+records the discovery that a ``VP`` consisting of the sequence
+``V NP PP`` has been discovered for the span (*i*, *j*).  This is known
 as a `complete edge`:dt:; see ex-chart-intro-parseedge_.
 If a complete edge spans the entire sentence, and has the grammar's
 start symbol as its left-hand side, then the edge is called a `parse
@@ -244,7 +244,7 @@ bottom-up parsing starts from the input string,
 and tries to find sequences of words and phrases that
 correspond to the *right hand* side of a grammar production. The
 parser then replaces these with the left-hand side of the production,
-until the whole sentence is reduced to an `S`:gc:.  Bottom-up chart
+until the whole sentence is reduced to an ``S``.  Bottom-up chart
 parsing is an extension of this approach in which hypotheses about
 structure are recorded as edges on a chart. In terms of our earlier
 terminology, bottom-up chart parsing can be seen as a parsing
@@ -296,11 +296,11 @@ for each grammar production whose right hand side begins with category
    :scale: 30
 
 The next step is to use the Fundamental Rule to add edges
-like [`np`:gc: |rarr| Lee |dot| , (0, 1)],
+like [``NP`` |rarr| Lee |dot| , (0, 1)],
 where we have "moved the dot" one position to the right.
 After this, we will now be able to add new self-loop edges such as 
-[`s`:gc: |rarr|  |dot| `np`:gc: `vp`:gc:, (0, 0)] and
-[`vp`:gc: |rarr|  |dot| `vp`:gc: `np`:gc:, (1, 1)], and use these to
+[``S`` |rarr|  |dot| ``NP`` ``VP``, (0, 0)] and
+[``VP`` |rarr|  |dot| ``VP`` ``NP``, (1, 1)], and use these to
 build more complete edges.
 
 Using these three rules, we can parse a sentence as shown in
@@ -329,29 +329,29 @@ Top-Down Parsing
 ----------------
 
 Top-down chart parsing works in a similar way to the recursive descent
-parser, in that it starts off with the top-level goal of finding an `s`:gc:.
-This goal is broken down into the subgoals of trying to find constituents such as `np`:gc: and
-`vp`:gc: predicted by the grammar.
+parser, in that it starts off with the top-level goal of finding an ``S``.
+This goal is broken down into the subgoals of trying to find constituents such as ``NP`` and
+``VP`` predicted by the grammar.
 To create a top-down chart parser, we use the Fundamental Rule as before plus
 three other rules: the `Top-Down Initialization Rule`:dt:, the `Top-Down
 Expand Rule`:dt:, and the `Top-Down Match Rule`:dt:.
 The Top-Down Initialization Rule in ex-td-init-rule_
 captures the fact that the root of any
-parse must be the start symbol `s`:gc:\.
+parse must be the start symbol ``S``\.
 
 .. _ex-td-init-rule:
-.. ex:: `Top-Down Initialization Rule`:dt: For each production `s`:gc: |rarr| |alpha|
-   add the self-loop edge [`s`:gc: |rarr| |dot|\ |alpha|\ , (0, 0)]
+.. ex:: `Top-Down Initialization Rule`:dt: For each production ``S`` |rarr| |alpha|
+   add the self-loop edge [``S`` |rarr| |dot|\ |alpha|\ , (0, 0)]
 
    |chart_td_ex1|
 
 .. |chart_td_ex1| image:: ../images/chart_td_ex1.png
    :scale: 30
 
-In our running example, we are predicting that we will be able to find an `np`:gc: and a
-`vp`:gc: starting at 0, but have not yet satisfied these subgoals.
-In order to find an  `np`:gc: we need to
-invoke a production that has `np`:gc: on its left hand side. This work
+In our running example, we are predicting that we will be able to find an ``NP`` and a
+``VP`` starting at 0, but have not yet satisfied these subgoals.
+In order to find an  ``NP`` we need to
+invoke a production that has ``NP`` on its left hand side. This work
 is done by the Top-Down Expand Rule ex-td-expand-rule_.
 This tells us that if our chart contains an incomplete
 edge whose dot is followed by a nonterminal *B*, then the parser
@@ -387,7 +387,7 @@ add an edge if the terminal corresponds to the current input symbol.
 
 Here we see our example chart after applying the Top-Down Match rule.
 After this, we can apply the fundamental rule to
-add the edge [`np`:gc: |rarr| Lee |dot| , (0, 1)].
+add the edge [``NP`` |rarr| Lee |dot| , (0, 1)].
 
 Using these four rules, we can parse a sentence top-down as shown in
 ex-top-down-strategy_.
@@ -452,8 +452,8 @@ which *P* dominates *w*. More precisely:
 
 To illustrate, suppose the input is of the form 
 `I saw ...`:lx:, and the chart already contains the edge 
-[`vp`:gc: |rarr|  |dot| `v`:gc: ..., (1, 1)]. Then the Scanner Rule will add to 
-the chart the edges [`v`:gc: |rarr| 'saw', (1, 2)]
+[``VP`` |rarr|  |dot| ``V`` ..., (1, 1)]. Then the Scanner Rule will add to 
+the chart the edges [``V`` |rarr| 'saw', (1, 2)]
 and ['saw'|rarr| |dot|\ , (1, 2)]. So in effect the Scanner Rule packages up a
 sequence of three rule applications: the Bottom-Up Initialization Rule for 
 [*w* |rarr| |dot|\ , (*j*, *j*\ +1)],
@@ -706,7 +706,7 @@ is the product of the probability of the production that
 generated it and the probabilities of its children.  For example, the
 probability of the edge ``[Edge: S`` |rarr| ``NP``\ |dot|\ ``VP, 0:2]``
 is the probability of the PCFG production ``S`` |rarr| ``NP VP``
-multiplied by the probability of its `np`:gc: child.
+multiplied by the probability of its ``NP`` child.
 (Note that an edge's tree only includes children for elements to the left
 of the edge's dot.)
 
diff --git a/book/ch08-old-extras.rst b/book/ch08-old-extras.rst
index e9cb72f4..ffcf2629 100644
--- a/book/ch08-old-extras.rst
+++ b/book/ch08-old-extras.rst
@@ -121,7 +121,7 @@ level is by means of a `tree diagram`:dt:, as shown in tree-diagram_.
 		  (NP
 		     (N dogs)))
 
-Note that linguistic trees grow upside down: the node labeled `s`:gc:
+Note that linguistic trees grow upside down: the node labeled ``S``
 is the `root`:dt: of the tree, while the `leaves`:dt: of the tree are
 labeled with the words.
 
@@ -153,7 +153,7 @@ In both cases, there is a prepositional phrase introduced by
 :lx:`with`.  In the first case this phrase modifies the noun
 :lx:`burglar`, and in the second case it modifies the verb :lx:`saw`.
 We could again think of this in terms of scope: does the prepositional
-phrase (`pp`:gc:) just have scope over the `np`:gc:
+phrase (``PP``) just have scope over the ``NP``
 `a burglar`:lx:, or does it have scope over
 the whole verb phrase? As before, we can represent the difference in terms
 of tree structure:
@@ -171,8 +171,8 @@ of tree structure:
 		     <NP the burglar>
 		     <PP with a telescope>))
 
-In burglar_\ a, the `pp`:gc: attaches to the `np`:gc:,
-while in burglar_\ b, the `pp`:gc: attaches to the `vp`:gc:.
+In burglar_\ a, the ``PP`` attaches to the ``NP``,
+while in burglar_\ b, the ``PP`` attaches to the ``VP``.
 
 We can generate these trees in Python as follows:
 
@@ -184,7 +184,7 @@ We can generate these trees in Python as follows:
 We can discard the structure to get the list of `leaves`:dt:, and
 we can confirm that both trees have the same leaves (except for the last word).
 We can also see that the trees have different `heights`:dt: (given by the
-number of nodes in the longest branch of the tree, starting at `s`:gc:
+number of nodes in the longest branch of the tree, starting at ``S``
 and descending to the words):
 
     >>> tree1.leaves()
@@ -201,7 +201,7 @@ The `Prepositional Phrase Attachment Corpus`:dt: makes it
 possible for us to study this question systematically.  The corpus is
 derived from the IBM-Lancaster Treebank of Computer Manuals and from
 the Penn Treebank, and distills out only the essential information
-about `pp`:gc: attachment. Consider the sentence from the WSJ
+about ``PP`` attachment. Consider the sentence from the WSJ
 in ppattach-a_.  The corresponding line in the Prepositional Phrase
 Attachment Corpus is shown in ppattach-b_.
 
@@ -218,7 +218,7 @@ Attachment Corpus is shown in ppattach-b_.
 
 That is, it includes an identifier for the original sentence, the
 head of the relevant verb phrase (i.e., `including`:lx:), the head of
-the verb's `np`:gc: object (`three`:lx:), the preposition
+the verb's ``NP`` object (`three`:lx:), the preposition
 (`with`:lx:), and the head noun within the prepositional phrase
 (`cancer`:lx:). Finally, it contains an "attachment" feature (``N`` or
 ``V``) to indicate whether the prepositional phrase attaches to
@@ -255,8 +255,8 @@ We can access the Prepositional Phrase Attachment Corpus from NLTK as follows:
     >>> nltk.corpus.ppattach.tuples('training')[9]
     ('16', 'including', 'three', 'with', 'cancer', 'N')
 
-If we go back to our first examples of `pp`:gc: attachment ambiguity,
-it appears as though it is the `pp`:gc: itself (e.g., `with a gun`:lx:
+If we go back to our first examples of ``PP`` attachment ambiguity,
+it appears as though it is the ``PP`` itself (e.g., `with a gun`:lx:
 versus `with a telescope`:lx:) that determines the attachment. However,
 we can use this corpus to find examples where other factors come into play.
 For example, it appears that the verb is the key factor in ppattach-verb_.
@@ -410,7 +410,7 @@ a compact representation in the form of a grammar.
 
 In our following discussion of grammar, we will use the following terminology.
 The grammar consists of productions, where each production involves a
-single `non-terminal`:dt: (e.g. `s`:gc:, `np`:gc:), an arrow, and one
+single `non-terminal`:dt: (e.g. ``S``, ``NP``), an arrow, and one
 or more non-terminals and `terminals`:dt: (e.g. `walked`:lx:).
 The productions are often divided into two main groups.
 The `grammatical productions`:dt: are those without a terminal on
@@ -419,13 +419,13 @@ a terminal on the right hand side.
 A special case of non-terminals are the `pre-terminals`:dt:, which
 appear on the left-hand side of lexical productions.
 We will say that a grammar `licenses`:dt: a tree if each non-terminal
-`x`:gc: with children `y`:gc:\ :subscript:`1` ... `y`:gc:\ :subscript:`n`
+``X`` with children ``Y``\ :subscript:`1` ... ``Y``\ :subscript:`n`
 corresponds to a production in the grammar of the form:
-`x`:gc: |rarr| `y`:gc:\ :subscript:`1` ... `y`:gc:\ :subscript:`n`.
+``X`` |rarr| ``Y``\ :subscript:`1` ... ``Y``\ :subscript:`n`.
 
 If you have experimented with the recursive descent parser, you may
 have noticed that it fails to deal properly with the
-following production: `np`:gc: |rarr| `np pp`:gc:.
+following production: ``NP`` |rarr| `np pp`:gc:.
 From a linguistic point of view, this production is perfectly respectable,
 and will allow us to derive trees like this:
 
@@ -451,25 +451,25 @@ These occur frequently in analyses of English, and
 the failure of recursive descent parsers to deal adequately with left
 recursion means that we will need to find alternative approaches.
 
-The revised grammar for `vp`:gc: will now look like this:
+The revised grammar for ``VP`` will now look like this:
 
 .. _subcat3:
 .. ex::
    .. parsed-literal:: 
 
-      `vp`:gc: |rarr| `datv np pp`:gc:
-      `vp`:gc: |rarr| `tv np`:gc:
-      `vp`:gc: |rarr| `sv s`:gc:
-      `vp`:gc: |rarr| `iv`:gc: 
+      ``VP`` |rarr| `datv np pp`:gc:
+      ``VP`` |rarr| `tv np`:gc:
+      ``VP`` |rarr| `sv s`:gc:
+      ``VP`` |rarr| ``IV`` 
 
-      `datv`:gc: |rarr| 'gave' | 'donated' | 'presented'
-      `tv`:gc: |rarr| 'saw' | 'kissed' | 'hit' | 'sang'
-      `sv`:gc: |rarr| 'said' | 'knew' | 'alleged'
-      `iv`:gc: |rarr| 'barked' | 'disappeared' | 'elapsed' | 'sang'
+      ``DATV`` |rarr| 'gave' | 'donated' | 'presented'
+      ``TV`` |rarr| 'saw' | 'kissed' | 'hit' | 'sang'
+      ``SV`` |rarr| 'said' | 'knew' | 'alleged'
+      ``IV`` |rarr| 'barked' | 'disappeared' | 'elapsed' | 'sang'
 
 Notice that according to subcat3_, a given lexical item can belong to more
 than one subcategory. For example, `sang`:lx: can occur both with and
-without a following `np`:gc: complement.
+without a following ``NP`` complement.
 
 
 
diff --git a/book/ch08.rst b/book/ch08.rst
index 55e00fd4..9f8c18f1 100644
--- a/book/ch08.rst
+++ b/book/ch08.rst
@@ -90,7 +90,7 @@ following sequence:
   .. ex:: Andre said The Jamaica Observer reported that Usain Bolt broke the 100m record
   .. ex:: I think Andre said the Jamaica Observer reported that Usain Bolt broke the 100m record
 
-If we replaced whole sentences with the symbol `s`:gc:, we would see patterns like
+If we replaced whole sentences with the symbol ``S``, we would see patterns like
 `Andre said S`:lx: and `I think S`:lx:.  These are templates for taking a sentence
 and constructing a bigger sentence.  There are other templates we can use, like
 `S but S`:lx:, and `S when S`:lx:.  With a bit of ingenuity we can construct some
@@ -155,7 +155,7 @@ Here, a "language" is considered to be a possibly infinite set
 of strings, and a grammar is formal device for "generating" the
 members of this set.  It achieves this using `recursion`:dt:,
 with the help of grammar `productions`:dt: of the form
-`s`:gc: |rarr| `s`:gc: `and`:lx: `s`:gc:, as we will explore in
+``S`` |rarr| ``S`` `and`:lx: ``S``, as we will explore in
 sec-context-free-grammar_.  In chap-semantics_ we will extend this,
 to automatically build up the meaning of a sentence out of the meanings
 of its parts.
@@ -286,10 +286,10 @@ Coordinate Structure:
     category *X*, then *v*\ :sub:`1` `and`:lx: *v*\ :sub:`2` is also a
     phrase of category  *X*.
 
-Here are a couple of examples. In the first, two `np`:gc:\ s (noun
-phrases) have been conjoined to make an `np`:gc:, while in the second,
-two `ap`:gc:\ s (adjective phrases) have been conjoined to make an
-`ap`:gc:.
+Here are a couple of examples. In the first, two ``NP``\ s (noun
+phrases) have been conjoined to make an ``NP``, while in the second,
+two ``AP``\ s (adjective phrases) have been conjoined to make an
+``AP``.
 
 .. _ex-coord:
 .. ex:: 
@@ -297,7 +297,7 @@ two `ap`:gc:\ s (adjective phrases) have been conjoined to make an
 
     .. ex:: On land they are (AP *slow and clumsy looking*).
 
-What we `can't`:em: do is conjoin an `np`:gc: and an `ap`:gc:, which is
+What we `can't`:em: do is conjoin an ``NP`` and an ``AP``, which is
 why `the worst part and clumsy looking`:lx: is ungrammatical.
 Before we can formalize these ideas, we need to
 understand the concept of `constituent structure`:dt:.
@@ -374,7 +374,7 @@ bear`:lx:, but not with, say, `in`:lx:. Consequently, we assign
 `he`:lx:\ /`him`:lx: to the same grammatical category as `the
 bear`:lx:. Since members of this category often have more than one
 immediate constituent, we call it a `phrase`:dt:, in this case `noun
-phrase`:dt: (`np`:gc:). By contrast, the categories to which individual
+phrase`:dt: (``NP``). By contrast, the categories to which individual
 words belong are `lexical`:dt: categories.
 
 .. XXX some readers will be overwhelmed with terminology by now.
@@ -386,7 +386,7 @@ words belong are `lexical`:dt: categories.
 .. note::
    Words such as personal pronouns are an exception to this
    generalization. Although they are not phrasal, it is convenient to
-   assign them to the category `np`:gc:, and we adopt a similar
+   assign them to the category ``NP``, and we adopt a similar
    convention for proper nouns.
 
 In fig-ic-diagram-labeled_, we have added
@@ -399,7 +399,7 @@ grammatical category labels to the words we saw in the earlier figure.
    Grammatical categories
 
 If we now strip out the words apart from the topmost row, add an
-`s`:gc: node, and flip the figure over, we end up with a standard
+``S`` node, and flip the figure over, we end up with a standard
 phrase structure tree.
 
 .. ex::
@@ -444,7 +444,7 @@ constituents at successive levels.
 
 We have presented grammatical categories as classes of word sequences
 that share distributional properties, and then given labels to those
-categories, such as `np`:gc:, `vp`:gc: and so on. From now on, we will
+categories, such as ``NP``, ``VP`` and so on. From now on, we will
 be more casual, and fail to draw the distinction unless the context
 demands it.
 
@@ -468,11 +468,11 @@ intended to formalize a phrase structure grammar. In addition to
 specifying whether a string is part of the language, a CFG associates
 a phrase structure tree with each well-formed sentence. The notion of
 immediate constituent is captured by `productions`:dt: of the
-grammar. An example of a production is `s`:gc: |rarr| `np vp`:gc:.
-This says that a constituent `s`:gc: has immediate constituents
-`np`:gc: and `vp`:gc:. The lefthand side of a production can also be a
-lexical category. Similarly, the production `v`:gc: |rarr| `saw`:lx: |
-`walked`:lx: means that the constituent `v`:gc: can consist of the
+grammar. An example of a production is ``S`` |rarr| ``NP VP``.
+This says that a constituent ``S`` has immediate constituents
+``NP`` and ``VP``. The lefthand side of a production can also be a
+lexical category. Similarly, the production ``V`` |rarr| `saw`:lx: |
+`walked`:lx: means that the constituent ``V`` can consist of the
 string `saw`:lx: or `walked`:lx:.  For a given phrase structure tree to be
 well-formed relative to a grammar, each non-terminal node and its
 children must correspond to a production in the grammar.
@@ -483,7 +483,7 @@ A Simple Grammar
 
 Let's start off by looking at a simple context-free grammar.  By
 convention, the left-hand-side of the first production is the
-`start-symbol`:dt: of the grammar, typically `s`:gc:, and all
+`start-symbol`:dt: of the grammar, typically ``S``, and all
 well-formed trees must have this symbol as their root label. In
 |NLTK|, context-free grammars are defined in the ``nltk.grammar``
 module.  In code-cfg1_ we define a grammar and show how parse a
@@ -570,12 +570,12 @@ Since our grammar licenses two trees for this sentence, the sentence is
 said to be :dt:`structurally ambiguous`.  The ambiguity in question is called
 a `prepositional phrase attachment ambiguity`:idx:, as we saw earlier in this chapter.
 As you may recall, it is an ambiguity about attachment since the
-`pp`:gc: `in the park`:lx: needs to be attached to one of two places
-in the tree: either as a daughter of `vp`:gc: or else as a daughter of
-`np`:gc:.
-When the `pp`:gc: is attached to `vp`:gc:, the intended interpretation
+``PP`` `in the park`:lx: needs to be attached to one of two places
+in the tree: either as a daughter of ``VP`` or else as a daughter of
+``NP``.
+When the ``PP`` is attached to ``VP``, the intended interpretation
 is that the seeing event happened
-in the park.  However, if the `pp`:gc: is attached to `np`:gc:,
+in the park.  However, if the ``PP`` is attached to ``NP``,
 then it was the man who was in the park, and the agent of the seeing (the dog)
 might have been sitting on the balcony of an apartment overlooking the
 park.
@@ -617,7 +617,7 @@ Recursion in Syntactic Structure
 --------------------------------
 
 A grammar is said to be :dt:`recursive` if a category occurring on the left hand
-side of a production (such as `s`:gc: in this case) also appears on
+side of a production (such as ``S`` in this case) also appears on
 the righthand side of a production. If this dual occurrence takes
 place in *one and the same production*, then we have :dt:`direct
 recursion`; otherwise we have :dt:`indirect recursion`. There is no
@@ -782,10 +782,10 @@ are the following:
    or case government).
 
 When we say in a phrase structure grammar that the immediate
-constituents of a `pp`:gc: are `p`:gc: and `np`:gc:, we are implicitly
+constituents of a ``PP`` are ``P`` and ``NP``, we are implicitly
 appealing to the head / dependent distinction. A prepositional phrase
-is a phrase whose head is a preposition; moreover, the `np`:gc: is a
-dependent of `p`:gc:.  The same distinction carries over to the other
+is a phrase whose head is a preposition; moreover, the ``NP`` is a
+dependent of ``P``.  The same distinction carries over to the other
 types of phrase that we have discussed. The key point to note here is
 that although phrase structure grammars seem very different from
 dependency grammars, they implicitly embody a recognition of
@@ -828,10 +828,10 @@ These possibilities correspond to the following productions:
 
 .. XXX above table is missing a caption, required for formal tables
 
-That is, `was`:lx: can occur with a following `Adj`:gc:, `gave`:lx:
-can occur with a following `np`:gc: and `pp`:gc:; `saw`:lx: can occur
-with a following `np`:gc: and `thought`:lx: can occur with a following
-`s`:gc:. The dependents `Adj`:gc:, `np`:gc:, `pp`:gc: and `s`:gc: are
+That is, `was`:lx: can occur with a following ``Adj``, `gave`:lx:
+can occur with a following ``NP`` and ``PP``; `saw`:lx: can occur
+with a following ``NP`` and `thought`:lx: can occur with a following
+``S``. The dependents ``Adj``, ``NP``, ``PP`` and ``S`` are
 often called :dt:`complements` of the respective verbs and there are
 strong constraints on what verbs can occur with what
 complements. By contrast with ex-subcat1_, the strings in ex-subcat2_ are ill-formed:
@@ -862,14 +862,14 @@ applicable to verbs, but also to the other classes of heads.
 Within frameworks based on phrase structure grammar, various
 techniques have been proposed for excluding the
 ungrammatical examples in ex-subcat2_. In a CFG, we need some way of constraining
-grammar productions which expand `vp`:gc: so that verbs *only* co-occur
+grammar productions which expand ``VP`` so that verbs *only* co-occur
 with their correct complements. We can do this by dividing the class of
 verbs into "subcategories", each of which is associated with a
 different set of complements. For example, `transitive verbs`:dt: such
-as `chased`:lx: and `saw`:lx: require a following `np`:gc:
-object complement; that is, they are `subcategorized`:dt: for `np`:gc:
+as `chased`:lx: and `saw`:lx: require a following ``NP``
+object complement; that is, they are `subcategorized`:dt: for ``NP``
 direct objects. If we introduce a new category label for transitive verbs, namely
-`tv`:gc: (for Transitive Verb), then we can use it in the following productions:
+``TV`` (for Transitive Verb), then we can use it in the following productions:
 
 ::
 
@@ -877,7 +877,7 @@ direct objects. If we introduce a new category label for transitive verbs, namel
      TV -> 'chased' | 'saw'
 
 Now `*Joe thought the bear`:lx: is excluded since we haven't listed
-`thought`:lx: as a `tv`:gc:, but `Chatterer saw the bear`:lx: is still allowed.
+`thought`:lx: as a ``TV``, but `Chatterer saw the bear`:lx: is still allowed.
 tab-verbcat_ provides more examples of labels for verb subcategories.
 
 .. table:: tab-verbcat
@@ -911,7 +911,7 @@ sentence in ex-mod_:
    .. ex:: Chatterer really thought Buster was angry.
    .. ex:: Joe really put the fish on the log.
 
-The structural ambiguity of `pp`:gc: attachment, which we have
+The structural ambiguity of ``PP`` attachment, which we have
 illustrated in both phrase structure and dependency grammars,
 corresponds semantically to an ambiguity in the scope of the modifier.
 
@@ -993,18 +993,18 @@ Recursive Descent Parsing
 
 The simplest kind of parser interprets a grammar as a specification
 of how to break a high-level goal into several lower-level subgoals.
-The top-level goal is to find an `s`:gc:.  The `s`:gc: |rarr| `np vp`:gc:
+The top-level goal is to find an ``S``.  The ``S`` |rarr| ``NP VP``
 production permits the parser to replace this goal with two subgoals:
-find an `np`:gc:, then find a `vp`:gc:.  Each of these subgoals can be
-replaced in turn by sub-sub-goals, using productions that have `np`:gc:
-and `vp`:gc: on their left-hand side.  Eventually, this expansion
+find an ``NP``, then find a ``VP``.  Each of these subgoals can be
+replaced in turn by sub-sub-goals, using productions that have ``NP``
+and ``VP`` on their left-hand side.  Eventually, this expansion
 process leads to subgoals such as: find the word `telescope`:lx:.  Such
 subgoals can be directly compared against the input string, and
 succeed if the next word is matched.  If there is no match the parser
 must back up and try a different alternative.
 
 The recursive descent parser builds a parse tree during the above
-process.  With the initial goal (find an `s`:gc:), the `s`:gc: root node
+process.  With the initial goal (find an ``S``), the ``S`` root node
 is created.  As the above process recursively expands its goals using
 the productions of the grammar, the parse tree is extended downwards
 (hence the name *recursive descent*).  We can see this in action using
@@ -1019,10 +1019,10 @@ Six stages of the execution of this parser are shown in fig-rdparser1-6_.
 
 During this process, the parser is often forced to choose between several
 possible productions.  For example, in going from step 3 to step 4, it
-tries to find productions with `n`:gc: on the left-hand side.  The
-first of these is `n`:gc: |rarr| `man`:lx:.  When this does not work
-it `backtracks`:idx:, and tries other `n`:gc: productions in order, under it
-gets to `n`:gc: |rarr| `dog`:lx:, which matches the next word in the
+tries to find productions with ``N`` on the left-hand side.  The
+first of these is ``N`` |rarr| `man`:lx:.  When this does not work
+it `backtracks`:idx:, and tries other ``N`` productions in order, under it
+gets to ``N`` |rarr| `dog`:lx:, which matches the next word in the
 input sentence.  Much later, as shown in step 5, it finds a complete
 parse.  This is a tree that covers the entire sentence, without any
 dangling edges.  Once a parse has been found, we can get the parser to
@@ -1043,14 +1043,14 @@ NLTK provides a recursive descent parser:
    that it takes as it parses a text.
 
 Recursive descent parsing has three key shortcomings.  First,
-left-recursive productions like `np`:gc: |rarr| `np pp`:gc: send it
+left-recursive productions like ``NP -> NP PP`` send it
 into an infinite loop.  Second, the parser wastes a lot of time
 considering words and structures that do not correspond to the input
 sentence.  Third, the backtracking process may discard parsed
 constituents that will need to be rebuilt again later.  For example,
-backtracking over `vp`:gc: |rarr| `v np`:gc: will discard the subtree
-created for the `np`:gc:.  If the parser then proceeds with `vp`:gc:
-|rarr| `v np pp`:gc:, then the `np`:gc: subtree must be created all
+backtracking over ``VP -> V NP`` will discard the subtree
+created for the ``NP``.  If the parser then proceeds with
+``VP -> V NP PP``, then the ``NP`` subtree must be created all
 over again.
 
 Recursive descent parsing is a kind of `top-down parsing`:dt:.
@@ -1068,7 +1068,7 @@ In common with all bottom-up parsers, a shift-reduce
 parser tries to find sequences of words and phrases that correspond
 to the *right hand* side of a grammar production, and replace them
 with the left-hand side, until the whole sentence is reduced to
-an `s`:gc:.
+an ``S``.
 
 .. XXX earlier section no longer talks about stacks.  Concepts of
    pushing and popping will need to be explained somewhere.
@@ -1084,7 +1084,7 @@ This operation may only be applied to the top of the stack;
 reducing items lower in the stack must be done before later items are
 pushed onto the stack.  The parser finishes when all the input is
 consumed and there is only one item remaining on the stack, a parse
-tree with an `s`:gc: node as its root.
+tree with an ``S`` node as its root.
 The shift-reduce parser builds a parse tree during the above process.
 Each time it pops *n* items off the stack it combines them into
 a partial parse tree, and pushes this back on the stack.
@@ -1118,7 +1118,7 @@ parser reports the steps that it takes as it parses a text:
 A shift-reduce parser can reach a dead end and fail to find any parse,
 even if the input sentence is well-formed according to the grammar.
 When this happens, no input remains, and the stack contains items
-which cannot be reduced to an `s`:gc:.  The problem arises because
+which cannot be reduced to an ``S``.  The problem arises because
 there are choices made earlier that cannot be undone by the parser
 (although users of the graphical demonstration can undo their choices).
 There are two kinds of choices to be made by the parser:
@@ -1135,8 +1135,8 @@ The advantages of shift-reduce parsers over recursive descent parsers
 is that they only build structure that corresponds to the words in the
 input.  Furthermore, they only build each sub-structure once,
 e.g. ``NP(Det(the), N(man))`` is only built and pushed onto the stack
-a single time, regardless of whether it will later be used by the `vp`:gc:
-|rarr| `v np pp`:gc: reduction or the `np`:gc: |rarr| `np pp`:gc: reduction.
+a single time, regardless of whether it will later be used by the
+``VP -> V NP PP`` reduction or the ``NP -> NP PP`` reduction.
 
 The Left-Corner Parser
 ----------------------
@@ -1161,27 +1161,30 @@ Mary`:lx:\ :
       (VP (V saw)
          (NP Mary)))
 
-Recall that the grammar in ``grammar2`` has the following productions for expanding `np`:gc:\ :
+.. XXX the following example is stale; the grammar it refers to does not
+   have these productions; also the formatting with rarr is stale.
+
+Recall that the grammar in code-cfg2_ has the following productions for expanding ``NP``\ :
 
 .. ex::
    .. _ex-r1:
-   .. ex:: `np`:gc: |rarr| `dt nom`:gc:
+   .. ex:: ``NP`` |rarr| ``dt nom``   (stale!)
    .. _ex-r2:
-   .. ex:: `np`:gc: |rarr| `dt nom pp`:gc:
+   .. ex:: ``NP`` |rarr| ``dt nom pp``  (stale!)
    .. _ex-r3:
-   .. ex:: `np`:gc: |rarr| `propn`:gc: 
+   .. ex:: ``NP`` |rarr| ``propn``  (stale!)
 
 .. XXX Following notation with DoubleRightArrow wrongly assumes
    this relation has been defined.
 
 Suppose we ask you to first look at tree ex-jmtree_, and then decide
-which of the `np`:gc: productions you'd want a recursive descent parser to
+which of the ``NP`` productions you'd want a recursive descent parser to
 apply first |mdash| obviously, ex-r3_ is the right choice! How do you
 know that it would be pointless to apply ex-r1_ or ex-r2_ instead? Because
 neither of these productions will derive a string whose first word is
 `John`:lx:.  That is, we can easily tell that in a successful
-parse of `John saw Mary`:lx:, the parser has to expand `np`:gc: in
-such a way that `np`:gc: derives the string `John`:lx: |alpha|. More
+parse of `John saw Mary`:lx:, the parser has to expand ``NP`` in
+such a way that ``NP`` derives the string `John`:lx: |alpha|. More
 generally, we say that a category `B`:math: is a `left-corner`:dt: of
 a tree rooted in `A`:math: if  `A`:math: |DoubleRightArrow|\ *
 `B`:math: |alpha|.
@@ -1234,10 +1237,10 @@ This approach to parsing is known as `chart parsing`:dt:.  We introduce
 the main idea in this section; see the online materials available for
 this chapter for more implementation details.
 
-Dynamic programming allows us to build the `pp`:gc: `in my pajamas`:lx:
+Dynamic programming allows us to build the ``PP`` `in my pajamas`:lx:
 just once.  The first time we build it we save it in a table, then we look it
-up when we need to use it as a subconstituent of either the object `np`:gc: or
-the higher `vp`:gc:. This table is known as a
+up when we need to use it as a subconstituent of either the object ``NP`` or
+the higher ``VP``. This table is known as a
 `well-formed substring table`:dt: (or |WFST| for short). 
 We will show how to construct the |WFST| bottom-up so as to systematically record
 what syntactic constituents have been found.
@@ -1273,7 +1276,7 @@ while the horizontal axis will denote the end position
 (thus `shot`:lx: will appear in the cell with coordinates (1, 2)).
 To simplify this presentation, we will assume each word has a unique
 lexical category, and we will store this (not the word) in the matrix.
-So cell (1, 2) will contain the entry `v`:gc:.
+So cell (1, 2) will contain the entry ``V``.
 More generally, if our input string is
 `a`:sub:`1`\ `a`:sub:`2` ... `a`:sub:`n`, and our grammar
 contains a production of the form *A* |rarr| `a`:sub:`i`, then we add *A* to
@@ -1290,7 +1293,7 @@ as a list of lists in Python, and initialize it
 with the lexical categories of each token, in the ``init_wfst()``
 function in code-wfst_.  We also define a utility function ``display()``
 to pretty-print the |WFST| for us.
-As expected, there is a `v`:gc: in cell (1, 2).
+As expected, there is a ``V`` in cell (1, 2).
 
 .. pylisting:: code-wfst
    :caption: Acceptor Using Well-Formed Substring Table (based on |CYK| algorithm)
@@ -1348,11 +1351,11 @@ As expected, there is a `v`:gc: in cell (1, 2).
     5    .    .    .    .    .    Det  NP  
     6    .    .    .    .    .    .    N
 
-Returning to our tabular representation, given that we have `det`:gc:
-in cell (2, 3) for the word `an`:lx:, and `n`:gc: in cell (3, 4) for the
+Returning to our tabular representation, given that we have ``Det``
+in cell (2, 3) for the word `an`:lx:, and ``N`` in cell (3, 4) for the
 word `elephant`:lx:, what should we put into cell (2, 4) for `an elephant`:lx:?
-We need to find a production of the form *A* |rarr| `det`:gc: `n`:gc:.
-Consulting the grammar, we know that we can enter `np`:gc: in cell (0,2).
+We need to find a production of the form *A* |rarr| ``Det N``.
+Consulting the grammar, we know that we can enter ``NP`` in cell (0,2).
 
 More generally, we can enter *A* in `(i, j)`:math: if there
 is a production *A* |rarr| *B* *C*, and we find
@@ -1382,7 +1385,7 @@ For example, this says that since we found ``Det`` at
    productions each time we want to look up via the right hand side.
 
 We conclude that there is a parse for the whole input string once
-we have constructed an `s`:gc: node in cell (0, 7), showing that we
+we have constructed an ``S`` node in cell (0, 7), showing that we
 have found a sentence that covers the whole input.
 
 Notice that we have not used any built-in parsing functions here.
@@ -1403,9 +1406,9 @@ able to propose constituents in locations that would not be licensed by
 the grammar.
 
 Finally, the |WFST| did not represent the structural ambiguity in
-the sentence (i.e. the two verb phrase readings).  The `vp`:gc:
-in cell (`2,8`) was actually entered twice, once for a `v np`:gc:
-reading, and once for a `vp pp`:gc: reading.  These are different
+the sentence (i.e. the two verb phrase readings).  The ``VP``
+in cell (`2,8`) was actually entered twice, once for a ``V NP``
+reading, and once for a ``VP PP`` reading.  These are different
 hypotheses, and the second overwrote the first (as it happens this didn't
 matter since the left hand side was the same.)
 Chart parsers use a slighly richer data structure and some interesting
@@ -1538,7 +1541,7 @@ technique for mining this corpus.
 Amongst the output lines of this program we find 
 ``offer-from-group N: ['rejected'] V: ['received']``,
 which indicates that `received`:lx: expects a separate
-`pp`:gc: complement, while `rejected`:lx: does not.
+``PP`` complement, while `rejected`:lx: does not.
 This information can help in developing a grammar.
 
 Pernicious Ambiguity
@@ -1593,7 +1596,7 @@ sentence and choose the appropriate one in the context.
 It's clear that humans don't do this either!
 
 Note that the problem is not with our choice of example. 
-[Church1982CSA]_ point out that the syntactic ambiguity of `pp`:gc:
+[Church1982CSA]_ point out that the syntactic ambiguity of ``PP``
 attachment in sentences like ex-pp_ also grows in proportion to the Catalan
 numbers.
 
@@ -1819,10 +1822,10 @@ Summary
   well-formed sentence.
 
 * A simple top-down parser is the recursive descent parser, which recursively
-  expands the start symbol (usually `s`:gc:) with the help of the grammar
+  expands the start symbol (usually ``S``) with the help of the grammar
   productions, and tries to match the input sentence.  This parser cannot
-  handle left-recursive productions (e.g., productions such as `np`:gc:
-  |rarr| `np pp`:gc:).  It is inefficient in the way it blindly expands
+  handle left-recursive productions (e.g., productions such as ``NP -> NP PP``).
+  It is inefficient in the way it blindly expands
   categories without checking whether they are compatible with the input string, and
   in repeatedly expanding the same non-terminals and discarding the results.
 
@@ -1922,8 +1925,8 @@ Exercises
    of its children, plus one.)
 
 #. |easy| Analyze the A.A. Milne sentence about Piglet, by underlining all
-   of the sentences it contains then replacing these with `s`:gc:
-   (e.g. the first sentence becomes `s`:gc: `when`:lx` `s`:gc:).
+   of the sentences it contains then replacing these with ``S``
+   (e.g. the first sentence becomes ``S`` `when`:lx` ``S``).
    Draw a tree structure for this "compressed" sentence.  What are
    the main syntactic constructions used for building such a long
    sentence?
@@ -1966,7 +1969,7 @@ Exercises
    PP``. Using the *Step* button, try to build a parse tree. What happens?
 
 #. |soso| Extend the grammar in ``grammar2`` with productions that expand prepositions as
-   intransitive, transitive and requiring a `pp`:gc:
+   intransitive, transitive and requiring a ``PP``
    complement. Based on these productions, use the method of the
    preceding exercise to draw a tree for the sentence `Lee ran away home`:lx:\.
 
@@ -2009,7 +2012,7 @@ Exercises
    limit the extracted subjects to subtrees whose height is 2.
 
 #. |soso| Inspect the Prepositional Phrase Attachment Corpus
-   and try to suggest some factors that influence `pp`:gc: attachment.
+   and try to suggest some factors that influence ``PP`` attachment.
 
 #. |soso| In this section we claimed that there are linguistic regularities
    that cannot be described simply in terms of n-grams.
@@ -2048,12 +2051,12 @@ Exercises
    and similar right hand sides can be collapsed, resulting in an equivalent but
    more compact set of rules.  Write code to output a compact grammar. 
 
-#. |hard| One common way of defining the subject of a sentence `s`:gc: in
-   English is as *the noun phrase that is the daughter of* `s`:gc: *and
-   the sister of* `vp`:gc:.   Write a function that takes the tree for
+#. |hard| One common way of defining the subject of a sentence ``S`` in
+   English is as *the noun phrase that is the daughter of* ``S`` *and
+   the sister of* ``VP``.   Write a function that takes the tree for
    a sentence and returns the subtree corresponding to the subject of the
    sentence.  What should it do if the root node of the tree passed to
-   this function is not `s`:gc:, or it lacks a subject?
+   this function is not ``S``, or it lacks a subject?
 
 #. |hard| Write a function that takes a grammar (such as the one defined in
    code-cfg1_) and returns a random sentence generated by the grammar.
diff --git a/book/ch09-old.rst b/book/ch09-old.rst
index d98d4886..9b189b92 100644
--- a/book/ch09-old.rst
+++ b/book/ch09-old.rst
@@ -122,7 +122,7 @@ words. We'll be a bit abstract for the moment, and call these words
 `everything`:em: about them, but we can at least give a partial
 description. For example, we know that the orthography of `a` is
 `these`:lx:, its phonological form is `DH IY Z`,  its part-of-speech is
-`Det`:gc:, and its number is plural. We can use dot notation to record
+``Det``, and its number is plural. We can use dot notation to record
 these observations: 
 
 .. _ex-feat0:
@@ -131,7 +131,7 @@ these observations:
 
       `a.spelling` = `these`:lx:
       `a.phonology` = `DH IY Z`
-      `a.pos` = `Det`:gc:
+      `a.pos` = ``Det``
       `a.number` = plural
 
 Thus ex-feat0_ is a `partial description` of a word; it lists some
@@ -257,13 +257,13 @@ context-free grammar.  We will begin with the simple CFG in ex-agcfg0_.
 .. ex::
    .. parsed-literal::
 
-     `s`:gc: |rarr| `np vp`:gc:
-     `np`:gc: |rarr| `Det n`:gc: 
-     `vp`:gc: |rarr| `v`:gc: 
+     ``S`` |rarr| ``NP VP``
+     ``NP`` |rarr| `Det n`:gc: 
+     ``VP`` |rarr| ``V`` 
 
-     `Det`:gc: |rarr| 'this'
-     `n`:gc: |rarr| 'dog'
-     `v`:gc: |rarr| 'runs'
+     ``Det`` |rarr| 'this'
+     ``N`` |rarr| 'dog'
+     ``V`` |rarr| 'runs'
 
 |nopar| Example ex-agcfg0_ allows us to generate the sentence `this dog runs`:lx:;
 however, what we really want to do is also generate `these dogs
@@ -310,10 +310,10 @@ make this explicit:
 .. ex::
    .. parsed-literal::
 
-     `N`:gc:\ [`num`:feat:\ =\ `pl`:fval:\ ]
+     ``N``\ [`num`:feat:\ =\ `pl`:fval:\ ]
 
 |nopar| In ex-num0_, we have introduced some new notation which says that the
-category `N`:gc: has a `feature`:dt: called `num`:feat: (short for
+category ``N`` has a `feature`:dt: called `num`:feat: (short for
 'number') and that the value of this feature is `pl`:fval: (short for
 'plural'). We can add similar annotations to other categories, and use
 them in lexical entries:
@@ -322,12 +322,12 @@ them in lexical entries:
 .. ex::
    .. parsed-literal::
 
-     `Det`:gc:\ [`num`:feat:\ =\ `sg`:fval:\ ] |rarr| 'this'
-     `Det`:gc:\ [`num`:feat:\ =\ `pl`:fval:\ ]  |rarr| 'these'
-     `N`:gc:\ [`num`:feat:\ =\ `sg`:fval:\ ] |rarr| 'dog'
-     `N`:gc:\ [`num`:feat:\ =\ `pl`:fval:\ ] |rarr| 'dogs'
-     `V`:gc:\ [`num`:feat:\ =\ `sg`:fval:\ ] |rarr| 'runs'
-     `V`:gc:\ [`num`:feat:\ =\ `pl`:fval:\ ] |rarr| 'run'
+     ``Det``\ [`num`:feat:\ =\ `sg`:fval:\ ] |rarr| 'this'
+     ``Det``\ [`num`:feat:\ =\ `pl`:fval:\ ]  |rarr| 'these'
+     ``N``\ [`num`:feat:\ =\ `sg`:fval:\ ] |rarr| 'dog'
+     ``N``\ [`num`:feat:\ =\ `pl`:fval:\ ] |rarr| 'dogs'
+     ``V``\ [`num`:feat:\ =\ `sg`:fval:\ ] |rarr| 'runs'
+     ``V``\ [`num`:feat:\ =\ `pl`:fval:\ ] |rarr| 'run'
 
 |nopar| Does this help at all? So far, it looks just like a slightly more
 verbose alternative to what was specified in ex-agcfg1_. Things become
@@ -340,26 +340,26 @@ these to state constraints:
    .. ex::
       .. parsed-literal::
 
-        `S`:gc: |rarr| `NP`:gc:\ [`num`:feat:\ =\ `?n`:math:\ ] `VP`:gc:\ [`num`:feat:\ =\ `?n`:math:\ ]
+        ``S`` |rarr| ``NP``\ [`num`:feat:\ =\ `?n`:math:\ ] ``VP``\ [`num`:feat:\ =\ `?n`:math:\ ]
 
    .. _ex-nprule:
    .. ex::
       .. parsed-literal::
 
-       `NP`:gc:\ [`num`:feat:\ =\ `?n`:math:\ ] |rarr| `Det`:gc:\ [`num`:feat:\ =\ `?n`:math:\ ] `N`:gc:\ [`num`:feat:\ =\ `?n`:math:\ ]
+       ``NP``\ [`num`:feat:\ =\ `?n`:math:\ ] |rarr| ``Det``\ [`num`:feat:\ =\ `?n`:math:\ ] ``N``\ [`num`:feat:\ =\ `?n`:math:\ ]
 
    .. _ex-vprule:
    .. ex::
       .. parsed-literal::
 
-       `VP`:gc:\ [`num`:feat:\ =\ `?n`:math:\ ] |rarr| `V`:gc:\ [`num`:feat:\ =\ `?n`:math:\ ]
+       ``VP``\ [`num`:feat:\ =\ `?n`:math:\ ] |rarr| ``V``\ [`num`:feat:\ =\ `?n`:math:\ ]
 
 |nopar| We are using "`?n`:math:" as a variable over values of `num`:feat:; it can
 be instantiated either to `sg`:fval: or `pl`:fval:. Its scope is
 limited to individual productions. That is, within ex-srule_, for example,
 `?n`:math: must be instantiated to the same constant value; we can
-read the production as saying that whatever value `NP`:gc: takes for the feature
-`num`:feat:, `VP`:gc: must take the same value. 
+read the production as saying that whatever value ``NP`` takes for the feature
+`num`:feat:, ``VP`` must take the same value. 
 
 In order to understand how these feature constraints work, it's
 helpful to think about how one would go about building a tree. Lexical
@@ -382,9 +382,9 @@ depth one):
    .. ex:: 
       .. tree:: (N[NUM=pl] dogs) 
 
-|nopar| Now ex-nprule_ says that whatever the `num`:feat: values of `N`:gc: and
-`Det`:gc: are, they have to be the same. Consequently, ex-nprule_ will
-permit ex-this_ and ex-dog_ to be combined into an `NP`:gc: as shown in
+|nopar| Now ex-nprule_ says that whatever the `num`:feat: values of ``N`` and
+``Det`` are, they have to be the same. Consequently, ex-nprule_ will
+permit ex-this_ and ex-dog_ to be combined into an ``NP`` as shown in
 ex-good1_ and it will also allow ex-these_ and ex-dogs_ to be combined, as in
 ex-good2_. By contrast, ex-bad1_ and ex-bad2_ are prohibited because the roots
 of their
@@ -409,10 +409,10 @@ constituent local trees differ in their values for the `num`:feat: feature.
       .. tree:: (NP[NUM=...] (Det[NUM=PL] these)(N[NUM=SG] dog))
 
 Production ex-vprule_ can be thought of as saying that the `num`:feat: value of the
-head verb has to be the same as the `num`:feat: value of the `VP`:gc:
+head verb has to be the same as the `num`:feat: value of the ``VP``
 mother. Combined with ex-srule_, we derive the consequence that if the
 `num`:feat: value of the subject head noun is `pl`:fval:, then so is
-the `num`:feat: value of the `VP`:gc:\ 's head verb.
+the `num`:feat: value of the ``VP``\ 's head verb.
 
 .. ex::
    .. tree:: (S (NP[NUM=pl] (Det[NUM=pl] these)(N[NUM=pl] dogs))(VP[NUM=pl] (V[NUM=pl] run)))
@@ -455,12 +455,12 @@ far in this chapter, plus a couple of new ones.
    TV[TENSE=past, NUM=?n] -> 'saw' | 'liked'
 
 |nopar| Notice that a syntactic category can have more than one feature; for example,
-`v`:gc:\ [`tense`:feat:\ =\ `pres`:fval:, `num`:feat:\ =\ `pl`:fval:\ ].
+``V``\ [`tense`:feat:\ =\ `pres`:fval:, `num`:feat:\ =\ `pl`:fval:\ ].
 In general, we can add as many features as we like.
 
 Notice also that we have used feature variables in lexical entries as well
 as grammatical productions. For example, `the`:lx: has been assigned the
-category `Det`:gc:\ [`num`:feat:\ =\ `?n`:math:]. Why is this?  Well,
+category ``Det``\ [`num`:feat:\ =\ `?n`:math:]. Why is this?  Well,
 you know that the definite article `the`:lx: can combine with both
 singular and plural nouns. One way of describing this would be to add
 two lexical entries to the grammar, one each for the singular and
@@ -469,7 +469,7 @@ leave the `num`:feat: value `underspecified`:dt: and letting it agree
 in number with whatever noun it combines with.
 
 A final detail about Example code-feat0cfg_ is the statement ``%start S``.
-This a "directive" that tells the parser to take `s`:gc: as the
+This a "directive" that tells the parser to take ``S`` as the
 start symbol for the grammar.
 
 In general, when we are trying to develop even a very small grammar,
@@ -548,7 +548,7 @@ instantiated by constant values in the corresponding feature structure
 in *B'*, and these instantiated values will be used in the new edge
 added by the Completer. This instantiation can be seen, for example,
 in the edge 
-[`np`:gc:\ [`num`:feat:\ =\ `sg`:fval:] |rarr| PropN[`num`:feat:\ =\ `sg`:fval:] |dot|, (0, 1)]
+[``NP``\ [`num`:feat:\ =\ `sg`:fval:] |rarr| PropN[`num`:feat:\ =\ `sg`:fval:] |dot|, (0, 1)]
 in code-featurecharttrace_,
 where the feature `num`:feat: has been assigned the value `sg`:fval:.
 
@@ -584,11 +584,11 @@ features are not written `f`:feat: +, `f`:feat: `-`:math: but simply
 .. ex::
       .. parsed-literal::
 
-        `V`:gc:\ [`tense`:feat:\ =\ `pres`:fval:, `+aux`:feat:\ =\ `+`:math:\ ] |rarr| 'can'
-        `V`:gc:\ [`tense`:feat:\ =\ `pres`:fval:, `+aux`:feat:\ =\ `+`:math:\ ] |rarr| 'may'
+        ``V``\ [`tense`:feat:\ =\ `pres`:fval:, `+aux`:feat:\ =\ `+`:math:\ ] |rarr| 'can'
+        ``V``\ [`tense`:feat:\ =\ `pres`:fval:, `+aux`:feat:\ =\ `+`:math:\ ] |rarr| 'may'
 
-        `V`:gc:\ [`tense`:feat:\ =\ `pres`:fval:, `-aux`:feat: `-`:math:\ ] |rarr| 'walks'
-        `V`:gc:\ [`tense`:feat:\ =\ `pres`:fval:, `-aux`:feat: `-`:math:\ ] |rarr| 'likes'
+        ``V``\ [`tense`:feat:\ =\ `pres`:fval:, `-aux`:feat: `-`:math:\ ] |rarr| 'walks'
+        ``V``\ [`tense`:feat:\ =\ `pres`:fval:, `-aux`:feat: `-`:math:\ ] |rarr| 'likes'
 
 We have spoken informally of attaching "feature annotations" to
 syntactic categories. A more general
@@ -600,9 +600,9 @@ features. Consider, for example, the object we have written as ex-ncat0_.
 .. ex::
       .. parsed-literal::
 
-        `n`:gc:\ [`num`:feat:\ =\ `sg`:fval:\ ] 
+        ``N``\ [`num`:feat:\ =\ `sg`:fval:\ ] 
 
-|nopar| The syntactic category `n`:gc:, as we have seen before, provides part
+|nopar| The syntactic category ``N``, as we have seen before, provides part
 of speech information. This information can itself be captured as a
 feature value pair, using  `pos`:feat: to represent "part of speech":
 
@@ -663,12 +663,12 @@ bundled together. A tiny grammar illustrating this point is shown in ex-agr2_.
 .. ex::
     .. parsed-literal::
 
-      `s`:gc: |rarr| `np`:gc:\ [`agr`:feat:\ =\ `?n`:fval:\ ] `vp`:gc:\ [`agr`:feat:\ =\ `?n`:fval:]
-      `np`:gc:\ [`agr`:feat:\ =\ `?n`:fval:] |rarr| `PropN`:gc:\ [`agr`:feat:\ =\ `?n`:fval:] 
-      `vp`:gc:\ [`tense`:feat:\ =\ `?t`:fval:, `agr`:feat:\ =\ `?n`:fval:] |rarr| `Cop`:gc:\ [`tense`:feat:\ =\ `?t`:fval:, `agr`:feat:\ =\ `?n`:fval:] Adj
+      ``S`` |rarr| ``NP``\ [`agr`:feat:\ =\ `?n`:fval:\ ] ``VP``\ [`agr`:feat:\ =\ `?n`:fval:]
+      ``NP``\ [`agr`:feat:\ =\ `?n`:fval:] |rarr| ``PropN``\ [`agr`:feat:\ =\ `?n`:fval:] 
+      ``VP``\ [`tense`:feat:\ =\ `?t`:fval:, `agr`:feat:\ =\ `?n`:fval:] |rarr| `Cop`:gc:\ [`tense`:feat:\ =\ `?t`:fval:, `agr`:feat:\ =\ `?n`:fval:] Adj
       `Cop`:gc:\ [`tense`:feat:\ =\ `pres`:fval:,  `agr`:feat:\ =\ [`num`:feat:\ =\ `sg`:fval:, `per`:feat:\ =\ `3`:fval:]] |rarr| 'is' 
-      `PropN`:gc:\ [`agr`:feat:\ =\ [`num`:feat:\ =\ `sg`:fval:, `per`:feat:\ =\ `3`:fval:]] |rarr| 'Kim'
-      `Adj`:gc: |rarr| 'happy'
+      ``PropN``\ [`agr`:feat:\ =\ [`num`:feat:\ =\ `sg`:fval:, `per`:feat:\ =\ `3`:fval:]] |rarr| 'Kim'
+      ``Adj`` |rarr| 'happy'
 
 
 .. _sec-feat-comp:
@@ -1098,7 +1098,7 @@ Subcategorization
 
 In Chapter chap-parse_, we proposed to augment our
 category labels to represent different kinds of verb.
-We introduced labels such as `iv`:gc: and `tv`:gc: for intransitive
+We introduced labels such as ``IV`` and ``TV`` for intransitive
 and transitive verbs respectively.  This allowed us to write productions
 like the following:
 
@@ -1106,18 +1106,18 @@ like the following:
 .. ex::
    .. parsed-literal::
 
-      `vp`:gc: |rarr| `iv`:gc: 
-      `vp`:gc: |rarr| `tv np`:gc: 
+      ``VP`` |rarr| ``IV`` 
+      ``VP`` |rarr| `tv np`:gc: 
 
-|nopar| Although we know that `iv`:gc: and `tv`:gc: are two
-kinds of `v`:gc:, from a formal point of view
-`iv`:gc: has no closer relationship with `tv`:gc: than it does
-with `np`:gc:. As it stands, `iv`:gc: and `tv`:gc: are just atomic
+|nopar| Although we know that ``IV`` and ``TV`` are two
+kinds of ``V``, from a formal point of view
+``IV`` has no closer relationship with ``TV`` than it does
+with ``NP``. As it stands, ``IV`` and ``TV`` are just atomic
 nonterminal symbols from a CFG.  This approach doesn't allow us
 to say anything about the class of verbs in general.
 For example, we cannot say something like "All lexical
-items of category `v`:gc: can be marked for tense", since `bark`:lx:,
-say, is an item of category `iv`:gc:, not `v`:gc:.
+items of category ``V`` can be marked for tense", since `bark`:lx:,
+say, is an item of category ``IV``, not ``V``.
 A simple solution, originally developed for a grammar framework
 called Generalized Phrase Structure Grammar (GPSG), stipulates that lexical
 categories may bear a `subcat`:feat: feature whose values are integers.
@@ -1143,10 +1143,10 @@ This is illustrated in a modified portion of Example code-feat0cfg_, shown in ex
      V[SUBCAT=1, TENSE=past, NUM=?n] -> 'saw' | 'liked'
      V[SUBCAT=2, TENSE=past, NUM=?n] -> 'said' | 'claimed'
 
-|nopar| When we see a lexical category like `v`:gc:\ [`subcat`:feat: 
+|nopar| When we see a lexical category like ``V``\ [`subcat`:feat: 
 `1`:fval:\ ], we can interpret the `subcat`:feat: specification as a
-pointer to the production in which `v`:gc:\ [`subcat`:feat: `1`:fval:\ ]
-is introduced as the head daughter in a `vp`:gc: production.
+pointer to the production in which ``V``\ [`subcat`:feat: `1`:fval:\ ]
+is introduced as the head daughter in a ``VP`` production.
 By convention, there is a one-to-one correspondence between
 `subcat`:feat: values and the productions that introduce lexical heads.
 It's worth noting that the choice of integer which acts as a value for
@@ -1154,10 +1154,10 @@ It's worth noting that the choice of integer which acts as a value for
 have chosen 3999, 113 and 57 as our two values in ex-subcatgpsg_.  On this
 approach, `subcat`:feat: can *only* appear on lexical categories; it
 makes no sense, for example, to specify a `subcat`:feat: value on
-`vp`:gc:.
+``VP``.
 
 In our third class of verbs above, we have specified a category
-`s-bar`:gc:. This is a label for subordinate clauses such as the
+``S-BAR``. This is a label for subordinate clauses such as the
 complement of `claim`:lx: in the example `You claim that you like
 children`:lx:. We require two further productions to analyze such sentences:
 
@@ -1180,25 +1180,25 @@ and Head-driven Phrase Structure Grammar. Rather than using
 `subcat`:feat: values as a way of indexing productions, the `subcat`:feat:
 value directly encodes the valency of a head (the list of
 arguments that it can combine with). For example, a verb like
-`put`:lx: that takes  `np`:gc: and `pp`:gc: complements (`put the
+`put`:lx: that takes  ``NP`` and ``PP`` complements (`put the
 book on the table`:lx:) might be represented as ex-subcathpsg0_:
 
 .. TODO: angle brackets don't appear
 
 .. _ex-subcathpsg0:
-.. ex::  `v`:gc:\ [`subcat`:feat: |langle|\ `np`:gc:, `np`:gc:, `pp`:gc:\ |rangle| ] 
+.. ex::  ``V``\ [`subcat`:feat: |langle|\ ``NP``, ``NP``, ``PP``\ |rangle| ] 
 
 |nopar| This says that the verb can combine with three  arguments. The
-leftmost element in the list is the subject `np`:gc:, while everything
-else |mdash| an `np`:gc: followed by a `pp`:gc: in this case |mdash| comprises the
+leftmost element in the list is the subject ``NP``, while everything
+else |mdash| an ``NP`` followed by a ``PP`` in this case |mdash| comprises the
 subcategorized-for complements. When a verb like `put`:lx: is combined
 with appropriate complements, the requirements which are specified in
-the  `subcat`:feat: are discharged, and only a subject `np`:gc: is
+the  `subcat`:feat: are discharged, and only a subject ``NP`` is
 needed. This category, which corresponds to what is traditionally
-thought of as `vp`:gc:, might be represented as follows.
+thought of as ``VP``, might be represented as follows.
 
 .. _ex-subcathpsg1:
-.. ex::  `v`:gc:\ [`subcat`:feat: |langle|\ `np`:gc:\ |rangle| ] 
+.. ex::  ``V``\ [`subcat`:feat: |langle|\ ``NP``\ |rangle| ] 
 
 Finally, a sentence is a kind of verbal category that has *no*
 requirements for further arguments, and hence has a `subcat`:feat:
@@ -1215,27 +1215,27 @@ Heads Revisited
 We noted in the previous section that by factoring subcategorization
 information out of the main category label, we could express more
 generalizations about properties of verbs. Another property of this
-kind is the following: expressions of category `v`:gc: are heads of
-phrases of category `vp`:gc:. Similarly (and more informally) `n`:gc:\
-s are heads of `np`:gc:\ s,  `a`:gc:\
-s (i.e., adjectives) are heads of `ap`:gc:\ s,  and `p`:gc:\
-s (i.e., adjectives) are heads of `pp`:gc:\ s. Not all phrases have
+kind is the following: expressions of category ``V`` are heads of
+phrases of category ``VP``. Similarly (and more informally) ``N``\
+s are heads of ``NP``\ s,  ``A``\
+s (i.e., adjectives) are heads of ``AP``\ s,  and ``P``\
+s (i.e., adjectives) are heads of ``PP``\ s. Not all phrases have
 heads |mdash| for example, it is standard to say that coordinate
 phrases (e.g., `the book and the bell`:lx:) lack heads |mdash|
 nevertheless, we would like our grammar formalism to express the
 mother / head-daughter
 relation where it holds. Now, although it looks as though there is
-something in common  between, say, `v`:gc: and `vp`:gc:, this is more
-of a handy convention than a real claim, since  `v`:gc: and `vp`:gc:
-formally have no more in common than `v`:gc: and `Det`:gc:. 
+something in common  between, say, ``V`` and ``VP``, this is more
+of a handy convention than a real claim, since  ``V`` and ``VP``
+formally have no more in common than ``V`` and ``Det``. 
 
 X-bar syntax (cf. [Chomsky1970RN]_, [Jackendoff1977XS]_) addresses
 this issue by abstracting out the notion of `phrasal level`:dt:. It is
-usual to recognize three such levels. If `n`:gc: represents the
-lexical level, then `n`:gc:\ ' represents the next level up,
-corresponding to the more traditional category `Nom`:gc:, while
-`n`:gc:\ '' represents the phrasal level, corresponding to the
-category `np`:gc:. (The primes here replace the typographically more
+usual to recognize three such levels. If ``N`` represents the
+lexical level, then ``N``\ ' represents the next level up,
+corresponding to the more traditional category ``Nom``, while
+``N``\ '' represents the phrasal level, corresponding to the
+category ``NP``. (The primes here replace the typographically more
 demanding horizontal bars of [Chomsky1970RN]_). ex-xbar0_ illustrates a
 representative structure.
 
@@ -1243,17 +1243,17 @@ representative structure.
 .. ex::
    .. tree:: (N''(Det a)(N'(N student)(P'' of\ French)))
 
-|nopar| The head of the structure ex-xbar0_ is `n`:gc: while `n`:gc:\ '
-and `n`:gc:\ '' are called `(phrasal) projections`:dt: of `n`:gc:. `n`:gc:\ ''
-is the `maximal projection`:dt:, and `n`:gc: is sometimes called the
+|nopar| The head of the structure ex-xbar0_ is ``N`` while ``N``\ '
+and ``N``\ '' are called `(phrasal) projections`:dt: of ``N``. ``N``\ ''
+is the `maximal projection`:dt:, and ``N`` is sometimes called the
 `zero projection`:dt:. One of the central claims of X-bar syntax is
-that all constituents share a structural similarity. Using `x`:gc: as
-a variable over `n`:gc:, `v`:gc:, `a`:gc: and `p`:gc:, we say that
+that all constituents share a structural similarity. Using ``X`` as
+a variable over ``N``, ``V``, ``A`` and ``P``, we say that
 directly subcategorized `complements`:em: of the head are always
 placed as sisters of the lexical head, whereas `adjuncts`:em: are
-placed as sisters of the intermediate category, `x`:gc:\ '. Thus, the
-configuration of the `p`:gc:\ '' adjunct in ex-xbar1_ contrasts with that
-of the complement `p`:gc:\ '' in ex-xbar0_.
+placed as sisters of the intermediate category, ``X``\ '. Thus, the
+configuration of the ``P``\ '' adjunct in ex-xbar1_ contrasts with that
+of the complement ``P``\ '' in ex-xbar0_.
 
 .. _ex-xbar1:
 .. ex::
@@ -1266,10 +1266,10 @@ using feature structures.
 .. ex::
    .. parsed-literal::
 
-     `s`:gc: |rarr| `n`:gc:\ [`bar`:feat:\ =\ `2`:fval:] `v`:gc:\ [`bar`:feat:\ =\ `2`:fval:]
-     `n`:gc:\ [`bar`:feat:\ =\ `2`:fval:] |rarr| `Det n`:gc:\ [`bar`:feat:\ =\ `1`:fval:]
-     `n`:gc:\ [`bar`:feat:\ =\ `1`:fval:] |rarr| `n`:gc:\ [`bar`:feat:\ =\ `1`:fval:] `p`:gc:\ [`bar`:feat:\ =\ `2`:fval:] 
-     `n`:gc:\ [`bar`:feat:\ =\ `1`:fval:] |rarr| `n`:gc:\ [`bar`:feat:\ =\ `0`:fval:] `p`:gc:\ [`bar`:feat:\ =\ `2`:fval:] 
+     ``S`` |rarr| ``N``\ [`bar`:feat:\ =\ `2`:fval:] ``V``\ [`bar`:feat:\ =\ `2`:fval:]
+     ``N``\ [`bar`:feat:\ =\ `2`:fval:] |rarr| `Det n`:gc:\ [`bar`:feat:\ =\ `1`:fval:]
+     ``N``\ [`bar`:feat:\ =\ `1`:fval:] |rarr| ``N``\ [`bar`:feat:\ =\ `1`:fval:] ``P``\ [`bar`:feat:\ =\ `2`:fval:] 
+     ``N``\ [`bar`:feat:\ =\ `1`:fval:] |rarr| ``N``\ [`bar`:feat:\ =\ `0`:fval:] ``P``\ [`bar`:feat:\ =\ `2`:fval:] 
 
 
 Auxiliary Verbs and Inversion
@@ -1342,8 +1342,8 @@ following production:
      S[+inv] -> V[+AUX] NP VP
 
 |nopar| That is, a clause marked as [`+inv`:feat:] consists of an auxiliary
-verb followed by a `vp`:gc:. (In a more detailed grammar, we would
-need to place some constraints on the form of the `vp`:gc:, depending
+verb followed by a ``VP``. (In a more detailed grammar, we would
+need to place some constraints on the form of the ``VP``, depending
 on the choice of auxiliary.) ex-invtree_ illustrates the structure of an
 inverted clause.
 
@@ -1392,8 +1392,8 @@ Consider the following contrasts:
 
       \*You put.
 
-The verb `like`:lx: requires an `np`:gc: complement, while
-`put`:lx: requires both a following `np`:gc: and `pp`:gc:. Examples
+The verb `like`:lx: requires an ``NP`` complement, while
+`put`:lx: requires both a following ``NP`` and ``PP``. Examples
 ex-gap1_ and ex-gap2_ show that these complements are *obligatory*:
 omitting them leads to ungrammaticality. Yet there are contexts in
 which obligatory complements can be omitted, as ex-gap3_ and ex-gap4_
@@ -1509,9 +1509,9 @@ A variety of mechanisms have been suggested for handling unbounded
 dependencies in formal grammars; we shall adopt an approach due to
 Generalized Phrase Structure Grammar that involves something called
 `slash categories`:dt:. A slash category is something of the form
-`y/xp`:gc:; we interpret this as a phrase of category `y`:gc: that
-is missing a sub-constituent of category `xp`:gc:. For example,
-`s/np`:gc: is an `s`:gc: that is missing an `np`:gc:. The use of
+`y/xp`:gc:; we interpret this as a phrase of category ``Y`` that
+is missing a sub-constituent of category ``XP``. For example,
+`s/np`:gc: is an ``S`` that is missing an ``NP``. The use of
 slash categories is illustrated in ex-gaptree1_. 
 
 .. _ex-gaptree1:
@@ -1519,19 +1519,19 @@ slash categories is illustrated in ex-gaptree1_.
       .. tree:: (S(NP[+WH] who)(S[+INV]\/NP (V[+AUX,\ SUBCAT=3] do)(NP[-WH] you)(VP/NP(V[-AUX,\ SUBCAT=1] like)(NP/NP e))))
 
 |nopar| The top part of the tree introduces the filler `who`:lx: (treated as
-an expression of category `np`:gc:\ [`+wh`:feat:]) together with a
+an expression of category ``NP``\ [`+wh`:feat:]) together with a
 corresponding gap-containing constituent `s/np`:gc:. The gap information is
 then "percolated" down the tree via the `vp/np`:gc: category, until it
-reaches the category `np/np`:gc:. At this point, the dependency 
+reaches the category ``NP/NP``. At this point, the dependency 
 is discharged by realizing the gap information as the empty string `e`
-immediately dominated by `np/np`:gc:.
+immediately dominated by ``NP/NP``.
 
 Do we need to think of slash categories as a completely new kind of
 object in our grammars?  Fortunately, no, we don't |mdash| in fact, we
 can accommodate them within our existing feature-based framework. We
 do this by treating slash as a feature, and the category to its right
 as a value. In other words, our "official" notation for `s/np`:gc:
-will be `s`:gc:\ [`slash`:feat:\ =\ `NP`:fval:\ ]. Once we have taken this
+will be ``S``\ [`slash`:feat:\ =\ `NP`:fval:\ ]. Once we have taken this
 step, it is straightforward to write a small grammar for
 analyzing unbounded dependency constructions.  Example code-slashcfg_ illustrates
 the main principles of slash categories, and also includes productions for
@@ -1570,27 +1570,27 @@ The grammar in Example code-slashcfg_ contains one gap-introduction production,
 .. ex::
    .. parsed-literal::
 
-      `s[-inv]`:gc: |rarr| `np`:gc: `s/np`:gc: 
+      `s[-inv]`:gc: |rarr| ``NP`` `s/np`:gc: 
 
 In order to percolate the slash feature correctly, we need to add
 slashes with variable values to both sides of the arrow in productions
-that expand `s`:gc:, `vp`:gc: and `np`:gc:. For example,
+that expand ``S``, ``VP`` and ``NP``. For example,
 
 .. ex::
    .. parsed-literal::
 
-      `vp/?x`:gc: |rarr| `v`:gc: `s-bar/?x`:gc: 
+      `vp/?x`:gc: |rarr| ``V`` `s-bar/?x`:gc: 
 
-|nopar| says that a slash value can be specified on the `vp`:gc: mother of a
-constituent if the same value is also specified on the `s-bar`:gc:
-daughter. Finally, empty_ allows the slash information on `np`:gc: to
+|nopar| says that a slash value can be specified on the ``VP`` mother of a
+constituent if the same value is also specified on the ``S-BAR``
+daughter. Finally, empty_ allows the slash information on ``NP`` to
 be discharged as the empty string.
 
 ..  _empty:
 .. ex::
    .. parsed-literal::
 
-      `np/np`:gc: |rarr|
+      ``NP/NP`` |rarr|
 
 Using code-slashcfg_, we can parse the string `who do you claim that you
 like`:lx:  into the tree shown in ex-gapparse_.
@@ -1972,7 +1972,7 @@ Exercises
 
 #. |hard| So-called `head features`:dt: are shared between the mother
    and head daughter. For example, `tense`:feat: is a head feature
-   that is shared between a `vp`:gc: and its head `v`:gc:
+   that is shared between a ``VP`` and its head ``V``
    daughter. See [Gazdar1985GPS]_ for more details. Most of the
    features we have looked at are head features |mdash| exceptions are
    `subcat`:feat: and `slash`:feat:. Since the sharing of head
diff --git a/book/ch09.rst b/book/ch09.rst
index 8fdaf7e6..c0524857 100644
--- a/book/ch09.rst
+++ b/book/ch09.rst
@@ -65,6 +65,8 @@
 9. Building Feature Based Grammars
 ==================================
 
+.. XXX missing standard opening
+
 ------------
 Introduction
 ------------
@@ -192,9 +194,11 @@ these observations:
       `these`:lex:.pos = Det
       `these`:lex:.number = plural
 
+.. XXX second sentence below is too complex
+
 Thus ex-feat0_ is a `partial description` of a word; it lists some
-attributes, or features, of the word, and declares their values. There
-are other attributes that we might be interested in, which have not
+properties of the word, and declares their values. There
+are other properties that we might be interested in, which have not
 been specified; for example, in a given grammatical context, what head
 the word is dependent on (using the notion of dependency discussed in
 chap-parse_), and what the lemma of the word is. But the omission of
@@ -482,7 +486,7 @@ head verb.
 .. ex::
    .. tree:: (S (NP[NUM=pl] (Det[NUM=pl] these)(N[NUM=pl] dogs))(VP[NUM=pl] (V[NUM=pl] run)))
 
-ex-agcfg2_ illustrated lexical productions for determiners like `this`:lx:
+Grammar ex-agcfg2_ illustrated lexical productions for determiners like `this`:lx:
 and `these`:lx: which require a singular or plural head noun
 respectively. However, other determiners in English are not choosy
 about the grammatical number of the noun they combine with.
@@ -502,13 +506,18 @@ value to ``NUM`` is one way of achieving this result::
     Det[NUM=?n] -> 'the' | 'some' | 'several' 
 
 But in fact we can be even more economical, and just omit any
-specification for ``NUM`` in the relevant lexical productions. We only need
+specification for ``NUM`` in such productions. We only need
 to explicitly enter a variable value when this constrains another
 value elsewhere in the same production.
 
 The grammar in code-feat0cfg_ illustrates most of the ideas we have introduced so
 far in this chapter, plus a couple of new ones.
 
+.. XXX name show_cfg() is idiosyncratic for something which prints a file
+
+.. XXX The contents of feat0.fcfg seems to have changed in the file.
+   I won't pull in the updated version in case the discussion also needs to be updated.
+
 .. pylisting:: code-feat0cfg
    :caption: Example Feature based Grammar
 
@@ -550,7 +559,7 @@ example,
 In general, we can add as many features as we like.
 
 A final detail about code-feat0cfg_ is the statement ``%start S``.
-This a "directive" that tells the parser to take ``S`` as the
+This "directive" tells the parser to take ``S`` as the
 start symbol for the grammar.
 
 In general, when we are trying to develop even a very small grammar,
@@ -560,7 +569,7 @@ tested and revised.  We have saved code-feat0cfg_ as a file named
 copy of this for further experimentation using ``nltk.data.load()``.
 
 Feature based grammars are parsed in |NLTK| using an Earley chart
-parser. For more information about this parsing technique, see [URL].
+parser (see sec-featgram-further-reading_ for more information about this).
 After tokenizing the input, we import the ``load_earley`` function
 load_earley1_ which takes a grammar filename as input and returns a
 chart parser ``cp`` load_earley2_.  Calling the parser's
@@ -594,7 +603,7 @@ is syntactically ambiguous or not.
 The details of the parsing procedure are not that important for
 present purposes. However, there is an implementation issue which
 bears on our earlier discussion of grammar size. One possible approach
-to parsing productions containing features constraints is to compile
+to parsing productions containing feature constraints is to compile
 out all admissible values of the features in question so that we end
 up with a large, fully specified CFG along the lines of ex-agcfg1_. By
 contrast, the parser process illustrated above works directly with the
@@ -628,16 +637,16 @@ So far, we have only seen feature values like ``sg`` and
 ``pl``. These simple values are usually called `atomic`:dt:
 |mdash| that is, they can't be decomposed into subparts. A special
 case of atomic values are `boolean`:dt: values, that is, values that
-just specify whether a property is true or false of a category. For
+just specify whether a property is true or false. For
 example, we might want to distinguish `auxiliary`:dt: verbs such as
 `can`:lx:, `may`:lx:, `will`:lx: and `do`:lx: with the boolean feature
 ``AUX``. For example, the production ``V[TENSE=pres, aux=+] -> 'can'``
-means that `can`:lx receives the value ``pres`` for ``TENSE`` and
+means that `can`:lx: receives the value ``pres`` for ``TENSE`` and
 ``+`` or ``true`` for ``AUX``. There is a widely adopted
 convention which abbreviates the representation of boolean
 features ``f``; instead of ``aux=+`` or ``aux=-``, we use ``+aux`` and
 ``-aux`` respectively. These are just abbreviations, however, and the
-parser should interpret them as though ``+`` and ``-`` are like any
+parser interprets them as though ``+`` and ``-`` are like any
 other atomic value. ex-lex_ shows some representative productions:
 
 .. _ex-lex:
@@ -651,22 +660,19 @@ other atomic value. ex-lex_ shows some representative productions:
         V[TENSE=pres, -aux] -> 'likes'
 
 We have spoken informally of attaching "feature annotations" to
-syntactic categories. A more general
-approach is to treat the whole category |mdash| that is, the
-non-terminal symbol plus the annotation |mdash| as a bundle of
-features. Consider, for example, the category  ``N[NUM=sg]``
-The syntactic category ``N``, as we have seen before, provides part
-of speech information. This information can itself be captured as a
-feature value pair, using  ``POS`` to represent "part of speech":
-``[POS=N, NUM=sg]``. In fact, we  regard this as our "official" representation of a
-feature based linguistic category, and ``N[NUM=sg]`` as a convenient abbreviation.
+syntactic categories.  More generally, we can treat the whole category
+|mdash| that is, the non-terminal symbol plus the annotation |mdash|
+as a bundle of features.  For example, ``N[NUM=sg]`` contains
+with part of speech information which can be represented as
+``POS=N``.  Our "official" representation for this category
+is ``[POS=N, NUM=sg]``, while ``N[NUM=sg]`` is just a convenient abbreviation.
 A bundle of feature-value pairs is called a `feature structure`:dt:
 or an `attribute value matrix`:dt: (AVM). A feature structure that
 contains a specification for the feature ``POS`` is a `linguistic
 category`:dt:. 
 
 In addition to atomic-valued features, we allow features whose values
-are themselves feature structures. For example, we might want to group
+are themselves feature structures. For example, we can group
 together agreement features (e.g., person, number and gender) as a
 distinguished part of a category as shown in ex-agr0_. In this case,
 we say that the feature ``AGR`` has a `complex`:dt: value.
@@ -681,19 +687,20 @@ we say that the feature ``AGR`` has a `complex`:dt: value.
         [      [NUM = pl  ]]
         [      [GND = fem ]]
 
-In passing, we should point out that there are alternative approaches
-for presenting AVMs; fig-avm1_ shows an example.
-
 .. _fig-avm1:
 .. figure:: ../images/avm1.png
    :scale: 60
 
    Alternative rendering of an Attribute Value Matrix
 
+In passing, we should point out that there are alternative approaches
+for displaying AVMs; fig-avm1_ shows an example.
 Athough feature structures rendered in the style of ex-agr0_ are less
 visually pleasing, we will stick with this format, since it
 corresponds to the output we will be getting from |NLTK|.
 
+.. XXX if people think of these as dictionaries there's nothing surprising about order
+
 On the topic of representation, we also note that there is no
 particular significance to the *order* of features in a feature
 structure. So ex-agr1_ is equivalent to ex-agr0_.
@@ -702,11 +709,11 @@ structure. So ex-agr1_ is equivalent to ex-agr0_.
 .. ex::
     ::
 
-        [agr = [num = pl  ]]
-        [      [per = 3   ]]
-        [      [gnd = fem ]]
+        [AGR = [NUM = pl  ]]
+        [      [PER = 3   ]]
+        [      [GND = fem ]]
         [                  ]
-        [pos = N           ]
+        [POS = N           ]
 
 Once we have the possibility of using features like `agr`:feat:, we
 can refactor a grammar like code-feat0cfg_ so that agreement features are
@@ -777,33 +784,31 @@ An alternative method of specifying feature structures is to
 use a bracketed string consisting of feature-value pairs in the format
 ``feature=value``, where values may themselves be feature structures:
 
-    >>> nltk.FeatStruct("[POS='N', AGR=[PER=3, NUM='pl', GND='fem']]")
-    [AGR=[GND='fem', NUM='pl', PER=3], POS='N']
-
+    >>> print nltk.FeatStruct("[POS='N', AGR=[PER=3, NUM='pl', GND='fem']]")
+    [       [ PER = 3     ] ]
+    [ AGR = [ GND = 'fem' ] ]
+    [       [ NUM = 'pl'  ] ]
+    [                       ]
+    [ POS = 'N'             ]
 
 Feature structures are not inherently tied to linguistic objects; they are
 general purpose structures for representing knowledge. For example, we
 could encode information about a person in a feature structure:
 
-    >>> person01 = nltk.FeatStruct(name='Lee', telno='01 27 86 42 96', age=33)
-
-.. _ex-person01:
-.. ex::
-   :: 
+    >>> print nltk.FeatStruct(name='Lee', telno='01 27 86 42 96', age=33)
+    [ age   = 33               ]
+    [ name  = 'Lee'            ]
+    [ telno = '01 27 86 42 96' ]
 
-        [name = `Lee'            ]
-        [telno = 01 27 86 42 96  ]
-        [age = 33                ]
-
-In the next couple of pages, we are going to use examples like
-ex-person01_ to explore standard operations over feature
-structures. This might seem to be taking us away from processing
-natural language, but we need to lay the ground work before we can
+In the next couple of pages, we are going to use examples like this
+to explore standard operations over feature structures.
+This will briefly divert us from processing natural language,
+but we need to lay the groundwork before we can
 get back to talking about grammars. Hang on tight!
 
 It is often helpful to view feature structures as graphs; more
-specifically, `directed acyclic graphs`:dt: (DAGs). ex-dag01_ is equivalent to
-the AVM ex-person01_.
+specifically, `directed acyclic graphs`:dt: (DAGs).
+ex-dag01_ is equivalent to the above AVM.
 
 .. _ex-dag01:
 .. ex::
@@ -851,16 +856,14 @@ such as ex-dag03_ are said to involve `structure sharing`:dt: or
 `reentrancy`:dt:. When two paths have the same value, they are said to
 be `equivalent`:dt:.
 
-There are a number of notations for representing reentrancy in
-matrix-style representations of feature structures. We adopt
-the following convention: the first occurrence of a shared feature structure 
-is prefixed with an integer in parentheses, such as ``(1)``, and any
-subsequent reference to that structure uses the notation
+In order to indicate reentrancy in our matrix-style representations, we will
+prefix the first occurrence of a shared feature structure 
+with an integer in parentheses, such as ``(1)``.
+Any later reference to that structure will use the notation
 ``->(1)``, as shown below.
 
-    >>> fs = nltk.FeatStruct("""[NAME='Lee', ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'], 
-    ...                         SPOUSE=[NAME='Kim', ADDRESS->(1)]]""")
-    >>> print fs
+    >>> print nltk.FeatStruct("""[NAME='Lee', ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'], 
+    ...                          SPOUSE=[NAME='Kim', ADDRESS->(1)]]""")
     [ ADDRESS = (1) [ NUMBER = 74           ] ]
     [               [ STREET = 'rue Pascal' ] ]
     [                                         ]
@@ -874,18 +877,13 @@ The bracketed integer is sometimes called a `tag`:dt: or a
 `coindex`:dt:. The choice of integer is not significant.
 There can be any number of tags within a single feature structure.
 
-    >>> fs1 = nltk.FeatStruct("[A='a', B=(1)[C='c'], D->(1), E->(1)]")
-
-.. _ex-reentrant02:
-.. ex::
-    ::
-
-         [ A = 'a'             ]
-         [                     ]
-         [ B = (1) [ C = 'c' ] ]
-         [                     ]
-         [ D -> (1)            ]
-         [ E -> (1)            ]
+    >>> print nltk.FeatStruct("[A='a', B=(1)[C='c'], D->(1), E->(1)]")
+    [ A = 'a'             ]
+    [                     ]
+    [ B = (1) [ C = 'c' ] ]
+    [                     ]
+    [ D -> (1)            ]
+    [ E -> (1)            ]
 
 
 .. TODO following AVM doesn't currently parse
@@ -988,15 +986,13 @@ Merging information from two feature structures is called
     [ NUMBER = 74           ]
     [ STREET = 'rue Pascal' ]
 
-Unification is formally defined as a binary operation: `FS`:math:\
-:subscript:`0` |SquareIntersection| `FS`:math:\
-:subscript:`1`. Unification is symmetric, so 
-
-.. ex::
-    `FS`:math:\ :subscript:`0` |SquareIntersection| `FS`:math:\
-    :subscript:`1` = `FS`:math:\ :subscript:`1` |SquareIntersection|
-    `FS`:math:\ :subscript:`0`.
-
+Unification is formally defined as a binary operation:
+`FS`:math:\ :subscript:`0` |SquareIntersection|
+`FS`:math:\ :subscript:`1`.
+Unification is symmetric, so 
+`FS`:math:\ :subscript:`0` |SquareIntersection|
+`FS`:math:\ :subscript:`1` = `FS`:math:\ :subscript:`1` |SquareIntersection|
+`FS`:math:\ :subscript:`0`.
 The same is true in Python:
 
     >>> print fs2.unify(fs1)
@@ -1043,43 +1039,33 @@ things become really interesting. First, let's define ex-dag04_ in Python:
     ...                           SPOUSE= [NAME=Kim,
     ...                                    ADDRESS=[NUMBER=74, 
     ...                                             STREET='rue Pascal']]]""")
-
-.. _ex-unification01:
-.. ex::
-     :: 
-
-         [ address = [ number = 74           ]               ]
-         [           [ street = `rue Pascal' ]               ]
-         [                                                   ]
-         [ name    = `Lee'                                   ]
-         [                                                   ]
-         [           [ address = [ number = 74           ] ] ]
-         [ spouse  = [           [ street = `rue Pascal' ] ] ]
-         [           [                                     ] ]
-         [           [ name    = `Kim'                     ] ]
+    >>> print fs0
+    [ ADDRESS = [ NUMBER = 74           ]               ]
+    [           [ STREET = 'rue Pascal' ]               ]
+    [                                                   ]
+    [ NAME    = 'Lee'                                   ]
+    [                                                   ]
+    [           [ ADDRESS = [ NUMBER = 74           ] ] ]
+    [ SPOUSE  = [           [ STREET = 'rue Pascal' ] ] ]
+    [           [                                     ] ]
+    [           [ NAME    = 'Kim'                     ] ]
 
 What happens when we augment Kim's address with a specification
-for `city`:feat:? (Notice that ``fs1`` includes the whole path from the root of
-the feature structure down to `city`:feat:.)
+for `city`:feat:?  Notice that ``fs1`` needs to include the
+whole path from the root of the feature structure down to `city`:feat:.
 
     >>> fs1 = nltk.FeatStruct("[SPOUSE = [ADDRESS = [CITY = Paris]]]")
-
-ex-unification02_ shows the result of unifying ``fs0`` with ``fs1``:
-
-.. _ex-unification02:
-.. ex::
-     :: 
-
-         [ address = [ number = 74           ]               ]
-         [           [ street = `rue Pascal' ]               ]
-         [                                                   ]
-         [ name    = `Lee'                                   ]
-         [                                                   ]
-         [           [           [ city   = `Paris'      ] ] ]
-         [           [ address = [ number = 74           ] ] ]
-         [ spouse  = [           [ street = `rue Pascal' ] ] ]
-         [           [                                     ] ]
-         [           [ name    = `Kim'                     ] ]
+    >>> print fs1.unify(fs0)
+    [ ADDRESS = [ NUMBER = 74           ]               ]
+    [           [ STREET = 'rue Pascal' ]               ]
+    [                                                   ]
+    [ NAME    = 'Lee'                                   ]
+    [                                                   ]
+    [           [           [ CITY   = 'Paris'      ] ] ]
+    [           [ ADDRESS = [ NUMBER = 74           ] ] ]
+    [ SPOUSE  = [           [ STREET = 'rue Pascal' ] ] ]
+    [           [                                     ] ]
+    [           [ NAME    = 'Kim'                     ] ]
 
 By contrast, the result is very different if ``fs1`` is unified with
 the structure-sharing version ``fs2`` (also shown earlier as the graph
@@ -1087,19 +1073,15 @@ ex-dag03_):
 
     >>> fs2 = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
     ...                           SPOUSE=[NAME=Kim, ADDRESS->(1)]]""")
-
-.. _ex-unification03:
-.. ex::
-     :: 
-
-         [               [ city   = `Paris'      ] ]
-         [ address = (1) [ number = 74           ] ]
-         [               [ street = `rue Pascal' ] ]
-         [                                         ]
-         [ name    = `Lee'                         ]
-         [                                         ]
-         [ spouse  = [ address -> (1)  ]           ]
-         [           [ name    = `Kim' ]           ]
+    >>> print fs1.unify(fs2)
+    [               [ CITY   = 'Paris'      ] ]
+    [ ADDRESS = (1) [ NUMBER = 74           ] ]
+    [               [ STREET = 'rue Pascal' ] ]
+    [                                         ]
+    [ NAME    = 'Lee'                         ]
+    [                                         ]
+    [ SPOUSE  = [ ADDRESS -> (1)  ]           ]
+    [           [ NAME    = 'Kim' ]           ]
 
 Rather than just updating what was in effect Kim's "copy" of Lee's address,
 we have now updated `both`:em: their addresses at the same time. More
@@ -1107,6 +1089,8 @@ generally, if a unification involves specializing the value of some
 path |pi|, then that unification simultaneously specializes the value
 of `any path that is equivalent to`:em: |pi|.
 
+.. XXX The ?x gets broken across lines
+
 As we have already seen, structure sharing can also be stated
 using variables such as ``?x``. 
 
@@ -1123,23 +1107,21 @@ using variables such as ``?x``.
 
 
 
-
 .. _sec-extending-a-feature-based-grammar:
 
 ---------------------------------
 Extending a Feature based Grammar
 ---------------------------------
 
-In this section, we are going to use the framework established earlier
-in the chapter to examine a variety of different linguistic issues. In the
-process, we hope to demonstrate the advantages that flow from the
-flexibility of using features in the grammar.
+In this section, we return to feature based grammar and explore
+a variety of linguistic issues, and demonstrate the benefits
+of incorporating features into the grammar.
 
 Subcategorization
 -----------------
 
-In chap-parse_, we proposed to augment our category labels to
-represent different kinds of verb, and introduced labels such as
+In chap-parse_, we augmented our category labels to
+represent different kinds of verb, and used the labels
 ``IV`` and ``TV`` for intransitive and transitive verbs
 respectively.  This allowed us to write productions like the
 following:
@@ -1151,26 +1133,22 @@ following:
       VP -> IV 
       VP -> TV NP 
 
-Although we know that ``IV`` and ``TV`` are two
-kinds of ``V``, from a formal point of view
-``IV`` has no closer relationship with ``TV`` than it does
-with ``NP``. As it stands, ``IV`` and ``TV`` are just atomic
-nonterminal symbols from a CFG.  This approach doesn't allow us
-to say anything about the class of verbs in general.
-For example, we cannot say something like "All lexical
-items of category ``V`` can be marked for tense", since `walk`:lx:,
-say, is an item of category ``IV``, not ``V``.
-The question arises as to whether we can replace  category labels such
-as ``TV`` and ``IV`` by ``V`` together with a feature that tells us
-whether the verb in question combines with (or is subcategorized for)
-a following ``NP`` object or whether it can occur without any
-complement.
+Although we know that ``IV`` and ``TV`` are two kinds of ``V``,
+they are just atomic nonterminal symbols from a CFG, as distinct
+from each other as any other pair of symbols.  This notation doesn't
+let us say anything about verbs in general, e.g. we cannot say
+"All lexical items of category ``V`` can be marked for tense",
+since `walk`:lx:, say, is an item of category ``IV``, not ``V``.
+So, can we replace category labels such as ``TV`` and ``IV``
+by ``V`` along with a feature that tells us whether
+the verb combines with a following ``NP`` object
+or whether it can occur without any complement?
 
 A simple approach, originally developed for a grammar framework
 called Generalized Phrase Structure Grammar (GPSG), tries to solve
 this problem by allowing lexical
 categories to bear a ``SUBCAT`` which tells us what subcategorization
-class the item belongs to. While GPSG  used integer values for
+class the item belongs to. While GPSG used integer values for
 ``SUBCAT``, the example below adopts more mnemonic values, namely
 ``intrans``, ``trans`` and ``clause``:
 
@@ -1242,15 +1220,14 @@ book on the table`:lx:) might be represented as ex-subcathpsg0_:
  
       V[SUBCAT=<NP, NP, PP>]
 
-
-This says that the verb can combine with three  arguments. The
-leftmost element in the list is the subject `np`, while everything
-else |mdash| an `np` followed by a `pp` in this case |mdash| comprises the
+This says that the verb can combine with three arguments. The
+leftmost element in the list is the subject ``NP``, while everything
+else |mdash| an ``NP`` followed by a ``PP`` in this case |mdash| comprises the
 subcategorized-for complements. When a verb like `put`:lx: is combined
 with appropriate complements, the requirements which are specified in
-the  ``SUBCAT`` are discharged, and only a subject `np` is
+the  ``SUBCAT`` are discharged, and only a subject ``NP`` is
 needed. This category, which corresponds to what is traditionally
-thought of as `vp`, might be represented as follows.
+thought of as ``VP``, might be represented as follows.
 
 .. _ex-subcathpsg1:
 .. ex::  
@@ -1258,7 +1235,7 @@ thought of as `vp`, might be represented as follows.
  
       V[SUBCAT=<NP>]
 
-Finally, a sentence is a kind of verbal category that has *no*
+Finally, a sentence is a kind of verbal category that has `no`:em:
 requirements for further arguments, and hence has a ``SUBCAT``
 whose value is the empty list. The tree ex-subcathpsg2_ shows how these
 category assignments combine in a parse of `Kim put the book on the table`:lx:.
@@ -1276,36 +1253,42 @@ We noted in the previous section that by factoring subcategorization
 information out of the main category label, we could express more
 generalizations about properties of verbs. Another property of this
 kind is the following: expressions of category ``V`` are heads of
-phrases of category ``VP``. Similarly (and more informally) ``N``\
-s are heads of ``NP``\ s,  ``A``\
-s (i.e., adjectives) are heads of ``AP``\ s,  and ``P``\
-s (i.e., adjectives) are heads of ``PP``\ s. Not all phrases have
-heads |mdash| for example, it is standard to say that coordinate
+phrases of category ``VP``. Similarly,
+``N``\ s are heads of ``NP``\ s,
+``A``\ s (i.e., adjectives) are heads of ``AP``\ s,  and
+``P``\ s (i.e., adjectives) are heads of ``PP``\ s.
+Not all phrases have heads |mdash| for example, it is standard to say that coordinate
 phrases (e.g., `the book and the bell`:lx:) lack heads |mdash|
 nevertheless, we would like our grammar formalism to express the
-parent / head-child
-relation where it holds. Now, although it looks as though there is
-something in common  between, say, ``V`` and ``VP``, this is more
-of a handy convention than a real claim, since  ``V`` and ``VP``
-formally have no more in common than ``V`` and ``Det``. 
+parent / head-child relation where it holds.
+At present, ``V`` and ``VP`` are just atomic symbols, and
+we need to find a way to relate them using features
+(as we did earlier to relate ``IV`` and ``TV``).
 
-X-bar syntax (cf. [Chomsky1970RN]_, [Jackendoff1977XS]_) addresses
+X-bar Syntax addresses
 this issue by abstracting out the notion of `phrasal level`:dt:. It is
 usual to recognize three such levels. If ``N`` represents the
 lexical level, then ``N``\ ' represents the next level up,
 corresponding to the more traditional category `Nom`, while
 ``N``\ '' represents the phrasal level, corresponding to the
-category ``NP``. (The primes here replace the typographically more
-demanding horizontal bars of [Chomsky1970RN]_). ex-xbar0_ illustrates a
+category ``NP``.   ex-xbar0_ illustrates a
 representative structure while  ex-xbar01_ is the more conventional counterpart.
 
-.. _ex-xbar0:
 .. ex::
-   .. tree:: (N''(Det a)(N'(N student)(P'' of\ French)))
+   .. _ex-xbar0:
+   .. ex::
+      .. tree:: (N''(Det a)(N'(N student)(P'' of\ French)))
 
-.. _ex-xbar01:
-.. ex::
-   .. tree:: (NP(Det a)(Nom(N student)(PP of\ French)))
+   .. _ex-xbar01:
+   .. ex::
+      .. tree:: (NP(Det a)(Nom(N student)(PP of\ French)))
+
+.. XXX The second half of the next paragraph is heavy going, for
+   a relatively simple idea; it would be easier to follow if
+   there was a diagram to demonstrate the contrast, giving
+   a pair of structures that are minimally different, e.g. 
+   "put the chair on the stage" vs "saw the chair on the stage".
+   After this, prose could formalize the concepts.
 
 The head of the structure ex-xbar0_ is ``N`` while ``N``\ '
 and ``N``\ '' are called `(phrasal) projections`:dt: of ``N``. ``N``\ ''
@@ -1570,10 +1553,10 @@ dependency where there is no upper bound on the distance between
 filler and gap.
 
 A variety of mechanisms have been suggested for handling unbounded
-dependencies in formal grammars; we shall adopt an approach due to
-Generalized Phrase Structure Grammar that involves something called
-`slash categories`:dt:. A slash category is something of the form
-``Y/XP``; we interpret this as a phrase of category ``Y`` that
+dependencies in formal grammars; here we illustrate the approach due to
+Generalized Phrase Structure Grammar that involves
+`slash categories`:dt:. A slash category has the form ``Y/XP``;
+we interpret this as a phrase of category ``Y`` that
 is missing a sub-constituent of category ``XP``. For example,
 ``S/NP`` is an ``S`` that is missing an ``NP``. The use of
 slash categories is illustrated in ex-gaptree1_. 
@@ -1590,10 +1573,12 @@ reaches the category ``NP/NP``. At this point, the dependency
 is discharged by realizing the gap information as the empty string `e`
 immediately dominated by ``NP/NP``.
 
+.. XXX above sentence: what empty string e?
+
 Do we need to think of slash categories as a completely new kind of
-object in our grammars?  Fortunately, no, we don't |mdash| in fact, we
-can accommodate them within our existing feature based framework. We
-do this by treating slash as a feature, and the category to its right
+object?  Fortunately, we
+can accommodate them within our existing feature based framework,
+by treating slash as a feature, and the category to its right
 as a value. In other words, our "official" notation for ``S/NP``
 will be ``S[SLASH=NP]``. Once we have taken this
 step, it is straightforward to write a small grammar for
@@ -1602,11 +1587,18 @@ the main principles of slash categories, and also includes productions for
 inverted clauses. To simplify presentation, we have omitted any
 specification of tense on the verbs.
 
+.. XXX the grammar doesn't use the "official" notation.  Need to say
+   that S/NP is just an abbreviation.  Seems strange to allow
+   unofficial notation in this formal grammar.  By "official" do
+   we actually mean "internal"?
+
+.. XXX The contents of feat1.fcfg seems to have changed in the file.
+   I won't pull in the updated version in case the discussion also needs to be updated.
 
 .. pylisting:: code-slashcfg
    :caption:
        Grammar with productions for inverted clauses and
-       long-distance dependencies
+       long-distance dependencies, making use of slash categories
 
     >>> nltk.data.show_cfg('grammars/book_grammars/feat1.fcfg')
     % start S
@@ -1654,10 +1646,13 @@ be discharged as the empty string.
 Using code-slashcfg_, we can parse the string `who do you claim that you
 like`:lx:
 
+.. XXX actual output has SUBCAT=2 instead of SUBCAT='clause'
+
     >>> tokens = 'who do you claim that you like'.split()
     >>> from nltk.parse import load_earley 
     >>> cp = load_earley('grammars/book_grammars/feat1.fcfg')  
-    >>> for tree in cp.nbest_parse(tokens): print tree
+    >>> for tree in cp.nbest_parse(tokens):
+    ...     print tree
     (S[-INV]
       (NP[+WH] who)
       (S[+INV]/NP[]
@@ -1676,12 +1671,14 @@ A more readable version of this tree is shown in ex-gapparse_.
 .. _ex-gapparse:
 .. ex::
     .. tree:: (S[-INV](NP[+WH] who)(S[+INV]/NP(V[+AUX] do)(NP[-WH] you)(VP/NP(V[-AUX,\ SUBCAT=clause] claim)(SBar/NP(Comp that)(S[-INV]/NP(NP[-WH] you)(VP/NP(V[-AUX,\ SUBCAT=trans] like)(NP/NP)))))))
+       :scale: 60:60:50
 
 The grammar in code-slashcfg_ will also allow us to parse sentences
 without gaps:
 
     >>> tokens = 'you claim that you like cats'.split()
-    >>> for tree in cp.nbest_parse(tokens): print tree
+    >>> for tree in cp.nbest_parse(tokens):
+    ...     print tree
     (S[-INV]
       (NP[-WH] you)
       (VP[]
@@ -1696,7 +1693,8 @@ In addition, it admits inverted sentences which do not involve
 `wh`:lx: constructions:
 
     >>> tokens = 'rarely do you sing'.split()
-    >>> for tree in cp.nbest_parse(tokens): print tree
+    >>> for tree in cp.nbest_parse(tokens):
+    ...     print tree
     (S[-INV]
       (Adv[+NEG] rarely)
       (S[+INV]
@@ -1756,6 +1754,9 @@ exceptions like `helfen`:lx: that govern the dative case:
 The grammar in code-germancfg_ illustrates the interaction of agreement
 (comprising person, number and gender) with case. 
 
+.. XXX The contents of german.fcfg seems to have changed in the file.
+   I won't pull in the updated version in case the discussion also needs to be updated.
+
 .. pylisting:: code-germancfg
    :caption: Example Feature based Grammar
 
@@ -1825,7 +1826,8 @@ tree for a sentence containing a verb which governs dative case.
     
     >>> tokens = 'ich folge den Katzen'.split()
     >>> cp = load_earley('grammars/book_grammars/german.fcfg')  
-    >>> for tree in cp.nbest_parse(tokens): print tree
+    >>> for tree in cp.nbest_parse(tokens):
+    ...     print tree
     (S[]
       (NP[AGR=[NUM='sg', PER=1], CASE='nom']
         (PRO[AGR=[NUM='sg', PER=1], CASE='nom'] ich))
@@ -1843,7 +1845,8 @@ following parse failure:
 
     >>> tokens = 'ich folge den Katze'.split()
     >>> cp = load_earley('grammars/book_grammars/german.fcfg', trace=2)  
-    >>> for tree in cp.nbest_parse(tokens): print tree
+    >>> for tree in cp.nbest_parse(tokens):
+    ...     print tree
               |.i.f.d.K.|
     Scanner   |[-] . . .| [0:1] 'ich'
     Scanner   |[-] . . .| [0:1] PRO[AGR=[NUM='sg', PER=1], CASE='nom'] -> 'ich' *
@@ -1920,12 +1923,21 @@ Summary
   variety of linguistic phenomena, including verb subcategorization,
   inversion constructions, unbounded dependency constructions and case government.
 
------------------
- Further Reading
------------------
+.. _sec-featgram-further-reading:
+
+---------------
+Further Reading
+---------------
 
 Consult [URL] for further materials on this chapter.
 
+Mention Earley chart parser info as promised.
+
+X-bar Syntax: [Chomsky1970RN]_, [Jackendoff1977XS]_ 
+(The primes we use replace Chomsky's typographically more demanding horizontal bars.)
+
+.. XXX next sentence is ungrammatical
+
 For more examples of feature based parsing with |NLTK|, please see the
 HOWTOs for feature structures, feature grammars and grammar test suites
 at |NLTK-HOWTO-URL|.
@@ -2037,6 +2049,15 @@ Exercises
        .. ex:: The water is precious.
        .. ex:: Water is precious.
 
+#. |easy| Write a function `subsumes()` which holds of two feature
+   structures ``fs1`` and ``fs2`` just in case ``fs1`` subsumes ``fs2``.
+
+#. |easy| Modify the grammar illustrated in ex-subcatgpsg_ to
+   incorporate a `bar` feature for dealing with phrasal projections.
+
+#. |easy| Modify the German grammar in code-germancfg_ to incorporate the
+   treatment of subcategorization presented in sec-extending-a-feature-based-grammar_. 
+
 #. |soso| Develop a feature based grammar that will correctly describe the following
    Spanish noun phrases:
 
@@ -2064,9 +2085,6 @@ Exercises
 #. |soso| Develop a wrapper for the ``earley_parser`` so that a trace
    is only printed if the input string fails to parse.
 
-#. |easy| Write a function `subsumes()` which holds of two feature
-   structures ``fs1`` and ``fs2`` just in case ``fs1`` subsumes ``fs2``.
-
 #. |soso| Consider the feature structures shown in code-featstructures_.
 
    .. XX NOTE: This example is somewhat broken -- nltk doesn't support
@@ -2107,12 +2125,6 @@ Exercises
 #. |soso| Ignoring structure sharing, give an informal algorithm for unifying
    two feature structures. 
 
-#. |easy| Modify the grammar illustrated in ex-subcatgpsg_ to
-   incorporate a `bar` feature for dealing with phrasal projections.
-
-#. |easy| Modify the German grammar in code-germancfg_ to incorporate the
-   treatment of subcategorization presented in sec-extending-a-feature-based-grammar_. 
-
 #. |soso| Extend the German grammar in code-germancfg_ so that it can
    handle so-called verb-second structures like the following:
 
diff --git a/book/deleted b/book/deleted
index f31d5241..244447cc 100644
--- a/book/deleted
+++ b/book/deleted
@@ -412,7 +412,7 @@ In this final section, we return to the grammar of English.  We
 consider more syntactic phenomena that will require us to refine
 the productions of our phrase structure grammar.
 
-Lexical heads other than `V`:gc: can be subcategorized for particular
+Lexical heads other than ``V`` can be subcategorized for particular
 complements:
 
 **Nouns**
@@ -424,7 +424,7 @@ complements:
 
 It has also been suggested that 'ordinary' prepositions are
 transitive, and that many so-called adverb are in fact intransitive
-prepositions. For example, `towards`:lx: requires an `NP`:gc: complement,
+prepositions. For example, `towards`:lx: requires an ``NP`` complement,
 while `home`:lx: and `forwards`:lx: forbid them.
 
 
@@ -439,14 +439,14 @@ while `home`:lx: and `forwards`:lx: forbid them.
 
 
 Adopting this approach, we can also analyse certain prepositions as
-allowing `PP`:gc: complements:
+allowing ``PP`` complements:
 
 .. example:: Kim ran away *from the house*.
 .. example:: Lee jumped down *into the boat*.
 
-In general, the lexical categories `V`:gc:, `N`:gc:, `A`:gc: and `P`:gc: are
+In general, the lexical categories ``V``, ``N``, `A`:gc: and `P`:gc: are
 taken to be the heads of the respective phrases `VP`, `NP`,
-`AP`:gc: and `PP`:gc:. Abstracting over the identity of these phrases, we
+`AP`:gc: and ``PP``. Abstracting over the identity of these phrases, we
 can say that a lexical category `X`:gc: is the head of its immediate
 `XP`:gc: phrase, and moreover that the complements `C`:subscript:`1`
 ... `C`:subscript:`n` of
@@ -1116,12 +1116,12 @@ We can also write derivation deriv1_ as:
 
 .. _deriv3:
 .. ex::
-   `np`:gc: |DoubleRightArrow| `det n pp`:gc:
+   ``NP`` |DoubleRightArrow| `det n pp`:gc:
    |DoubleRightArrow| `the`:lx: `n pp`:gc:
-   |DoubleRightArrow| `the dog`:lx: `pp`:gc:
+   |DoubleRightArrow| `the dog`:lx: ``PP``
    |DoubleRightArrow| `the dog`:lx: `p np`:gc:
-   |DoubleRightArrow| `the dog with`:lx: `np`:gc:
-   |DoubleRightArrow| `the dog with a`:lx: `n`:gc:
+   |DoubleRightArrow| `the dog with`:lx: ``NP``
+   |DoubleRightArrow| `the dog with a`:lx: ``N``
    |DoubleRightArrow| `the dog with a telescope`:lx:
 
 where |DoubleRightArrow| means "derives in one step". 
@@ -1179,7 +1179,7 @@ shows the correspondence:
 
 One important aspect of the tabular approach to parsing can be seen more
 clearly if we look at the graph representation: given our grammar, there are two
-different ways to derive a top-level `vp`:gc: for the input, as shown in
+different ways to derive a top-level ``VP`` for the input, as shown in
 Table chartnp_\ (a,b).
 In our graph representation, we simply combine the
 two sets of edges to yield Table chartnp_\ (c).
@@ -1187,11 +1187,11 @@ two sets of edges to yield Table chartnp_\ (c).
 .. table:: chartnp
 
    +---------------------------------------------------------+
-   | a. `vp`:gc: |rarr| `v`:gc: `np`:gc:                     |
+   | a. ``VP`` |rarr| ``V`` ``NP``                     |
    |                                                         |
    | |chartnp0|                                              |
    +---------------------------------------------------------+
-   | b. `vp`:gc: |rarr| `vp`:gc: `pp`:gc:                    |
+   | b. ``VP`` |rarr| ``VP`` ``PP``                    |
    |                                                         |
    | |chartnp1|                                              |
    +---------------------------------------------------------+
@@ -1214,7 +1214,7 @@ two sets of edges to yield Table chartnp_\ (c).
 |nopar| However, given a |WFST| we cannot necessarily read off
 the justification for adding a particular edge.
 For example, in chartnp_\ (b), ``[Edge: VP, 2:8]``
-might owe its existence to a production `vp`:gc: |rarr| `v np pp`:gc:.
+might owe its existence to a production ``VP`` |rarr| `v np pp`:gc:.
 Unlike phrase structure trees, a |WFST| does not encode a relation of
 immediate dominance. In order to make such information available, we
 can label edges not just with a non-terminal category, but with the
@@ -1240,8 +1240,8 @@ If `d = n`:math:, then `c`:sub:`d+1` |dots| `c`:sub:`n` is
 empty and the edge represents a complete constituent and is called
 a `complete edge`:dt:.  Otherwise, the edge represents an incomplete
 constituent, and is called an `incomplete edge`:dt:.  In Figure
-chart_terms_\ (a), [`vp`:gc: |rarr| `v`:gc: `np`:gc: |dot|, (1, 3)] is a
-complete edge, and [`vp`:gc: |rarr| `v`:gc: |dot| `np`:gc:, (1, 2)] is
+chart_terms_\ (a), [``VP`` |rarr| ``V`` ``NP`` |dot|, (1, 3)] is a
+complete edge, and [``VP`` |rarr| ``V`` |dot| ``NP``, (1, 2)] is
 an incomplete edge.
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -1306,7 +1306,7 @@ an incomplete edge.
     we want to study syntactic patterns, finding particular verbs in a
     corpus and displaying their arguments.  For instance, here are some
     uses of the verb `gave`:lx: in the Wall Street Journal (in the Penn
-    Treebank corpus sample).  After `np`:gc:\ -chunking, the internal
+    Treebank corpus sample).  After ``NP``\ -chunking, the internal
     details of each noun phrase have been suppressed, allowing us to see
     some higher-level patterns::
 
diff --git a/book/introduction.txt b/book/introduction.txt
index 98f12ae8..7dccb33f 100644
--- a/book/introduction.txt
+++ b/book/introduction.txt
@@ -97,13 +97,13 @@ tree will be as follows:
 
 If we assume that the input string we are trying to parse is `the cat
 slept`:lx:, we will succeed in identifying `the`:lx: as a word that can
-belong to the category `Det`:gc:. In this case, the parser goes on to
-the next node of the tree, `N`:gc:, and next input word, `cat`:lx:. However,
+belong to the category ``Det``. In this case, the parser goes on to
+the next node of the tree, ``N``, and next input word, `cat`:lx:. However,
 if we had built the same partial tree with an input string `did the
 cat sleep`:lx:, the parse would fail at this point, since `did`:lx: is not of
-category `Det`:gc:.  The parser would throw away the structure built so
-far and look for an alternative way of going from the `S`:gc: node down
-to a leftmost lexical category (e.g., using a rule `S`:gc: |rarr| `V NP
+category ``Det``.  The parser would throw away the structure built so
+far and look for an alternative way of going from the ``S`` node down
+to a leftmost lexical category (e.g., using a rule ``S`` |rarr| `V NP
 VP`:gc:).
 The important point for now is not the details of this or other
 parsing algorithms; we discuss this topic much more fully in the
diff --git a/definitions.rst b/definitions.rst
index 1bba0af3..019ed58b 100644
--- a/definitions.rst
+++ b/definitions.rst
@@ -210,8 +210,8 @@
      :class: emphasis
   
   .. Grammatical Category - e.g. NP and verb as technical terms
-  .. role:: gc
-     :class: category
+     .. role:: gc
+        :class: category
     
   .. Math expression - e.g. especially for variables
   .. role:: math