# COLX 535 Lab Assignment 2: Working with CFG Parsers (Cheat sheet)

## Assignment Objectives

History of constituent parsers:


- Magerman parser {magerman:1995:ACL}
- Charniak parser {charniak:1996,charniak-goldwater-johnson:1998}
- Johnson parser {johnson:1998:CL}
- Collins parser {collins:1996:ACL,collins:1999} $\rightarrow$ Bikel parser {bikel:2004,bikel:2004:CL}}
- Stanford parser {klein-manning:2003:ACL,klein-manning:2003:HLT-NAACL,klein-manning:2003:NIPS} $\leftarrow$ **Lab2**
- (statistical) Berkeley parser {petrov-EtAl:2006:COLACL,petrov-klein:2007:main}
- (neural) Berkeley parser {kitaev-klein:2018:ACL,kitaev-cao-klein:2019:ACL}

In this assignment you will 
- Use the Stanford CoreNLP parser to parse new text into constituency trees
- Create a parsing gold standard and use it to evaluate parsers
- Build a context-free grammar from existing parses (optional assignment)

## Getting Started

This assignment requires that you have set up the Stanford parser. First, make sure you have the more recent version of [Java](https://www.java.com/en/download/), then get the [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/) package. Make sure that you get the recent 4.5.1 version (or at least >=4.2.0). 

To load Stanford CoreNLP in Python, change the `coreNLP_dir` variable in the code below to where you unzipped Stanford coreNLP. You can follow this [tutorial](https://bbengfort.github.io/snippets/2018/06/22/corenlp-nltk-parses.html). Once the coreNLP server is running, you will be able to access it through NLTK.

It may take a few seconds or up to a minute to start the server.

In [2]:
# from nltk.parse.corenlp import CoreNLPServer
# import os
# import time


# coreNLP_dir = "/Users/jungyeul/Downloads/mds-cl-2022-23-block4/stanford-corenlp-4.5.1/" # Change this to your coreNLP directory

# server = CoreNLPServer(
#    os.path.join(coreNLP_dir, "stanford-corenlp-4.5.1.jar"),
#    os.path.join(coreNLP_dir, "stanford-corenlp-4.5.1-models.jar")    
# )
# server.start()

Instead of running above command, you need to run the following command in the terminal to run CoreNLPServer:

`java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000`

Try to parse a sentence to make sure that everything works.

In [1]:
from nltk.parse.corenlp import CoreNLPParser

parser = CoreNLPParser()

In [2]:
parse = next(parser.raw_parse("I put the book in the box on the table."))
print(parse)

(ROOT
  (S
    (NP (PRP I))
    (VP
      (VBD put)
      (NP (DT the) (NN book))
      (PP (IN in) (NP (DT the) (NN box)))
      (PP (IN on) (NP (DT the) (NN table))))
    (. .)))


In [5]:
list(parser.raw_parse('I put the book in the box on the table.'))[0].pretty_print()

                         ROOT                              
                          |                                 
                          S                                
  ________________________|______________________________   
 |                        VP                             | 
 |    ____________________|________________              |  
 |   |       |            PP               PP            | 
 |   |       |         ___|____         ___|___          |  
 NP  |       NP       |        NP      |       NP        | 
 |   |    ___|___     |    ____|___    |    ___|____     |  
PRP VBD  DT      NN   IN  DT       NN  IN  DT       NN   . 
 |   |   |       |    |   |        |   |   |        |    |  
 I  put the     book  in the      box  on the     table  . 



In [5]:
parse = next(parser.raw_parse("They gave me yellow and blue pants"))
print(parse)

(ROOT
  (S
    (NP (PRP They))
    (VP
      (VBD gave)
      (NP (PRP me))
      (NP (ADJP (JJ yellow) (CC and) (JJ blue)) (NNS pants)))))


In [6]:
list(next(parser.raw_parse("They gave me yellow and blue pants")))[0].pretty_print()

      S                                 
  ____|_________                         
 |              VP                      
 |     _________|__________              
 |    |    |               NP           
 |    |    |           ____|_________    
 NP   |    NP        ADJP            |  
 |    |    |     _____|________      |   
PRP  VBD  PRP   JJ    CC       JJ   NNS 
 |    |    |    |     |        |     |   
They gave  me yellow and      blue pants



In [7]:
list(parser.raw_parse("They gave me yellow and blue pants"))[0].pretty_print()

     ROOT                               
      |                                  
      S                                 
  ____|_________                         
 |              VP                      
 |     _________|__________              
 |    |    |               NP           
 |    |    |           ____|_________    
 NP   |    NP        ADJP            |  
 |    |    |     _____|________      |   
PRP  VBD  PRP   JJ    CC       JJ   NNS 
 |    |    |    |     |        |     |   
They gave  me yellow and      blue pants



Run the code below if you want to shut down the coreNLP server after you've finished with it. It's a good idea to shut down the parser after finishing work with it because it may remain running in the background and you may not be able to start another parser instance without restarting your computer or manually killing the parser process. 

If you forget to stop the server, next time when you try to launch it, you'll get an error. In this case, you may first need to kill the old server manually. To do this, you can run `ps -ax | grep stanford` on the commandline (at least on OSX and Linux) which should give you the process ID of the server, e.g. 11111. You can then use `kill -9 11111` to kill the parser, after which you should be able to start the server again.

In [8]:
try:
    server.stop()
except:
    pass

Other things you'll need:

In [7]:
import nltk
from nltk.tree import Tree
from nltk.grammar import CFG,Nonterminal,Production,FeatureGrammar
from nltk.parse import CoreNLPParser
from nltk.corpus import brown

nltk.download('movie_reviews')

[nltk_data] Downloading package movie_reviews to
[nltk_data]     /Users/jungyeul/nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!


True

## Tidy Submission

rubric={mechanics:1}

To get the marks for tidy submission:

- Submit the assignment by filling in this jupyter notebook with your answers embedded
- Be sure to follow the [general lab instructions](https://ubc-mds.github.io/resources_pages/general_lab_instructions)

### Exercise 1: Test the parser

#### 1.1
rubric={accuracy:2}

Get the Stanford parser working, then parse the first 20 sentences of the NLTK `movie_reviews` corpus, and report the average depth (height) of the parse trees. If you find the parser is failing to parse something, you can skip over it.

You should retain the tokenization in the `move_reviews` corpus. You can use `parser.parse()` to parse the tokenized input sentences. Note that `parser.parse()` returns an iterator over possible parses. There may be several if the sentence is ambiguous. You can compute statistics on the first sentence which is returned by `parser.parse()`.

In [10]:
from nltk.corpus import movie_reviews

# your code here


### Exercise 2: Build a gold standard

One typical way to build treebanks is, rather than having humans build a tree from scratch, instead use an automatic parser to give an initial parse, and then have humans do a second pass to fix any errors. That's what you're going to do in this exercise. 

We will use the following three sentences from the NLTK `movie_reviews` corpus to build a mini-treebank:

In [11]:
corpus = [['oh', ',', 'and', 'by', 'the', 'way', ',', 'this', 'is', 'not', 'a', 'horror', 'or', 'teen', 'slasher', 'flick', '.'],
          ['little', 'do', 'they', 'know', 'the', 'power', 'within', '.'],
          ['so', ',', 'if', 'robots', 'and', 'body', 'parts', 'really', 'turn', 'you', 'on', ',', 'here', "'", 's', 'your', 'movie', '.']]

In each of the three sections below you will see a parse tree produced by CorNLPParser. Each of the trees contains at least one parse error. You'll see a brief informal description of the error and it is your task to fix the tree.  

Create an NLTK Tree corresponding to the correct parse, which should be appended to the `gold_standard_parses` list below. You can do this manually by printing the tree, creating a triple-quoted string, modifying it, and converting it back into a `Tree` using the function `Tree.fromstring` following the example below. If you are unsure exactly how to correct something, read through the lecture slides. Many common parse errors are explained there.

In [12]:
# Example:

# The second phrase should be an VP, not an NP
err_tree_str = '''(S
(NP (NNS Dogs)) 
(NP (VBN bark))
(. .))
'''

corr_tree_str = '''(S
(NP (NNS Dogs)) 
(VP (VBN bark))
(. .))
'''

corr_tree = Tree.fromstring(corr_tree_str)
print(corr_tree)

(S (NP (NNS Dogs)) (VP (VBN bark)) (. .))


In [13]:
corr_tree_root = Tree.fromstring("(ROOT " + corr_tree_str + ")")

list(corr_tree_root)[0].pretty_print()

      S      
  ____|____   
 NP   VP   | 
 |    |    |  
NNS  VBN   . 
 |    |    |  
Dogs bark  . 



In [14]:
# Store your corrected nltk.Tree objects in this list
gold_standard_parses = []

#### Sentence 1
rubric={accuracy:2}

In [15]:
# The word "flick" should be modified by the entire noun phrase 
# "a horror or teen slasher" instead of just "teen slasher". A noun 
# phrase which modifies a noun is labeled as NML.
err_tree_str = '''(ROOT
  (S
    (INTJ (UH oh))
    (, ,)
    (CC and)
    (PP (IN by) (NP (DT the) (NN way)))
    (, ,)
    (NP (DT this))
    (VP
      (VBZ is)
      (RB not)
      (NP
        (NP (DT a) (NN horror))
        (CC or)
        (NP (NML (NN teen) (NN slasher)) (NN flick))))
    (. .)))'''

# your code here


print("err_tree_str:")
list(Tree.fromstring(err_tree_str))[0].pretty_print()
gold_standard_parses.append(Tree.fromstring(corr_tree_str))

err_tree_str:
                                       S                                                   
  _____________________________________|_________________________________________________   
 |    |   |       |           |   |                     VP                               | 
 |    |   |       |           |   |     ________________|_____                           |  
 |    |   |       |           |   |    |   |                  NP                         | 
 |    |   |       |           |   |    |   |        __________|______________            |  
 |    |   |       PP          |   |    |   |       |          |              NP          | 
 |    |   |    ___|___        |   |    |   |       |          |         _____|______     |  
INTJ  |   |   |       NP      |   NP   |   |       NP         |       NML           |    | 
 |    |   |   |    ___|___    |   |    |   |    ___|____      |    ____|_____       |    |  
 UH   ,   CC  IN  DT      NN  ,   DT  VBZ  RB  DT       NN   

#### Sentence 2
rubric={accuracy:2}

In [16]:
# The PP "within" should attach to the NP "the power", not to the VP "know the power". 
err_tree_str = '''(ROOT
  (S
    (NP (RB little))
    (VP
      (VBP do)
      (SBAR
        (S
          (NP (PRP they))
          (VP (VBP know) (NP (DT the) (NN power)) (PP (IN within))))))
    (. .)))'''

# your code here


print("err_tree_str:")
list(Tree.fromstring(err_tree_str))[0].pretty_print()
gold_standard_parses.append(Tree.fromstring(corr_tree_str))

err_tree_str:
                 S                           
   ______________|_________________________   
  |         VP                             | 
  |      ___|____                          |  
  |     |       SBAR                       | 
  |     |        |                         |  
  |     |        S                         | 
  |     |    ____|____                     |  
  |     |   |         VP                   | 
  |     |   |     ____|______________      |  
  NP    |   NP   |        NP         PP    | 
  |     |   |    |     ___|____      |     |  
  RB   VBP PRP  VBP   DT       NN    IN    . 
  |     |   |    |    |        |     |     |  
little  do they know the     power within  . 



#### Sentence 3
rubric={accuracy:2}

In [17]:
# There are several errors."body" and "parts" should form an NP. 
# Moreover, "here's your movie" should form a clause and "s" is in 
# fact a verb.  
err_tree_str = '''(ROOT
  (SINV
    (ADVP (RB so))
    (, ,)
    (PP
      (IN if)
      (NP (NML (NNS robots) (CC and) (NN body)) (NNS parts)))
    (ADVP (RB really))
    (VP
      (VBP turn)
      (NP (NP (PRP you)) (PP (IN on) (, ,) (NP (RB here))) ('' '))
      (S (NP (POS s))))
    (NP (PRP$ your) (NN movie))
    (. .)))'''

# your code here


print("err_tree_str:")
list(Tree.fromstring(err_tree_str))[0].pretty_print()
gold_standard_parses.append(Tree.fromstring(corr_tree_str))

err_tree_str:
                                              SINV                                            
  _____________________________________________|____________________________________________   
 |    |              |                   |                  VP                    |         | 
 |    |              |                   |      ____________|____________         |         |  
 |    |              PP                  |     |            NP           |        |         | 
 |    |    __________|___                |     |     _______|________    |        |         |  
 |    |   |              NP              |     |    |       PP       |   S        |         | 
 |    |   |           ___|_________      |     |    |    ___|___     |   |        |         |  
ADVP  |   |         NML            |    ADVP   |    NP  |   |   NP   |   NP       NP        | 
 |    |   |     _____|_______      |     |     |    |   |   |   |    |   |    ____|____     |  
 RB   ,   IN  NNS    CC      NN

### Exercise 3: Evaluating parsers

Now that we have a gold standard, we can use it to evaluate parser output. 

#### 3.1
rubric={accuracy:3,quality:1}

Start by writing a function, get_constituents, which takes a parse tree and returns a set of tuples, where each tuple is (*label*, *start*, *end*) where *start* and *end* correspond to the indicies of a corresponding constituent (phrase) in the sentence and *label* is the label of that constituent. 

Do **not** include simple POS constituents `(POS word)` like `(VBD ate)`. We want to evaluate the parser only on actual phrases.

**HINT:** You may want to use recusrion to solve this assignment.

In [21]:
def get_constituents(tree,start_index=0):
    constituents = set()

    # your code here
    
    return constituents

In [22]:
tree = Tree.fromstring('''(S (NP (DT the) (DT mouse)) (VP (VBD ate) (NP (NP (DT the) (DT mouse)) (POS 's) (NN cheese))) )''')
assert get_constituents(tree) == {("S",0,7), ("NP",0,2), ("VP",2,7),("NP",3,7),("NP",3,5)}
tree = Tree.fromstring('''(S (NP (DET the) (NP (NN cat) (CC and) (NN dog))) (VP (VBD fought)))''')
assert get_constituents(tree) == {("S",0,5), ("NP",0,4), ("NP",1,4),("VP",4,5)}
print("Success!")

Success!


#### 3.2
rubric={accuracy:2,efficiency:1}

Write a function `parse_f1` which uses get_constituents to implement the constituent F-score measure discussed in the lecture and reading. It should be given two lists, a lists of proposed parses and a corresponding list of gold standard parses, and return an F-score reflecting how close the proposed parses match. For full points, you should keep a running count of the relevant numbers over the entire set, and not average f-score across the individual sentences. 

**Hint:** to get the efficiency point, you should take advantage of Python's fast set operations


$\text{precision}  = \displaystyle\frac{\text{relevant constituents} \cap \text{retrieved constituents}}{\text{retrieved constituents}} = \displaystyle\frac{\text{tp}}{\text{tp + fp}}$

$\text{recall}  = \displaystyle\frac{\text{relevant constituents} \cap \text{retrieved constituents}}{\text{relevant constituents}} = \displaystyle\frac{\text{tp}}{\text{tp + fn}}$

$F_1  = 2 \cdot \displaystyle\frac{\text{precision} \cdot \text{recall}}{\text{precision + recall}}$



- `EVALB`: https://nlp.cs.nyu.edu/evalb/
- `EVALB_SPMRL`: https://github.com/nikitakit/self-attentive-parser/tree/master/EVALB_SPMRL (from Berkeley Neural Parser)


In [24]:
def parse_f1(proposed_parses,gold_parses):
    f1score = 0

    # your code here
    
    return f1score

In [28]:
print("gold")
tree = Tree.fromstring('''(ROOT (S (NP (NNS mice)) (VP (VBD love) (NP (NNS ducks)))) )''')
list(tree)[0].pretty_print()

print("proposed")
tree = Tree.fromstring('''(ROOT (S (NP (NNS mice) (NN love)) (VP (VBZ ducks))) )''')
list(tree)[0].pretty_print()

gold
      S            
  ____|____         
 |         VP      
 |     ____|____    
 NP   |         NP 
 |    |         |   
NNS  VBD       NNS 
 |    |         |   
mice love     ducks

proposed
          S        
       ___|_____    
      NP        VP 
  ____|___      |   
NNS       NN   VBZ 
 |        |     |   
mice     love ducks



In [25]:
gold_parses = [Tree.fromstring('''(S (NP (DT the) (DT mouse)) (VP (VBD ate) (NP (NP (DT the) (DT mouse)) (POS 's) (NN cheese))) )'''), Tree.fromstring('''(S (NP (NNS mice)) (VP (VBD love) (NP (NNS ducks))))''')]
proposed_parses = [Tree.fromstring('''(S (NP (DT the) (DT mouse)) (VP (VBD ate) (NP (NP (DT the) (DT mouse)) (POS 's) (NN cheese))) )'''), Tree.fromstring('''(S (NP (NNS mice) (NN love)) (VP (VBZ ducks)))''')]
assert 0.71> parse_f1(proposed_parses,gold_parses) > 0.7
print("Success!")

 sys:  {('NP', 0, 2), ('VP', 2, 7), ('NP', 3, 7), ('S', 0, 7), ('NP', 3, 5)}
gold:  {('NP', 0, 2), ('VP', 2, 7), ('NP', 3, 7), ('S', 0, 7), ('NP', 3, 5)}
 sys:  {('NP', 0, 2), ('VP', 2, 3), ('S', 0, 3)}
gold:  {('NP', 0, 1), ('NP', 2, 3), ('VP', 1, 3), ('S', 0, 3)}
tp =  6
fp =  2
fn =  3
precision =  0.75
recall =  0.6666666666666666
Success!


$\text{precision}  = \displaystyle\frac{\text{relevant constituents} \cap \text{retrieved constituents}}{\text{retrieved constituents}} = \displaystyle\frac{6}{6+2}$ where # of retrieved constituents (`len(sys's items)`) = 8

$\text{recall}  = \displaystyle\frac{\text{relevant constituents} \cap \text{retrieved constituents}}{\text{relevant constituents}} = \displaystyle\frac{6}{6+3}$ where # of relevant constituents (`len(gold's items)`) = 9



#### 3.3

rubric={accuracy:1}

Parse your three example sentences from assignment 2 using CoreNLPParser (you can find the sentences in the list `corpus`), extract the constituents and evaluate the result against `gold_standard_parses`. 

**Hint:** Your F1-score should be > 0.6 (the actual score may depend a bit on your version of the CoreNLP parser).

In [29]:
parsed_corpus = ...
print("Parser f1-score: %.2f" % parse_f1(parsed_corpus, gold_standard_parses))

Parser f1-score: 0.68


#### 3.4

rubric={reasoning:1}

Given the way we build our gold standard, do you think this is a valid indication of the quality of parsers? Why or why not? What about if we tested the parser on the Penn Treebank corpus instead?

YOUR ANSWER HERE



### Exercise 4: Generate a grammar

#### 4.1
rubric= {accuracy:1}

Parse trees implicitly contain the production rules for a CFG defined by the productions which are present in the parse tree. You can access these productions using the member function `nltk.Tree.productions()`. 

Produce a grammar corresponding to the CFG productions in your three sentences from exercise 2, and print it out. 

In [30]:
corr_tree_str = '''(S
(NP (NNS Dogs)) 
(VP (VBN bark))
(. .))
'''

tree:
      S      
  ____|____   
 NP   VP   | 
 |    |    |  
NNS  VBN   . 
 |    |    |  
Dogs bark  . 

rules:
S -> NP VP .
NP -> NNS
VP -> VBN
. -> '.'
NNS -> 'Dogs'
VBN -> 'bark'


In [31]:
rules = set()

# your code here


for rule in rules:
    print(rule)

#### 4.2
rubric= {accuracy:1}

Show the rules in 4.1 are indeed sufficient to parse the sentences in the list `corpus`. Using an NLTK EarleyChartParser parser for this. Print out the number of parses for each sentence. Don't print out the parses themselves, as there might be a lot of them and you could crash your notebook (this depends a bit on how you fixed the parse trees in exercise 2). You should also set the `trace` keyword argument of the parser to 0 for the same reason. 

If you have individual sentences which are taking longer than 2 minutes to parse, you can skip over them.

In [32]:
from nltk import EarleyChartParser
from nltk.grammar import CFG

# your code here

#### 4.3  Optional
rubric= {accuracy:2}

Convert your CFG grammar into a feature grammar and implement noun-verb agreement (you should use the feature values `3SG` and `NON3SG`). Make sure that all S, NP and VP rules use agreement features. 

```
S -> NP[NUM=SG] VP[NUM=SG] "."
NP[NUM=SG] -> DT[NUM=SG] NN
DT[NUM=SG] -> 'a'
NN -> 'dog'
VP[NUM=SG] -> VBP[NUM=SG] 
VBZ[NUM=SG] -> 'barks'
```

Give an example sentence, which displays noun-verb agreement. Show that your feature grammar can parse this sentence. Then create a version of the same sentence without proper agreement, and show that the number of parses for this setence is lower (possibly zero). 

Your grammar shouldn't contain any rules where the LHS contains special characters like `.` or `$`. Otherwise `FeatureGrammar.fromstring` might give an error. This means that you might need to rename some of your non-terminals. 

```
S -> NP VP "."          <- NP VP . 
PRPS -> 'your'          <- PRP$
```

In [33]:
from nltk.grammar import FeatureGrammar
from nltk.parse import FeatureEarleyChartParser

# your code here
