In [1]:
# Please do not change this cell because some hidden tests might depend on it.
import os

# Otter grader does not handle ! commands well, so we define and use our
# own function to execute shell commands.
def shell(commands, warn=True):
    """Executes the string `commands` as a sequence of shell commands.
     
       Prints the result to stdout and returns the exit status. 
       Provides a printed warning on non-zero exit status unless `warn` 
       flag is unset.
    """
    file = os.popen(commands)
    print (file.read().rstrip('\n'))
    exit_status = file.close()
    if warn and exit_status != None:
        print(f"Completed with errors. Exit status: {exit_status}\n")
    return exit_status

shell("""
ls requirements.txt >/dev/null 2>&1
if [ ! $? = 0 ]; then
 rm -rf .tmp
 git clone https://github.com/cs236299-2020/lab3-4.git .tmp
 mv .tmp/tests ./
 mv .tmp/requirements.txt ./
 rm -rf .tmp
fi
pip install -q -r requirements.txt
""")




In [2]:
# Initialize Otter
import otter
grader = otter.Notebook()

%%latex
\newcommand{\vect}[1]{\mathbf{#1}}
\newcommand{\cnt}[1]{\sharp(#1)}
\newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
\newcommand{\softmax}{\operatorname{softmax}}
\newcommand{\Prob}{\Pr}
\newcommand{\given}{\,|\,}

$$
\renewcommand{\vect}[1]{\mathbf{#1}}
\renewcommand{\cnt}[1]{\sharp(#1)}
\renewcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
\renewcommand{\softmax}{\operatorname{softmax}}
\renewcommand{\Prob}{\Pr}
\renewcommand{\given}{\,|\,}
$$

# CS187
## Lab 3-4 - Probabilistic parsing and parse disambiguation

Continuing our work on PCFG, we present and implement a probabilistic version of the CKY algorithm.

New bits of Python used for the first time in the _solution set_ for this lab, and which you may therefore find useful:

* none

## Preparations

In [3]:
import copy
import math
import nltk
import pandas as pd

from collections import Counter
from collections import defaultdict
from pprint import pprint

## Probabilistic CKY

In lab 3-2 you worked with the CKY algorithm as a recognizer and its extension to a parser using backpointers. In the following section you will familiarize yourself with the probabilistic extension of the CKY parser, as presented by Jurafsky & Martin (Chapter 14), which returns the most probable parse (MPP) of a string according to a PCFG grammar.

For reference, here is a pseudo-code version of the algorithm:

```
 1.  define cky-mpp(string = w1, ..., wN, grammar):
 2.      for j in [1..N]:                     # each end string position

             # handle rules of the form A -> w
 3.          for all A where A -> wj in grammar:
 4.              T[j-1, j, A] := P(A -> wj)

             # handle rules of the form A -> B C
 5.          for length in [2..j]:            # each subconstituent length
 6.              i := j - length              # start string position
 7.              for split in [i+1..j-1]      # each split point
 8.                  for all A where 
 9.                          A -> B C in grammar
10.                          and T[i, split, B] > 0
11.                          and T[split, j, C] > 0:
12.                      new_prob := P(A -> B C)
13.                                  x table[i, split, B]
14.                                  x table[split, j, C]
15.                      if T[i, j, A] < new_prob
16.                         then T[i, j, A] := new_prob
17,                              back[i, j, A] := (split, B, C)
18.      return (build_tree(back[0, N, S]), T[0, N, S]
```

Let's go over the differences from the non-probabilistic CKY parser we saw in the previous lab:

1. table dimensions - for a sentence of $N$ words:
  *   CKY  $\to$ $(N+1)\times(N+1)$
  *   PCKY  $\to$ $(N+1)\times(N+1)\times|\cal{N}|$
2. table values:
  *   CKY $\to$ list of constituents
  *   PCKY $\to$ probabilities, where `table[i, j, A]` is the maximum probability of nonterminal `A` covering words between string positions `i` and `j`
3. backpointers:
  *   CKY $\to$ mapping from nontermnials to set of all possible split positions and rules
  *   PCKY $\to$ mapping from nonterminals to the single most probable split position and rule

Now, let's see how we use this algorithm to parse the sentence "two plus three times four".
First, we will need a CNF form of our arithmetic PCFG:

In [4]:
probabilistic_arithmetic_grammar_CNF = nltk.PCFG.fromstring("""
    S -> NUM OPS [0.4] | SOP NUM [0.25] 
    S -> 'one' [0.035] | 'two' [0.035] | 'three' [0.035] | 'four' [0.035] | 'five' [0.035]
    S -> 'six' [0.035] | 'seven' [0.035] | 'eight' [0.035] | 'nine' [0.035] | 'ten' [0.035]
    OPS -> OP S [1.0]
    SOP -> S OP [1.0]
    OP -> 'plus' [0.32] | 'added to' [0.08] | 'minus' [0.2] 
    OP -> 'times' [0.27] | 'multiplied by' [0.03] | 'divided by' [0.1]
    NUM -> 'one' [0.1] | 'two' [0.1] | 'three' [0.1] | 'four' [0.1] | 'five' [0.1] 
    NUM -> 'six' [0.1] | 'seven' [0.1] | 'eight' [0.1] | 'nine' [0.1] | 'ten' [0.1]
""")

probabilistic_arithmetic_grammar_CNF.is_chomsky_normal_form()

True

Next, we will implement the required $(N+1)\times(N+1)\times|\cal{N}|$ 3D tables as $(N+1)\times(N+1)$ 2D tables, in which each cell will hold a dictionary mapping nonterminals to the required entry values. We will implement separate recognition and backpointer tables:
1. For the recognition table `table`, the entry values are their probabilities.
2. For the backpointers table `back`, the entry values are the appropriate backpointer `(split, B, C)`.

As in the previous lab, all the cells that need not be filled contain '---'. All other cells are initialized with an appropriate default dictionary. Run the following code to initialize the tables. (You don't need to go over it, we will look at a specific cell to better understand the content.)

In [5]:
phrase = "two plus three times four"
words = [''] + phrase.split()
N = len(words)

# initialize data in tables
table_data = [['---' for i in range(N)] for j in range(N)]
back_data = [['---' for i in range(N)] for j in range(N)]

# add in upper triangular elements
for i in range(N):
    for j in range(N):
        if i < j:
            table_data[i][j] = defaultdict(int)
            back_data[i][j] = defaultdict(lambda x: None)
            
# generate corresponding data frames
table = pd.DataFrame(table_data, columns=words, index=list('012345'))
table.columns = pd.MultiIndex.from_arrays([table.columns] + [list('012345')])
back = pd.DataFrame(back_data, columns=words, index=list('012345'))
back.columns = pd.MultiIndex.from_arrays([back.columns] + [list('012345')])

Let's first print out one of the tables:

In [6]:
table

Unnamed: 0_level_0,Unnamed: 1_level_0,two,plus,three,times,four
Unnamed: 0_level_1,0,1,2,3,4,5
0,---,{},{},{},{},{}
1,---,---,{},{},{},{}
2,---,---,---,{},{},{}
3,---,---,---,---,{},{}
4,---,---,---,---,---,{}
5,---,---,---,---,---,---


Examine how we fill both tables for the first 3 values of `j`. Below, you'll finish filling the tables for the remainder yourself.

In [7]:
# j = 1
table.iloc[0,1]['S'] = 0.035
table.iloc[0,1]['NUM'] = 0.1

# j = 2
table.iloc[1,2]['OP'] = 0.32

#    i=0, split=1
table.iloc[0,2]['SOP'] = (
    1.0                           # Pr(SOP -> S OP)
    * table.iloc[0,1]['S']
    * table.iloc[1,2]['OP']
)
back.iloc[0,2]['SOP'] = (1, 'S', 'OP')

# j=3
table.iloc[2,3]['S'] = 0.035
table.iloc[2,3]['NUM'] = 0.1

#    i=1, split=2
table.iloc[1,3]['OPS'] = (
    1.0                           # Pr(OPS -> OP S)
    * table.iloc[1,2]['OP'] 
    * table.iloc[2,3]['S']
)
back.iloc[1,3]['OPS'] = (2, 'OP', 'S')

#    i=0, split=1
table.iloc[0,3]['S'] = (
    0.4                           # Pr(S-> NUM OPS) 
    * table.iloc[0,1]['NUM']
    * table.iloc[1,3]['OPS']
)
back.iloc[0,3]['S'] = (1, 'NUM', 'OPS')

#    i=0, split=2
#
#      Currently the probability and backpointers in cell [0,3] are
#      according to the parse S -> S OPS with split=1:
#
#         table.iloc[0,3] = Pr(S -> NUM OPS) * Pr(0,1,S) * Pr(1,3,OPS).
#
#      The parsing for split=2 is S -> SOP NUM, so we need to check
#      (lines 15-17) if
#
#         table.iloc[0,3] < Pr(S -> SOP NUM) * Pr(0,2,SOP) * Pr(2,3,S)  
#
#      and if so, update the probability and the backpointers accordingly.

if table.iloc[0,3]['S'] < 0.25 * table.iloc[0,2]['SOP'] * table.iloc[2,3]['NUM']:
  table.iloc[0,3]['S'] = (
      0.25
      * table.iloc[0,2]['SOP'] 
      * table.iloc[2,3]['NUM']
  )
  back.iloc[0,3]['S'] = (2, 'SOP', 'NUM')

Here's where we are so far:

In [8]:
table

Unnamed: 0_level_0,Unnamed: 1_level_0,two,plus,three,times,four
Unnamed: 0_level_1,0,1,2,3,4,5
0,---,"{'S': 0.035, 'NUM': 0.1}",{'SOP': 0.011200000000000002},{'S': 0.00044800000000000016},{},{}
1,---,---,{'OP': 0.32},{'OPS': 0.011200000000000002},{},{}
2,---,---,---,"{'S': 0.035, 'NUM': 0.1}",{},{}
3,---,---,---,---,{},{}
4,---,---,---,---,---,{}
5,---,---,---,---,---,---


Pay attention to the fact that, although two parses are possible for words[0,3] ( one is S -> NUM OPS and the other is S -> SOP NUM ), since both end up with the same nonterminal for the span [0,3] we only keep the higher probability option:

In [9]:
back.iloc[0,3]['S']

(1, 'NUM', 'OPS')

Now you finish filling the table:

<!--
BEGIN QUESTION
name: pcky_fill_table
-->

In [10]:
# Continue filling in the table as shown above
table.iloc[3,4]['OP'] = 0.27

table.iloc[2,4]['SOP'] =  1.0 * table.iloc[2,3]['S'] * table.iloc[3,4]['OP']

back.iloc[2,4]['S'] = (3, 'S', 'OP')

table.iloc[1,4] = 0

table.iloc[0,4]['SOP'] =   1.0 * table.iloc[0,3]['S'] * table.iloc[3,4]['OP']

back.iloc[0,4]['SOP'] = (3, 'S', 'OP')

table.iloc[4,5]['S'] = 0.035

table.iloc[4,5]['NUM'] = 0.1

table.iloc[3,5]['OPS'] = 1.0*table.iloc[3,4]['OP']*table.iloc[4,5]['S']

back.iloc[3,5]['OPS'] = (4, 'OP', 'S')

table.iloc[2,5]['S'] = 0.25*table.iloc[2,4]['SOP']*table.iloc[4,5]['NUM']

second25 = 0.4*table.iloc[2,3]['NUM']*table.iloc[3,5]['OPS']

if table.iloc[2,5]['S'] < second25:
  table.iloc[2,5]['S'] = second25

back.iloc[2,5]['S'] = (3, 'NUM', 'OPS')

table.iloc[1,5]['OPS'] = 1.0*table.iloc[1,2]['OP']*table.iloc[2,5]['S']

back.iloc[1,5]['OPS'] = (2, 'OP', 'S')

table.iloc[0,5]['S'] = 0.4*table.iloc[0,1]['NUM']*table.iloc[1,5]['OPS']

second05 = 0.25*table.iloc[0,4]['SOP']*table.iloc[4,5]['NUM']

if table.iloc[0,5]['S'] < second05:
  table.iloc[0,5]['S'] = second05

back.iloc[0,5]['S'] = (1, 'SOP', 'NUM')

back

Unnamed: 0_level_0,Unnamed: 1_level_0,two,plus,three,times,four
Unnamed: 0_level_1,0,1,2,3,4,5
0,---,{},"{'SOP': (1, 'S', 'OP')}","{'S': (1, 'NUM', 'OPS')}","{'SOP': (3, 'S', 'OP')}","{'S': (1, 'SOP', 'NUM')}"
1,---,---,{},"{'OPS': (2, 'OP', 'S')}",{},"{'OPS': (2, 'OP', 'S')}"
2,---,---,---,{},"{'S': (3, 'S', 'OP')}","{'S': (3, 'NUM', 'OPS')}"
3,---,---,---,---,{},"{'OPS': (4, 'OP', 'S')}"
4,---,---,---,---,---,{}
5,---,---,---,---,---,---


In [11]:
grader.check("pcky_fill_table")

Let us look at the probability of the highest scoring tree:

In [12]:
table.iloc[0,5]

defaultdict(int, {'S': 4.838400000000003e-06})

<!-- BEGIN QUESTION -->

---

**Question:** Compare the probability of the highest scoring tree you just computed to the probabilities you got in the end of the disambiguation section, when calculating probabilities of all parse trees for the phrase and explain the result.

<!--
BEGIN QUESTION
name: open_question_pcky
manual: true
-->

We've received a probability score indetical to tree 3 from the previous lab, which had the highest probability among all possible trees. This is because, by the way probabilistic CKY algorithm works, it returns the highest probability among all the possible trees, hence no suprise we've recieved the same score as the highest probability tree from previous lab's section.

<!-- END QUESTION -->



---

## PP attachment ambiguity

In the following section we set aside the rather limited world of arithmetic expressions, focusing on a common example of structural ambiguity in natural language called _prepositional phrase (PP) attachment_. A PP can modify both noun phrases and verb phrases, often creating ambiguity as to what constituent a PP should be attached to.

Take a look at the sentence "Twain bought a book for Howells" and the following PCFG:

In [13]:
sent1 = "Twain bought a book for Howells"

pcfg = nltk.PCFG.fromstring("""
    S -> NP VP [1.0]
    NP -> NNP [0.3] | DT NOM [0.6] | NOM [0.1]
    NOM -> NOM PP [0.4] | NN [0.3] | NNS [0.3]
    VP -> DTV NP PP [0.2] | TV NP [0.8]
    PP -> P NP [1.0]
    NNS -> 'books' [0.6] | 'gifts' [0.4]
    NNP -> 'Twain' [0.6] | 'Howells' [0.4]
    NN -> 'table' [0.5] | 'book' [0.5] 
    DTV -> 'bought' [0.5] | 'put' [0.5]
    TV -> 'bought' [0.5] | 'saw' [0.5]
    P -> 'on' [0.3] | 'of' [0.4] | 'by' [0.1] | 'for' [0.2]
    DT -> 'a' [0.5] | 'the' [0.5]
""")

Parsing this sentence with the above PCFG results in two possible parses:

In [14]:
parser = nltk.parse.InsideChartParser(pcfg)
possible_parses = list(parser.parse(sent1.split()))

for i, tree in enumerate(possible_parses):
  print(f'Possible parse #{i+1} with probability {tree.prob():.3g}:\n')
  tree.pretty_print()

Possible parse #1 with probability 3.11e-05:

              S                          
   ___________|___                        
  |               VP                     
  |      _________|____                   
  |     |              NP                
  |     |      ________|___               
  |     |     |           NOM            
  |     |     |    ________|___           
  |     |     |   |            PP        
  |     |     |   |         ___|_____     
  NP    |     |  NOM       |         NP  
  |     |     |   |        |         |    
 NNP    TV    DT  NN       P        NNP  
  |     |     |   |        |         |    
Twain bought  a  book     for     Howells

Possible parse #2 with probability 1.94e-05:

              S                          
   ___________|_______                    
  |                   VP                 
  |      _____________|________           
  |     |         NP           PP        
  |     |      ___|___      ___|_____     
  NP    |     |

<!-- BEGIN QUESTION -->

---

**Question:** What is the more natural parsing, the one that leads to the preferred _reading_ of the sentence? Is it the most probable parse tree?

<!--
BEGIN QUESTION
name: open_question_pp1
manual: true
-->

The most natural reading is to refer the determiner to a single noun and then to the "for" connector, which decribes the action of buying (for whom the book was **bought**) thus the second tree, which has smaller probability, is the more natural parsing.

<!-- END QUESTION -->

---

Change some of the rule probabilities (try to change as few as possible) such that the other tree has higher probability.

<!--
BEGIN QUESTION
name: attach_pp_to_np_1
-->

In [15]:
pcfg = nltk.PCFG.fromstring("""
    S -> NP VP [1.0]
    NP -> NNP [0.3] | DT NOM [0.6] | NOM [0.1]
    NOM -> NOM PP [0.4] | NN [0.3] | NNS [0.3]
    VP -> DTV NP PP [0.8] | TV NP [0.2]
    PP -> P NP [1.0]
    NNS -> 'books' [0.6] | 'gifts' [0.4]
    NNP -> 'Twain' [0.6] | 'Howells' [0.4]
    NN -> 'table' [0.5] | 'book' [0.5] 
    DTV -> 'bought' [0.5] | 'put' [0.5]
    TV -> 'bought' [0.5] | 'saw' [0.5]
    P -> 'on' [0.3] | 'of' [0.4] | 'by' [0.1] | 'for' [0.2]
    DT -> 'a' [0.5] | 'the' [0.5]
""")

In [16]:
parser2 = nltk.parse.InsideChartParser(pcfg)
possible_parses2 = list(parser2.parse(sent1.split()))

for i, tree in enumerate(possible_parses2):
  print(f'Possible parse #{i+1} with probability {tree.prob():.3g}:\n')
  tree.pretty_print()

Possible parse #1 with probability 7.78e-05:

              S                          
   ___________|_______                    
  |                   VP                 
  |      _____________|________           
  |     |         NP           PP        
  |     |      ___|___      ___|_____     
  NP    |     |      NOM   |         NP  
  |     |     |       |    |         |    
 NNP   DTV    DT      NN   P        NNP  
  |     |     |       |    |         |    
Twain bought  a      book for     Howells

Possible parse #2 with probability 7.78e-06:

              S                          
   ___________|___                        
  |               VP                     
  |      _________|____                   
  |     |              NP                
  |     |      ________|___               
  |     |     |           NOM            
  |     |     |    ________|___           
  |     |     |   |            PP        
  |     |     |   |         ___|_____     
  NP    |     |

Now we use the PCFG you defined to parse the sentence "Twain bought a book by Howells"

In [17]:
sent2 = "Twain bought a book by Howells"
possible_parses = list(parser2.parse(sent2.split()))

for i, tree in enumerate(possible_parses):
  print('Possible parse #{} with probability {:.3g}:\n'.format(i+1,tree.prob()))
  tree.pretty_print()

Possible parse #1 with probability 3.89e-05:

              S                          
   ___________|_______                    
  |                   VP                 
  |      _____________|________           
  |     |         NP           PP        
  |     |      ___|___      ___|_____     
  NP    |     |      NOM   |         NP  
  |     |     |       |    |         |    
 NNP   DTV    DT      NN   P        NNP  
  |     |     |       |    |         |    
Twain bought  a      book  by     Howells

Possible parse #2 with probability 3.89e-06:

              S                          
   ___________|___                        
  |               VP                     
  |      _________|____                   
  |     |              NP                
  |     |      ________|___               
  |     |     |           NOM            
  |     |     |    ________|___           
  |     |     |   |            PP        
  |     |     |   |         ___|_____     
  NP    |     |

<!-- BEGIN QUESTION -->

---

**Question:** Now what is the more natural parse for the sentence? Is it the most probable one? Can the PCFG be modified such that both sentences are parsed according to the natural readings for these sentences? Try to explain the problem.

<!--
BEGIN QUESTION
name: attach_pp_to_np_2
manual: true
-->

Oppose to the previous sentence, the most natural reading is to refer the determiner with it's noun and it's possession together (the possesion connector 'by' reffers to the book and not to the verb bought), thus the second tree, which has smaller probability, is the more natural parsing.


There is no distinction between the both cases ('for' and 'by' and the correspoding division of the sentences) in the current set of rules, thus for both natural reading parse cases cannot be achieved in current set of rules by only changing the probabilities, because the probabilties which distinguish between the two opposite parse trees sum to 1, thus one is bigger then the other. It could be achieved if we make such a distinction with additional rules. 

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

---

## Lab debrief – for consensus submission only

**Question:** We're interested in any thoughts your group has about this lab so that we can improve this lab for later years, and to inform later labs for this year. Please list any issues that arose or comments you have to improve the lab. Useful things to comment on include the following: 

* Was the lab too long or too short?
* Were the readings appropriate for the lab? 
* Was it clear (at least after you completed the lab) what the points of the exercises were? 
* Are there additions or changes you think would make the lab better?

<!--
BEGIN QUESTION
name: open_response_debrief
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



# End of Lab 3-4

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [18]:
grader.check_all()