<a href="https://colab.research.google.com/github/krystal826/Natural-Language-Processing/blob/main/Lab09_Task05.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Task 05: Exploring PCFG

In [None]:
import nltk
from nltk import CFG, PCFG
from nltk.parse.generate import generate

### PCFG that accepts the sentence "I shot an elephant in my pyjamas"

In [None]:
grammar = nltk.PCFG.fromstring("""
    S -> NP VP     [1.0]
    PP -> P NP     [1.0]
    NP -> Det N    [0.5]
    NP -> NP PP    [0.40] 
    NP -> 'I'      [0.1]
    VP -> V NP     [0.7]
    VP -> VP PP    [0.3]
    Det -> 'an'    [0.6]
    Det -> 'my'    [0.4]
    N -> 'elephant' [0.5]
    N -> 'pajamas' [0.5]
    V -> 'shot'    [1.0]
    P -> 'in'      [1.0]
""")

### Parsing the sentence using normal CFG. Two parse trees are produced. The sentence is ambiguous.

In [None]:
sentence = ['I', 'shot', 'an', 'elephant', 'in', 'my', 'pajamas']
parser = nltk.ChartParser(grammar)

for tree in parser.parse(sentence):
    tree.pretty_print()

     S                                       
  ___|______________                          
 |                  VP                       
 |         _________|__________               
 |        VP                   PP            
 |    ____|___              ___|___           
 |   |        NP           |       NP        
 |   |     ___|_____       |    ___|_____     
 NP  V   Det        N      P  Det        N   
 |   |    |         |      |   |         |    
 I  shot  an     elephant  in  my     pajamas

     S                                       
  ___|______________                          
 |                  VP                       
 |    ______________|______                   
 |   |                     NP                
 |   |         ____________|___               
 |   |        |                PP            
 |   |        |             ___|___           
 |   |        NP           |       NP        
 |   |     ___|_____       |    ___|_____     
 NP  V   Det        N  

### Parsing the sentence using PCFG. The Viterbi Parser is based on Probabilistic CKY parser. This parser gives the most likely parse tree is displayed. Identify which parse tree is chosen?

In [None]:
viterbi_parser = nltk.ViterbiParser(grammar)

for tree in viterbi_parser.parse(['I', 'shot', 'an', 'elephant', 'in', 'my', 'pajamas']):
    print("\nDisplay the most likely tree (with highest probability value)")
    print(tree)
tree.pretty_print()


Display the most likely tree (with highest probability value)
(S
  (NP I)
  (VP
    (V shot)
    (NP
      (NP (Det an) (N elephant))
      (PP (P in) (NP (Det my) (N pajamas)))))) (p=0.00042)
     S                                       
  ___|______________                          
 |                  VP                       
 |    ______________|______                   
 |   |                     NP                
 |   |         ____________|___               
 |   |        |                PP            
 |   |        |             ___|___           
 |   |        NP           |       NP        
 |   |     ___|_____       |    ___|_____     
 NP  V   Det        N      P  Det        N   
 |   |    |         |      |   |         |    
 I  shot  an     elephant  in  my     pajamas



### To display all possible parse trees with its probability value respectively. The InsideChartParser is used to parse the sentence.

In [None]:
from nltk.parse import pchart
parser = pchart.InsideChartParser(grammar)

parses = parser.parse_all(sentence)
for parse in parses:
    print(parse)

(S
  (NP I)
  (VP
    (V shot)
    (NP
      (NP (Det an) (N elephant))
      (PP (P in) (NP (Det my) (N pajamas)))))) (p=0.00042)
(S
  (NP I)
  (VP
    (VP (V shot) (NP (Det an) (N elephant)))
    (PP (P in) (NP (Det my) (N pajamas))))) (p=0.000315)


### Below is another CFG. Run these two cells and see the output.

In [None]:
grammar = nltk.CFG.fromstring("""
    S    -> NP VP
    NP   -> Det Noun
    NP   -> NP PP 
    PP   -> Prep NP
    VP   -> Verb NP
    VP   -> VP PP
    Det  -> 'the' | 'a'
    Noun -> 'man' | 'lady' | 'telescope'
    Prep -> 'with'    
    Verb   -> 'saw'
""")

In [None]:
sentence = ['the', 'man', 'saw', 'the', 'lady', 'with', 'a', 'telescope']
parser = nltk.ChartParser(grammar)
for tree in parser.parse(sentence):
    tree.pretty_print()

                   S                                 
      _____________|_______                           
     |                     VP                        
     |              _______|_________                 
     |             VP                PP              
     |         ____|___          ____|___             
     NP       |        NP       |        NP          
  ___|___     |     ___|___     |     ___|______      
Det     Noun Verb Det     Noun Prep Det        Noun  
 |       |    |    |       |    |    |          |     
the     man  saw  the     lady with  a      telescope

                   S                                 
      _____________|_______                           
     |                     VP                        
     |         ____________|____                      
     |        |                 NP                   
     |        |         ________|____                 
     |        |        |             PP              
     |        |    

### Q1. Since there are two parse trees are produced, you have to disambiguate them.  Assume that the second tree is the most likely tree, transform the CFG into PCFG. Provide your own probability value for each rule.

### Q2. Parse the sentence "the man saw the lady with a telescope" with your defined PCFG. Do you get the following tree as the output?

![Screenshot%202020-12-23%20at%209.21.33%20PM.png](attachment:Screenshot%202020-12-23%20at%209.21.33%20PM.png)