# Context-Free Grammars and Parsing with NLTK
Partially taken from the NLTK book


## Initializations:


In [1]:
### CREATE VIRTUAL DISPLAY ###
!apt-get install -y xvfb # Install X Virtual Frame Buffer
import os
os.system('Xvfb :1 -screen 0 1600x1200x16  &')    # create virtual display with size 1600x1200 and 16 bit color. Color can be changed to 24 or 8
os.environ['DISPLAY']=':1.0'    # tell X clients to use our virtual DISPLAY :1.0.

%matplotlib inline

### INSTALL GHOSTSCRIPT (Required to display NLTK trees) ###
!apt-get update
!apt install ghostscript python3-tk
#from ctypes.util import find_library
#find_library("gs")

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  xvfb
0 upgraded, 1 newly installed, 0 to remove and 37 not upgraded.
Need to get 784 kB of archives.
After this operation, 2,271 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 xvfb amd64 2:1.19.6-1ubuntu4.10 [784 kB]
Fetched 784 kB in 1s (947 kB/s)
Selecting previously unselected package xvfb.
(Reading database ... 155229 files and directories currently installed.)
Preparing to unpack .../xvfb_2%3a1.19.6-1ubuntu4.10_amd64.deb ...
Unpacking xvfb (2:1.19.6-1ubuntu4.10) ...
Setting up xvfb (2:1.19.6-1ubuntu4.10) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
Ign:3 https://developer.download.nvidia.com/compute/cuda/

## Loading a CFG grammar:

In [2]:
import nltk

from nltk import CFG
from IPython.display import display

grammar = CFG.fromstring("""
   S -> NP VP
   PP -> P NP
   NP -> Det N | NP PP
   VP -> V NP | VP PP
   Det -> 'a' | 'the'
   N -> 'dog' | 'cat'
   V -> 'chased' | 'sat'
   P -> 'on' | 'in'
 """)

## Analyzing sentences:

In [None]:
sent1 = "the dog chased the cat".split()
sent2 = "the dog chased the cat on the dog".split()

# Create an analyzer from the grammar
chart_parser = nltk.ChartParser(grammar)

# View the analysis
for p in chart_parser.parse(sent1): print(p)

### Viewing the syntax tree:

In [None]:
# A single parse tree
display(p)

In [None]:
# When there are several analyses
for p in chart_parser.parse(sent2): display(p)

## Generating sentences given a CFG:


In [None]:
from nltk.parse.generate import generate, demo_grammar
from nltk import CFG
grammar = CFG.fromstring(demo_grammar)
print(grammar)

In [None]:
for sentence in generate(grammar, n=50, depth=5):
     print(' '.join(sentence))

## Example: issues in grammar design: avoid unnecessary ambiguity


In [None]:
import nltk
from nltk import CFG

grammar = CFG.fromstring("""
   ADJP -> ADJP ADJP
   ADJP -> 'white' | 'tall' | 'small' | 'expensive'
""")

sentence = "white expensive tall small".split()

# Create an analyzer from the grammar
chart_parser = nltk.ChartParser(grammar)

# View the analysis
for p in chart_parser.parse(sentence): 
    print(p)
    display(p)

## Exercise: rewrite the grammar so that there is only a single analysis for adjective chains

In [None]:
grammar2 = CFG.fromstring("""
   # Design your new ADJP grammar based on this one:
   ADJP -> ADJP ADJP
   ADJP -> 'white' | 'tall' | 'small' | 'expensive'
""")

sentence = "white expensive tall small".split()

# Create an analyzer from the grammar
chart_parser = nltk.ChartParser(grammar2)

# View the analysis
for p in chart_parser.parse(sentence): 
    print(p)
    display(p)

## Exercise (homework assignment)

Take the following grammar:

In [None]:
from nltk import CFG

grammar = CFG.fromstring("""
 S -> NP VP
 PP -> P NP
 NP -> Det N | NP PP
 NP -> Det ADJ N
 VP -> V NP | VP PP
 Det -> 'a' | 'the'
 N -> 'dog' | 'cat'
 V -> 'chased' | 'sat'
 P -> 'on' | 'in'
 ADJ -> 'big' | ADJ 'big'
 S1 -> NP VP | NP V
 S -> S CORD S1
 CORD -> 'and'
 """)

Write rules:
* Simple sentences:
  * the big white cat sat on the dog
  * today the big white cat sat on the dog
  * Sally ate a sandwich .
  * Sally and the president wanted and ate a sandwich .
  * the president sighed .
  * the president thought that a sandwich sighed .
  * it perplexed the president that a sandwich ate Sally .
  * the very very very perplexed president ate a sandwich .
  * the president worked on every proposal on the desk .
* Relative clauses:
  * the cat that the dog chased chased the dog
* Coordination:
  * The cat and the dog chased the cat
* **Always save the last correct grammar (and example sentences)!** <br> Any change can have positive or negative effects, and there must always be a way to go back to the last correct version
* Document the resulting grammar: range of syntactic constructions covered, example analyses, limitations, ...
* Provide 5 random sentences generated with your last version of the grammar  illustrating your modifications.

Something to start with:

In [None]:
sentence = "the big dog chased the big cat".split()

# Create an analyzer from the grammar
chart_parser = nltk.ChartParser(grammar)

# View the analysis
for p in chart_parser.parse(sentence): 
  print(p)
  display(p)