# Workshop week 6: Application of Transformers and Syntactic Parsing

## 1. Application of Transformers

### BERT for Named Entity Recognition

    Using BERT for Named Entity Recognition (NER): 
    Named Entity Recognition (NER) is a task of identifying and classifying entities in a text into predefined categories such as person, organization, location, time, and others. BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained deep learning model that has shown state-of-the-art performance in various natural language processing tasks, including NER. To use BERT for NER, the pre-trained BERT model can be fine-tuned on a labeled dataset of named entities. The fine-tuned BERT model can then be used to predict the named entities in new text. The input to the model is a sequence of tokens, and the output is a sequence of labels that correspond to the named entity categories.



BERT has shown superior performance compared to traditional machine learning and deep learning models. Fine-tuning the pre-trained BERT model requires a small labeled dataset and can be done efficiently using transfer learning, making it an effective and efficient approach for various NLP tasks.


This part is prepared in a separate as training and testing may take up to half an hour.

    Notebook BERT for Named Entity Recognition.ipynb
    
**The reason for reviewing this code is that it may be useful in your Assignment 2**


## 2. Syntactic Parsing

Syntactic parsing is the process of analyzing a sentence or a text in a language and determining its grammatical structure. It involves determining the relationships between the words in a sentence and their roles in building the sentence's meaning.

Constituency parsing and Dependency parsing are two approaches used to perform syntactic parsing.

Constituency parsing involves analyzing a sentence and determining its constituents, which are the smallest units that make up the sentence. The constituents are organized into a tree structure, where the root of the tree represents the complete sentence, and each branch represents a constituent.

Dependency parsing, on the other hand, involves analyzing a sentence and determining the dependencies between its words. A dependency is a relation between two words in a sentence that captures the grammatical role of one word with respect to the other. The dependencies are represented as directed edges in a graph, where each node represents a word in the sentence and each edge represents a dependency between two words.

#### Using Viterbi algorithm with Probabilistic CFG for syntactic parsing

Source: https://www.nltk.org/_modules/nltk/parse/viterbi.html#demo

In [None]:
def demo(sentence_number=1, draw_parses='y', print_parses='y'):
    """
    A demonstration of the probabilistic parsers.  The user is
    prompted to select which demo to run, and how many parses should
    be found; and then each parser is run on the same demo, and a
    summary of the results are displayed.
    """
    import sys
    import time

    from functools import reduce
    from nltk import tokenize
    from nltk.grammar import PCFG
    from nltk.parse import ViterbiParser

    toy_pcfg1 = PCFG.fromstring(
        """
    S -> NP VP [1.0]
    NP -> Det N [0.5] | NP PP [0.25] | 'John' [0.1] | 'I' [0.15]
    Det -> 'the' [0.8] | 'my' [0.2]
    N -> 'man' [0.5] | 'telescope' [0.5]
    VP -> VP PP [0.1] | V NP [0.7] | V [0.2]
    V -> 'ate' [0.35] | 'saw' [0.65]
    PP -> P NP [1.0]
    P -> 'with' [0.61] | 'under' [0.39]
    """
    )

    toy_pcfg2 = PCFG.fromstring(
        """
    S    -> NP VP         [1.0]
    VP   -> V NP          [.59]
    VP   -> V             [.40]
    VP   -> VP PP         [.01]
    NP   -> Det N         [.41]
    NP   -> Name          [.28]
    NP   -> NP PP         [.31]
    PP   -> P NP          [1.0]
    V    -> 'saw'         [.21]
    V    -> 'ate'         [.51]
    V    -> 'ran'         [.28]
    N    -> 'boy'         [.11]
    N    -> 'cookie'      [.12]
    N    -> 'table'       [.13]
    N    -> 'telescope'   [.14]
    N    -> 'hill'        [.5]
    Name -> 'Jack'        [.52]
    Name -> 'Bob'         [.48]
    P    -> 'with'        [.61]
    P    -> 'under'       [.39]
    Det  -> 'the'         [.41]
    Det  -> 'a'           [.31]
    Det  -> 'my'          [.28]
    """
    )

    # Define two demos.  Each demo has a sentence and a grammar.
    demos = [
        ("I saw the man with my telescope", toy_pcfg1),
        ("the boy saw Jack with Bob under the table with a telescope", toy_pcfg2),
    ]

    # Ask the user which demo they want to use.
    print()
    for i in range(len(demos)):
        print(f"{i + 1:>3}: {demos[i][0]}")
        print("     %r" % demos[i][1])
        print()
    print("Which demo (%d-%d)? " % (1, len(demos)), end=" ")
    try:
        snum = int(sentence_number) - 1
        sent, grammar = demos[snum]
    except:
        print("Bad sentence number")
        return

    # Tokenize the sentence.
    tokens = sent.split()

    parser = ViterbiParser(grammar)
    all_parses = {}

    print(f"\nsent: {sent}\nparser: {parser}\ngrammar: {grammar}")
    parser.trace(3)
    t = time.time()
    parses = parser.parse_all(tokens)
    time = time.time() - t
    average = (
        reduce(lambda a, b: a + b.prob(), parses, 0) / len(parses) if parses else 0
    )
    num_parses = len(parses)
    for p in parses:
        all_parses[p.freeze()] = 1

    # Print some summary statistics
    print()
    print("Time (secs)   # Parses   Average P(parse)")
    print("-----------------------------------------")
    print("%11.4f%11d%19.14f" % (time, num_parses, average))
    parses = all_parses.keys()
    if parses:
        p = reduce(lambda a, b: a + b.prob(), parses, 0) / len(parses)
    else:
        p = 0
    print("------------------------------------------")
    print("%11s%11d%19.14f" % ("n/a", len(parses), p))

    # Ask the user if we should draw the parses.
    print()
    print("Draw parses (y/n)? "+draw_parses, end=" ")
    if draw_parses.strip().lower().startswith("y"):
        from nltk.draw.tree import draw_trees

        print("  please wait...")
        draw_trees(*parses)

    # Ask the user if we should print the parses.
    print()
    print("Print parses (y/n)? "+print_parses, end=" ")
    if print_parses.strip().lower().startswith("y"):
        for parse in parses:
            print(parse)
demo(1,'n','y')


### Task 1: What is the meaning based on this parsing?

Analyse the parsing result of the sentence "I saw the man with my telescope". As you can figure out, this sentence has two meanings. Based on he parsing result, which meaning of this sentence corresponds the parsing corresponds to? How did you find out?

Discuss your findings in the class.

### Task 2: Manipulate the probabilities of the CFG grammar to change the meaning.

Modify toy_pcfg1 to force the parsing to the other meaning. 

Are there more than one way to do it?

Discuss your finding in the class.

<h2><center>Context-free Grammar and Dependency Grammar</center></h2>

Context-Free Grammar (CFG) is a type of formal grammar used to describe the structure of a natural language. A CFG defines a set of rules for generating sentences in a language. Each rule consists of a left-hand side, which is a non-terminal symbol, and a right-hand side, which is a sequence of terminal and non-terminal symbols.

A CFG rule has the form:

A -> B C D

where A is a non-terminal symbol, and B, C, and D are either terminal or non-terminal symbols. The arrow symbol "->" represents a production and means that the non-terminal A can be replaced by the sequence of symbols B, C, and D.

CFG rules can be used to parse a sentence by constructing a parse tree. The parse tree is a tree structure that represents the syntactic structure of a sentence according to the rules of the grammar. The process of constructing the parse tree involves repeatedly applying the CFG rules to the sentence until all the non-terminal symbols have been replaced by terminal symbols.

Dependency Grammar is a type of grammar that defines the dependencies between the words in a sentence. A dependency grammar consists of a set of dependency rules, each of which defines the dependencies between two words in a sentence.

A dependency grammar rule has the form:

word1 --relation--> word2

where word1 and word2 are words in the sentence, and relation is a type of dependency between them. For example, the relationship "subject" specifies that word1 is the subject of the sentence, and word2 is the predicate.

The accuracy of a dependency parser can be evaluated using various metrics, such as precision, recall, and F1-score. Precision measures the proportion of dependencies that are correctly identified by the parser, while recall measures the proportion of dependencies that are found by the parser compared to the total number of dependencies in the sentence. F1-score is a measure that combines precision and recall and provides a single score that indicates the overall performance of the parser.

Consider the sentence "The cat chased the mouse".

Constituency Parsing:

    The sentence can be represented as a constituency tree, where the root of the tree represents the complete sentence, and each branch represents a constituent:



  (S <br />
   (NP The cat)<br />
   (VP chased<br />
       (NP the mouse)))<br />


Dependency Parsing:

    The sentence can be represented as a dependency graph, where each node represents a word in the sentence and each edge represents a dependency between two words:

  cat --subject--> chased <br />
  chased --object--> mouse <br />


In [1]:
## Example of CFG

import nltk
from nltk import CFG
from nltk.parse import RecursiveDescentParser
from nltk.parse.chart import ChartParser

#  Define a simple context-free grammar
grammar = CFG.fromstring("""
S -> NP VP
NP -> Det N
VP -> V NP
Det -> 'a' | 'the'
N -> 'dog' | 'cat'
V -> 'chased' | 'sat'
""")

# Use a recursive descent parser to parse the sentence
rd_parser = RecursiveDescentParser(grammar)
sentence = "the cat chased a dog"
tokens = sentence.split()
for tree in rd_parser.parse(tokens):
    print(tree)

# Evaluate the accuracy of the CFG using a chart parser
chart_parser = ChartParser(grammar)
test_sentences = [
    "the cat chased a dog",
    "a dog chased the cat",
    "the dog sat",
    "a cat chased the dog",
]
correct = 0
total = len(test_sentences)
for sentence in test_sentences:
    tokens = sentence.split()
    parse_trees = chart_parser.parse(tokens)
    if len(list(parse_trees)) > 0:
        correct += 1

accuracy = correct / total
print("CFG Accuracy: {:.2f}%".format(accuracy * 100))


(S (NP (Det the) (N cat)) (VP (V chased) (NP (Det a) (N dog))))
CFG Accuracy: 75.00%


### Task 3: Modify the above code by allowing it to parse "the cat chased a dog on the mat"




## Dependency and constituency parsing with spacy


In this example, we first load the English language model from spaCy. Then, we parse a sample sentence and print the dependencies between the tokens. Finally, we evaluate the accuracy of the parser by parsing several test sentences and counting the number of correct parses. The accuracy is calculated as the ratio of correct parses to the total number of test sentences. The accuracy is determined by checking if the noun chunks were correctly extracted from the test sentences, but other metrics could be used for evaluation as well.

In [3]:
# const parsing  
!pip install benepar
!pip install sentencepiece

arg_constraints = {} # to stop validation, runs faster
import spacy
import benepar
import en_core_web_sm

nlp = spacy.load("en_core_web_sm")
benepar.download('benepar_en3')


import benepar, spacy
import en_core_web_sm
from nltk.tree import Tree

nlp = en_core_web_sm.load()
nlp.add_pipe('benepar', config={"model": "benepar_en3"})

doc = nlp('One morning I chased a cat in my pyjamas')
sent = list(doc.sents)[0]
str_tree = sent._.parse_string
print(str_tree)

tree = Tree.fromstring(str_tree)
tree.pretty_print()

Collecting benepar
  Downloading benepar-0.2.0.tar.gz (33 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting torch-struct>=0.5
  Downloading torch_struct-0.5-py3-none-any.whl (34 kB)
Collecting protobuf
  Downloading protobuf-4.25.3-cp39-cp39-win_amd64.whl (413 kB)
     ------------------------------------- 413.4/413.4 kB 30.2 kB/s eta 0:00:00
Collecting sentencepiece>=0.1.91
  Downloading sentencepiece-0.2.0-cp39-cp39-win_amd64.whl (991 kB)
     ------------------------------------ 991.5/991.5 kB 186.3 kB/s eta 0:00:00
Building wheels for collected packages: benepar
  Building wheel for benepar (setup.py): started
  Building wheel for benepar (setup.py): finished with status 'done'
  Created wheel for benepar: filename=benepar-0.2.0-py3-none-any.whl size=37625 sha256=809bbefd59394cbb0db1e75b097743c06672e80fe85123f73edb9a75b125b88e
  Stored in directory: c:\users\samridhi\appdata\local\pip\cache\wheels\dc\9a\8b\5d4c83fde



[nltk_data] Downloading package benepar_en3 to
[nltk_data]     C:\Users\samridhi\AppData\Roaming\nltk_data...
[nltk_data] Error downloading 'benepar_en3' from
[nltk_data]     <https://github.com/nikitakit/self-attentive-
[nltk_data]     parser/releases/download/models/benepar_en3.zip>:
[nltk_data]     [WinError 10053] An established connection was aborted
[nltk_data]     by the software in your host machine


BadZipFile: File is not a zip file

### Task 4: Meaning of this parser

Run the sentence "I saw the man with my stolen telescope"

Run the sentence "I saw the man with my own eyes"


Based on the parsing, find out what is the meaning of these sentence. 

Are these meanings close to the sentence most likely meaning? 


In [None]:
# Dependency parser
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp('One morning I chased a cat in my pyjamas')
displacy.render(doc, style="dep")
# displacy.serve(doc, style="dep") # this can be used to display in localhost:5000

### Task 5: Interpret the meaning of this sentence from the parsing. 

Is it easier to find out the meaning based on depencency or constituency parsing?

Think of another sentence with ambiguous meaning that is hard to figure out without the background human knowledge.

Try it with this parser. Did you get the meaning you expected?



In [None]:
# Task -2 
import spacy

# Load the English model
nlp = spacy.load("en_core_web_sm")

# Define a sentence
sentence = "The cat chased the dog."

# Parse the sentence to get its grammatical structure
doc = nlp(sentence)

# Print the dependency tree
print("\nDependency tree for the sentence:")
for token in doc:
    print(token.text, token.dep_, token.head.text, [child for child in token.children])


### Reading: Comparing CFG and Dependency Grammar

Context Free Grammar (CFG) and Dependency Grammar are two different approaches to represent the grammatical structure of a sentence.

CFG is a type of grammar that consists of a set of production rules that specify the structure of sentences. It defines the relationships between non-terminal symbols and terminal symbols in a sentence. Non-terminal symbols represent parts of speech such as nouns, verbs, adjectives, etc. Terminal symbols represent words in the sentence.

Dependency Grammar, on the other hand, represents the grammatical structure of a sentence as a set of dependencies between words in the sentence. It is a type of grammar that defines the relationships between words in a sentence in terms of their function in the sentence. Each word in the sentence is either a dependent or a head. The head is the main word in the relationship, and the dependent is the word that is related to the head.

The main difference between CFG and Dependency Grammar is that CFG focuses on the structure of a sentence, while Dependency Grammar focuses on the relationships between words in the sentence. CFG is more suited to generating new sentences based on a set of rules, while Dependency Grammar is more suited to understanding the relationships between words in an existing sentence.

The advantages of CFG include its simplicity, generality, and the ability to generate new sentences. The disadvantages include its difficulty in handling free word order and complex relationships between words.

The advantages of Dependency Grammar include its ability to handle free word order and complex relationships between words. The disadvantages include its complexity and the difficulty in generating new sentences based on the rules.

In conclusion, both CFG and Dependency Grammar have their strengths and weaknesses, and the choice between the two will depend on the specific task at hand.

### Optional Task 6: Experimenting with Different Grammars and Dependency Parsers

To experiment with different grammars and dependency parsers, you can try using different CFG libraries or implementations, such as the Earley parser

## Optional: Extracting Entities from text: may be useful for Assignment 2

### Extracting Entities

In natural language processing (NLP), extracting entities refers to the process of identifying and extracting specific pieces of information from text, such as people, organizations, locations, dates, etc. This is a fundamental task in many NLP applications, such as named entity recognition, question answering, information retrieval, and text classification.

There are different approaches to extracting entities from text, ranging from rule-based systems to machine learning models. One popular approach is to use pre-trained models such as the ones provided by the spaCy library. These models are trained on large annotated datasets and can achieve high accuracy in identifying and classifying entities.

Here's an example of how to extract entities from a text using spaCy in Python:

In [1]:
import spacy

nlp = spacy.load("en_core_web_sm")

text = "Apple is looking at buying a startup for $1 billion"
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, ent.label_)


Apple ORG
$1 billion MONEY


In this example, we load the pre-trained en_core_web_sm model from spaCy and use it to process the text "Apple is looking at buying a startup for $1 billion". We then iterate over the identified entities in the doc object and print out their text and label. This shows that the model correctly identified "Apple" as an organization and "$1 billion" as a monetary value. Steps:

We first import the spaCy library and load a pre-trained model for the English language using nlp = spacy.load("en_core_web_sm"). This initializes an instance of the Language class and loads the pre-trained model data for English.


We then define the input text that we want to extract entities from using text = "Apple is looking at buying a startup for $1 billion".


Next, we use the nlp object to process the input text by calling doc = nlp(text). 

This creates a Doc object that contains various linguistic annotations such as part-of-speech tags, dependencies, and named entities.


Finally, we iterate over the entities identified in the input text using a for loop and print out their text and label using print(ent.text, ent.label_). The ent variable represents an individual entity in the doc object, and ent.text and ent.label_ return the text and label of the entity, respectively.

**Apply the above code to one of the articles from your assignment 2. Check if working.**