### 1. Named Entity Recognition (NER)
Identifies entities in text, such as names, dates, locations, and organizations.

In [1]:
# different spacy and en_core_web_sm versions may lead to different results
# this notebook is made with spaCy version 3.8.3 and en_core_web_sm version 3.8.0
# find your own version in the terminal: python -m spacy info
import spacy 
from spacy import displacy

NER = spacy.load("en_core_web_sm")  # Load SpaCy's small English model
sentence = "This course is lectured by Dr. S. Supraja, and Simon Liu at NTU, Singapore."

doc = NER(sentence)
for entity in doc.ents:
    print(f"{entity.text}[{entity.label_}]")

S. Supraja[PERSON]
Simon Liu[PERSON]
NTU[ORG]
Singapore[GPE]


In [2]:
# use the following command to explore the definition of the given POS tag, dependency label or entity type in spacy
spacy.explain('ORG')

'Companies, agencies, institutions, etc.'

In [3]:
displacy.render(doc, style='ent', jupyter=True)

### 2. Part-of-Speech (POS) Tagging
Assigning grammatical categories (like noun, verb, adjective, etc.) to words in a sentence.

In [4]:
# Process the text for coarse-grained POS tagging
for token in doc:
    print(f"{token.text}[{token.pos_}]")

This[DET]
course[NOUN]
is[AUX]
lectured[VERB]
by[ADP]
Dr.[PROPN]
S.[PROPN]
Supraja[PROPN]
,[PUNCT]
and[CCONJ]
Simon[PROPN]
Liu[PROPN]
at[ADP]
NTU[PROPN]
,[PUNCT]
Singapore[PROPN]
.[PUNCT]


In [5]:
# Process the text for fine-grained POS tagging
for token in doc:
    print(f"{token.text}[{token.tag_}]")

This[DT]
course[NN]
is[VBZ]
lectured[VBN]
by[IN]
Dr.[NNP]
S.[NNP]
Supraja[NNP]
,[,]
and[CC]
Simon[NNP]
Liu[NNP]
at[IN]
NTU[NNP]
,[,]
Singapore[NNP]
.[.]


In [6]:
from nltk import Tree

def tok_format(tok, coarse=False):
    if coarse:
        return "[".join([tok.orth_, tok.pos_]) + "]"
    return "[".join([tok.orth_, tok.tag_]) + "]"

def to_nltk_tree(node, coarse=False):
    if node.n_lefts + node.n_rights > 0:
        return Tree(tok_format(node), [to_nltk_tree(child, coarse=coarse) for child in node.children])
    else:
        return tok_format(node, coarse=coarse)

[to_nltk_tree(sent.root, coarse=True).pretty_print() for sent in doc.sents];
[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents];

                                     lectured[VBN]                                                                                  
    _______________________________________|_________________________________________________________                                
   |       |         |         |           |                       |                              Liu[NNP]                          
   |       |         |         |           |                       |                       __________|________                       
   |       |         |         |           |                     by[IN]                   |                 at[IN]                  
   |       |         |         |           |                       |                      |                   |                      
   |       |         |         |       course[NN]             Supraja[NNP]                |                NTU[NNP]                 
   |       |         |         |           |            __________

### 3. Dependency Parsing
Analyzing the grammatical structure of a sentence to establish relationships between words.

In [7]:
# Process the text for dependency parsing
print('{:<15} | {:<10} | {:<15} | {:<20}'.format('Token','Relation','Head','Children'))
print('-'*70)
for token in doc:
    #Print the token, dependency nature, head and all dependents of the token
    print("{:<15} | {:<10} | {:<15} | {:<20}"
          .format(str(token.text), str(token.dep_), str(token.head.text), str([child for child in token.children])))

Token           | Relation   | Head            | Children            
----------------------------------------------------------------------
This            | det        | course          | []                  
course          | nsubjpass  | lectured        | [This]              
is              | auxpass    | lectured        | []                  
lectured        | ROOT       | lectured        | [course, is, by, ,, and, Liu, .]
by              | agent      | lectured        | [Supraja]           
Dr.             | compound   | Supraja         | []                  
S.              | compound   | Supraja         | []                  
Supraja         | pobj       | by              | [Dr., S.]           
,               | punct      | lectured        | []                  
and             | cc         | lectured        | []                  
Simon           | compound   | Liu             | []                  
Liu             | conj       | lectured        | [Simon, at]         
at     

In [8]:
# use displacy to render the text
displacy.render(doc, style='dep', jupyter=True, options={'distance':120})

### Practice for the week
Perform NER, POS tagging and dependency parser on the following text and observe the results. Refer to https://spacy.io/api for more information.

In [6]:
import pandas as pd
import spacy

raw_text1 = "From 1925 to 1945, Tolkien was the Rawlinson and Bosworth Professor of Anglo-Saxon and a Fellow of Pembroke College, both at the University of Oxford. He then moved within the same university to become the Merton Professor of English Language and Literature and Fellow of Merton College, and held these positions from 1945 until his retirement in 1959. Tolkien was a close friend of C. S. Lewis, a co-member of the informal literary discussion group The Inklings. He was appointed a Commander of the Order of the British Empire by Queen Elizabeth II on 28 March 1972."

raw_text2 = '''
From 1925 to 1945, Tolkien was the Rawlinson and Bosworth Professor of Anglo-Saxon and a Fellow of Pembroke College, both at the University of Oxford. 
He then moved within the same university to become the Merton Professor of English Language and Literature and Fellow of Merton College, and held these positions from 1945 until his retirement in 1959. 
Tolkien was a close friend of C. S. Lewis, a co-member of the informal literary discussion group The Inklings. 
He was appointed a Commander of the Order of the British Empire by Queen Elizabeth II on 28 March 1972.
'''

df = pd.DataFrame([raw_text1, raw_text2], columns=['text'])
print(df)

                                                text
0  From 1925 to 1945, Tolkien was the Rawlinson a...
1  \nFrom 1925 to 1945, Tolkien was the Rawlinson...


In [13]:
# load the small English Model
nlp = spacy.load('en_core_web_sm')

# lists to store tokens and tags
token = []
pos = []

# TODO: continue the codes from here
for i, doc in enumerate(nlp.pipe(df['text']), start=1):
    print(f"\n=== DOC {i} : NER ===")
    for ent in doc.ents:
        print(f"{ent.text} [{ent.label_}]")


=== DOC 1 : NER ===
1925 to 1945 [DATE]
Tolkien [PERSON]
Anglo-Saxon [ORG]
the University of Oxford [ORG]
English Language and Literature and Fellow of Merton College [WORK_OF_ART]
1945 [DATE]
1959 [DATE]
Tolkien [PERSON]
C. S. Lewis [PERSON]
Inklings [ORG]
the British Empire [GPE]
Elizabeth II [PERSON]
28 March 1972 [DATE]

=== DOC 2 : NER ===
1925 to 1945 [DATE]
Tolkien [PERSON]
Anglo-Saxon [ORG]
the University of Oxford [ORG]
English Language and Literature and Fellow of Merton College [WORK_OF_ART]
1945 [DATE]
1959 [DATE]
Tolkien [PERSON]
C. S. Lewis [PERSON]
Inklings [ORG]
the British Empire [GPE]
Elizabeth II [PERSON]
28 March 1972 [DATE]


In [18]:
# Build spaCy docs for both rows in your DataFrame
docs = list(nlp.pipe(df['text']))

def fmt(sent, use="pos"):
    # use="pos" for coarse (tok.pos_), use="tag" for fine (tok.tag_)
    if use == "tag":
        return "".join(f"{t.text}[{t.tag_}]{t.whitespace_}" for t in sent).strip()
    return "".join(f"{t.text}[{t.pos_}]{t.whitespace_}" for t in sent).strip()

for doc_i, doc in enumerate(docs, start=1):
    print(f"\n=== DOC {doc_i} — Coarse POS (pos_) ===")
    for sent_i, sent in enumerate(doc.sents, start=1):
        print(f"({sent_i}) {fmt(sent, use='pos')}")

    print(f"\n=== DOC {doc_i} — Fine POS (tag_) ===")
    for sent_i, sent in enumerate(doc.sents, start=1):
        print(f"({sent_i}) {fmt(sent, use='tag')}")


=== DOC 1 — Coarse POS (pos_) ===
(1) From[ADP] 1925[NUM] to[ADP] 1945[NUM],[PUNCT] Tolkien[PROPN] was[AUX] the[DET] Rawlinson[PROPN] and[CCONJ] Bosworth[PROPN] Professor[PROPN] of[ADP] Anglo[PROPN]-[PUNCT]Saxon[PROPN] and[CCONJ] a[DET] Fellow[PROPN] of[ADP] Pembroke[PROPN] College[PROPN],[PUNCT] both[PRON] at[ADP] the[DET] University[PROPN] of[ADP] Oxford[PROPN].[PUNCT]
(2) He[PRON] then[ADV] moved[VERB] within[ADP] the[DET] same[ADJ] university[NOUN] to[PART] become[VERB] the[DET] Merton[PROPN] Professor[PROPN] of[ADP] English[PROPN] Language[PROPN] and[CCONJ] Literature[PROPN] and[CCONJ] Fellow[PROPN] of[ADP] Merton[PROPN] College[PROPN],[PUNCT] and[CCONJ] held[VERB] these[DET] positions[NOUN] from[ADP] 1945[NUM] until[ADP] his[PRON] retirement[NOUN] in[ADP] 1959[NUM].[PUNCT]
(3) Tolkien[PROPN] was[AUX] a[DET] close[ADJ] friend[NOUN] of[ADP] C.[PROPN] S.[PROPN] Lewis[PROPN],[PUNCT] a[DET] co[NOUN]-[NOUN]member[NOUN] of[ADP] the[DET] informal[ADJ] literary[ADJ] discussion[NOUN] grou

In [19]:
from nltk import Tree

def tok_format(tok, mode="fine"):
    """
    mode:
      - "coarse" -> token[POS]
      - "fine"   -> token[TAG]
      - "dep"    -> token[DEP:POS]
    """
    if mode == "coarse":
        tag = tok.pos_
    elif mode == "dep":
        tag = f"{tok.dep_}:{tok.pos_}"
    else:  # "fine"
        tag = tok.tag_
    return f"{tok.text}[{tag}]"

def to_nltk_tree(node, mode="fine"):
    """Convert a spaCy token subtree rooted at `node` into an NLTK Tree."""
    if node.n_lefts + node.n_rights > 0:
        return Tree(
            tok_format(node, mode),
            [to_nltk_tree(child, mode=mode) for child in node.children]
        )
    else:
        return tok_format(node, mode)

# Build one tree per sentence for different labeling modes
coarse_trees = [to_nltk_tree(sent.root, mode="coarse") for sent in doc.sents]
fine_trees   = [to_nltk_tree(sent.root, mode="fine")   for sent in doc.sents]
dep_trees    = [to_nltk_tree(sent.root, mode="dep")    for sent in doc.sents]

# Pretty print (coarse POS)
for i, t in enumerate(coarse_trees, 1):
    print(f"\nSentence {i} — coarse POS:")
    t.pretty_print()

# If you also want the others, uncomment:
# for i, t in enumerate(fine_trees, 1):
#     print(f"\nSentence {i} — fine POS (PTB tag):")
#     t.pretty_print()
#
# for i, t in enumerate(dep_trees, 1):
#     print(f"\nSentence {i} — dependency label + POS:")
#     t.pretty_print()



Sentence 1 — coarse POS:
                                                                                     was[AUX]                                                                                                                                                             
    ____________________________________________________________________________________|____________________________________________________________________________________________________________________________________________________________      
   |           |                    |                                                                             Professor[PROPN]                                                                                                                   |    
   |           |                    |                    ________________________________________________________________|__________________________________________________________________________________                

In [20]:
from spacy import displacy

# Build spaCy docs for both rows in your DataFrame
docs = list(nlp.pipe(df['text']))

for doc_i, doc in enumerate(docs, start=1):
    for sent_i, sent in enumerate(doc.sents, start=1):
        print(f"\n=== DOC {doc_i} • Sentence {sent_i} ===")
        print('{:<15} | {:<10} | {:<15} | {}'.format('Token','Relation','Head','Children'))
        print('-'*90)
        for token in sent:
            children = ", ".join(child.text for child in token.children) or "-"
            print("{:<15} | {:<10} | {:<15} | {}"
                  .format(token.text, token.dep_, token.head.text, children))

        # Render the dependency parse for this sentence (Jupyter-friendly)
        displacy.render(sent, style='dep', jupyter=True, options={'distance': 120})



=== DOC 1 • Sentence 1 ===
Token           | Relation   | Head            | Children
------------------------------------------------------------------------------------------
From            | prep       | was             | 1925, to
1925            | pobj       | From            | -
to              | prep       | From            | 1945
1945            | pobj       | to              | -
,               | punct      | was             | -
Tolkien         | nsubj      | was             | -
was             | ROOT       | was             | From, ,, Tolkien, Professor, .
the             | det        | Rawlinson       | -
Rawlinson       | nmod       | Professor       | the, and, Bosworth
and             | cc         | Rawlinson       | -
Bosworth        | conj       | Rawlinson       | -
Professor       | attr       | was             | Rawlinson, of, and, Fellow, ,, at
of              | prep       | Professor       | Saxon
Anglo           | compound   | Saxon           | -
-               |


=== DOC 1 • Sentence 2 ===
Token           | Relation   | Head            | Children
------------------------------------------------------------------------------------------
He              | nsubj      | moved           | -
then            | advmod     | moved           | -
moved           | ROOT       | moved           | He, then, within, become, ,, and, held, .
within          | prep       | moved           | university
the             | det        | university      | -
same            | amod       | university      | -
university      | pobj       | within          | the, same
to              | aux        | become          | -
become          | advcl      | moved           | to, Professor
the             | det        | Professor       | -
Merton          | compound   | Professor       | -
Professor       | attr       | become          | the, Merton, of, of
of              | prep       | Professor       | Language
English         | compound   | Language        | -
Language       


=== DOC 1 • Sentence 3 ===
Token           | Relation   | Head            | Children
------------------------------------------------------------------------------------------
Tolkien         | nsubj      | was             | -
was             | ROOT       | was             | Tolkien, friend, .
a               | det        | friend          | -
close           | amod       | friend          | -
friend          | attr       | was             | a, close, of
of              | prep       | friend          | Lewis
C.              | compound   | Lewis           | -
S.              | compound   | Lewis           | -
Lewis           | pobj       | of              | C., S., ,, co, -, member
,               | punct      | Lewis           | -
a               | det        | co              | -
co              | appos      | Lewis           | a
-               | appos      | Lewis           | -
member          | appos      | Lewis           | of
of              | prep       | member          | grou


=== DOC 1 • Sentence 4 ===
Token           | Relation   | Head            | Children
------------------------------------------------------------------------------------------
He              | nsubjpass  | appointed       | -
was             | auxpass    | appointed       | -
appointed       | ROOT       | appointed       | He, was, Commander, on, .
a               | det        | Commander       | -
Commander       | oprd       | appointed       | a, of
of              | prep       | Commander       | Order
the             | det        | Order           | -
Order           | pobj       | of              | the, of
of              | prep       | Order           | Empire
the             | det        | Empire          | -
British         | compound   | Empire          | -
Empire          | pobj       | of              | the, British, by
by              | prep       | Empire          | II
Queen           | compound   | II              | -
Elizabeth       | compound   | II              | -


=== DOC 2 • Sentence 1 ===
Token           | Relation   | Head            | Children
------------------------------------------------------------------------------------------

               | dep        | From            | -
From            | prep       | was             | 
, 1925, to
1925            | pobj       | From            | -
to              | prep       | From            | 1945
1945            | pobj       | to              | -
,               | punct      | was             | -
Tolkien         | nsubj      | was             | -
was             | ROOT       | was             | From, ,, Tolkien, Professor, .
the             | det        | Rawlinson       | -
Rawlinson       | nmod       | Professor       | the, and, Bosworth
and             | cc         | Rawlinson       | -
Bosworth        | conj       | Rawlinson       | -
Professor       | attr       | was             | Rawlinson, of, and, Fellow, ,, at
of              | prep       | Professor       | Saxon
Anglo         


=== DOC 2 • Sentence 2 ===
Token           | Relation   | Head            | Children
------------------------------------------------------------------------------------------
He              | nsubj      | moved           | -
then            | advmod     | moved           | -
moved           | ROOT       | moved           | He, then, within, become, ,, and, held, .
within          | prep       | moved           | university
the             | det        | university      | -
same            | amod       | university      | -
university      | pobj       | within          | the, same
to              | aux        | become          | -
become          | advcl      | moved           | to, Professor
the             | det        | Professor       | -
Merton          | compound   | Professor       | -
Professor       | attr       | become          | the, Merton, of, of
of              | prep       | Professor       | Language
English         | compound   | Language        | -
Language       


=== DOC 2 • Sentence 3 ===
Token           | Relation   | Head            | Children
------------------------------------------------------------------------------------------
Tolkien         | nsubj      | was             | -
was             | ROOT       | was             | Tolkien, friend, .
a               | det        | friend          | -
close           | amod       | friend          | -
friend          | attr       | was             | a, close, of
of              | prep       | friend          | Lewis
C.              | compound   | Lewis           | -
S.              | compound   | Lewis           | -
Lewis           | pobj       | of              | C., S., ,, co, -, member
,               | punct      | Lewis           | -
a               | det        | co              | -
co              | appos      | Lewis           | a
-               | appos      | Lewis           | -
member          | appos      | Lewis           | of
of              | prep       | member          | grou


=== DOC 2 • Sentence 4 ===
Token           | Relation   | Head            | Children
------------------------------------------------------------------------------------------
He              | nsubjpass  | appointed       | -
was             | auxpass    | appointed       | -
appointed       | ROOT       | appointed       | He, was, Commander, on, .
a               | det        | Commander       | -
Commander       | oprd       | appointed       | a, of
of              | prep       | Commander       | Order
the             | det        | Order           | -
Order           | pobj       | of              | the, of
of              | prep       | Order           | Empire
the             | det        | Empire          | -
British         | compound   | Empire          | -
Empire          | pobj       | of              | the, British, by
by              | prep       | Empire          | II
Queen           | compound   | II              | -
Elizabeth       | compound   | II              | -