# PosTagging and Named Entity Recognition (NER)

We consider some texts from QA SQuAD collection to annotate for its characterization with PosTagging and Named Entity Reconigtion (NER) open source frameworks: treetagger, Stanford CoreNLP, spacy, stanza

### Example texts

In [1]:
question_example = 'When was the Tower Theatre built?'
response_example = '1939'
context_example = 'The popular neighborhood known as the Tower District is centered around the historic Tower Theatre, which is included on the National List of Historic Places. The theater was built in 1939 and is at Olive and Wishon Avenues in the heart of the Tower District. (The name of the theater refers to a well-known landmark water tower, which is actually in another nearby area). The Tower District neighborhood is just north of downtown Fresno proper, and one-half mile south of Fresno City College. Although the neighborhood was known as a residential area prior, the early commercial establishments of the Tower District began with small shops and services that flocked to the area shortly after World War II. The character of small local businesses largely remains today. To some extent, the businesses of the Tower District were developed due to the proximity of the original Fresno Normal School, (later renamed California State University at Fresno). In 1916 the college moved to what is now the site of Fresno City College one-half mile north of the Tower District.'

In [2]:
amazon_context_example= "The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain 'Amazonas' in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."

In [3]:
beyonce_context= 'In August, the couple attended the 2011 MTV Video Music Awards, at which Beyoncé performed "Love on Top" and started the performance saying "Tonight I want you to stand up on your feet, I want you to feel the love that\'s growing inside of me". At the end of the performance, she dropped her microphone, unbuttoned her blazer and rubbed her stomach, confirming her pregnancy she had alluded to earlier in the evening. Her appearance helped that year\'s MTV Video Music Awards become the most-watched broadcast in MTV history, pulling in 12.4 million viewers; the announcement was listed in Guinness World Records for "most tweets per second recorded for a single event" on Twitter, receiving 8,868 tweets per second and "Beyonce pregnant" was the most Googled term the week of August 29, 2011.'

### PosTagging

#### TreeTagger

In [4]:
# https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
import treetaggerwrapper

  re.IGNORECASE | re.VERBOSE)
  re.VERBOSE | re.IGNORECASE)
  UrlMatch_re = re.compile(UrlMatch_expression, re.VERBOSE | re.IGNORECASE)
  EmailMatch_re = re.compile(EmailMatch_expression, re.VERBOSE | re.IGNORECASE)


In [5]:
tagger = treetaggerwrapper.TreeTagger(TAGLANG='en', TAGDIR='C:\\Users\\larao_000\\Documents\\nlp\\tree-tagger-windows-3.2.3\\TreeTagger\\')

In [6]:
def pos_tagging(text, max_length=1000):
    results = []
    for i in range(0, len(text), max_length):
        partial_text = text[i:i+max_length]
        tags = tagger.tag_text(partial_text)
        results += treetaggerwrapper.make_tags(tags)
    return results

In [7]:
%%time
pos_tagging(question_example)

Wall time: 575 ms


[Tag(word='When', pos='WRB', lemma='when'),
 Tag(word='was', pos='VBD', lemma='be'),
 Tag(word='the', pos='DT', lemma='the'),
 Tag(word='Tower', pos='NP', lemma='Tower'),
 Tag(word='Theatre', pos='NP', lemma='Theatre'),
 Tag(word='built', pos='VVD', lemma='build'),
 Tag(word='?', pos='SENT', lemma='?')]

In [8]:
%%time
pos_tagging(response_example)

Wall time: 2 ms


[Tag(word='1939', pos='LS', lemma='@card@')]

In [9]:
print(pos_tagging('Which name is also used to describe the Amazon rainforest in English?'))
print(pos_tagging('also known in English as Amazonia or the Amazon Jungle,'))

[Tag(word='Which', pos='WDT', lemma='which'), Tag(word='name', pos='NN', lemma='name'), Tag(word='is', pos='VBZ', lemma='be'), Tag(word='also', pos='RB', lemma='also'), Tag(word='used', pos='VVN', lemma='use'), Tag(word='to', pos='TO', lemma='to'), Tag(word='describe', pos='VV', lemma='describe'), Tag(word='the', pos='DT', lemma='the'), Tag(word='Amazon', pos='NP', lemma='Amazon'), Tag(word='rainforest', pos='NN', lemma='rainforest'), Tag(word='in', pos='IN', lemma='in'), Tag(word='English', pos='NP', lemma='English'), Tag(word='?', pos='SENT', lemma='?')]
[Tag(word='also', pos='RB', lemma='also'), Tag(word='known', pos='VVN', lemma='know'), Tag(word='in', pos='IN', lemma='in'), Tag(word='English', pos='NP', lemma='English'), Tag(word='as', pos='IN', lemma='as'), Tag(word='Amazonia', pos='NP', lemma='Amazonia'), Tag(word='or', pos='CC', lemma='or'), Tag(word='the', pos='DT', lemma='the'), Tag(word='Amazon', pos='NP', lemma='Amazon'), Tag(word='Jungle', pos='NN', lemma='jungle'), Tag(wo

In [10]:
print(pos_tagging('Jay Z and Beyonce attended which event together in August of 2011?'))
print(pos_tagging('MTV Video Music Awards'))

[Tag(word='Jay', pos='NP', lemma='Jay'), Tag(word='Z', pos='NP', lemma='Z'), Tag(word='and', pos='CC', lemma='and'), Tag(word='Beyonce', pos='NP', lemma='Beyonce'), Tag(word='attended', pos='VVN', lemma='attend'), Tag(word='which', pos='WDT', lemma='which'), Tag(word='event', pos='NN', lemma='event'), Tag(word='together', pos='RB', lemma='together'), Tag(word='in', pos='IN', lemma='in'), Tag(word='August', pos='NP', lemma='August'), Tag(word='of', pos='IN', lemma='of'), Tag(word='2011', pos='CD', lemma='@card@'), Tag(word='?', pos='SENT', lemma='?')]
[Tag(word='MTV', pos='NP', lemma='MTV'), Tag(word='Video', pos='NP', lemma='Video'), Tag(word='Music', pos='NP', lemma='Music'), Tag(word='Awards', pos='VVZ', lemma='award')]


In [11]:
%%time
pos_tagging(context_example)

Wall time: 17 ms


[Tag(word='The', pos='DT', lemma='the'),
 Tag(word='popular', pos='JJ', lemma='popular'),
 Tag(word='neighborhood', pos='NN', lemma='neighborhood'),
 Tag(word='known', pos='VVN', lemma='know'),
 Tag(word='as', pos='IN', lemma='as'),
 Tag(word='the', pos='DT', lemma='the'),
 Tag(word='Tower', pos='NP', lemma='Tower'),
 Tag(word='District', pos='NP', lemma='District'),
 Tag(word='is', pos='VBZ', lemma='be'),
 Tag(word='centered', pos='VVN', lemma='center'),
 Tag(word='around', pos='IN', lemma='around'),
 Tag(word='the', pos='DT', lemma='the'),
 Tag(word='historic', pos='JJ', lemma='historic'),
 Tag(word='Tower', pos='NP', lemma='Tower'),
 Tag(word='Theatre', pos='NP', lemma='Theatre'),
 Tag(word=',', pos=',', lemma=','),
 Tag(word='which', pos='WDT', lemma='which'),
 Tag(word='is', pos='VBZ', lemma='be'),
 Tag(word='included', pos='VVN', lemma='include'),
 Tag(word='on', pos='IN', lemma='on'),
 Tag(word='the', pos='DT', lemma='the'),
 Tag(word='National', pos='NP', lemma='National'),
 Ta

#### Spacy

In [12]:
import spacy
nlp_spacy = spacy.load("en_core_web_sm")

In [13]:
def pos_tagging_spacy(nlp, text):
    postags = []
    doc = nlp(text)
    for token in doc:
        postags.append((token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
            token.shape_, token.is_alpha, token.is_stop))
    return postags

In [14]:
%%time
pos_tagging_spacy(nlp_spacy, question_example)

Wall time: 31.3 ms


[('When', 'when', 'ADV', 'WRB', 'advmod', 'Xxxx', True, True),
 ('was', 'be', 'AUX', 'VBD', 'ROOT', 'xxx', True, True),
 ('the', 'the', 'DET', 'DT', 'det', 'xxx', True, True),
 ('Tower', 'Tower', 'PROPN', 'NNP', 'compound', 'Xxxxx', True, False),
 ('Theatre', 'Theatre', 'PROPN', 'NNP', 'nsubj', 'Xxxxx', True, False),
 ('built', 'build', 'VERB', 'VBD', 'ccomp', 'xxxx', True, False),
 ('?', '?', 'PUNCT', '.', 'punct', '?', False, False)]

In [15]:
%%time
pos_tagging_spacy(nlp_spacy, response_example)

Wall time: 8.01 ms


[('1939', '1939', 'NUM', 'CD', 'ROOT', 'dddd', False, False)]

In [16]:
%%time
pos_tagging_spacy(nlp_spacy, context_example)

Wall time: 88.3 ms


[('The', 'the', 'DET', 'DT', 'det', 'Xxx', True, True),
 ('popular', 'popular', 'ADJ', 'JJ', 'amod', 'xxxx', True, False),
 ('neighborhood',
  'neighborhood',
  'NOUN',
  'NN',
  'nsubjpass',
  'xxxx',
  True,
  False),
 ('known', 'know', 'VERB', 'VBN', 'acl', 'xxxx', True, False),
 ('as', 'as', 'SCONJ', 'IN', 'prep', 'xx', True, True),
 ('the', 'the', 'DET', 'DT', 'det', 'xxx', True, True),
 ('Tower', 'Tower', 'PROPN', 'NNP', 'compound', 'Xxxxx', True, False),
 ('District', 'District', 'PROPN', 'NNP', 'pobj', 'Xxxxx', True, False),
 ('is', 'be', 'AUX', 'VBZ', 'auxpass', 'xx', True, True),
 ('centered', 'center', 'VERB', 'VBN', 'ROOT', 'xxxx', True, False),
 ('around', 'around', 'ADP', 'IN', 'prep', 'xxxx', True, True),
 ('the', 'the', 'DET', 'DT', 'det', 'xxx', True, True),
 ('historic', 'historic', 'ADJ', 'JJ', 'amod', 'xxxx', True, False),
 ('Tower', 'Tower', 'PROPN', 'NNP', 'compound', 'Xxxxx', True, False),
 ('Theatre', 'Theatre', 'PROPN', 'NNP', 'pobj', 'Xxxxx', True, False),
 ('

#### Stanza

In [17]:
#!pip install stanza
import stanza
#stanza.download('en')

In [18]:
nlp = stanza.Pipeline('en')

2021-06-20 15:57:26 INFO: Loading these models for language: en (English):
| Processor | Package   |
-------------------------
| tokenize  | combined  |
| pos       | combined  |
| lemma     | combined  |
| depparse  | combined  |
| sentiment | sstplus   |
| ner       | ontonotes |

2021-06-20 15:57:26 INFO: Use device: cpu
2021-06-20 15:57:26 INFO: Loading: tokenize
2021-06-20 15:57:26 INFO: Loading: pos
2021-06-20 15:57:27 INFO: Loading: lemma
2021-06-20 15:57:27 INFO: Loading: depparse
2021-06-20 15:57:27 INFO: Loading: sentiment
2021-06-20 15:57:28 INFO: Loading: ner
2021-06-20 15:57:29 INFO: Done loading processors!


In [19]:
def pos_tagging_stanza(nlp, text):
    postags = []
    doc = nlp(text)
    for sent in doc.sentences:
        for token in sent.words:
            postags.append((token.text, token.upos, token.xpos, token.feats))
    return postags

In [20]:
%%time
pos_tagging_stanza(nlp, question_example)

Wall time: 245 ms


[('When', 'ADV', 'WRB', 'PronType=Int'),
 ('was',
  'AUX',
  'VBD',
  'Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin'),
 ('the', 'DET', 'DT', 'Definite=Def|PronType=Art'),
 ('Tower', 'NOUN', 'NN', 'Number=Sing'),
 ('Theatre', 'NOUN', 'NN', 'Number=Sing'),
 ('built', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part'),
 ('?', 'PUNCT', '.', None)]

In [21]:
%%time
pos_tagging_stanza(nlp, response_example)

Wall time: 84.1 ms


[('1939', 'NUM', 'CD', 'NumType=Card')]

In [22]:
%%time
pos_tagging_stanza(nlp, context_example)

Wall time: 3.05 s


[('The', 'DET', 'DT', 'Definite=Def|PronType=Art'),
 ('popular', 'ADJ', 'JJ', 'Degree=Pos'),
 ('neighborhood', 'NOUN', 'NN', 'Number=Sing'),
 ('known', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part'),
 ('as', 'SCONJ', 'IN', None),
 ('the', 'DET', 'DT', 'Definite=Def|PronType=Art'),
 ('Tower', 'PROPN', 'NNP', 'Number=Sing'),
 ('District', 'PROPN', 'NNP', 'Number=Sing'),
 ('is', 'AUX', 'VBZ', 'Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin'),
 ('centered', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass'),
 ('around', 'ADP', 'IN', None),
 ('the', 'DET', 'DT', 'Definite=Def|PronType=Art'),
 ('historic', 'ADJ', 'JJ', 'Degree=Pos'),
 ('Tower', 'PROPN', 'NNP', 'Number=Sing'),
 ('Theatre', 'PROPN', 'NNP', 'Number=Sing'),
 (',', 'PUNCT', ',', None),
 ('which', 'PRON', 'WDT', 'PronType=Rel'),
 ('is', 'AUX', 'VBZ', 'Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin'),
 ('included', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass'),
 ('on', 'ADP', 'IN', None),
 ('the', 'DET', 'DT', 

In [23]:
def ner_stanza(nlp, text):
    nertags = []
    doc = nlp(text)
    for token in doc.ents:
        nertags.append((token.text, token.type))
    return nertags

In [24]:
print(ner_stanza(nlp, 'Which name is also used to describe the Amazon rainforest in English?'))
print(ner_stanza(nlp, 'also known in English as Amazonia or the Amazon Jungle,'))

[('Amazon', 'LOC'), ('English', 'LANGUAGE')]
[('English', 'LANGUAGE'), ('Amazonia', 'LOC'), ('the Amazon Jungle', 'LOC')]


In [25]:
print(ner_stanza(nlp, 'Jay Z and Beyonce attended which event together in August of 2011?'))
print(ner_stanza(nlp, 'MTV Video Music Awards'))

[('Jay Z', 'PERSON'), ('Beyonce', 'PERSON'), ('August of 2011', 'DATE')]
[('MTV Video Music Awards', 'ORG')]


In [26]:
print(ner_stanza(nlp, question_example))
print(ner_stanza(nlp, response_example))

[('the Tower Theatre', 'FAC')]
[('1939', 'DATE')]


In [27]:
print(ner_stanza(nlp, context_example))

[('the Tower District', 'LOC'), ('Tower Theatre', 'FAC'), ('the National List of Historic Places', 'ORG'), ('1939', 'DATE'), ('Olive and Wishon Avenues', 'FAC'), ('the Tower District', 'LOC'), ('Tower District', 'LOC'), ('Fresno', 'GPE'), ('one-half mile', 'QUANTITY'), ('Fresno City College', 'FAC'), ('the Tower District', 'LOC'), ('World War II', 'EVENT'), ('today', 'DATE'), ('the Tower District', 'LOC'), ('Fresno Normal School', 'ORG'), ('California State University', 'ORG'), ('Fresno', 'GPE'), ('1916', 'DATE'), ('Fresno City College', 'ORG'), ('one-half mile', 'QUANTITY'), ('the Tower District', 'LOC')]


In [28]:
print(ner_stanza(nlp, amazon_context_example))

[('Amazon', 'LOC'), ('Portuguese', 'NORP'), ('Amazônia', 'GPE'), ('Selva Amazónica', 'LOC'), ('Amazonía', 'LOC'), ('Amazonia', 'LOC'), ('French', 'NORP'), ('Dutch', 'NORP'), ('English', 'LANGUAGE'), ('Amazonia', 'LOC'), ('the Amazon Jungle', 'LOC'), ('Amazon', 'LOC'), ('South America', 'LOC'), ('7,000,000 square kilometres', 'QUANTITY'), ('2,700,000 sq mi', 'QUANTITY'), ('5,500,000 square kilometres', 'QUANTITY'), ('2,100,000 sq mi', 'QUANTITY'), ('nine', 'CARDINAL'), ('Brazil', 'GPE'), ('60%', 'PERCENT'), ('Peru', 'GPE'), ('13%', 'PERCENT'), ('Colombia', 'GPE'), ('10%', 'PERCENT'), ('Venezuela', 'GPE'), ('Ecuador', 'GPE'), ('Bolivia', 'GPE'), ('Guyana', 'GPE'), ('Suriname', 'GPE'), ('French Guiana', 'GPE'), ('four', 'CARDINAL'), ('Amazonas', 'ORG'), ('Amazon', 'ORG'), ('over half', 'CARDINAL'), ('390 billion', 'CARDINAL'), ('16,000', 'CARDINAL')]


In [29]:
print(ner_stanza(nlp, beyonce_context))

[('August', 'DATE'), ('2011', 'DATE'), ('MTV Video Music Awards', 'EVENT'), ('Beyoncé', 'PERSON'), ('"Love on Top"', 'WORK_OF_ART'), ('Tonight', 'TIME'), ('earlier in the evening', 'TIME'), ('year', 'DATE'), ('MTV Video Music Awards', 'WORK_OF_ART'), ('MTV', 'ORG'), ('12.4 million', 'CARDINAL'), ('Guinness World Records', 'WORK_OF_ART'), ('second', 'ORDINAL'), ('Twitter', 'ORG'), ('8,868', 'CARDINAL'), ('"Beyonce pregnant"', 'WORK_OF_ART'), ('the week of August 29, 2011', 'DATE')]


In [30]:
print(ner_stanza(nlp, 'This poster of Madrid costs 3 euros during 3 hours with 5% of discount to first buyers'))

[('Madrid', 'GPE'), ('3 euros', 'MONEY'), ('3 hours', 'TIME'), ('5%', 'PERCENT'), ('first', 'ORDINAL')]


### Stanford Core NLP NER

In [31]:
#from stanfordnlp.server import CoreNLPClient
from stanfordcorenlp import StanfordCoreNLP

In [32]:
import re
def preprocess_text(text_str):
    regular_expr = re.compile('\n|\r|\t|\(|\)|\[|\]|:|\,|\;|"|\?|\-|\%')
    text_str = re.sub(regular_expr, ' ', text_str)
    token_list = text_str.split(' ')
    token_list = [element for element in token_list if element]
    return ' '.join(token_list)

In [33]:
def filter_ner_relevant(tuple_list):
    ner_dictionary = {}
    previous_ner = 'O'
    for element in tuple_list:
        if element[1] != 'O':
            if element[1] == previous_ner:
                ner_dictionary[element[1]][-1] += ' ' + element[0]
            elif element[1] in ner_dictionary.keys():
                ner_dictionary[element[1]].append(element[0])
            else:
                ner_dictionary[element[1]] = [element[0]]    
        previous_ner = element[1]
    return ner_dictionary

Start server with command: java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit
,pos,lemma,parse,ner,sentiment" -port 9000 -timeout 30000

In [34]:
# https://www.khalidalnajjar.com/setup-use-stanford-corenlp-server-python/
# https://stanfordnlp.github.io/CoreNLP/index.html#download
# https://stanfordnlp.github.io/stanfordnlp/corenlp_client.html
nlp = StanfordCoreNLP('http://localhost', port=9000, timeout=30000)

In [35]:
#filter_ner_relevant(nlp.ner(preprocess_text(question_example)))
nlp.ner(preprocess_text(question_example))

[('When', 'O'),
 ('was', 'O'),
 ('the', 'O'),
 ('Tower', 'O'),
 ('Theatre', 'O'),
 ('built', 'O')]

In [36]:
filter_ner_relevant(nlp.ner(preprocess_text(response_example)))

{'DATE': ['1939']}

In [39]:
#filter_ner_relevant(nlp.ner(preprocess_text(context_example)))

In [40]:
print(filter_ner_relevant(nlp.ner(preprocess_text('Which name is also used to describe the Amazon rainforest in English?'))))
print(filter_ner_relevant(nlp.ner(preprocess_text('also known in English as Amazonia or the Amazon Jungle,'))))

{'LOCATION': ['Amazon'], 'NATIONALITY': ['English']}
{'NATIONALITY': ['English'], 'LOCATION': ['Amazonia', 'Amazon Jungle']}


In [41]:
filter_ner_relevant(nlp.ner(preprocess_text(amazon_context_example)))

{'ORGANIZATION': ['Amazon', 'Amazon Jungle', 'Amazon'],
 'NATIONALITY': ['Portuguese',
  'Spanish',
  'French',
  'Dutch',
  'English',
  'French'],
 'PERSON': ['Selva Amazónica Amazonía', 'Amazoneregenwoud'],
 'LOCATION': ['Amazonia', 'Amazonia', 'Amazon', 'South'],
 'COUNTRY': ['America',
  'Brazil',
  'Peru',
  'Colombia',
  'Venezuela Ecuador Bolivia Guyana Suriname'],
 'NUMBER': ['7 000 000',
  '2 700 000',
  '5 500 000',
  '2 100 000',
  'nine',
  '60',
  '13',
  '10',
  'four',
  '390 billion',
  '16 000']}

In [42]:
print(filter_ner_relevant(nlp.ner(preprocess_text('Jay Z and Beyonce attended which event together in August of 2011?'))))
print(filter_ner_relevant(nlp.ner(preprocess_text('MTV Video Music Awards'))))

{'PERSON': ['Jay Z', 'Beyonce'], 'DATE': ['August of 2011']}
{'ORGANIZATION': ['MTV']}


In [43]:
filter_ner_relevant(nlp.ner(preprocess_text(beyonce_context)))

{'DATE': ['August', '2011', 'Tonight', 'the week of August 29 2011'],
 'PERSON': ['Beyoncé', 'Beyonce'],
 'TIME': ['evening'],
 'DURATION': ['year'],
 'ORGANIZATION': ['MTV', 'Twitter'],
 'NUMBER': ['12.4 million', '8 868'],
 'MISC': ['Guinness World Records', 'Googled'],
 'ORDINAL': ['second', 'second']}

In [None]:
#example_tosend = preprocess_text(example)
#result = nlp.ner(example_tosend)

In [None]:
#print(result)

In [None]:
#filter_ner_relevant(result)

## Spacy NER

In [44]:
import spacy
nlp_spacy = spacy.load("en_core_web_sm")

In [45]:
spacy.explain('FAC')

'Buildings, airports, highways, bridges, etc.'

In [46]:
spacy.displacy.render(nlp_spacy(context_example), style='ent', jupyter=True)

In [47]:
spacy.displacy.render(nlp_spacy(amazon_context_example), style='ent', jupyter=True)

In [48]:
spacy.displacy.render(nlp_spacy(beyonce_context), style='ent', jupyter=True)

In [49]:
# https://spacy.io/api/annotation#named-entities
# https://spacy.io/usage/linguistic-features#named-entities
# 'PERSON', 'NORP', 'FAC', 'ORG', 'GPE', 'LOC', 'PRODUCT', 'EVENT', 'WORK_OF_ART', 
# 'LAW', 'LANGUAGE', 'DATE', 'TIME', 'PERCENT', 'MONEY', 'QUANTITY', 'ORDINAL', 'CARDINAL'
def detect_entities(nlp, text, ner_tag):
    entities = []
    doc = nlp(text)
    for ent in doc.ents:
        if ent.label_ in ner_tag:
            entities.append(ent.text)
    return entities

In [52]:
result = detect_entities(nlp_spacy, context_example, ['PERSON', 'NORP', 'FAC', 'ORG', 'GPE', 'LOC', 'PRODUCT', 'EVENT', 'WORK_OF_ART', 'LAW', 'LANGUAGE', 'DATE', 'TIME', 'PERCENT', 'MONEY', 'QUANTITY', 'ORDINAL', 'CARDINAL'])
print(context_example)
print(result)

The popular neighborhood known as the Tower District is centered around the historic Tower Theatre, which is included on the National List of Historic Places. The theater was built in 1939 and is at Olive and Wishon Avenues in the heart of the Tower District. (The name of the theater refers to a well-known landmark water tower, which is actually in another nearby area). The Tower District neighborhood is just north of downtown Fresno proper, and one-half mile south of Fresno City College. Although the neighborhood was known as a residential area prior, the early commercial establishments of the Tower District began with small shops and services that flocked to the area shortly after World War II. The character of small local businesses largely remains today. To some extent, the businesses of the Tower District were developed due to the proximity of the original Fresno Normal School, (later renamed California State University at Fresno). In 1916 the college moved to what is now the site

In [51]:
people_entities = detect_entities(nlp_spacy, context_example, 'PERSON')
print('PERSON: ' + str(people_entities))
norp_entities = detect_entities(nlp_spacy, context_example, 'NORP')
print('NORP: ' + str(norp_entities))
fac_entities = detect_entities(nlp_spacy, context_example, 'FAC')
print('FAC: ' + str(fac_entities))
org_entities = detect_entities(nlp_spacy, context_example, 'ORG')
print('ORG: ' + str(org_entities))
gpe_entities = detect_entities(nlp_spacy, context_example, 'GPE')
print('GPE: ' + str(gpe_entities))
loc_entities = detect_entities(nlp_spacy, context_example, 'LOC')
print('LOC: ' + str(loc_entities))
product_entities = detect_entities(nlp_spacy, context_example, 'PRODUCT')
print('PRODUCT: ' + str(product_entities))
event_entities = detect_entities(nlp_spacy, context_example, 'EVENT')
print('EVENT: ' + str(event_entities))
workofart_entities = detect_entities(nlp_spacy, context_example, 'WORK_OF_ART')
print('WORK_OF_ART: ' + str(workofart_entities))
lang_entities = detect_entities(nlp_spacy, context_example, 'LANGUAGE')
print('LANGUAGE: ' + str(lang_entities))
date_entities = detect_entities(nlp_spacy, context_example, 'DATE')
print('DATE: ' + str(date_entities))
time_entities = detect_entities(nlp_spacy, context_example, 'TIME')
print('TIME: ' + str(time_entities))
percent_entities = detect_entities(nlp_spacy, context_example, 'PERCENT')
print('PERCENT: ' + str(percent_entities))
money_entities = detect_entities(nlp_spacy, context_example, 'MONEY')
print('MONEY: ' + str(money_entities))
quantity_entities = detect_entities(nlp_spacy, context_example, 'QUANTITY')
print('QUANTITY: ' + str(quantity_entities))
cardinal_entities = detect_entities(nlp_spacy, context_example, 'CARDINAL')
print('CARDINAL: ' + str(cardinal_entities))
ordinal_entities = detect_entities(nlp_spacy, context_example, 'ORDINAL')
print('ORDINAL: ' + str(ordinal_entities))

PERSON: []
NORP: []
FAC: ['Tower Theatre', 'the Tower District', 'The Tower District']
ORG: ['the National List of Historic Places', 'Fresno City College', 'Fresno Normal School', 'California State University at Fresno', 'Fresno City College']
GPE: []
LOC: ['the Tower District', 'the Tower District', 'the Tower District', 'the Tower District']
PRODUCT: []
EVENT: ['Fresno', 'World War II']
WORK_OF_ART: []
LANGUAGE: []
DATE: ['1939', 'today', '1916']
TIME: []
PERCENT: []
MONEY: []
QUANTITY: ['one-half mile', 'one-half mile']
CARDINAL: []
ORDINAL: []
