<a href="https://colab.research.google.com/github/isegura/BasicNLP/blob/master/IntroNER_spacy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Named Entity Recognition by using Spacy


Named entity recognition (NER)is probably the first step towards information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. NER is used in many fields in Natural Language Processing (NLP).

In this notebook we will see how Spacy can deal with this task.

First, we must install Spacy.


In [14]:
!pip install spacy
!python -m spacy download en



[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
[38;5;2m✔ Linking successful[0m
/usr/local/lib/python3.6/dist-packages/en_core_web_sm -->
/usr/local/lib/python3.6/dist-packages/spacy/data/en
You can now load the model via spacy.load('en')


We can process a text and show its entities: 

In [23]:
article = '''
Asian shares skidded on Tuesday after a rout in tech stocks put Wall Street to the sword, while a 
sharp drop in oil prices and political risks in Europe pushed the dollar to 16-month highs as investors dumped 
riskier assets. MSCI’s broadest index of Asia-Pacific shares outside Japan dropped 1.7 percent to a 1-1/2 
week trough, with Australian shares sinking 1.6 percent. Japan’s Nikkei dived 3.1 percent led by losses in 
electric machinery makers and suppliers of Apple’s iphone parts. Sterling fell to $1.286 after three straight 
sessions of losses took it to the lowest since Nov.1 as there were still considerable unresolved issues with the
European Union over Brexit, British Prime Minister Theresa May said on Monday.'''

import spacy

nlp = spacy.load('en')
document = nlp(article)

print('Original Sentence: {}'.format(article))
print()

for entity in document.ents:
    print('Type: {}, Value: {}, star: {}, end: {}'.format(entity.label_, entity.text,entity.start_char, entity.end_char))


Original Sentence: 
Asian shares skidded on Tuesday after a rout in tech stocks put Wall Street to the sword, while a 
sharp drop in oil prices and political risks in Europe pushed the dollar to 16-month highs as investors dumped 
riskier assets. MSCI’s broadest index of Asia-Pacific shares outside Japan dropped 1.7 percent to a 1-1/2 
week trough, with Australian shares sinking 1.6 percent. Japan’s Nikkei dived 3.1 percent led by losses in 
electric machinery makers and suppliers of Apple’s iphone parts. Sterling fell to $1.286 after three straight 
sessions of losses took it to the lowest since Nov.1 as there were still considerable unresolved issues with the
European Union over Brexit, British Prime Minister Theresa May said on Monday.

Type: NORP, Value: Asian, star: 1, end: 6
Type: DATE, Value: Tuesday, star: 25, end: 32
Type: LOC, Value: Europe, star: 148, end: 154
Type: ORG, Value: MSCI, star: 228, end: 232
Type: LOC, Value: Asia-Pacific, star: 253, end: 265
Type: GPE, Value: Ja

In [24]:
from spacy import displacy

displacy.render(spacy_nlp(str(article)), jupyter=True, style='ent')


Spacy also allows us to recognize named entities in Spanish.

In [25]:
!python -m spacy download es


[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('es_core_news_sm')
[38;5;2m✔ Linking successful[0m
/usr/local/lib/python3.6/dist-packages/es_core_news_sm -->
/usr/local/lib/python3.6/dist-packages/spacy/data/es
You can now load the model via spacy.load('es')


In [28]:
article = '''
Junts per Catalunya opta ahora por no poner palos en las ruedas para que Esquerra facilite la investidura de Pedro Sánchez. 
La formación que lidera Carles Puigdemont —a la espera de que la justicia belga decida sobre su extradición— anunció este 
martes que retira una moción sobre la autodeterminación, que tenía que ser votada hoy miércoles en el Parlament y que ponía a ERC
 en una situación comprometida. La decisión, que generó mucho debate interno, se gestó en la reunión que tuvieron 
 el expresident y varios cargos electos de Junts, el pasado lunes en Bélgica..'''

import spacy

nlp = spacy.load('es')
document = nlp(article)

print('Original Sentence: %s' % (article))

for entity in document.ents:
    print('Type: {}, Value: {}, star: {}, end: {}'.format(entity.label_, entity.text,entity.start_char, entity.end_char))


Original Sentence: 
Junts per Catalunya opta ahora por no poner palos en las ruedas para que Esquerra facilite la investidura de Pedro Sánchez. 
La formación que lidera Carles Puigdemont —a la espera de que la justicia belga decida sobre su extradición— anunció este 
martes que retira una moción sobre la autodeterminación, que tenía que ser votada hoy miércoles en el Parlament y que ponía a ERC
 en una situación comprometida. La decisión, que generó mucho debate interno, se gestó en la reunión que tuvieron 
 el expresident y varios cargos electos de Junts, el pasado lunes en Bélgica..
Type: PER, Value: Junts per Catalunya, star: 1, end: 20
Type: ORG, Value: Esquerra, star: 74, end: 82
Type: PER, Value: Pedro Sánchez, star: 110, end: 123
Type: MISC, Value: La formación, star: 126, end: 138
Type: PER, Value: Carles Puigdemont, star: 150, end: 167
Type: MISC, Value: Parlament, star: 351, end: 360
Type: MISC, Value: ERC
 , star: 375, end: 380
Type: MISC, Value: La decisión, star: 411, end:

In [29]:
displacy.render(nlp(str(article)), jupyter=True, style='ent')
