<a href="https://colab.research.google.com/github/krakowiakpawel9/ml_course/blob/master/x/04_spacy/04_ner.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* @author: krakowiakpawel9@gmail.com  
* @site: e-smartdata.org

### spaCy
Strona biblioteki: [https://spacy.io/](https://spacy.io/)  

Podstawowa biblioteka do przetwarzania języka naturalnego w języku Python.

Aby zainstalować bibliotekę spaCy, użyj polecenia poniżej:
```
!pip install spacy
```
Aby zaktualizować do najnowszej wersji użyj polecenia poniżej:
```
!pip install --upgrade spacy
```
Kurs stworzony w oparciu o wersję `2.1.9`

### Spis treści:
1. [Import bibliotek](#0)
2. [Wygenerowanie danych](#1)



### <a name='0'></a> Import bibliotek

In [1]:
import spacy

nlp = spacy.load('en_core_web_sm')


doc = nlp('UiPath, a software maker valued last year at $7 billion, is getting closer to an initial public offering after helping some of the biggest companies in the U.S. automate routine processes.')
doc

UiPath, a software maker valued last year at $7 billion, is getting closer to an initial public offering after helping some of the biggest companies in the U.S. automate routine processes.

https://spacy.io/api/annotation#named-entities

In [2]:
for entity in doc.ents:
    print(entity)

UiPath
last year
$7 billion
U.S.


In [3]:
for entity in doc.ents:
    print(f'{entity.text.ljust(11)}: {entity.label_.ljust(5)}: {spacy.explain(entity.label_)}')

UiPath     : ORG  : Companies, agencies, institutions, etc.
last year  : DATE : Absolute or relative dates or periods
$7 billion : MONEY: Monetary values, including unit
U.S.       : GPE  : Countries, cities, states


In [4]:
from spacy import displacy

displacy.render(doc, style='ent', jupyter=True)

In [5]:
for ent in doc.ents:
    print(f'{ent.text.ljust(11)}:{ent.label_.ljust(6)}:{str(ent.label).ljust(4)}:{str(ent.start_char).ljust(4)}:{ent.end_char}')

UiPath     :ORG   :383 :0   :6
last year  :DATE  :391 :32  :41
$7 billion :MONEY :394 :45  :55
U.S.       :GPE   :384 :156 :160


In [6]:
type(doc.ents[0])

spacy.tokens.span.Span

Dodanie nowych encji

In [7]:
doc = nlp('e_smartdata: online courses - learn on your schedule.')
doc.ents

()

In [8]:
from spacy.tokens import Span

e_smartdata_ent = Span(doc, 0, 1, label='ORG')
doc.ents = list(doc.ents) + [e_smartdata_ent]

displacy.render(doc, style='ent', jupyter=True)

Dłuższy dokument

In [0]:
nlp = spacy.load('en_core_web_sm')
doc = nlp("Apple's chief executive Tim Cook said the company would open its first physical stores in India in 2021 and a online outlet later this year. " +
    "Apple had to seek special approval from the Indian government to open a store without a local partner. The announcement was made at the company's annual shareholders' meeting. " +
    "Investors at the meeting also voted on a proposal that the firm should alter how it responds when governments ask it to remove apps from its marketplace. " +
    "Though the measure wasn't approved, it failed by a slimmer margin then similar proposals in the past. " + 
    "Apple's move into India, the second largest smartphone market in the world, has been expected for some time, but the announcement of a date was new. " + 
    "In 2018 India changed the laws that prevented foreign brands from opening single-brand stores in the country. Nevertheless Mr Cook said India had wanted Apple to open its store with a local partner.")

In [10]:
displacy.render(doc, style='ent', jupyter=True)

In [11]:
doc.user_data["title"] = "Person Recognition"

displacy.render(doc, style='ent', jupyter=True, options={'ents': ['PERSON']})

In [12]:
displacy.render(doc, style='ent', jupyter=True, options={'ents': ['PERSON'], 'colors': {'PERSON': 'linear-gradient(90deg, #8e7cfc, #ffb0ee)'}})

In [13]:
doc.user_data["title"] = "Person and Organization Recognition"

options = {'ents': ['PERSON', 'ORG'],
           'colors': {'PERSON': 'linear-gradient(90deg, #8e7cfc, #ffb0ee)', 'ORG': 'linear-gradient(90deg, #68fcee, #99f788)'}}

displacy.render(doc, style='ent', jupyter=True, options=options)

In [14]:
ent_labels = set([ent.label_ for ent in doc.ents])
ent_labels

{'DATE', 'GPE', 'NORP', 'ORDINAL', 'ORG', 'PERSON'}

In [15]:
for label in ent_labels:
    print(f'{label.ljust(8)}:{spacy.explain(label)}')

ORDINAL :"first", "second", etc.
GPE     :Countries, cities, states
PERSON  :People, including fictional
NORP    :Nationalities or religious or political groups
ORG     :Companies, agencies, institutions, etc.
DATE    :Absolute or relative dates or periods


Niemiecki

In [16]:
!python -m spacy download de_core_news_md

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('de_core_news_md')


In [0]:
nlp = spacy.load('de_core_news_md')
doc = nlp("Der Blick auf den Bildschirm des Handys lenkt Fußgänger im Straßenverkehr nicht nur ab, er beeinflusst auch deutlich das Sehvermögen! " + 
    "Das ergab eine Studie am Fraunhofer-Institut. Wer im Gehen aufs Handy schaut, hat demnach etwa 20 Prozent weniger Sehschärfe als sonst.")

In [18]:
displacy.render(doc, style='ent', jupyter=True)

In [19]:
for label in ['LOC', 'ORG']:
    print(f'{label.ljust(6)}:{spacy.explain(label)}')

LOC   :Non-GPE locations, mountain ranges, bodies of water
ORG   :Companies, agencies, institutions, etc.


Hiszpański

In [20]:
!python -m spacy download es_core_news_md

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('es_core_news_md')


In [0]:
nlp = spacy.load('es_core_news_md')
doc = nlp("A Comissão Europeia decidiu apontar a app Signal como o serviço de comunicação recomendado para os seus membros trocarem mensagens, deixando de parte o WhatsApp.")

In [22]:
displacy.render(doc, style='ent', jupyter=True)