## SpaCy 

+ Библиотека для продвинутого NLP
+ Ряд языков, английский, китайский, немецкий, французский, итальянский, польский, испанский и др., разрабатываются модели для всё новых языков
+ Про spaCy: https://spacy.io/usage

+ Установка для английского:
```
pip install spacy
python -m spacy download en_core_web_sm
```

In [1]:
import spacy

nlp = spacy.load("en_core_web_sm")

### Функционал:

1. Part-of-speech tagging:

https://spacy.io/api/annotation#pos-tagging

In [2]:
d = nlp("Colorless green ideas sleep furiously")
for token in d:
    print(token.text, token.pos_)

Colorless ADP
green ADJ
ideas NOUN
sleep VERB
furiously ADV


2. Лемматизация и токенизация:

In [3]:
d = nlp("Colorless green ideas sleep furiously")
for token in d:
    print(token.text, token.lemma_)

Colorless colorless
green green
ideas idea
sleep sleep
furiously furiously


In [5]:
d = nlp("To be, or not to be, that is the question: Whether 'tis nobler in the mind to suffer")
for token in d:
    print(token.text, token.lemma_)

To to
be be
, ,
or or
not not
to to
be be
, ,
that that
is be
the the
question question
: :
Whether whether
' '
tis tis
nobler nobler
in in
the the
mind mind
to to
suffer suffer


3. Синтаксис: дерево зависимостей

In [6]:
d = nlp("Colorless green ideas sleep furiously")
for token in d:
    print(token.text, token.lemma_, token.pos_, token.dep_)

Colorless colorless ADP amod
green green ADJ amod
ideas idea NOUN compound
sleep sleep VERB ROOT
furiously furiously ADV advmod


Навигация по дереву: используется терминология "head" (вершина) и "child" (зависимое)

In [7]:
for token in d:
    print(token.text, token.dep_, token.head.text, token.head.pos_,
            [child for child in token.children])

Colorless amod sleep VERB []
green amod ideas NOUN []
ideas compound sleep VERB [green]
sleep ROOT sleep VERB [Colorless, ideas, furiously]
furiously advmod sleep VERB []


In [11]:
d = nlp("If a farmer owns a donkey, he beats it.")
for token in d:
    print(token.text, token.dep_, token.head.text, token.head.pos_,
            [child for child in token.children])

If mark owns VERB []
a det farmer NOUN []
farmer nsubj owns VERB [a]
owns advcl beats VERB [If, farmer, donkey]
a det donkey NOUN []
donkey dobj owns VERB [a]
, punct beats VERB []
he nsubj beats VERB []
beats ROOT beats VERB [owns, ,, he, it, .]
it dobj beats VERB []
. punct beats VERB []


Найдём детей "beats" справа и слева:

In [18]:
d = nlp("If a farmer owns a donkey, he beats it.")
print([token.text for token in d[8].lefts]) 
print([token.text for token in d[8].rights])  
print(d[8].n_lefts)  
print(d[8].n_rights)  

['owns', ',', 'he']
['it', '.']
3
2


Теперь найдём тип зависимости, детей слева, детей справа, родителей:

In [19]:
root = [token for token in d if token.head == token][0]
subject = list(root.lefts)[0]
for descendant in subject.subtree:
    assert subject is descendant or subject.is_ancestor(descendant)
    print(descendant.text, descendant.dep_, descendant.n_lefts,
            descendant.n_rights,
            [ancestor.text for ancestor in descendant.ancestors])

If mark 0 0 ['owns', 'beats']
a det 0 0 ['farmer', 'owns', 'beats']
farmer nsubj 1 0 ['owns', 'beats']
owns advcl 2 1 ['beats']
a det 0 0 ['donkey', 'owns', 'beats']
donkey dobj 1 0 ['owns', 'beats']


Распечатаем часть речи, тип зависимости, вершину:

In [20]:
d = nlp("If a farmer owns a donkey, he beats it.")
for token in d:
    print(token.text, token.pos_, token.dep_, token.head.text)

If SCONJ mark owns
a DET det farmer
farmer NOUN nsubj owns
owns VERB advcl beats
a DET det donkey
donkey NOUN dobj owns
, PUNCT punct beats
he PRON nsubj beats
beats VERB ROOT beats
it PRON dobj beats
. PUNCT punct beats


Визуализируем дерево:

In [22]:
from spacy import displacy

In [24]:
d = nlp("If a farmer owns a donkey, he beats it.")

displacy.render(d, style='dep')

#### Задание:

0. Протестируем SpaCy и освежим наши познания в синтаксисе.
1. Дадим на разбор предложение с синтаксической неоднозначностью. Какой анализ предлагает SpaCy?
    + John saw the man on the mountain with a telescope.
    + I'm glad I'm a man, and so is Lola.
    + больше примеров: https://en.wikipedia.org/wiki/Syntactic_ambiguity
2. Дадим примеры на острова (приведены ниже с традиционной оценкой их приемлемости). Какой анализ предлагает SpaCy?

    + What_i does Sarah believe that Susan thinks that John bought __i?
    + Complex NPs

        - \*Who_i did Mary see the report that was about \_i?
        (cf. Mary saw the report that was about the senator.)

        - ?\*Which sportscar_i did the color of _i delight the baseball player?
        (cf. The color of the sportscar delighted the baseball player.)

    + Complements of manner-of-speaking verbs
        - ??What_i did Mary whisper that Bill liked _i?
        (cf. Mary whispered that Bill liked wine.)

    + Complex NPs (both noun complements and relative clauses)
        - \*The senator who_i Mary saw the report that was about _i bothered many people on the committee.

    + Subject islands
        - ?\* The sportscar which_i the color of _i delighted the baseball player…

    + Complements of manner-of-speaking verbs
        - ?The wine which_i Mary whispered that Bill liked _i was a Cabernet.

3. Справляется ли SpaCy с эллипсисом:
    + John can play the guitar; Mary can, too.

In [25]:
d = nlp("I'm glad I'm a man, and so is Lola.")

displacy.render(d, style='dep')

In [26]:
d = nlp("Who did Mary see the report that was about?")

displacy.render(d, style='dep')

In [27]:
d = nlp("John can play the guitar; Mary can, too.")

displacy.render(d, style='dep')