# Spacy Wordnet

* [spacy-wordnet](https://spacy.io/universe/project/spacy-wordnet)

> spacy-wordnet creates annotations that easily allow the use of WordNet and [WordNet Domains](http://wndomains.fbk.eu/) by using the [NLTK WordNet interface](http://www.nltk.org/howto/wordnet.html).

* [PyPi spaCy WordNet](https://pypi.org/project/spacy-wordnet/)

> You also need to install the following NLTK wordnet data:
> ```
> python -m nltk.downloader wordnet
> python -m nltk.downloader omw
> ```

!python -m nltk.downloader wordnet
!python -m nltk.downloader omw

In [9]:
import pandas as pd
import spacy
from spacy.symbols import nsubj, dobj, iobj, VERB
import spacy_wordnet
from spacy_wordnet.wordnet_annotator import WordnetAnnotator 

In [2]:
def _to_text(tokens, sep=',') -> str:
    return sep.join(map(str, list(tokens)))

# Model

In [12]:
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("spacy_wordnet", after='tagger')

<spacy_wordnet.wordnet_annotator.WordnetAnnotator at 0x1038c5d90>

# Document

In [None]:
doc = nlp("Autonomous cars shift insurance liability toward manufacturers.")

# Finding Subject Noun Phrase

In [17]:
text = """Autonomous cars shifted insurance liability to manufacturers.
"""
doc = nlp(' '.join(text.split()))

subjects = []
for candidate in doc:
    if candidate.dep == nsubj and candidate.head.pos == VERB:
        subjects.append(candidate)

with doc.retokenize() as retokenizer:
    for subject in subjects:
        span = doc[subject.left_edge.i : subject.right_edge.i+1]
        retokenizer.merge(span)
        
for token in doc:
    print(f"{token.i:<4}{token.text:40}{token.pos_:7}{token.dep_}")
    print(token._.wordnet.wordnet_domains()[:5])

0   Autonomous cars                         NOUN   nsubj
['politics']
1   shifted                                 VERB   ROOT
['basketball', 'earth', 'computer_science', 'earth', 'railway']
2   insurance                               NOUN   compound
['book_keeping', 'money', 'finance', 'economy', 'tax']
3   liability                               NOUN   dobj
['finance', 'tax', 'money', 'book_keeping', 'finance']
4   to                                      ADP    prep
[]
5   manufacturers                           NOUN   pobj
['industry', 'economy', 'exchange', 'commerce', 'enterprise']
6   .                                       PUNCT  punct
[]


In [None]:
# WordNet object links spaCy token with NLTK WordNet interface by giving access to synsets and lemmas 
print(token._.wordnet.synsets())
print(token._.wordnet.lemmas())

# And automatically add info about WordNet domains
token._.wordnet.wordnet_domains()