In [1]:
%load_ext lab_black

# Spacy example notebook

This notebook assumes that the container's base image is built with the image defined by the Dockerfile.

If the container is successfuly built and used as the base image for the notebook service, then you should be able to run the commands in this notebook.

In [2]:
import spacy
import numpy as np

In [3]:
nlp = spacy.load("en_core_web_md")

## Covert the sentence into word vectors

Spacy word vectors are similar in concept to FastText or Gensim. The vectors are stored in a static word vectors table. That makes it very fast to look up to look up a word vector from a token/word, but the vectors only model the lexical type of the word, rather than the context associated with a specific token in a sentence.

Transformer models such as BERT are usually used to understand language in context, but these models are more computationally expensive.

While word vectors are not compatible with most transformer models, the accuracy of other types of NLP networks can be improved by using them.

Source: https://spacy.io/usage/embeddings-transformers

In [10]:
doc = nlp("The quick brown fox jumped over the lazy dog.")

In [11]:
word_vectors = np.array([token.vector for token in doc if token.has_vector])
word_vectors

array([[-0.65276 ,  0.23873 , -0.23325 , ..., -0.42636 ,  0.48578 ,
        -0.28969 ],
       [-0.60053 ,  0.18838 , -0.40993 , ..., -0.27799 ,  0.31229 ,
        -0.28331 ],
       [-0.66906 , -0.35133 ,  0.08064 , ..., -0.63783 ,  0.31403 ,
         0.20384 ],
       ...,
       [-0.99977 , -0.1947  , -0.41958 , ...,  0.44761 ,  0.029474,
         0.22546 ],
       [-0.72483 ,  0.42538 ,  0.025489, ...,  0.050317, -0.23159 ,
         0.28165 ],
       [-0.73351 ,  0.41392 , -0.4425  , ..., -0.18777 , -0.076822,
        -0.015507]], dtype=float32)

## Render a dependency graph for a sentence

Dependency parsing is the process to analyze the grammatical structure of a sentence and find related words and the type of relationship between them.

Source: https://towardsdatascience.com/natural-language-processing-dependency-parsing-cf094bbbe3f7

In [6]:
doc2 = nlp("IBM Power10 servers are fast!")

In [7]:
spacy.displacy.render(doc2, style="dep")