# Using off the shelf NLP tools with Spacy

This notebook shows how to use some off the shelf NLP models for doing stuff like identifying key phrases, named entities, part of speech tagging, etc.  


Based on: ["Applied Language Technology MOOC"](https://applied-language-technology.mooc.fi/html/notebooks/part_ii/03_basic_nlp.html)

Let us start with installing two useful text processing libraries - spacy and textacy. Textacy is built on top of spacy, to add a few more NLP functionalities.

In [None]:
!pip install spacy
!pip install textacy

# Spacy

I will start with Spacy first. To do anything with Spacy, you have to download the respective language model, which is already trained for performing several functions. I am downloading an English model. Full list for all supported languages is at: Full list at: https://spacy.io/models 

In [None]:
#Download the required Spacy model. 
import spacy.cli

#spacy.cli.download("en_core_web_trf")
#I won't run this as I already downloaded this model. 

In [None]:
#Now, load the spacy model. 
import spacy
nlp = spacy.load('en_core_web_trf')

We have to convert any given string into spacy's document format to use its functions. nlp that we defined in the above line, does that for us.

In [None]:
text = "Ludwig Maximilian University of Munich (also referred to as LMU or simply as the University of Munich; German: Ludwig-Maximilians-Universität München) is a public research university located in Munich, Germany, and is the country's sixth-oldest university in continuous operation."
doc = nlp(text)

In [None]:
#Now, let us look at Spacy's tokenization for this text:
for token in doc:
    print(token)

In [None]:
#Getting the part of speech tags for individual tokens
for token in doc:
    # Print the token and the POS tags
    print(token, token.pos_, token.tag_)

In [None]:
# Print the token and the results of morphological analysis
for token in doc:
    print(token, token.morph)

In [None]:
#Get the per token morphological information
#doc[7] is the word "referred" in our text
print(doc[7])
print(doc[7].morph.to_dict())

In [None]:
#View the syntactic parse tree of the sentence to see relations between words
from spacy import displacy
displacy.render(doc, style='dep', options={'compact': True})


In [None]:
# Loop over sentences in the Doc object and count them using enumerate()
# We have only one sentence in our doc, though. 
for number, sent in enumerate(doc.sents):    
    print(number, sent)

In [None]:
# Print the token and its lemma
for token in doc:
    print(token, token.lemma_)

In [None]:
# Loop over the named entities in the Doc object 
for ent in doc.ents:
    # Print the named entity and its label
    print(ent.text, ent.label_)

In [None]:
displacy.render(doc, style='ent')


In [None]:
# Get the noun chunks in the doc.
for item in doc.noun_chunks:
    print(item)

# Textacy

ref: https://textacy.readthedocs.io/en/latest/quickstart.html


In [None]:
#import it!
import textacy

In [None]:
text2 = """
Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.[2]

Deep-learning architectures such as deep neural networks, deep belief networks, deep reinforcement learning, recurrent neural networks, convolutional neural networks and Transformers have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, climate science, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance
"""

In [None]:
text2

In [None]:
#we still have to convert to a spacy doc, even when using textacy.
# so we have to load a spacy model first and then use it to convert.
en = textacy.load_spacy_lang("en_core_web_trf", disable=("parser",))
tdoc = textacy.make_spacy_doc(text2, en)

In [None]:
list(textacy.extract.ngrams(tdoc, 3, filter_stops=True, filter_punct=True, filter_nums=False))

## key phrase extraction

Textacy supports several key phrase extraction algorithms. 

In [None]:
from textacy.extract import keyterms as kt
kt.textrank(tdoc, normalize="lemma", topn=10)

check textacy's documentation to know more. It also supports a few other languages, and key phrase extraction could be a very useful function to know about.

In [None]:
You can explore a few other NLP models in the demos at: https://huggingface.co/