Created Date: 8 March 2019
SpaCy is an open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython. The library is published under the MIT license.
Today we’ll be talking about how to get started with NLP using Spacy. But before starting, make sure that you have Python and Spacy installed in your system.
To install Spacy and English Model:
sudo pip install spacy
python -m spacy download en
In spacy, the object “nlp” is used to create documents, access linguistic annotations and different nlp properties.
The default model which is english-core-web, for which we load the “en” model.
import spacy
nlp = spacy.load(“en”)
- WORD TOKENIZE
Tokenize words to get the tokens of the text i.e breaking the sentences into words. - SENTENCE TOKENIZE
Tokenize sentences if the there are more than 1 sentence i.e breaking the sentences to list of sentence. - STOP WORDS REMOVAL
Remove irrelevant words using nltk stop words like is,the,a etc from the sentences as they don’t carry any information. - Lemma
lemmatize the text so as to get its root form eg: functions,funtionality as function - Get word frequency
counting the word occurrence using FreqDist library. Word frequency helps us to determine how important the word is in the document by knowing how many times the word is being used. - POS tags
POS tag helps us to know the tags of each word like whether a word is noun, adjective etc. - NER
NER(Named Entity Recognition) is the process of getting the entity names
BLOG: https://medium.com/@pemagrg/nlp-for-beninners-using-spacy-6161cf48a229