In [None]:
import spacy
from spacy import displacy

NER = spacy.load("en_core_web_sm")

#### The spacy library,whic is commonly used for various NLP tasks such as tokenization, part-of-speech tagging,NER .
#### Here we have used displacy libray ,which is imported from spaCy library, displaCy is a built-in visualizer that helps you visualize dependency parse trees and entities predicted by machine learning models.It can perform POS tagging, and a dependency parse on text data.


In [None]:
##raw_text="The Indian Space research Organization or is the national space agency of India"
raw_text="""Generating random paragraphs can be an excellent way for writers to get their creative flow going at the beginning of the day.
         The writer has no idea what topic the random paragraph will be about when it appears. This forces the writer to use creativity to complete one of three common writing challenges.
         The writer can use the paragraph as the first one of a short story and build upon it. A second option is to use the random paragraph somewhere in a short story they create.
          The third option is to have the random paragraph be the ending paragraph in a short story.
          No matter which of these challenges is undertaken, the writer is forced to use creativity to incorporate the paragraph into their writing.
          The anem of Person vedant waghale is Studying in BCA 2 nd year 2024 ."""

In [None]:
text1=NER(raw_text)

#### To process some raw text that we want to analyze for named entities. It could be any text input, such as a sentence, a paragraph, or a document.It processes the text to identify entities.

In [None]:
for word in text1.ents:
  print(word.text,word.label_)

the beginning of the day DATE
one CARDINAL
three CARDINAL
first ORDINAL
second ORDINAL
third ORDINAL
BCA ORG
2 nd year 2024 DATE


#### Here we iterate over the named entity extracted from text1 ,it prints out the text of each named entity word.text, and prints each entity along with its label.

In [None]:
spacy.explain("CARDINAL")

'Numerals that do not fall under another type'

In [None]:
spacy.explain("ORDINAL")

'"first", "second", etc.'

In [None]:
spacy.explain("DATE")

'Absolute or relative dates or periods'

In [None]:
spacy.explain("GPE")

'Countries, cities, states'

In [None]:
spacy.explain("ORG")

'Companies, agencies, institutions, etc.'

#### The function spacy.explain is a utility function provided by the spaCy library . It is used to retrieve human-readable explanations or descriptions for the labels used in spaCy's annotations, such as part-of-speech tags, dependency labels, and NER labels.

In [None]:
displacy.render(text1,style="ent",jupyter=True)

#### Here the displacy.render function is used to render the visualization of  annotations produced by spaCy.Here, it will render the visualization of named entities. And it is a display of visualization that highlights the named entities identified in the text. Each named entity will be color-coded based on its entity type.

In [None]:
NER = spacy.load("en_core_web_sm")
doc=NER("This is the Best Time to  give the Gate Exam.")
displacy.render(doc,style="dep",jupyter=True)


#### It loads the spaCy english language model ,then it proccess the input  text given ,"This is the Best Time to give the Gate Exam." and stores it in NER variable , and it renders a dependency parse visualization for the processed text .

In [None]:
import nltk
from nltk.stem import PorterStemmer
nltk.download("punkt")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

Here we have imported nltk library for working with human language data. And Stemmer algorithm is used as it reduces words to their root or base form, which can help in various natural language processing tasks like text analysis and information retrieval.

In [None]:
ps=PorterStemmer()

example_words=["program","Programming","programmer","programs","programmed","multiply","studies","studied","study","edits","edited","edites","addict","addicted","addictes"]

print("{0:20}{1:20}".format("--Word--","--Stem--"))
for word in example_words:
  print("{0:20}{1:20}".format(word,ps.stem(word)))

--Word--            --Stem--            
program             program             
Programming         program             
programmer          programm            
programs            program             
programmed          program             
multiply            multipli            
studies             studi               
studied             studi               
study               studi               
edits               edit                
edited              edit                
edites              edit                
addict              addict              
addicted            addict              
addictes            addict              


Here we have porter stemmer algorithm to find root word ,and then we print it in the given format and  porter Stemmer algorithm reduces different inflected forms of words to their base form.

### Word Net Lemmatizer

In [None]:
from nltk.stem import WordNetLemmatizer
nltk.download("wordnet")
nltk.download("omw-1.4")

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


True

Here we mport the WordNetLemmatizer class from the stem module as,lemmatization is the process of reducing words to their base or dictionary form (known as lemma), which is helpful in tasks such as text normalization and information retrieval. ANd also download the WordNet corpus which is often used in lemmatization and other nlp tasks

In [None]:
wnl = WordNetLemmatizer()

example_words=["program","Programming","programmer","programs","programmed","multiply","studies","studied","study","edits","edited","edites","addict","addicted","addictes"]

print("{0:20}{1:20}".format("--Word--","--Lemma--"))
for word in example_words:
  print("{0:20}{1:20}".format(word,wnl.lemmatize(word,pos="v")))

--Word--            --Lemma--           
program             program             
Programming         Programming         
programmer          programmer          
programs            program             
programmed          program             
multiply            multiply            
studies             study               
studied             study               
study               study               
edits               edit                
edited              edit                
edites              edit                
addict              addict              
addicted            addict              
addictes            addict              


Here it shows the output where we have used the wordnet lemmatizer to find the lemma from the given words, and we print it in the format .And then we print each word with it's lemma and for that we use lemmatize() function.