<a href="https://colab.research.google.com/github/faizan1402/Natural_Language_Processing/blob/main/Parts_of_speech_and_Tagging_Entity_Recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Parts Of Speech**

In [None]:
# Section Goals
#Understand how to retrive ,parts of speech using spacy
# Understand how to use Named Entity Recognition with spacy
# Visualize POS and NER
#Perform Sentence Segmentation
#Most words are rare,and it's common for words that look completely different to mean almost the same thing
#The same words in a different order can mean something completely different
#Even splitting text into useful word like units can be difficult in many language 
# While it's possible to solve some problems starting from only the raw characters, it's usually  better to use linguistic knowledge to add useful information
# That's exactly what spacy is designed to do:you put in raw text and get back a Doc object,that comes with a variety of annotations.
# In this lecture we'll take a closer look at coares POS tages (noun,verb,adjective) and fine-granted tags(plural noun,past-tense verb,superlative adjective)


In [None]:
import spacy

In [None]:
nlp = spacy.load('en_core_web_sm')# loaded the englsh library

In [None]:
doc = nlp(u"The quick brown fox jumped over the lazy dog's back.")

In [None]:
print(doc.text)

The quick brown fox jumped over the lazy dog's back.


In [None]:
print(doc[4])

jumped


In [None]:
print(doc[4].pos_)

VERB


In [None]:
#Fine grained tag
print(doc[4].tag_)

VBD


In [None]:
print(doc[4].tag)#numerical id show tag

17109001835818727656


In [None]:
for token in doc:
  print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_)}") # so 10 is the space  find

The        DET        DT         determiner
quick      ADJ        JJ         adjective
brown      ADJ        JJ         adjective
fox        NOUN       NN         noun, singular or mass
jumped     VERB       VBD        verb, past tense
over       ADP        IN         conjunction, subordinating or preposition
the        DET        DT         determiner
lazy       ADJ        JJ         adjective
dog        NOUN       NN         noun, singular or mass
's         PART       POS        possessive ending
back       NOUN       NN         noun, singular or mass
.          PUNCT      .          punctuation mark, sentence closer


In [None]:
doc = nlp(u"He reads book on NLP." )

In [None]:
word = doc[1]

In [None]:
word.text

'reads'

In [None]:
 token  = word
 print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_)}")

reads      VERB       VBZ        verb, 3rd person singular present


In [None]:
#so spacy is very strong entity and detect the text ,parts of speech and tense

In [None]:
doc = nlp(u"The quick brown fox jumped  over the lazy dog's back.")
POS_counts = doc.count_by(spacy.attrs.POS)
POS_counts

{84: 3, 85: 1, 90: 2, 92: 3, 94: 1, 97: 1, 100: 1, 103: 1}

In [None]:
doc.vocab[85].text

'ADP'

In [None]:
doc[3].pos

92

In [None]:
for k,v in sorted(POS_counts.items()):#k-> key,v->value and key is sorted order
  print(f"{k}. {doc.vocab[k].text:{5}} {v}")

84. ADJ   3
85. ADP   1
90. DET   2
92. NOUN  3
94. PART  1
97. PUNCT 1
100. VERB  1
103. SPACE 1


In [None]:
TAG_counts = doc.count_by(spacy.attrs.TAG)# attrs means attribute

for k,v in sorted(TAG_counts.items()):#k-> key,v->value and key is sorted order
  print(f"{k}. {doc.vocab[k].text:{5}} {v}")

74. POS   1
1292078113972184607. IN    1
6893682062797376370. _SP   1
10554686591937588953. JJ    3
12646065887601541794. .     1
15267657372422890137. DT    2
15308085513773655218. NN    3
17109001835818727656. VBD   1


In [None]:
# How many length vocabulary
len(doc.vocab)

515

In [None]:
#DGP counts for syntactic dependencies
DEP_counts = doc.count_by(spacy.attrs.DEP)

for k,v in sorted(DEP_counts.items()):
  print(f"{k}. {doc.vocab[k].text:{5}} {v}")

0.       1
402. amod  3
415. det   2
429. nsubj 1
439. pobj  1
440. poss  1
443. prep  1
445. punct 1
8110129090154140942. case  1
8206900633647566924. ROOT  1


#**Visuaizing Parts of Speech**

In [None]:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp(u"The quick brown fox jumped over the lazy dog.")
from spacy import displacy
#options ={'distance':80,'compact':'True'}
#displacy.render(doc,style='dep',jupyter=True,options=options)
displacy.render(doc,style='dep',jupyter=True)# large visualize

In [None]:
options = {'distance':100,'compact':'True','color':'yellow','bg':'#09a3d5','font':'Times'}# bg for back ground color and yellow for text color
displacy.render(doc,style='dep',jupyter=True,options=options)#compact size

In [None]:
doc2 = nlp(u"This is a sentence.This is an other sentence,possibly longer than the other.")
#create list of spans
spans = list(doc2.sents)
displacy.serve(spans,style='dep',options={'distance':110})


Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


In [None]:
#how to check online web browser check visualise parts of speech  -> simple host 127.0.0.1:5000

#**Named Entity Recognition**

In [None]:
# Named-entity recognition (NER) seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names,organizations,locations,medical codes time expressions,quantities,monetary,values,percentages,etc.


In [None]:
#Our goal is to in raw text such as:
# Jim bought 300 shares of Acme Corp.in 2006
#And add additional NER information 
# [Jim]person bought 300 shares of [Acme Corp.]organization in [2006] Time


In [None]:
import spacy
nlp = spacy.load('en_core_web_sm')


In [None]:
def show_ents(doc):
  if doc.ents:
    for ent in doc.ents:
         print(ent.text + ' - '+ent.label_ +' -'+str(spacy.explain(ent.label_)))
    else:
      print('No entities found')

In [None]:
doc = nlp(u'Hi how are you?')
show_ents(doc)
#Output is > No entities found

In [None]:
doc = nlp(u"May I go ti Washington,DC next May to see the Washington Monument?")

In [None]:
show_ents(doc)
# show entity recgnition the particular documents information show


Washington - GPE -Countries, cities, states
next May - DATE -Absolute or relative dates or periods
the Washington Monument - ORG -Companies, agencies, institutions, etc.
No entities found


In [None]:
doc = nlp(u"Can I please have 500 dollars of Microsoft Stock?")
show_ents(doc)

500 dollars - MONEY -Monetary values, including unit
Microsoft Stock - ORG -Companies, agencies, institutions, etc.
No entities found


In [None]:
doc= nlp(u"Tesla to build a U.K. factory for $6 million")
show_ents(doc)

U.K. - GPE -Countries, cities, states
$6 million - MONEY -Monetary values, including unit
No entities found


In [None]:
doc = nlp("Einstin  was greate Scinetist and thousands of more inventions and  he was living in America country")
show_ents(doc)


Scinetist - ORG -Companies, agencies, institutions, etc.
thousands - CARDINAL -Numerals that do not fall under another type
America - GPE -Countries, cities, states
No entities found


In [None]:
from spacy.tokens import Span
ORG = doc.vocab.strings[u"ORG"]
ORG

383

In [None]:
# So has value of entity is->383
doc= nlp(u"Tesla to build a U.K. factory for $6 million")

In [None]:
#create a new entity 
new_ent = Span(doc,0,1,label=ORG)
#add the entity documents object doc entity
doc.ents = list(doc.ents) + [new_ent]
show_ents(doc)

Tesla - ORG -Companies, agencies, institutions, etc.
U.K. - GPE -Countries, cities, states
$6 million - MONEY -Monetary values, including unit
No entities found


#**Part-2 Entity Recognition**

In [None]:
doc = nlp(u"Our company created a brand new vaccum cleaner."
          u"This is new vaccum-cleaner is the best in show.")

In [None]:
show_ents(doc)

In [None]:
from spacy.matcher import PhraseMatcher
matcher = PhraseMatcher(nlp.vocab)
phrase_list = ['vacuum cleaner','vacuum-cleaner']
phrase_patterns = [nlp(text) for text in phrase_list]# list comprehensive

In [None]:
matcher.add('newproduct',None,*phrase_patterns)

In [None]:
found_matches = matcher(doc)
found_matches

[]

In [None]:
from spacy.tokens import Span
PROD = doc.vocab.strings[u"PRODUCT"]
found_matches

[]

In [None]:
new_ents =[Span(doc,match[1],match[2],label=PROD) for match in found_matches]

In [None]:
doc.ents = list(doc.ents) + new_ents
show_ents(doc)

In [None]:
doc = nlp(u"Orginally I paid $29.95 for this car toy,but now it is marked down")

In [None]:
# list comprehension
[ent for ent in doc.ents if ent.label_ ==  "MONEY"]
# Output : [29.95,10 dollars]
#suppose length find then len[ent for ent in doc.ents if ent.label_ === "MONEY"]
#output - 2

[29.95]