### Parts of Speech Tagging

In [21]:
import spacy
nlp = spacy.load("en_core_web_sm")


In [22]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [23]:
doc = nlp("Elon flew to mars in a SpaceX rocket yesterday.he carried a bag of money with him.")
for token in doc:
    print(token,  " | ", token.pos_, " | ", spacy.explain(token.pos_))

Elon  |  PROPN  |  proper noun
flew  |  VERB  |  verb
to  |  PART  |  particle
mars  |  NOUN  |  noun
in  |  ADP  |  adposition
a  |  DET  |  determiner
SpaceX  |  PROPN  |  proper noun
rocket  |  NOUN  |  noun
yesterday.he  |  NUM  |  numeral
carried  |  VERB  |  verb
a  |  DET  |  determiner
bag  |  NOUN  |  noun
of  |  ADP  |  adposition
money  |  NOUN  |  noun
with  |  ADP  |  adposition
him  |  PRON  |  pronoun
.  |  PUNCT  |  punctuation


In [24]:
doc = nlp("Wow! Dr. Strange made 250 million dollars in a single day, that's mind-blowing. He is the best superhero ever!")

for token in doc:
    print(token, " | ", token.pos_, " | ", spacy.explain(token.pos_),
    token.tag_, " | ", spacy.explain(token.tag_))

Wow  |  INTJ  |  interjection UH  |  interjection
!  |  PUNCT  |  punctuation .  |  punctuation mark, sentence closer
Dr.  |  PROPN  |  proper noun NNP  |  noun, proper singular
Strange  |  PROPN  |  proper noun NNP  |  noun, proper singular
made  |  VERB  |  verb VBD  |  verb, past tense
250  |  NUM  |  numeral CD  |  cardinal number
million  |  NUM  |  numeral CD  |  cardinal number
dollars  |  NOUN  |  noun NNS  |  noun, plural
in  |  ADP  |  adposition IN  |  conjunction, subordinating or preposition
a  |  DET  |  determiner DT  |  determiner
single  |  ADJ  |  adjective JJ  |  adjective (English), other noun-modifier (Chinese)
day  |  NOUN  |  noun NN  |  noun, singular or mass
,  |  PUNCT  |  punctuation ,  |  punctuation mark, comma
that  |  PRON  |  pronoun DT  |  determiner
's  |  AUX  |  auxiliary VBZ  |  verb, 3rd person singular present
mind  |  NOUN  |  noun NN  |  noun, singular or mass
-  |  PUNCT  |  punctuation HYPH  |  punctuation mark, hyphen
blowing  |  VERB  |  ver

In [25]:
doc = nlp("he quit the job.")
doc[1]

print(doc[1].text, " | ", doc[1].tag_ ," | ", spacy.explain(doc[1].tag_))

quit  |  VBD  |  verb, past tense


In [26]:
earnings_text="""Microsoft Corp. today announced the following results for the quarter ended December 31, 2021, as compared to the corresponding period of last fiscal year:

·         Revenue was $51.7 billion and increased 20%
·         Operating income was $22.2 billion and increased 24%
·         Net income was $18.8 billion and increased 21%
·         Diluted earnings per share was $2.48 and increased 22%
“Digital technology is the most malleable resource at the world’s disposal to overcome constraints and reimagine everyday work and life,” said Satya Nadella, chairman and chief executive officer of Microsoft. “As tech as a percentage of global GDP continues to increase, we are innovating and investing across diverse and growing markets, with a common underlying technology stack and an operating model that reinforces a common strategy, culture, and sense of purpose.”
“Solid commercial execution, represented by strong bookings growth driven by long-term Azure commitments, increased Microsoft Cloud revenue to $22.1 billion, up 32% year over year” said Amy Hood, executive vice president and chief financial officer of Microsoft."""



### Removing extra unwanted data form the above text

In [31]:
doc = nlp(earnings_text)
filtered_tokens = []
for token in doc:
    if token.pos_ not in["SPACE", "x", "PUNCT"]:
        filtered_tokens.append(token)

In [33]:
filtered_tokens[:20]

[Microsoft,
 Corp.,
 today,
 announced,
 the,
 following,
 results,
 for,
 the,
 quarter,
 ended,
 December,
 31,
 2021,
 as,
 compared,
 to,
 the,
 corresponding,
 period]

In [35]:
count = doc.count_by(spacy.attrs.POS)
count


{96: 15,
 92: 45,
 100: 23,
 90: 9,
 85: 16,
 93: 16,
 97: 27,
 98: 1,
 84: 20,
 103: 10,
 87: 6,
 99: 5,
 89: 12,
 86: 3,
 94: 3,
 95: 2}

In [40]:
doc.vocab[96].text


'PROPN'

In [45]:
# load all the words in the vocab using for loop
for k, v in count.items():
    print(doc.vocab[k].text, " | ", v)

PROPN  |  15
NOUN  |  45
VERB  |  23
DET  |  9
ADP  |  16
NUM  |  16
PUNCT  |  27
SCONJ  |  1
ADJ  |  20
SPACE  |  10
AUX  |  6
SYM  |  5
CCONJ  |  12
ADV  |  3
PART  |  3
PRON  |  2


'SYM'