# spaCy

In [1]:
import spacy
nlp  = spacy.load('en_core_web_sm')



In [2]:
doc = nlp("""Coronaviruses are a large family of viruses that are actually common throughout the world and can cause respiratory illness in people and animals. There are several known coronaviruses that infect people and usually only cause mild respiratory disease, such as the common cold. However, at least two previously identified coronaviruses have caused severe illness — Severe Acute Respiratory Syndrome (SARS) coronavirus and Middle East Respiratory Syndrome (MERS) coronavirus. """)

In [3]:
doc

Coronaviruses are a large family of viruses that are actually common throughout the world and can cause respiratory illness in people and animals. There are several known coronaviruses that infect people and usually only cause mild respiratory disease, such as the common cold. However, at least two previously identified coronaviruses have caused severe illness — Severe Acute Respiratory Syndrome (SARS) coronavirus and Middle East Respiratory Syndrome (MERS) coronavirus. 

In [6]:
# tokenization
for token in doc:
    print(token.text)

Coronaviruses
are
a
large
family
of
viruses
that
are
actually
common
throughout
the
world
and
can
cause
respiratory
illness
in
people
and
animals
.
There
are
several
known
coronaviruses
that
infect
people
and
usually
only
cause
mild
respiratory
disease
,
such
as
the
common
cold
.
However
,
at
least
two
previously
identified
coronaviruses
have
caused
severe
illness
—
Severe
Acute
Respiratory
Syndrome
(
SARS
)
coronavirus
and
Middle
East
Respiratory
Syndrome
(
MERS
)
coronavirus
.


In [8]:
# lemmatization
for lemma in doc:
    print(lemma.text, lemma.lemma_)
    print()

Coronaviruses coronaviruse

are be

a a

large large

family family

of of

viruses virus

that that

are be

actually actually

common common

throughout throughout

the the

world world

and and

can can

cause cause

respiratory respiratory

illness illness

in in

people people

and and

animals animal

. .

There there

are be

several several

known know

coronaviruses coronaviruse

that that

infect infect

people people

and and

usually usually

only only

cause cause

mild mild

respiratory respiratory

disease disease

, ,

such such

as as

the the

common common

cold cold

. .

However however

, ,

at at

least least

two two

previously previously

identified identify

coronaviruses coronaviruse

have have

caused cause

severe severe

illness illness

— —

Severe Severe

Acute Acute

Respiratory Respiratory

Syndrome Syndrome

( (

SARS SARS

) )

coronavirus coronavirus

and and

Middle Middle

East East

Respiratory Respiratory

Syndrome Syndrome

( (

MERS MERS

) )

In [11]:
# part of speech
for pos in doc:
    print(pos.text+"----"+pos.pos_+"----"+pos.tag_)
    print()

Coronaviruses----NOUN----NNS

are----AUX----VBP

a----DET----DT

large----ADJ----JJ

family----NOUN----NN

of----ADP----IN

viruses----NOUN----NNS

that----PRON----WDT

are----AUX----VBP

actually----ADV----RB

common----ADJ----JJ

throughout----ADP----IN

the----DET----DT

world----NOUN----NN

and----CCONJ----CC

can----AUX----MD

cause----VERB----VB

respiratory----ADJ----JJ

illness----NOUN----NN

in----ADP----IN

people----NOUN----NNS

and----CCONJ----CC

animals----NOUN----NNS

.----PUNCT----.

There----PRON----EX

are----AUX----VBP

several----ADJ----JJ

known----VERB----VBN

coronaviruses----NOUN----NNS

that----PRON----WDT

infect----VERB----VBP

people----NOUN----NNS

and----CCONJ----CC

usually----ADV----RB

only----ADV----RB

cause----VERB----VBP

mild----PROPN----NNP

respiratory----PROPN----NNP

disease----NOUN----NN

,----PUNCT----,

such----ADJ----JJ

as----SCONJ----IN

the----DET----DT

common----ADJ----JJ

cold----NOUN----NN

.----PUNCT----.

However----ADV----RB

,---

In [12]:
# visualisation 
from spacy import displacy

doc = nlp("NLP is a real fun for life")
displacy.render(doc,style="dep")

In [18]:
# named entity recognition with spaCy
doc = nlp("I am Gunjan Paul, working as a software developer in company called LOVEONN in California")
for ent in doc.ents:
    print(ent.text +" --- "+ ent.label_)
displacy.render(doc,style='ent')

Gunjan Paul --- PERSON
LOVEONN --- ORG
California --- GPE
