Tarefas diversas de Processamento de Linguagem Natural

Passo 1: Instalando a biblioteca e recarregando o ambiente.


In [1]:
!pip install spacy

import pkg_resources,imp

imp.reload(pkg_resources)



<module 'pkg_resources' from '/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py'>

Passo 2: Baixando o modelo a ser utilizado e recarregando o ambiente.

In [2]:
import spacy.cli

spacy.cli.download("en_core_web_trf")

import pkg_resources,imp

imp.reload(pkg_resources)

[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_trf')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


<module 'pkg_resources' from '/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py'>

Passo 3: Importando a biblioteca e definindo o modelo a ser utilizado


In [1]:
import spacy

nlp = spacy.load('en_core_web_sm')

Passo 4: Definindo o texto a ser utilizado nas tarefas de NLP

In [2]:
text = """National Park Week starts on Saturday, and it
also starts off with a bang-for-your-buck.

That’s because every US National Park Service site will
have free entry on Saturday. NPS manages almost 430
sites, and the majority of them already offer free entry
every day.

But this is your chance to get into the coveted, big-name
national parks and other sites without paying a fee.

That includes legendary parks such as Yosemite, which
normally has an entry fee of $20 per person or $35 per
vehicle.

(Note: You’ll still need a reservation to drive into Yosemite
on weekends and holidays from April 13 to June 30.)

"""

doc = nlp(text)

Passo 5: Realizando a tokenização do texto

In [3]:
for token in doc:

    print(token)

National
Park
Week
starts
on
Saturday
,
and
it


also
starts
off
with
a
bang
-
for
-
your
-
buck
.



That
’s
because
every
US
National
Park
Service
site
will


have
free
entry
on
Saturday
.
NPS
manages
almost
430


sites
,
and
the
majority
of
them
already
offer
free
entry


every
day
.



But
this
is
your
chance
to
get
into
the
coveted
,
big
-
name


national
parks
and
other
sites
without
paying
a
fee
.



That
includes
legendary
parks
such
as
Yosemite
,
which


normally
has
an
entry
fee
of
$
20
per
person
or
$
35
per


vehicle
.



(
Note
:
You
’ll
still
need
a
reservation
to
drive
into
Yosemite


on
weekends
and
holidays
from
April
13
to
June
30
.
)





Passo 6: Obtendo as tags de classe gramatical para os tokens individuais.

In [4]:
for token in doc:

    # Print the token and the POS tags

    print(token, token.pos_, token.tag_)

National PROPN NNP
Park PROPN NNP
Week PROPN NNP
starts VERB VBZ
on ADP IN
Saturday PROPN NNP
, PUNCT ,
and CCONJ CC
it PRON PRP

 SPACE _SP
also ADV RB
starts VERB VBZ
off ADP RP
with ADP IN
a DET DT
bang NOUN NN
- PUNCT HYPH
for ADP IN
- PUNCT HYPH
your PRON PRP$
- PUNCT HYPH
buck NOUN NN
. PUNCT .


 SPACE _SP
That PRON DT
’s VERB VBZ
because SCONJ IN
every DET DT
US PROPN NNP
National PROPN NNP
Park PROPN NNP
Service PROPN NNP
site NOUN NN
will AUX MD

 SPACE _SP
have VERB VB
free ADJ JJ
entry NOUN NN
on ADP IN
Saturday PROPN NNP
. PUNCT .
NPS PROPN NNP
manages VERB VBZ
almost ADV RB
430 NUM CD

 SPACE _SP
sites NOUN NNS
, PUNCT ,
and CCONJ CC
the DET DT
majority NOUN NN
of ADP IN
them PRON PRP
already ADV RB
offer VERB VBP
free ADJ JJ
entry NOUN NN

 SPACE _SP
every DET DT
day NOUN NN
. PUNCT .


 SPACE _SP
But CCONJ CC
this PRON DT
is AUX VBZ
your PRON PRP$
chance NOUN NN
to PART TO
get VERB VB
into ADP IN
the DET DT
coveted ADJ JJ
, PUNCT ,
big ADJ JJ
- PUNCT HYPH
name NOUN NN



Passo 7: Imprimindo os tokens e sua análise morfológica


In [5]:
for token in doc:

    print(token, token.morph)

National Number=Sing
Park Number=Sing
Week Number=Sing
starts Number=Sing|Person=3|Tense=Pres|VerbForm=Fin
on 
Saturday Number=Sing
, PunctType=Comm
and ConjType=Cmp
it Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs

 
also 
starts Number=Sing|Person=3|Tense=Pres|VerbForm=Fin
off 
with 
a Definite=Ind|PronType=Art
bang Number=Sing
- PunctType=Dash
for 
- PunctType=Dash
your Person=2|Poss=Yes|PronType=Prs
- PunctType=Dash
buck Number=Sing
. PunctType=Peri


 
That Number=Sing|PronType=Dem
’s Number=Sing|Person=3|Tense=Pres|VerbForm=Fin
because 
every 
US Number=Sing
National Number=Sing
Park Number=Sing
Service Number=Sing
site Number=Sing
will VerbForm=Fin

 
have VerbForm=Inf
free Degree=Pos
entry Number=Sing
on 
Saturday Number=Sing
. PunctType=Peri
NPS Number=Sing
manages Number=Sing|Person=3|Tense=Pres|VerbForm=Fin
almost 
430 NumType=Card

 
sites Number=Plur
, PunctType=Comm
and ConjType=Cmp
the Definite=Def|PronType=Art
majority Number=Sing
of 
them Case=Acc|Number=Plur|

Passo 8: Visualizando a árvore de análise sintática por frase e a relação entre as palavras.


In [6]:
from spacy import displacy

displacy.render(doc, style='dep', options={'compact':
True})