In [1]:
import pandas as pd
import xml.etree.ElementTree as ET 

import spacy 
nlp = spacy.load("en_core_web_md")

from spacy import displacy
from pathlib import Path

# Extract Data Set

In [2]:
tree = ET.parse('../pubmed22n0001.xml')
root = tree.getroot()

In [3]:
article_count = 0
articles = []

for article in root.iter('AbstractText'):
    # print(article.tag, article.attrib,article.text)
    articles.append(article.text)
    article_count += 1

In [4]:
articles[2]

'The distribution of blood flow to the subendocardial, medium and subepicardial layers of the left ventricular free wall was studied in anaesthetized dogs under normoxic (A), hypoxic (B) conditions and under pharmacologically induced (etafenone) coronary vasodilation (C). Regional myocardial blood flow was determined by means of the particle distribution method. In normoxia a transmural gradient of flow was observed, with the subendocardial layers receiving a significantly higher flow rate compared with the subepicardial layers. In hypoxia induced vasodilation this transmural gradient of flow was persistent. In contrast a marked redistribution of regional flow was observed under pharmacologically induced vasodilation. The transmural gradient decreased. In contrast to some findings these experiments demonstrate that a considerable vasodilatory capacity exists in all layers of the myocardium and can be utilized by drugs. The differences observed for the intramural distribution pattern of

# Exploration using Chapt 3 Methods

In [5]:
doc = nlp(articles[2])
for token in doc:
    print(token.text, token.pos_, token.tag_, spacy.explain(token.pos_), spacy.explain(token.tag_))

The DET DT determiner determiner
distribution NOUN NN noun noun, singular or mass
of ADP IN adposition conjunction, subordinating or preposition
blood NOUN NN noun noun, singular or mass
flow NOUN NN noun noun, singular or mass
to ADP IN adposition conjunction, subordinating or preposition
the DET DT determiner determiner
subendocardial ADJ JJ adjective adjective (English), other noun-modifier (Chinese)
, PUNCT , punctuation punctuation mark, comma
medium ADJ JJ adjective adjective (English), other noun-modifier (Chinese)
and CCONJ CC coordinating conjunction conjunction, coordinating
subepicardial ADJ JJ adjective adjective (English), other noun-modifier (Chinese)
layers NOUN NNS noun noun, plural
of ADP IN adposition conjunction, subordinating or preposition
the DET DT determiner determiner
left ADJ JJ adjective adjective (English), other noun-modifier (Chinese)
ventricular ADJ JJ adjective adjective (English), other noun-modifier (Chinese)
free ADJ JJ adjective adjective (English), 

In [6]:
doc = nlp(articles[2])
for token in doc:
    print(token.text, token.dep_, token.head)

The det distribution
distribution nsubjpass studied
of prep distribution
blood compound flow
flow pobj of
to prep flow
the det layers
subendocardial amod layers
, punct subendocardial
medium conj subendocardial
and cc medium
subepicardial conj medium
layers pobj to
of prep layers
the det wall
left amod ventricular
ventricular amod wall
free amod wall
wall pobj of
was auxpass studied
studied ROOT studied
in prep studied
anaesthetized amod dogs
dogs pobj in
under prep dogs
normoxic nmod conditions
( punct A
A appos normoxic
) punct A
, punct normoxic
hypoxic amod conditions
( punct conditions
B nmod conditions
) punct conditions
conditions pobj under
and cc under
under conj under
pharmacologically advmod induced
induced amod vasodilation
( punct vasodilation
etafenone nmod vasodilation
) punct vasodilation
coronary amod vasodilation
vasodilation pobj under
( punct vasodilation
C appos vasodilation
) punct vasodilation
. punct studied
Regional amod flow
myocardial amod flow
blood compound

In [7]:
doc = nlp(articles[2])
# displacy.serve(doc, style='dep')
displacy.render(doc, style='dep')

In [10]:
doc = nlp(articles[1])
print(doc.ents)
for ent in doc.ents:
    print(ent, ent.label_,spacy.explain(ent.label_))

(214, DS, Wistar, only about 12 percent, 150 mg kg, 1.7 percent, 40 percent, one, 80 percent, 47 percent, 150 mg kg, 25 percent, 18 percent, 250-min, two, DS)
214 CARDINAL Numerals that do not fall under another type
DS PERSON People, including fictional
Wistar ORG Companies, agencies, institutions, etc.
only about 12 percent PERCENT Percentage, including "%"
150 mg kg QUANTITY Measurements, as of weight or distance
1.7 percent PERCENT Percentage, including "%"
40 percent PERCENT Percentage, including "%"
one CARDINAL Numerals that do not fall under another type
80 percent PERCENT Percentage, including "%"
47 percent PERCENT Percentage, including "%"
150 mg kg QUANTITY Measurements, as of weight or distance
25 percent PERCENT Percentage, including "%"
18 percent PERCENT Percentage, including "%"
250-min TIME Times smaller than a day
two CARDINAL Numerals that do not fall under another type
DS PERSON People, including fictional


In [12]:
doc = nlp(articles[2])
print(doc.ents)
for ent in doc.ents:
    print(ent, ent.label_,spacy.explain(ent.label_))

()


In [14]:
doc = nlp(articles[4])
print(doc.ents)
for ent in doc.ents:
    print(ent, ent.label_,spacy.explain(ent.label_))

(RMI, 61 140, RMI, 61 144, RMI, 61 280, synthetized N-[8-R-dibenzo(b, f)oxepin-10-yl]-N'-methyl-, three, CPZ, CPD, CPZ, CPD, RMI, 61 140, RMI, 61 144, RMI, 61 280, CPZ)
RMI ORG Companies, agencies, institutions, etc.
61 140 CARDINAL Numerals that do not fall under another type
RMI ORG Companies, agencies, institutions, etc.
61 144 CARDINAL Numerals that do not fall under another type
RMI ORG Companies, agencies, institutions, etc.
61 280 CARDINAL Numerals that do not fall under another type
synthetized N-[8-R-dibenzo(b ORG Companies, agencies, institutions, etc.
f)oxepin-10-yl]-N'-methyl- DATE Absolute or relative dates or periods
three CARDINAL Numerals that do not fall under another type
CPZ ORG Companies, agencies, institutions, etc.
CPD ORG Companies, agencies, institutions, etc.
CPZ ORG Companies, agencies, institutions, etc.
CPD ORG Companies, agencies, institutions, etc.
RMI ORG Companies, agencies, institutions, etc.
61 140 CARDINAL Numerals that do not fall under another type


In [15]:
articles[4]

"RMI 61 140, RMI 61 144 and RMI 61 280 are newly synthetized N-[8-R-dibenzo(b,f)oxepin-10-yl]-N'-methyl-piperazine-maleates which show interesting psychopharmacologic effects. This work contains the results of a study performed with these three compounds, in order to demonstrate their neuropsycholeptic activity in comparison with chloropromazine (CPZ) and chlordiazepoxide (CPD). The inhibition of motility observed in mice shows that the compounds reduce the normal spontaneous motility as well as the muscle tone. The central-depressant activity is evidenced by increased barbiturate-induced sleep and a remarkable eyelid ptosis can also be observed. Our compounds do not show any activity on electroshock just as do CPZ and CPD. As to the antipsychotic outline, our compounds show strong reduction of lethality due to amphetamine in grouped mice and a strong antiapomorphine activity. They show also an antiaggressive effect and an inhibitory activity on avoidance behaviour much stronger than C