## Import libraries

In [1]:
import pandas as pd
import spacy
from spacy.matcher import Matcher
from spacy import displacy

In [2]:
# Import English Library
nlp = spacy.load("en_core_web_sm")



## Load dataframe

In [3]:
df = pd.read_json("df_HCQ.json")
df.head()

Unnamed: 0,Publication ID,title,abstract,abstract_clean
0,pub.1126880632,COVID-19 and what pediatric rheumatologists sh...,"On March 11th, 2020 the World Health Organizat...","On March 11th, 2020 the World Health Organizat..."
1,pub.1127834352,Hydroxychloroquine or chloroquine with or with...,"BACKGROUND: Hydroxychloroquine or chloroquine,...","BACKGROUND: Hydroxychloroquine or chloroquine,..."
2,pub.1126667578,Hydroxychloroquine in patients mainly with mil...,Abstract Objectives To assess the efficacy and...,Abstract Objectives To assess the efficacy and...
3,pub.1125404383,Of chloroquine and COVID-19,Recent publications have brought attention to ...,Recent publications have brought attention to ...
4,pub.1127182972,An independent appraisal and re-analysis of hy...,A recent open-label study claimed that hydroxy...,A recent open-label study claimed that hydroxy...


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 17 entries, 0 to 16
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Publication ID  17 non-null     object
 1   title           17 non-null     object
 2   abstract        17 non-null     object
 3   abstract_clean  17 non-null     object
dtypes: object(4)
memory usage: 680.0+ bytes


## Reshape dataframe

In [5]:
# Make new dataframe from 'df': df_HCQ
df_HCQ = df[["Publication ID", "title", "abstract_clean"]]

In [6]:
# Add column 'doc' to 'df_HCQ'
# 'df_HCQ["doc"]' shall contain Doc-objects made from abstracts
df_HCQ["doc"] = df_HCQ["abstract_clean"].apply(nlp)

In [7]:
df_HCQ.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 17 entries, 0 to 16
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Publication ID  17 non-null     object
 1   title           17 non-null     object
 2   abstract_clean  17 non-null     object
 3   doc             17 non-null     object
dtypes: object(4)
memory usage: 680.0+ bytes


In [8]:
df_HCQ.head()

Unnamed: 0,Publication ID,title,abstract_clean,doc
0,pub.1126880632,COVID-19 and what pediatric rheumatologists sh...,"On March 11th, 2020 the World Health Organizat...","(On, March, 11th, ,, 2020, the, World, Health,..."
1,pub.1127834352,Hydroxychloroquine or chloroquine with or with...,"BACKGROUND: Hydroxychloroquine or chloroquine,...","(BACKGROUND, :, Hydroxychloroquine, or, chloro..."
2,pub.1126667578,Hydroxychloroquine in patients mainly with mil...,Abstract Objectives To assess the efficacy and...,"(Abstract, Objectives, To, assess, the, effica..."
3,pub.1125404383,Of chloroquine and COVID-19,Recent publications have brought attention to ...,"(Recent, publications, have, brought, attentio..."
4,pub.1127182972,An independent appraisal and re-analysis of hy...,A recent open-label study claimed that hydroxy...,"(A, recent, open, -, label, study, claimed, th..."


In [9]:
def get_sentences(doc):
    sents_list = [sent for sent in doc.sents]
    
    return sents_list

In [10]:
df_HCQ["sentences"] = df_HCQ["doc"].apply(get_sentences)
df_HCQ.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 17 entries, 0 to 16
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Publication ID  17 non-null     object
 1   title           17 non-null     object
 2   abstract_clean  17 non-null     object
 3   doc             17 non-null     object
 4   sentences       17 non-null     object
dtypes: object(5)
memory usage: 816.0+ bytes


In [11]:
df_HCQ.head(3)

Unnamed: 0,Publication ID,title,abstract_clean,doc,sentences
0,pub.1126880632,COVID-19 and what pediatric rheumatologists sh...,"On March 11th, 2020 the World Health Organizat...","(On, March, 11th, ,, 2020, the, World, Health,...","[(On, March, 11th, ,, 2020, the, World, Health..."
1,pub.1127834352,Hydroxychloroquine or chloroquine with or with...,"BACKGROUND: Hydroxychloroquine or chloroquine,...","(BACKGROUND, :, Hydroxychloroquine, or, chloro...","[(BACKGROUND, :), (Hydroxychloroquine, or, chl..."
2,pub.1126667578,Hydroxychloroquine in patients mainly with mil...,Abstract Objectives To assess the efficacy and...,"(Abstract, Objectives, To, assess, the, effica...","[(Abstract, Objectives, To, assess, the, effic..."


### Example abstracts

In [12]:
sentences_6 = df_HCQ["sentences"].iloc[6]
for i, sentence in enumerate(sentences_6):
    print(f"({i}) {sentence}")

(0) Background Treatments are urgently needed to prevent respiratory failure and deaths from coronavirus disease 2019 (COVID-19).
(1) Hydroxychloroquine (HCQ) has received worldwide attention because of positive results from small studies.
(2) Methods We used data collected from routine care of all adults in 4 French hospitals with documented SARS-CoV-2 pneumonia and requiring oxygen ≥ 2 L/min to emulate a target trial aimed at assessing the effectiveness of HCQ at 600 mg/day.
(3) The composite primary endpoint was transfer to intensive care unit (ICU) within 7 days from inclusion and/or death from any cause.
(4) Analyses were adjusted for confounding factors by inverse probability of treatment weighting.
(5) Results This study included 181 patients with SARS-CoV-2 pneumonia; 84 received HCQ within 48 hours of admission (HCQ group) and 97 did not (no-HCQ group).
(6) Initial severity was well balanced between the groups.
(7) In the weighted analysis, 20.2% patients in the HCQ group were

In [13]:
sentences_14 = df_HCQ["sentences"].iloc[14]
for i, sentence in enumerate(sentences_14):
    print(f"({i}) {sentence}")

(0) The coronavirus disease 2019 (COVID-19) virus is spreading rapidly, and scientists are endeavoring to discover drugs for its efficacious treatment in China.
(1) Chloroquine phosphate, an old drug for treatment of malaria, is shown to have apparent efficacy and acceptable safety against COVID-19 associated pneumonia in multicenter clinical trials conducted in China.
(2) The drug is recommended to be included in the next version of the Guidelines for the Prevention, Diagnosis, and Treatment of Pneumonia Caused by COVID-19 issued by the National Health Commission of the People's Republic of China for treatment of COVID-19 infection in larger populations in the future.


### Example of disagreement-sentences (from different abstracts)

In [14]:
pro_example = sentences_14[2]
con_example= sentences_6[-1] # also: sentences_6[15]; '[-1]' shall point out, that the disagreeing sentence is the last
                             # sentence in the abstract

print(f"(PRO) {pro_example}")
print("\n")
print(f"(CON) {con_example}")

(PRO) The drug is recommended to be included in the next version of the Guidelines for the Prevention, Diagnosis, and Treatment of Pneumonia Caused by COVID-19 issued by the National Health Commission of the People's Republic of China for treatment of COVID-19 infection in larger populations in the future.


(CON) These results do not support the use of HCQ in patients hospitalised for documented SARS-CoV-2-positive hypoxic pneumonia.


* The debated statement is a statement of causal relevancy. "Chloroquine phosphate"/"HCQ" is said to be effective against COVID-19 related pneumonia. AND it is safe enough to do no harm to patients. I.e. (Hydroxy)chloroquine is suitable for giving it to patients. So HCQ is said to heal patients that suffer form COVID-19 related pneumonia. Thus, the debated statement is this: (CR) "Treatment with (Hydroxy)chloroquine ist causally relevant for healing COVID-19 related pneumonia". If CR is true, then (Hydroxy)chloroquine could or should be used to treat COVID-19 related pneumonia. CR is the debated statement.
* PRO claims that some body of evidence "show[s]" CR. (The body of evidence is not stated in PRO, but I suppose some evidence is assumed that is suitable for "showing" CR.) CON claims that some body of evidence ("[t]hese results") "support[s]" CR. (Let's ignore for a moment that there is a negation in CON). Both PRO and CON say that there is a relation between some body of evidence and CR. Let's call that relation "support-relation", or SUP(x,y) for short. PRO states that the support-relation obtains between some body of evidence and CR (SUP(E,CR)), whereas CON denies that this relation obtains (NOT SUP(E,CR)).
* Claiming that the support-relation obtains between some body of evidence and CR means that one claims, that CR is true (or at least might well be true). But denying that the support-relation obtains is **not** to say that CR is false.
* There is disagreement between PRO and CON. The disagreement is **not**: PRO states that CR is true while CON states that CR is NOT true. The disagreement rather seems to be: PRO states SUP(E,CR) while CON states NOT SUP(E,CR). It seems to be promising to search for expressions (verbs) that express SUP.

### Alternative 1: Search for verbs (POS)

In [15]:
pattern_1 = [{"POS": "VERB"}]

matcher_pro_1 = Matcher(nlp.vocab)
matcher_pro_1.add("verb_pro_id", None, pattern_1)

matcher_con_1 = Matcher(nlp.vocab)
matcher_con_1.add("verb_con_id", None, pattern_1)

matches_pro_1 = matcher_pro_1(pro_example)
matches_con_1 = matcher_con_1(con_example)

In [16]:
for token in pro_example:
    print(token.text, token.lemma_, token.pos_)

The the DET
drug drug NOUN
is be AUX
recommended recommend VERB
to to PART
be be AUX
included include VERB
in in ADP
the the DET
next next ADJ
version version NOUN
of of ADP
the the DET
Guidelines Guidelines PROPN
for for ADP
the the DET
Prevention Prevention PROPN
, , PUNCT
Diagnosis Diagnosis PROPN
, , PUNCT
and and CCONJ
Treatment Treatment PROPN
of of ADP
Pneumonia Pneumonia PROPN
Caused cause VERB
by by ADP
COVID-19 covid-19 ADV
issued issue VERB
by by ADP
the the DET
National National PROPN
Health Health PROPN
Commission Commission PROPN
of of ADP
the the DET
People People PROPN
's 's PART
Republic Republic PROPN
of of ADP
China China PROPN
for for ADP
treatment treatment NOUN
of of ADP
COVID-19 covid-19 ADJ
infection infection NOUN
in in ADP
larger large ADJ
populations population NOUN
in in ADP
the the DET
future future NOUN
. . PUNCT


In [17]:
print(f"(PRO) {pro_example}\n")
for verb_pro_id, start, end in matches_pro_1:
    print(f"Verb found in (PRO): {pro_example[start:end].lemma_}\n")

(PRO) The drug is recommended to be included in the next version of the Guidelines for the Prevention, Diagnosis, and Treatment of Pneumonia Caused by COVID-19 issued by the National Health Commission of the People's Republic of China for treatment of COVID-19 infection in larger populations in the future.

Verb found in (PRO): recommend

Verb found in (PRO): include

Verb found in (PRO): cause

Verb found in (PRO): issue



Searching for verbs as part of speech (POS) yields "show" which expresses SUP in the context of PRO. But it also gives "conduct" which in the context of PRO does not express SUP. ("conduct" here means that a certain kind of trials were executed in China.)

In [18]:
for token in con_example:
    print(token.text, token.lemma_, token.pos_)

These these DET
results result NOUN
do do AUX
not not PART
support support VERB
the the DET
use use NOUN
of of ADP
HCQ HCQ PROPN
in in ADP
patients patient NOUN
hospitalised hospitalise VERB
for for ADP
documented document VERB
SARS SARS PROPN
- - PUNCT
CoV-2-positive CoV-2-positive PROPN
hypoxic hypoxic ADJ
pneumonia pneumonia NOUN
. . PUNCT


In [19]:
print(f"(CON) {con_example}\n")
for verb_con_id, start, end in matches_con_1:
    print(f"Verb found in (CON): {con_example[start:end].lemma_}\n")

(CON) These results do not support the use of HCQ in patients hospitalised for documented SARS-CoV-2-positive hypoxic pneumonia.

Verb found in (CON): support

Verb found in (CON): hospitalise

Verb found in (CON): document



Searching for verbs as part of speech (POS) yields:
* "support", which in the context of CON expresses SUP
* verbs that do not express SUP in the context of CON
  - "hospitalise": Saying that patients were in hospital
  - "document": Saying that patients were tested positive for SARS-CoV-2 (and had pneumonia)

### Alternative 2: Search for "ROOT" (DEP)

In [20]:
pattern_2 = [{"DEP": "ROOT"}]

matcher_pro_2 = Matcher(nlp.vocab)
matcher_pro_2.add("root_pro_id", None, pattern_2)

matcher_con_2 = Matcher(nlp.vocab)
matcher_con_2.add("root_con_id", None, pattern_2)

matches_pro_2 = matcher_pro_2(pro_example)
matches_con_2 = matcher_con_2(con_example)

In [21]:
for token in pro_example:
    print(token.text, token.lemma_, token.dep_)

The the det
drug drug nsubjpass
is be auxpass
recommended recommend ROOT
to to aux
be be auxpass
included include xcomp
in in prep
the the det
next next amod
version version pobj
of of prep
the the det
Guidelines Guidelines pobj
for for prep
the the det
Prevention Prevention pobj
, , punct
Diagnosis Diagnosis conj
, , punct
and and cc
Treatment Treatment conj
of of prep
Pneumonia Pneumonia pobj
Caused cause acl
by by agent
COVID-19 covid-19 pobj
issued issue acl
by by agent
the the det
National National compound
Health Health compound
Commission Commission pobj
of of prep
the the det
People People poss
's 's case
Republic Republic pobj
of of prep
China China pobj
for for prep
treatment treatment pobj
of of prep
COVID-19 covid-19 amod
infection infection pobj
in in prep
larger large amod
populations population pobj
in in prep
the the det
future future pobj
. . punct


In [22]:
print(f"(PRO) {pro_example}\n")
for root_pro_id, start, end in matches_pro_2:
    print(f"'ROOT' found in (PRO): {pro_example[start:end].lemma_}\n")

(PRO) The drug is recommended to be included in the next version of the Guidelines for the Prevention, Diagnosis, and Treatment of Pneumonia Caused by COVID-19 issued by the National Health Commission of the People's Republic of China for treatment of COVID-19 infection in larger populations in the future.

'ROOT' found in (PRO): recommend



DEP-search for "ROOT" gives "show" which expresses SUP in the context of PRO. This search finds exactly what I was looking for.

In [23]:
for token in con_example:
    print(token.text, token.lemma_, token.dep_)

These these det
results result nsubj
do do aux
not not neg
support support ROOT
the the det
use use dobj
of of prep
HCQ HCQ pobj
in in prep
patients patient pobj
hospitalised hospitalise acl
for for prep
documented document amod
SARS SARS npadvmod
- - punct
CoV-2-positive CoV-2-positive amod
hypoxic hypoxic amod
pneumonia pneumonia pobj
. . punct


In [24]:
print(f"(CON) {con_example}\n")
for root_con_id, start, end in matches_con_2:
    print(f"'ROOT' found in (CON): {con_example[start:end].lemma_}\n")

(CON) These results do not support the use of HCQ in patients hospitalised for documented SARS-CoV-2-positive hypoxic pneumonia.

'ROOT' found in (CON): support



DEP-search for "ROOT" gives "support" which expresses SUP in the context of CON. This search finds exactly what I was looking for.

* POS-search for verbs gives SUP-expressions. But it also gives verbs that do not express SUP. Some search results are not relevent-
* DEP-search for "ROOT" yields exactly those expressions that express SUP. All search results are relevant. (But perhaps it is too restrictive and finds not all SUP-expressions. This has to be investigated further.)
* DEP-search for "ROOT" might be the more promising alternative to follow.

### Tentative search for patterns

##### A first pattern: [{"DEP": "ROOT"}, {"DEP": "det"}]

In [25]:
displacy.render(con_example, style="dep", jupyter=True)

In [26]:
for token in con_example:
    print(token.text, token.lemma_, token.dep_)

These these det
results result nsubj
do do aux
not not neg
support support ROOT
the the det
use use dobj
of of prep
HCQ HCQ pobj
in in prep
patients patient pobj
hospitalised hospitalise acl
for for prep
documented document amod
SARS SARS npadvmod
- - punct
CoV-2-positive CoV-2-positive amod
hypoxic hypoxic amod
pneumonia pneumonia pobj
. . punct


A first search pattern operates on the dependencies between tokens within the context of a sentence. The pattern combines the ROOT of a sentence and a determiner ("det") that is the direct successor of ROOT (see CON above). (It might be more specific to use a negation as a direct predessor of ROOT as in CON above. But at this point I don't want to exclude sentences that state a support-relation instead of denying it.)

In [27]:
# Make pattern: pattern_dep
pattern_dep = [{"DEP": "ROOT"}, {"DEP": "det"}]

matcher_dep = Matcher(nlp.vocab)
matcher_dep.add("dep_id", None, pattern_dep)

##### Search for patterns in abstracts

In [28]:
def search_dep(sentences):
    for sentence in sentences:
        matches_sent = matcher_dep(sentence)

        for _id, start, end in matches_sent:
            print(f"{sentence}\n\nExpression found: {sentence[start:end].lemma_}\n\n==========\n\n")

In [29]:
results = df_HCQ["sentences"].apply(search_dep)

We did a multinational registry analysis of the use of hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19.

Expression found: do a



The authors have declared no competing interest.     

Expression found: declare no



The funders played no role in study design, data collection, data analysis, data interpretation, or reporting.

Expression found: play no



Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Expression found: follow all



The scientific community should consider this information in light of previous experiments with chloroquine in the field of antiviral research.

Expression found: consider this



This systematic review and meta-analysis not only indicated no clinical benefits regarding

Expression found: indicate no



The results of the meta-analysis on comparative studies

In [30]:
len(results)

17

This search finds too much. But it gives hints which verbs may be important.

In [31]:
def list_dep(sentences):
    dep_list = []
    
    for sentence in sentences:
        matches_sent = matcher_dep(sentence)

        for dep_id, start, end in matches_sent:
            dep_list.append(sentence[start:end].lemma_)
            
    return dep_list

In [32]:
results_list = df_HCQ["sentences"].apply(list_dep)

In [33]:
results_list

0                                                    []
1                                                [do a]
2                     [declare no, play no, follow all]
3                                       [consider this]
4                                                    []
5                   [indicate no, indicate no, show no]
6                                         [support the]
7                                    [be no, perform a]
8                          [be a, increase the, be the]
9                                                    []
10    [cause a, analyze the, identify a, support the...
11                                                   []
12                                                   []
13                         [evaluate the, consider the]
14                                                   []
15                                                   []
16                 [represent the, face a, explore the]
Name: sentences, dtype: object

The search pattern should be a combination of a ROOT-verb of a certain kind and a "det". The most promising candidates are "indicate", "show" and "support".

In [34]:
# New matcher

pattern_indicate = [{"LEMMA": "indicate"}, {"DEP": "det"}]
pattern_show = [{"LEMMA": "show"}, {"DEP": "det"}]
pattern_support = [{"LEMMA": "support"}, {"DEP": "det"}]

matcher_dep = Matcher(nlp.vocab)

matcher_dep.add("dep_id", None, pattern_indicate)
matcher_dep.add("dep_id", None, pattern_show)
matcher_dep.add("dep_id", None, pattern_support)

In [35]:
results_2 = df_HCQ["sentences"].apply(search_dep)

This systematic review and meta-analysis not only indicated no clinical benefits regarding

Expression found: indicate no



The results of the meta-analysis on comparative studies indicated no significant clinical effectiveness (negative in RT-PCR evaluation) for HCQ regimen in the treatment of COVID-19 in comparison to control group (RR: 0.96, 95% CI, 0.76-1.22).

Expression found: indicate no



Conclusions and Relevance: This systematic review and meta-analysis not only showed no clinical benefits regarding HCQ treatment with/without azithromycin for COVID-19 patients, but according to multiple sensitivity analysis, the higher mortality rates were observed for both HCQ and HCQ+AZM regimen groups, especially in the latter.

Expression found: show no



These results do not support the use of HCQ in patients hospitalised for documented SARS-CoV-2-positive hypoxic pneumonia.

Expression found: support the



The findings support the hypothesis that these drugs have efficacy in the tre

This result is better, but still not ideal.

##### "Critical objections"

* In the example above I considered a pair of disagreeing sentences - I called them PRO and CON, respectively. PRO states a causal relevancy, or rather it states that some body of evidence supports a statement of causal relevancy. CON on the contrary negates that some body of evidence supports this statement. Sentences like CON are critical objections referring to PRO-like sentences.
* Critical objections are one part of a disagreement. In the following I shall focus on Critical objections. I will assume that for a critical obejection there is a PRO-like sentence - even though I will not mention or state the PRO-like sentence.
* Sentences like CON might be only one type of critical objection among others.

In [36]:
# Critical objection matcher

pattern_neg = [{"DEP": "neg"}, {"DEP": "ROOT"}, {"DEP": "det"}]

# "no" is a certain kind of "det". So I replaced '{"DEP": "det"}' with '{"LEMMA": "no"}'
pattern_indicate = [{"LEMMA": "indicate"}, {"LEMMA": "no"}]
pattern_show = [{"LEMMA": "show"}, {"LEMMA": "no"}]
pattern_support = [{"LEMMA": "support"}, {"LEMMA": "no"}]

matcher_dep = Matcher(nlp.vocab)

matcher_dep.add("dep_id", None, pattern_indicate)
matcher_dep.add("dep_id", None, pattern_show)
matcher_dep.add("dep_id", None, pattern_support)
matcher_dep.add("dep_id", None, pattern_neg)

In [37]:
results_3 = df_HCQ["sentences"].apply(search_dep)

This systematic review and meta-analysis not only indicated no clinical benefits regarding

Expression found: indicate no



The results of the meta-analysis on comparative studies indicated no significant clinical effectiveness (negative in RT-PCR evaluation) for HCQ regimen in the treatment of COVID-19 in comparison to control group (RR: 0.96, 95% CI, 0.76-1.22).

Expression found: indicate no



Conclusions and Relevance: This systematic review and meta-analysis not only showed no clinical benefits regarding HCQ treatment with/without azithromycin for COVID-19 patients, but according to multiple sensitivity analysis, the higher mortality rates were observed for both HCQ and HCQ+AZM regimen groups, especially in the latter.

Expression found: show no



These results do not support the use of HCQ in patients hospitalised for documented SARS-CoV-2-positive hypoxic pneumonia.

Expression found: not support the



