# Expressing Disagreement in Abstracts
### A Reflection on Scientific Articles about Hydroxychloroquine (HCQ)/Cholorquine (CQ) as a Treatment for COVID-19

## 0 Import Libraries

In [1]:
import pandas as pd
import spacy
from spacy.matcher import Matcher
from spacy.pipeline import EntityRuler
from spacy import displacy

In [2]:
# Import English Library.
nlp = spacy.load("en_core_web_lg", disable=["ner"])

## 1 Outline

(A) Abstracts -> sentences -> filtered sentences (sections 2 - 4)

(B) Label entities (section 5)

(C) Disagreement pairs (section 6)
 - Seperate affirmative sentences from negated sentences (6.1)
 - Assign sentences that show disagreement (6.2)
 
*Basic idea:*

>`"The findings support the hypothesis that these drugs have efficacy in the treatment of COVID-19."`

>`"These results do not support the use of HCQ in patients hospitalised for documented SARS-CoV-2-positive hypoxic pneumonia."`

|Sentence type|Noun (EVID/SCI)|Negation (NEG)|Verb (SUPP)|Statement (Span)|
|-------------|---------------|--------------|-----------|----------------|
|PRO          |`The findings` |              |`support`  |`the hypothesis that these drugs have efficacy in the treatment of COVID-19`|
|CON          |`These results`|`[do] not`    |`support`  |`the use of HCQ in patients hospitalised for documented SARS-CoV-2-positive hypoxic pneumonia`|

**Similar topics: Span similarity vs. Doc similarity**

In [3]:
# Doc similarity.
doc_a = nlp("The study shows that HCQ is effective against COVID-19")
doc_b = nlp("The study does not show that HCQ is effective against COVID-19")
doc_c = nlp("The study shows that climate change is caused by humans")

print(f"Doc similarity for 'doc_a' and 'doc_b': {doc_a.similarity(doc_b)}")
print(f"Doc similarity for 'doc_c' and 'doc_b': {doc_c.similarity(doc_b)}")

Doc similarity for 'doc_a' and 'doc_b': 0.9769809423815198
Doc similarity for 'doc_c' and 'doc_b': 0.8925517358308369


In [4]:
# Span similarity.
span_a = doc_a[3:]
span_b = doc_b[5:]
span_c = doc_c[3:]

print(f"span_a: '{span_a}'")
print(f"span_b: '{span_b}'")
print(f"span_c: '{span_c}'")
print("\n")
print(f"Span similarity for 'span_a' and 'span_b': {span_a.similarity(span_b)}")
print(f"Span similarity for 'span_c' and 'span_b': {span_c.similarity(span_b)}")

span_a: 'that HCQ is effective against COVID-19'
span_b: 'that HCQ is effective against COVID-19'
span_c: 'that climate change is caused by humans'


Span similarity for 'span_a' and 'span_b': 1.0
Span similarity for 'span_c' and 'span_b': 0.7738426327705383


## 2 Load Dataframe

In [5]:
abstract_df = pd.read_json("data/HCQ_clean_abstracts.json")
abstract_df.head(3)

Unnamed: 0,Publication ID,title,abstract_clean
0,pub.1126880632,COVID-19 and what pediatric rheumatologists sh...,"On March 11th, 2020 the World Health Organizat..."
1,pub.1127834352,Hydroxychloroquine or chloroquine with or with...,"BACKGROUND: Hydroxychloroquine or chloroquine,..."
2,pub.1126667578,Hydroxychloroquine in patients mainly with mil...,Abstract Objectives To assess the efficacy and...


## 3 New Dataframe: From Abstracts to Sentences

In [6]:
# Make a dataframe that
#      (i) has a row for each sentence;
#     (ii) assigns a unique id to each sentence;
#    (iii) assigns the title of publication to each sentence.
def single_sentences(dataframe):
    data_list = []                          # Create empty list-object: data_list.
      
    for row_number in dataframe.index:      # The for-loop iterates over index of all row numbers.
        sentence_number = 0                 # Set counter.
        
        for sentence in nlp(dataframe["abstract_clean"].iloc[row_number]).sents:
            
            sentence_id = dataframe["Publication ID"].iloc[row_number] + "-" + str(sentence_number)
            
            data_list.append([sentence_id, dataframe["title"].iloc[row_number], sentence.text])
            
            sentence_number += 1
            
    new_dataframe = pd.DataFrame(data_list, columns=["sentence_id", "title", "sentence"])
    
    return new_dataframe

In [7]:
# Create a dataframe that contains in each row one single sentence
# and its corresponding title and sentence ID as 
# its unique identifier: sentences_df.
sentences_df = single_sentences(abstract_df)

In [8]:
sentences_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 216 entries, 0 to 215
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   sentence_id  216 non-null    object
 1   title        216 non-null    object
 2   sentence     216 non-null    object
dtypes: object(3)
memory usage: 5.2+ KB


In [9]:
sentences_df.head(3)

Unnamed: 0,sentence_id,title,sentence
0,pub.1126880632-0,COVID-19 and what pediatric rheumatologists sh...,"On March 11th, 2020 the World Health Organizat..."
1,pub.1126880632-1,COVID-19 and what pediatric rheumatologists sh...,"The infection, transmitted by 2019 novel coron..."
2,pub.1126880632-2,COVID-19 and what pediatric rheumatologists sh...,"Italy was early and severely involved, with a ..."


In [10]:
# Make a list of sentences (Doc-objects) from column 'sentence' of 
# dataframe 'sentences_df': doc_list.
doc_list = list(nlp.pipe(sentences_df["sentence"].to_list()))

In [11]:
len(doc_list)

216

## 4 Filter Sentences

### 4.1 Filter by Verb

**General Patterns:**

```python
[{"POS": "VERB", "DEP": "ROOT", "LEMMA": {"IN": verbs}}]
```
or

```python
[{"POS": "VERB", "DEP": "xcomp", "LEMMA": {"IN": verbs}}]
```

**Verbs that express a Support-Relation:**

* confirm
* reveal
* show
* suggest
* support

In [12]:
spacy.explain("xcomp")

'open clausal complement'

In [13]:
# Make verb_matcher.
verb_matcher = Matcher(nlp.vocab, validate=True)

In [14]:
# Make search pattern for 'verb_matcher: support_verbs_pattern.
support_verbs_pattern = [{"POS": "VERB", "DEP": "ROOT", "LEMMA": {"IN": ['reveal', 'show', 'suggest', 'support']}}]

In [15]:
# Make additional search pattern for 'verb_matcher': confirm_pattern.
confirm_pattern = [{"POS": "VERB", "DEP": "xcomp", "LEMMA": "confirm"}]

In [16]:
# Add 'support_verbs_pattern' and 'confirm_pattern' to 'verb_matcher'.
verb_matcher.add("VERB_ID", None, support_verbs_pattern, confirm_pattern)

In [17]:
verb_filtered_sentences = [doc for doc in doc_list if len(verb_matcher(doc)) > 0]

### 4.2 Additionally Filter by Noun

**General Pattern:**

```python
[{"POS": "NOUN", "DEP": "nsubj", "LEMMA": noun/pronoun}]
```
or
```python
[{"POS": "NOUN", "DEP": "pobj", "LEMMA": noun}]
```

**Nouns that indicate Evidence:**

* analysis
* evidence
* finding
* result
* survey
* trial

**Pronoun, 1st person, plural:**

* we ("-PRON-")

In [18]:
# Make another matcher: 'noun_matcher'.
noun_matcher = Matcher(nlp.vocab, validate=True)

In [19]:
# EVIDENCE-noun patterns for 'noun_matcher'.
analysis_pattern = [{"POS": "NOUN", "DEP": "nsubj", "LEMMA": "analysis"}]
evidence_pattern = [{"POS": "NOUN", "DEP": "nsubj", "LEMMA": "evidence"}]
finding_pattern = [{"POS": "NOUN", "DEP": "nsubj", "LEMMA": "finding"}]

result_pattern = [{"POS": "NOUN", "DEP": "nsubj", "LEMMA": "result"}]
survey_pattern = [{"POS": "NOUN", "DEP": "nsubj", "LEMMA": "survey"}]

In [20]:
# Additional search pattern for 'noun_matcher': 'trial_pattern'.
trial_pattern = [{"POS": "NOUN", "DEP": "pobj", "LEMMA": "trial"}]

In [21]:
# Additional search pattern for 'noun_matcher': 'we_pattern'.
we_pattern = [{"POS": "PRON", "DEP": "nsubj", "LEMMA": "-PRON-"}]

In [22]:
# Add search patterns to 'noun_matcher'.
noun_matcher.add("NOUN_ID", None, 
                   analysis_pattern, 
                   evidence_pattern, 
                   finding_pattern, 
                   result_pattern, 
                   survey_pattern, 
                   trial_pattern, 
                   we_pattern)

In [23]:
# Filter the Doc-objects (sentences) in 'verb_filtered_sentences' and 
# add the selected Docs to a list: noun_filtered_sentences.
noun_filtered_sentences = [doc.text for doc in verb_filtered_sentences if len(noun_matcher(doc)) > 0] 

In [24]:
print(f"Number of Sentences in 'doc_list': {len(doc_list)}")
print(f"Number of Sentences in 'verb_filtered_sentences': {len(verb_filtered_sentences)}")
print(f"Number of Sentences in 'noun_filtered_sentences': {len(noun_filtered_sentences)}")

Number of Sentences in 'doc_list': 216
Number of Sentences in 'verb_filtered_sentences': 11
Number of Sentences in 'noun_filtered_sentences': 9


## 5 Label Entities

For the following see:

[https://spacy.io/usage/rule-based-matching#entityruler](https://spacy.io/usage/rule-based-matching#entityruler)

**EntityRuler**

In [25]:
# Initialize spacy's EntityRuler: ruler.
ruler = EntityRuler(nlp, validate=True)

In [26]:
# Add 'ruler' to pipline of 'nlp'.
nlp.add_pipe(ruler)

**SUPPORT-Verbs**

In [27]:
# Pattern for Entity "SUPP" (SUPPORT-verb): verb_patterns.
verb_patterns = [{"label": "SUPP", 
                  "pattern": [
                      {"POS": "VERB", 
                       "DEP": "ROOT", 
                       "LEMMA": {
                           "IN": ['reveal', 'show', 'suggest', 'support']
                       }
                      }
                  ]
                 }, 
                 {"label": "SUPP", 
                  "pattern": [
                      {"POS": "VERB", 
                       "DEP": "xcomp", 
                       "LEMMA": "confirm"
                      }
                  ]
                 }
                ]

**EVIDENCE-nouns**

In [28]:
# Pattern for Entity "EVID" (EVIDENCE-noun) : evidence_patterns.
evidence_patterns = [{'label': 'EVID', 'pattern': 'This re-analysis'},
                     {'label': 'EVID', 'pattern': 'This systematic review and meta-analysis'},
                     {'label': 'EVID', 'pattern': 'These results'},
                     {'label': 'EVID', 'pattern': 'Interpretation Preliminary findings'},
                     {'label': 'EVID', 'pattern': 'Preliminary evidence'},
                     {'label': 'EVID', 'pattern': 'The findings'},
                     {'label': 'EVID', 'pattern': 'our survey'},
                     {"label": "EVID", "pattern": "multicenter clinical trials"}]

In [29]:
# Pattern for Entity "SCI" (scientists): we_label_pattern.
we_label_pattern = [{"label": "SCI", "pattern": "We"}]

**Negations**

In [30]:
# Pattern for Entity "NEG" (negation): negation_patterns.
negation_patterns = [{"label": "NEG", "pattern": [{"LEMMA": {"IN": ["not", "no", "unable"]}}]}]

**Apply "ruler" to Sentences**

In [31]:
# Add patterns to 'ruler'.
ruler.add_patterns(verb_patterns)
ruler.add_patterns(evidence_patterns)
ruler.add_patterns(we_label_pattern)
ruler.add_patterns(negation_patterns)

  self.phrase_matcher.add(label, patterns)
  self.phrase_matcher.add(label, patterns)


In [32]:
# Convert strings in 'noun_filtered_sentences' into Doc-objects with 
# labeled named entities.  Make a list of these Docs: 
# disagreement_sentences.
disagreement_sentences = list(nlp.pipe(noun_filtered_sentences))

In [33]:
# Show an example: Sentence with Named Entities.
displacy.render(disagreement_sentences[5], style="ent", jupyter=True)

## 6 Disagreement Pairs

### 6.1 Separate Sentences with Regard to Negation

In [34]:
# Make 'negation_matcher'.
negation_matcher = Matcher(nlp.vocab, validate=True)

In [35]:
# Negation Pattern.
negation_pattern = [{"LEMMA": {"IN": ["not", "no", "unable"]}}]

In [36]:
# Add 'negation_pattern' to 'negation_matcher'.
negation_matcher.add("NEGATION_ID", None, negation_pattern)

In [37]:
# List of affirmative sentences: sents.
sents = []

In [38]:
# List of negated sentences: negated_sents.
negated_sents = []

In [39]:
# Define a function which sorts sentences into two groups depending on 
# whether a sentence is affirmative or negated: negation_filter().
def negation_filter(sent_list):
    for doc in sent_list:
        if len(negation_matcher(doc)) > 0:
            negated_sents.append(doc)
        else:
            sents.append(doc)

In [40]:
# Apply 'negation_filter()' to 'disagreement_sentences'.  Docs of 
# 'disagreement_sentences' will be stored either in 'sents' or 
# 'negated_sents'.
negation_filter(disagreement_sentences)

### 6.2 Pairs of Disagreement Sentences

In [41]:
# Make and configure a new Matcher-object: span_matcher.  'span_matcher' 
# is used in the following function 'disagreement_pairs()'.
#
# The purpose of 'span_matcher': To spot the token after which a Doc is 
# to be cut into two parts.  The second part is the Span of interest.
span_matcher = Matcher(nlp.vocab, validate=True)

span_pattern_1 = [{"POS": "VERB", "DEP": "ROOT", "LEMMA": {"IN": ['reveal', 'show', 'suggest', 'support']}}]
span_pattern_2 = [{"POS": "VERB", "DEP": "xcomp", "LEMMA": "confirm"}]

span_matcher.add("SPAN_ID", None, span_pattern_1, span_pattern_2)

In [42]:
# Define a function that creates pairs of disagreeing sentences and 
# stores each pair in a list: disagreement_pairs().
def disagreement_pairs(list_of_affirmative_sentences, list_of_negated_sentences):
    
    # List of pairs of disagreeing sentences: 
    # pairs_of_disagreeing_sentences.  This list shall be returned by 
    # the function.
    pairs_of_disagreeing_sentences = []
    
    # 1. Loop ("outer loop"): Iterate over all Doc-objects in 
    # 'list_of_affirmative_sentences'.    
    for doc in list_of_affirmative_sentences:
        # Slice the current Doc-object into a Span-object with the help 
        # of 'span_matcher': span1.
        for match_id, start, end in span_matcher(doc):
            span1 = doc[end:]
         
        # 2. Loop ("inner loop"): Iterate over all Doc-objects in 
        # 'list_of_negated_sentences'.
        for doc_neg in list_of_negated_sentences:
            # Slice the current Doc-object into a Span-object with the 
            # help of 'span_matcher': span2.            
            for match_id, start, end in span_matcher(doc_neg):
                span2 = doc_neg[end + 1:]
            
            # If Span-object 1 and Span-object 2 have a certain degree 
            # of similarity, then make a pair of the corresponding 
            # sentences and add the pair to the list 
            # 'pairs_of_disagreeing_sentences'.            
            if span1.similarity(span2) >= 0.83:
                pairs_of_disagreeing_sentences.append((doc, doc_neg))
    
    # Return the list of sentence pairs which sentences show 
    # disagreement.    
    return pairs_of_disagreeing_sentences

In [43]:
# Make a list of pairs of disagreeing sentences.  Take the sentences
# from 'sents' and 'negated_sents'.  Store the results as
# disagreementPairs.
disagreementPairs = disagreement_pairs(sents, negated_sents)

In [44]:
# Show pairs of disagreeing sentences with entities highlighted.
for (doc, doc_neg) in disagreementPairs:
    print("==========\n")
    print("(PRO)\n")
    displacy.render(doc, style="ent", jupyter=True)
    print("\n(CON)\n")
    displacy.render(doc_neg, style="ent", jupyter=True)
    print("\n")


(PRO)




(CON)






(PRO)




(CON)






(PRO)




(CON)






(PRO)




(CON)






(PRO)




(CON)






(PRO)




(CON)






(PRO)




(CON)






(PRO)




(CON)






(PRO)




(CON)






(PRO)




(CON)






(PRO)




(CON)






(PRO)




(CON)






(PRO)




(CON)





