# COVID-19 Open Research Dataset Challenge (CORD-19)
![](https://altaonline.typepad.com/.a/6a0192ac343706970d025d9b3673bb200c-800wi)

<h2>Goal</h2><br>
    This is my second response to the call to action to the artificial intelligence experts (if I can be called one) to ddevelop text and data mining tools that can help the medical community develop answers to high priority scientific questions. For that I will use the CORD-19 dataset, which represents the most extensive machine-readable coronavirus literature collection available for data mining to date. Bellow are the current tasks for this challenge, which will be completed by the creation of an interactive custer graph, with a search bar search for the all_sources_metadata file. There are around 29500 papers in the dataset. These are listed in the all_sources_metadata file. Some of the papers in the metadata are also in JSON files. The eventual goal is to connect the metadata with the JSON data.<br>
    <h2>Tasks</h2>
    <ul>
    <li>What is known about transmission, incubation, and environmental stability?</li>
    <li>What do we know about COVID-19 risk factors?</li>
    <li>What do we know about virus genetics, origin, and evolution?</li>
    <li>Sample task with sample submission</li>
    <li>What do we know about vaccines and therapeutics?</li>
    <li>What do we know about non-pharmaceutical interventions?</li>
    <li>What has been published about ethical and social science considerations?</li>
    <li>What do we know about diagnostics and surveillance?</li>
    <li>What has been published about medical care?</li>
    <li>What has been published about information sharing and inter-sectoral collaboration?</li>
    </ul>
    <h2>Citations, ups and downs</h2><br>
    I used the <a href='https://www.kaggle.com/maksimeren/covid-19-literature-clustering'>COVID-19 Literature Clustering</a> as a reference for making this notebook. I was still looking forward to testing different NLP solutions (check my other one <a href='https://www.kaggle.com/beatrizyumi/covid-19-autocomplete-search-bar'>here</a> and the ones presented on this notebook were very interesting, showing the different results on many different modeling choices for the clustering of the research papers. I opted for the t-SNE, because it is very visual and it would give me a good output for an interactive graph. I also tried to make the graph on plotly, simply because it is one of my favorite libraries out there. On the plus side, it offers a very dynamic, interactive and visual way to look at the research papers, being able to sort them by clusters or by searching them through the search bar. On the down side, it needs you to have your internet working, and depending on the size of the dataframe, it can take a while to load (mine took a few hours). Besides that I am still working on a way to split authors when they are separated by comma or by semicolon.<br>
    <h2>Features of this notebook</h2>
<ol><li>Viewing the papers in the metdata csv as a dataframe</li>
    <li>Viewing the papers in the interactive graph, which has hover information</li>
    <li>Search using a simple search index inside the interactive graph</li></ol>
    <h2>Turn your internet on!</h2><br>
    For this notebook to work your internet must be on.

# Importing libraries and datasets
![](https://media1.giphy.com/media/2yqYwtl5MUa6V405Ib/giphy.gif?cid=790b7611312b8226ba98603f316f4a2740c6e4a11ef228fc&rid=giphy.gif)

In [None]:
# loading libraries - must have internet on

# Overall tools
import numpy as np
import pandas as pd
import json
import glob
from scipy.spatial.distance import cdist

# Progress bar for the loops
import time
import sys
import tqdm

# Text tools
import re, nltk, spacy, gensim
from nltk.corpus import stopwords
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.stem import PorterStemmer
nltk.download("punkt")
nltk.download("stopwords")

# Sklearn
from sklearn.feature_extraction.text import HashingVectorizer # Vectorizor for the words in the abstract
from sklearn.feature_extraction.text import TfidfVectorizer # Vectorizor for the text in the abstract (tf-idf)
from sklearn.manifold import TSNE
from sklearn.cluster import MiniBatchKMeans
from sklearn.cluster import KMeans
from sklearn import metrics

# Plotting tools
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# Bokeh
from bokeh.models import ColumnDataSource, HoverTool, LinearColorMapper, CustomJS
from bokeh.palettes import Category20
from bokeh.transform import linear_cmap
from bokeh.io import output_file, show
from bokeh.transform import transform
from bokeh.io import output_notebook
from bokeh.plotting import figure
from bokeh.layouts import column
from bokeh.models import RadioButtonGroup
from bokeh.models import TextInput
from bokeh.layouts import gridplot
from bokeh.models import Div
from bokeh.models import Paragraph
from bokeh.layouts import column, widgetbox

In [None]:
#loading metadata file

root_path = '/kaggle/input/CORD-19-research-challenge/'
metadata_path = f'{root_path}/metadata.csv'
meta_df = pd.read_csv(metadata_path, dtype={
    'pubmed_id': str,
    'Microsoft Academic Paper ID': str, 
    'doi': str
})
meta_df.head()

In [None]:
# importing all json files

all_json = glob.glob(f'{root_path}/**/*.json', recursive=True)
len(all_json)

# Treating the data
![](https://media2.giphy.com/media/l41YtBXZvSRdgqq7m/giphy.gif?cid=790b76115c16e4f50dc6c6525b7b74b67a0ef53415ae01c0&rid=giphy.gif)

First we create a class that will read the json files in a humane readable way.

In [None]:
# File Reader class

class FileReader:
    def __init__(self, file_path):
        with open(file_path) as file:
            content = json.load(file)
            self.paper_id = content['paper_id']
            self.abstract = []
            self.body_text = []
            # Abstract
            for entry in content['abstract']:
                self.abstract.append(entry['text'])
            # Body text
            for entry in content['body_text']:
                self.body_text.append(entry['text'])
            self.abstract = '\n'.join(self.abstract)
            self.body_text = '\n'.join(self.body_text)
    def __repr__(self):
        return f'{self.paper_id}: {self.abstract[:200]}... {self.body_text[:200]}...'

In [None]:
# Checking if the File Reader Class worked

print(FileReader(all_json[0]))

In [None]:
# Function to add break every length characters

def get_breaks(content, length):
    data = ""
    words = content.split(' ')
    total_chars = 0

    for i in range(len(words)):
        total_chars += len(words[i])
        if total_chars > length:
            data = data + "<br>" + words[i]
            total_chars = 0
        else:
            data = data + " " + words[i]
    return data

We then input all the json files into a DataFrame. (This might take a while, since there are a lot of files). A lot of things will happen here.<br><br>
<b>First</b> we will only work with papers with meta data.<br><br>
<b>Second</b> if there is no abstract provided, we will use the title of the research paper as the abstract, as the analysis will use the abstract as a basis.<br><br>
<b>Third</b> all the other information is included on it's field, separated if there is more than one of each, in the case of authors, for example.


In [None]:
# Input the research papers into a DataFrame

dict_ = {'paper_id': [], 'abstract': [], 'body_text': [], 'authors': [], 'title': [], 'journal': [], 'abstract_summary': []}
for idx, entry in enumerate(all_json):
    if idx % (len(all_json) // 10) == 0:
        print(f'Processing index: {idx} of {len(all_json)}')
    content = FileReader(entry)
    
    # get metadata information
    meta_data = meta_df.loc[meta_df['sha'] == content.paper_id]
    # no metadata, skip this paper
    if len(meta_data) == 0:
        continue
    
    dict_['paper_id'].append(content.paper_id)
    dict_['abstract'].append(content.abstract)
    dict_['body_text'].append(content.body_text)
    
    # also create a column for the summary of abstract to be used in a plot
    if len(content.abstract) == 0: 
        # no abstract provided, we input the title
        dict_['abstract_summary'].append(meta_data['title'].values[0])
    else:
        dict_['abstract_summary'].append(content.abstract)
        
    # get metadata information
    meta_data = meta_df.loc[meta_df['sha'] == content.paper_id]
    
    # if more than one author
    try:
        authors = str(meta_data['authors'].values[0]).split(';')
        authors1 = [i.split(',') for i in authors]    
        dict_['authors'].append(". ".join(authors))
    except Exception as e:
        dict_['authors'].append(". ".join(authors))
    
    # add the title information, add breaks when needed
    dict_['title'].append(meta_data['title'].values[0])
    
    # add the journal information
    dict_['journal'].append(meta_data['journal'].values[0])
    
df_covid = pd.DataFrame(dict_, columns=['paper_id', 'abstract', 'body_text', 'authors', 'title', 'journal', 'abstract_summary'])
df_covid.head()

In [None]:
dict_ = None

In [None]:
# Adding word count column

df_covid['abstract_word_count'] = df_covid['abstract'].apply(lambda x: len(x.strip().split()))
df_covid['body_word_count'] = df_covid['body_text'].apply(lambda x: len(x.strip().split()))
df_covid.head()

There may be papers that were inputted from more than one source, so we should check and remove duplicated inputs.

In [None]:
# We will remove the duplicated papers
df_covid.shape

In [None]:
# Removing duplicated papers

duplicate_paper = ~(df_covid.title.isnull() | df_covid.abstract.isnull()) & (df_covid.duplicated(subset=['title', 'abstract']))
df_covid = df_covid[~duplicate_paper].reset_index(drop=True)
df_covid.shape

We will now start treating the text inside the DataFrame. Before doing anything, we have to define a lot of functions to do that.

In [None]:
# Creating a list of stopwords in english

english_stopwords = list(set(stopwords.words('english')))

In [None]:
# Creating a lemmatizing function

lmtzr = WordNetLemmatizer()

In [None]:
# Creating a stem function

porter = PorterStemmer()

In [None]:
# Creating a function that cleans text of special characters

def strip_characters(text):
    t = re.sub('\(|\)|:|,|;|\.|’|”|“|\?|%|>|<', '', text)
    t = re.sub('/', ' ', t)
    t = t.replace("'",'')
    return t

In [None]:
# Creating a function that makes text lowercase and uses the function created above

def clean(text):
    t = text.lower()
    t = strip_characters(t)
    return t

In [None]:
# Tokenize into individual tokens - words mostly

def tokenize(text):
    words = nltk.word_tokenize(text)
    return list(set([word for word in words 
                     if len(word) > 1
                     and not word in english_stopwords
                     and not (word.isnumeric() and len(word) is not 4)
                     and (not word.isnumeric() or word.isalpha())] )
               )

In [None]:
# Creating a function that cleans, lemmatize and tokenize texts

def preprocess(text):
    t = clean(text)
    tokens = tokenize(t)
    l = [lmtzr.lemmatize(word) for word in tokens]
    return tokens

In [None]:
def stemming(text):
    stem_sentence=[]
    for word in text:
        stem_sentence.append(porter.stem(word))
    return "".join(stem_sentence)

Now that the functions are ready, we can start trating our text. However, I will create the tokens in another column because I want to vectorize my X in two different ways.

In [None]:
# Preprocessing all the strings inside the column abstract. It will make them lowercase, remove special characters, stopwords and tokenize them.
df_covid['abstract_processed'] = df_covid['abstract'].apply(lambda x: preprocess(x))

In [None]:
# Preprocessing all the strings inside the column abstract. It will make stem them.
df_covid['abstract'] = df_covid['abstract'].apply(lambda x: stemming(x))

In [None]:
abstract = df_covid['abstract_processed'].tolist()
len(abstract)

In [None]:
# Creating vectors for each word
n_gram_all = []

for word in abstract:
    n_gram = []
    for i in range(len(word)-2+1):
        n_gram.append("".join(word[i:i+2]))
    n_gram_all.append(n_gram)

# It is time for some modeling
![](https://media0.giphy.com/media/Mvm1XBC8O48EM/giphy.gif?cid=790b76119bdd3e0bcedc198f7b26c76abecbb392aba651cf&rid=giphy.gif)

In [None]:
# hash vectorizer instance

hvec = HashingVectorizer(lowercase=False, analyzer=lambda l:l, n_features=2**12)

In [None]:
# Fit and Transforming hash vectorizer

X = hvec.fit_transform(n_gram_all)
X.shape

Before we go any further, let's determine the ideal number of clusters for our analysis. We will do that with the elbow method in kmeans. This takes a long time, so I put the tqdm library on the for loop, so we are sure the kernel is working. Seriously, it takes a ridiculous amount of time, even with a good computer, and since we are dealing with a great number of entries, we have to test a lot of possibilities of clusters. Run it, go watch a series, call your family, catch up with some friends.

In [None]:
# THIS WILL TAKE A LONG, LONG TIME. Go watch some series. Go call your family. Catch up with your friends.

# Building the clustering model and calculating the values of the Distortion and Inertia

distortions = [] 
inertias = [] 
mapping1 = {} 
mapping2 = {} 
K = range(1,26) 

for k in tqdm.tqdm(K): 
    #Building and fitting the model 
    kmeanModel = KMeans(n_clusters=k).fit(X.toarray()) 
    kmeanModel.fit(X.toarray())     

    distortions.append(sum(np.min(cdist(X.toarray(), kmeanModel.cluster_centers_, 
                          'euclidean'),axis=1)) / X.toarray().shape[0]) 
    inertias.append(kmeanModel.inertia_)

    mapping1[k] = sum(np.min(cdist(X.toarray(), kmeanModel.cluster_centers_, 
                     'euclidean'),axis=1)) / X.toarray().shape[0] 
    mapping2[k] = kmeanModel.inertia_ 
    time.sleep(0.1)

In [None]:
# List of number of clusters and the decrease of value, this helps to see exactly where the elbow is flexing

for key,val in mapping1.items(): 
    print(str(key)+' : '+str(val.round(4)))

In [None]:
# Plotting the elbow graph

plt.plot(K, distortions, 'bx-') 
plt.xlabel('Values of K') 
plt.ylabel('Distortion') 
plt.title('The Elbow Method using Distortion') 
plt.show()

From the graph we can see that ther are a few inflexions we could use, but I will go for 21 because of the large number of data.

In [None]:
# Dimensionality Reduction with t-SNE

tsne = TSNE(verbose = 1, perplexity = 10, metric = 'cosine', early_exaggeration = 20, learning_rate = 300, random_state = 42)
X_embedded = tsne.fit_transform(X)

In [None]:
# sns settings
sns.set(rc={'figure.figsize':(15,15)})

# colors
palette = sns.color_palette("bright", 1)

# plot
sns.scatterplot(X_embedded[:,0], X_embedded[:,1], palette=palette)

plt.title("t-SNE Covid-19 Articles")
# plt.savefig("plots/t-sne_covid19.png")
plt.show()

In [None]:
# determining the best number of clusters

k = 21
kmeans = MiniBatchKMeans(n_clusters=k)
y_pred = kmeans.fit_predict(X)
y = y_pred

In [None]:

output_notebook()
y_labels = y_pred

# data sources
source = ColumnDataSource(data=dict(
    x= X_embedded[:,0], 
    y= X_embedded[:,1],
    x_backup = X_embedded[:,0],
    y_backup = X_embedded[:,1],
    desc= y_labels, 
    titles= df_covid['title'],
    authors = df_covid['authors'],
    journal = df_covid['journal'],
    abstract = df_covid['abstract_summary'],
    labels = ["C-" + str(x) for x in y_labels]
    ))

# hover over information
hover = HoverTool(tooltips=[
    ("Title", "@titles{safe}"),
    ("Author(s)", "@authors"),
    ("Journal", "@journal"),
    ("Abstract", "@abstract{safe}"),
],
                 point_policy="follow_mouse")

# map colors
mapper = linear_cmap(field_name='desc', 
                     palette=Category20[20],
                     low=min(y_labels) ,high=max(y_labels))

# prepare the figure
p = figure(plot_width=800, plot_height=800, 
           tools=[hover, 'pan', 'wheel_zoom', 'box_zoom', 'reset'], 
           title="t-SNE Covid-19 Articles, Clustered(K-Means), Abstracts Hash Vectorized", 
           toolbar_location="right")

# plot
p.scatter('x', 'y', size=5, 
          source=source,
          fill_color=mapper,
          line_alpha=0.3,
          line_color="black",
          legend = 'labels')

# add callback to control 
callback = CustomJS(args=dict(p=p, source=source), code="""
            
            var radio_value = cb_obj.active;
            var data = source.data; 
            
            x = data['x'];
            y = data['y'];
            
            x_backup = data['x_backup'];
            y_backup = data['y_backup'];
            
            labels = data['desc'];
            
            if (radio_value == '20') {
                for (i = 0; i < x.length; i++) {
                    x[i] = x_backup[i];
                    y[i] = y_backup[i];
                }
            }
            else {
                for (i = 0; i < x.length; i++) {
                    if(labels[i] == radio_value) {
                        x[i] = x_backup[i];
                        y[i] = y_backup[i];
                    } else {
                        x[i] = undefined;
                        y[i] = undefined;
                    }
                }
            }


        source.change.emit();
        """)

# callback for searchbar
keyword_callback = CustomJS(args=dict(p=p, source=source), code="""
            
            var text_value = cb_obj.value;
            var data = source.data; 
            
            x = data['x'];
            y = data['y'];
            
            x_backup = data['x_backup'];
            y_backup = data['y_backup'];
            
            abstract = data['abstract'];
            titles = data['titles'];
            authors = data['authors'];
            journal = data['journal'];

            for (i = 0; i < x.length; i++) {
                if(abstract[i].includes(text_value) || 
                   titles[i].includes(text_value) || 
                   authors[i].includes(text_value) || 
                   journal[i].includes(text_value)) {
                    x[i] = x_backup[i];
                    y[i] = y_backup[i];
                } else {
                    x[i] = undefined;
                    y[i] = undefined;
                }
            }
            


        source.change.emit();
        """)

# option
option = RadioButtonGroup(labels=["C-0", "C-1", "C-2",
                                  "C-3", "C-4", "C-5",
                                  "C-6", "C-7", "C-8",
                                  "C-9", "C-10", "C-11",
                                  "C-12", "C-13", "C-14",
                                  "C-15", "C-16", "C-17",
                                  "C-18", "C-19", "C-20", "C-21",
                                  "C-22", "C-22", "C-23", "C-24",
                                  "C-25", "C-26", "C-27", "C-28",
                                  "C-20", "C-29", "C-30", "C-31",
                                  "C-32", "C-33", "C-34", "C-35",
                                  "C-36", "C-37", "C-38", "C-39",
                                  "C-40", "All"], 
                          active=40, callback=callback)

# search box
keyword = TextInput(title="Search:", callback=keyword_callback)

#header
header = Div(text="""<h1>COVID-19 Research Papers Interactive Cluster Map</h1>""")

# show
show(column(header, widgetbox(option, keyword),p))


In [None]:
# Vectorizing with plain text and TD-IDF

vectorizer = TfidfVectorizer(max_features=2**12)
X1 = vectorizer.fit_transform(df_covid['abstract'].values)

In [None]:
# Dimension reduction

tsne = TSNE(verbose=1, perplexity = 10, metric = 'cosine', early_exaggeration = 20, learning_rate = 300, random_state = 42)
X_embedded1 = tsne.fit_transform(X1.toarray())

In [None]:
# THIS WILL TAKE A LONG, LONG TIME. Go watch some series. Go call your family. Catch up with your friends.

# Building the clustering model and calculating the values of the Distortion and Inertia

distortions = [] 
inertias = [] 
mapping1 = {} 
mapping2 = {} 
K = range(1,26) 

for k in tqdm.tqdm(K): 
    #Building and fitting the model 
    kmeanModel = KMeans(n_clusters=k).fit(X1.toarray()) 
    kmeanModel.fit(X1.toarray())     

    distortions.append(sum(np.min(cdist(X1.toarray(), kmeanModel.cluster_centers_, 
                          'euclidean'),axis=1)) / X1.toarray().shape[0]) 
    inertias.append(kmeanModel.inertia_)

    mapping1[k] = sum(np.min(cdist(X1.toarray(), kmeanModel.cluster_centers_, 
                     'euclidean'),axis=1)) / X1.toarray().shape[0] 
    mapping2[k] = kmeanModel.inertia_ 
    time.sleep(0.1)

In [None]:
# List of number of clusters and the decrease of value, this helps to see exactly where the elbow is flexing

for key,val in mapping1.items(): 
    print(str(key)+' : '+str(val.round(4)))

In [None]:
# Plotting the elbow graph

plt.plot(K, distortions, 'bx-') 
plt.xlabel('Values of K') 
plt.ylabel('Distortion') 
plt.title('The Elbow Method using Distortion') 
plt.show()

In [None]:
# determining the best number of clusters for TD IDF

k = 21
kmeans = MiniBatchKMeans(n_clusters=k)
y_pred1 = kmeans.fit_predict(X1)
y1 = y_pred1

In [None]:
# sns settings
sns.set(rc={'figure.figsize':(15,15)})

# colors
palette = sns.color_palette("bright", len(set(y1)))

# plot
sns.scatterplot(X_embedded1[:,0], X_embedded1[:,1], hue=y1, legend='full', palette=palette)
plt.title("t-SNE Covid-19 Articles - Clustered(K-Means) - Tf-idf with Plain Text")
# plt.savefig("plots/t-sne_covid19_label_TFID.png")
plt.show()

In [None]:

output_notebook()
y_labels = y_pred1

# data sources
source = ColumnDataSource(data=dict(
    x= X_embedded1[:,0], 
    y= X_embedded1[:,1],
    x_backup = X_embedded1[:,0],
    y_backup = X_embedded1[:,1],
    desc= y_labels, 
    titles= df_covid['title'],
    authors = df_covid['authors'],
    journal = df_covid['journal'],
    abstract = df_covid['abstract_summary'],
    labels = ["C-" + str(x) for x in y_labels]
    ))

# hover over information
hover = HoverTool(tooltips=[
    ("Title", "@titles{safe}"),
    ("Author(s)", "@authors"),
    ("Journal", "@journal"),
    ("Abstract", "@abstract{safe}"),
],
                 point_policy="follow_mouse")

# map colors
mapper = linear_cmap(field_name='desc', 
                     palette=Category20[20],
                     low=min(y_labels) ,high=max(y_labels))

# prepare the figure
p = figure(plot_width=800, plot_height=800, 
           tools=[hover, 'pan', 'wheel_zoom', 'box_zoom', 'reset'], 
           title="t-SNE Covid-19 Articles, Clustered(K-Means), Tf-idf with Plain Text", 
           toolbar_location="right")

# plot
p.scatter('x', 'y', size=5, 
          source=source,
          fill_color=mapper,
          line_alpha=0.3,
          line_color="black",
          legend = 'labels')

# add callback to control 
callback = CustomJS(args=dict(p=p, source=source), code="""
            
            var radio_value = cb_obj.active;
            var data = source.data; 
            
            x = data['x'];
            y = data['y'];
            
            x_backup = data['x_backup'];
            y_backup = data['y_backup'];
            
            labels = data['desc'];
            
            if (radio_value == '20') {
                for (i = 0; i < x.length; i++) {
                    x[i] = x_backup[i];
                    y[i] = y_backup[i];
                }
            }
            else {
                for (i = 0; i < x.length; i++) {
                    if(labels[i] == radio_value) {
                        x[i] = x_backup[i];
                        y[i] = y_backup[i];
                    } else {
                        x[i] = undefined;
                        y[i] = undefined;
                    }
                }
            }


        source.change.emit();
        """)

# callback for searchbar
keyword_callback = CustomJS(args=dict(p=p, source=source), code="""
            
            var text_value = cb_obj.value;
            var data = source.data; 
            
            x = data['x'];
            y = data['y'];
            
            x_backup = data['x_backup'];
            y_backup = data['y_backup'];
            
            abstract = data['abstract'];
            titles = data['titles'];
            authors = data['authors'];
            journal = data['journal'];

            for (i = 0; i < x.length; i++) {
                if(abstract[i].includes(text_value) || 
                   titles[i].includes(text_value) || 
                   authors[i].includes(text_value) || 
                   journal[i].includes(text_value)) {
                    x[i] = x_backup[i];
                    y[i] = y_backup[i];
                } else {
                    x[i] = undefined;
                    y[i] = undefined;
                }
            }
            


        source.change.emit();
        """)

# option
option = RadioButtonGroup(labels=["C-0", "C-1", "C-2",
                                  "C-3", "C-4", "C-5",
                                  "C-6", "C-7", "C-8",
                                  "C-9", "C-10", "C-11",
                                  "C-12", "C-13", "C-14",
                                  "C-15", "C-16", "C-17",
                                  "C-18", "C-19", "C-20", "C-21",
                                  "C-22", "C-22", "C-23", "C-24",
                                  "C-25", "C-26", "C-27", "C-28",
                                  "C-20", "C-29", "C-30", "C-31",
                                  "C-32", "C-33", "C-34", "C-35",
                                  "C-36", "C-37", "C-38", "C-39",
                                  "C-40", "All"], 
                          active=40, callback=callback)

# search box
keyword = TextInput(title="Search:", callback=keyword_callback)

#header
header = Div(text="""<h1>COVID-19 Research Papers Interactive Cluster Map</h1>""")

# show
show(column(header, widgetbox(option, keyword),p))