This notebook is based on the CORVID-19 KAGGLE challenge https://www.kaggle.com/covid19
It provides a simple chatbot question answering functionality to answer COVID-19 queries. 

The notebook does not apply fine tuning, it applies a pre-trained NLP model from e.g. the sentence-transformers project. The notebook embeds the challenge's COVID 19 papers' paragraphs (from the abstracts or the fulltexts) into a corpus of embeddings. The papers had been pre-processed for the challenge, i.e. converted from source format (e.g. pdf) into a json file based on which this notebook trains. The papers and the according licenses can be found via https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge. 

After training (or loading a pre-trained model), the user can ask a question via the ask_question function. That returns the five sentences of all corpus that are closest.

Future enhancement: we could return the paper id in addition

Requires several standard packages, tqdm, sentence-transformers, scipy, torch (if allenai's scibert transformer model is used), transformer (if allenai's scibert transformer model is used), and iPython 


In [1]:
import os
import json
import warnings
import pickle
import gzip
warnings.simplefilter('ignore')

#which pretrained NLP model to use...
NLP_MODEL = 'bert-base-nli-mean-tokens'
#NLP_MODEL = 'allenai/scibert_scivocab_uncased'
#NLP_MODEL_friendly = 'scibert_scivocab_uncased'

#point to the preprocessed scientific paper directory - proof of concept, hence only a hard coded path
JSON_PATH="/media/sf_Python/CORD19/data/kaggle/"

#train corpus on fulltexts or on abstracts
MODE='abstract'
#MODE='fulltext'

Load the paper corpus... 
either the corpus exists already as a pkl file, or parse the JSON files of the kaggle challenge

In [2]:
if MODE == 'abstract':
    corpusfile='corpus.pkl'
    filenamefile='corpus_files.pkl'
elif MODE == 'fulltext':
    corpusfile='corpus_fulltext.pkl'
    filenamefile='corpus_fulltext_files.pkl'
else:
    raise ValueError('unrecognized MODE set: <{}>'.format(MODE))
    
if not os.path.exists(corpusfile+".gz"):
    json_files = []
    #JSON PATH should be set if the file is not present... requires you to have access to the papers
    for dirname, _, filenames in os.walk(JSON_PATH):
        for filename in filenames:
            if filename.endswith('.json'):
                json_files.append(os.path.join(dirname, filename))        
    print(json_files[0])
    print(json_files[-1])
            
    corpus = []
    filenames = []

    # loop through the files
    i=0
    total=len(json_files)
    for jfile in json_files[::]:
        # for each file open it and read as json
        i+=1
        if i%100 == 0:
            print("{}/{}".format(i,total))#
        #JSON PATH should be set if the file is not present... requires you to have access to the papers
        with open(os.path.join(JSON_PATH, jfile)) as json_file:
            covid_json = json.load(json_file)            
            if MODE == 'abstract':
                # read abstract
                for item in covid_json['abstract']:
                    corpus.append(item['text'])
                    filenames.append(JSON_PATH + jfile)
            elif MODE == 'fulltext':
                #read body text
                for item in covid_json['body_text']:
                    corpus.append(item['text'])
                    filenames.append(JSON_PATH + jfile)
    print(len(corpus))    
    with gzip.open(corpusfile+".gz", "wb") as f:
        pickle.dump(corpus, f)    
    with gzip.open(filenamefile+".gz", "wb") as f:
        pickle.dump(filenames, f) 
        
    print("Corpus created and stored.")
else:
    try:
        with open(corpusfile, "rb") as f:
            corpus = pickle.load(f)
        with open(filenamefile, "rb") as f:
            filenames = pickle.load(f)
    except:
        with gzip.open(corpusfile+".gz", "rb") as f:
            corpus = pickle.load(f)
        with gzip.open(filenamefile+".gz", "rb") as f:
            filenames = pickle.load(f)
    print("Stored corpus loaded.")

Stored corpus loaded.


No that we have the paper corpus, lets embed it using a pretrained NLP model, e.g. BERT
This takes time, so we can store the embeddings as a pkl file. If an embedding pkl file is found, we can skip the embedding creation and directly load it


In [3]:
print("Corpus size = %d"%(len(corpus)))

import scipy
import logging
logging.getLogger().setLevel(logging.INFO)


if NLP_MODEL == 'allenai/scibert_scivocab_uncased':
    if MODE == 'abstract':
        embeddingFileName = 'corpus_embeddings_{}.pkl'.format(NLP_MODEL_friendly)
    elif MODE == 'fulltext':
        embeddingFileName = 'corpus_fullText_embeddings_{}.pkl'.format(NLP_MODEL_friendly)
else:
    if MODE == 'abstract':
        embeddingFileName = 'corpus_embeddings_{}.pkl'.format(NLP_MODEL)
    elif MODE == 'fulltext':
        embeddingFileName = 'corpus_fullText_embeddings_{}.pkl'.format(NLP_MODEL)
        
if not os.path.exists(embeddingFileName+".gz"):
    if NLP_MODEL == 'allenai/scibert_scivocab_uncased':      
        from transformers import *
        import torch
        import tqdm
        
        tokenizer = AutoTokenizer.from_pretrained(NLP_MODEL)
        model = AutoModel.from_pretrained(NLP_MODEL)
        corpus_embeddings=[]
        for i in tqdm.notebook.trange(len(corpus)):
            try:
                embed=model(torch.tensor([tokenizer.encode(corpus[i])]))[-1]
                corpus_embeddings.append(embed.detach().numpy())
            except RuntimeError as e:
                import numpy as np
                corpus_embeddings.append(np.zeros((768,1))) 
                tqdm.tqdm.write("error: {}".format(e))                                
    else:
        from sentence_transformers import SentenceTransformer
        embedder = SentenceTransformer(NLP_MODEL)
        corpus_embeddings = embedder.encode(corpus, show_progress_bar=True)
    with gzip.open(embeddingFileName+".gz", "wb") as f:
        pickle.dump(corpus_embeddings, f)
    print("Corpus embeddings created and stored.")
else:    
    with gzip.open(embeddingFileName+".gz", "rb") as f:
        corpus_embeddings = pickle.load(f)
    print("Corpus embeddings loaded.")

Corpus size = 37344
Corpus embeddings loaded.


Now lets define the Q&A funciton - embedd the query received, compare it to all embedded corpus entities using e.g. the cosine distance, and return the five corpus entries closest to the query. 

Future enhancement: we could return the paper id in addition

In [6]:
from IPython.display import display, Markdown, Latex
import numpy as np

# inputs text query and results top N matching answers
def ask_question(query,  closest_n = 5):    
    queries = [query]
    if NLP_MODEL == 'allenai/scibert_scivocab_uncased':   
        try:
            if tokenizer:
                pass
        except:
            tokenizer = AutoTokenizer.from_pretrained(NLP_MODEL)
            model = AutoModel.from_pretrained(NLP_MODEL)
        query_embeddings=model(torch.tensor([tokenizer.encode(queries)]))[-1]
        query_embeddings=query_embeddings.detach().numpy()
    else:
        logging.getLogger().setLevel(logging.WARNING)
        from sentence_transformers import SentenceTransformer
        embedder = SentenceTransformer(NLP_MODEL)
        query_embeddings = embedder.encode(queries)

    # Find the closest N sentences of the corpus for each query sentence based on cosine similarity
    for query, query_embedding in zip(queries, query_embeddings):        
        distances = scipy.spatial.distance.cdist(query_embedding.reshape((1,len(query_embedding))), np.concatenate(corpus_embeddings), "cosine")[0]

        results = zip(range(len(distances)), distances)
        results = sorted(results, key=lambda x: x[1])
        display(Markdown('## Question -> %s'%query))
        display(Markdown('**Top 5 answers compiled below by running AI algorithm on research text.**<hr>')) 
        
        # get the closest answers
        count = 0
        for idx, distance in results[0:closest_n]:
            display(Markdown('- ### ' + corpus[idx].strip() + " (Score: %.4f)" % (1-distance) + " (filename:{})".format(filenames[idx])))
        display(Markdown('<hr>'))

Now we can post questions via ask_question and get five response sentences (taken from the configured corpus text 1:1)


In [7]:
ask_question('Does smoking or pre-existing pulmonary disease increase risk of COVID-19?')
#ask_question('Do drug increase risk of COVID-19?')


## Question -> Does smoking or pre-existing pulmonary disease increase risk of COVID-19?

**Top 5 answers compiled below by running AI algorithm on research text.**<hr>

- ### What is already known about this topic? Respiratory syncytial virus (RSV)-and rhinovirus (RV)-induced bronchiolitis are associated with an increased risk of asthma. (Score: 0.7901) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/40b1e938710764eab74f1bdadba1b75f2f7e480c.json)

- ### The emergence of viral respiratory pathogens with pandemic potential, such as severe acute respiratory syndrome coronavirus (SARS-CoV) and influenza A H5N1, urges the need for deciphering their pathogenesis to develop new intervention strategies. SARS-CoV infection causes acute lung injury (ALI) that may develop into life-threatening acute respiratory distress syndrome (ARDS) with advanced age correlating positively with adverse disease outcome. The molecular pathways, however, that cause virus-induced ALI/ARDS in aged individuals are ill-defined. Here, we show that SARS-CoVinfected aged macaques develop more severe pathology than young adult animals, even though viral replication levels are similar. Comprehensive genomic analyses indicate that aged macaques have a stronger host response to virus infection than young adult macaques, with an increase in differential expression of genes associated with inflammation, with NF-kB as central player, whereas expression of type I interferon (IFN)-b is reduced. Therapeutic treatment of SARS-CoV-infected aged macaques with type I IFN reduces pathology and diminishes pro-inflammatory gene expression, including interleukin-8 (IL-8) levels, without affecting virus replication in the lungs. Thus, ALI in SARS-CoV-infected aged macaques developed as a result of an exacerbated innate host response. The anti-inflammatory action of type I IFN reveals a potential intervention strategy for virus-induced ALI. (Score: 0.7466) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/284790100b67133f3228466016a8f98ad096e24d.json)

- ### To investigate the long-term effects of mild H1N1 influenza infection on the pulmonary function of a cohort of patients. (Score: 0.7463) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/e642816c09dd07b7bdf515088670a72ee8698bd8.json)

- ### Some newly emerging viral lung infections have the potential to cause large outbreaks of severe respiratory disease amongst humans. In this contribution we discuss infections by influenza A (H5n1), SARS and Hanta virus. (Score: 0.7423) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/305482d398f7da13bff462c3e8162bada6c5e195.json)

- ### Exposure to oxidant air pollution is associated with increased respiratory morbidities and susceptibility to infections. Ozone is a commonly encountered oxidant air pollutant, yet its effects on influenza infections in humans are not known. The greater Mexico City area was the primary site for the spring 2009 influenza A H1N1 pandemic, which also coincided with high levels of environmental ozone. Proteolytic cleavage of the viral membrane protein hemagglutinin (HA) is essential for influenza virus infectivity. Recent studies suggest that HA cleavage might be cell-associated and facilitated by the type II transmembrane serine proteases (TTSPs) human airway trypsin-like protease (HAT) and transmembrane protease, serine 2 (TMPRSS2), whose activities are regulated by antiproteases, such as secretory leukocyte protease inhibitor (SLPI). Based on these observations, we sought to determine how acute exposure to ozone may modulate cellular protease/antiprotease expression and function, and to define their roles in a viral infection. We utilized our in vitro model of differentiated human nasal epithelial cells (NECs) to determine the effects of ozone on influenza cleavage, entry, and replication. We show that ozone exposure disrupts the protease/antiprotease balance within the airway liquid. We also determined that functional forms of HAT, TMPRSS2, and SLPI are secreted from human airway epithelium, and acute exposure to ozone inversely alters their expression levels. We also show that addition of antioxidants significantly reduces virus replication through the induction of SLPI. In addition, we determined that ozone-induced cleavage of the viral HA protein is not cell-associated and that secreted endogenous proteases are sufficient to activate HA leading to a significant increase in viral replication. Our data indicate that pre-exposure to ozone disrupts the protease/antiprotease balance found in the human airway, leading to increased influenza susceptibility. (Score: 0.7396) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/24ed22a878649ce74463bf63090563093d002c86.json)

<hr>

In [8]:
ask_question('Are neonates and pregnant women ar greater risk of COVID-19?')

## Question -> Are neonates and pregnant women ar greater risk of COVID-19?

**Top 5 answers compiled below by running AI algorithm on research text.**<hr>

- ### BACKGROUND Person to person spread of COIVD-19 in the UK has now been confirmed. There are limited case series reporting the impact on women affected by coronaviruses (CoV) during pregnancy. In women affected by SARS and MERS, the case fatality rate appeared higher in women affected in pregnancy compared with non-pregnant women. We conducted a rapid, review to guide management of women affected by COVID -19 during pregnancy and developed interim practice guidance with the RCOG and RCPCH to inform maternity and neonatal service planning METHODS Searches were conducted in PubMed and MedRxiv to identify primary case reports, case series, observational studies or randomised-controlled trial describing women affected by coronavirus in pregnancy and on neonates. Data was extracted from relevant papers and the review was drafted with representatives of the RCPCH and RCOG who also provided expert consensus on areas where data were lacking RESULTS From 9964 results on PubMed and 600 on MedRxiv, 18 relevant studies (case reports and case series) were identified. There was inconsistent reporting of maternal, perinatal and neonatal outcomes across case reports and series concerning COVID-19, SARS, MERS and other coronaviruses. From reports of 19 women to date affected by COVID-19 in pregnancy, delivering 20 babies, 3 (16%) were asymptomatic, 1 (5%) was admitted to ICU and no maternal deaths have been reported. Deliveries were 17 by caesarean section, 2 by vaginal delivery, 8 (42%) delivered pre-term. There was one neonatal death, in 15 babies who were tested there was no evidence of vertical transmission. (Score: 0.8038) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/biorxiv_medrxiv/26e75d3c815aae7fd9b094c3e5c74d3f7132ca13.json)

- ### Planning for a future infl uenza pandemic should include considerations specifi c to pregnant women. First, pregnant women are at increased risk for infl uenza-associated illness and death. The effects on the fetus of maternal infl uenza infection, associated fever, and agents used for prophylaxis and treatment should be taken into account. Pregnant women might be reluctant to comply with public health recommendations during a pandemic because of concerns regarding effects of vaccines or medications on the fetus. Guidelines regarding nonpharmaceutical interventions (e.g., voluntary quarantine) also might present special challenges because of confl icting recommendations about routine prenatal care and delivery. Finally, healthcare facilities need to develop plans to minimize exposure of pregnant women to ill persons, while ensuring that women receive necessary care. (Score: 0.7992) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/372fcfd74ff4f579704d40ec8fe524357534ef22.json)

- ### There are limited case series reporting the impact on women affected by coronaviruses (CoV) during pregnancy. In women affected by SARS and MERS, the case fatality rate appeared higher in women affected in pregnancy compared with non-pregnant women. (Score: 0.7872) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/biorxiv_medrxiv/26e75d3c815aae7fd9b094c3e5c74d3f7132ca13.json)

- ### Exposure to medications in pregnancy can be toxic to a fetus in a gestational age-dependent manner [1] . Medications that are teratogenic at certain stages in the first trimester may be safe later in pregnancy, and medications later in pregnancy may have metabolic effects that interfere with neonatal function. Determination of safe medications for use in pregnancy must take into consideration the relative need for the use of certain medications and the possibility of inadvertent exposure in early pregnancy because of unplanned pregnancies. (Score: 0.7823) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/7d7ae06b934fc522ac85ca03044df16bdeecc287.json)

- ### Background: Infections during pregnancy have the potential to adversely impact birth outcomes. We evaluated the association between receipt of inactivated influenza vaccine during pregnancy and prematurity and small for gestational age (SGA) births. (Score: 0.7803) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/db493d400b682be0385bd1ff034fa718d0c398cb.json)

<hr>

In [9]:
ask_question('Severity of disease, including risk of fatality among symptomatic hospitalized patients, and high-risk patient groups')

## Question -> Severity of disease, including risk of fatality among symptomatic hospitalized patients, and high-risk patient groups

**Top 5 answers compiled below by running AI algorithm on research text.**<hr>

- ###  assessed the prevalence of comorbidities in infected patients.  comorbidities are risk factors for severe patients compare with Non-severe. (Score: 0.8704) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/4090d8537051fae501355bc567dc2dec620762eb.json)

- ### Interpretation Disease caused by MERS-CoV presents with a wide range of clinical manifestations and is associated with substantial mortality in admitted patients who have medical comorbidities. Major gaps in our knowledge of the epidemiology, community prevalence, and clinical spectrum of infection and disease need urgent defi nition. (Score: 0.8581) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/6824b8522fc9b537dd57e4f11f5b0d4313acfe14.json)

- ### In the critical patient, several factors are combined making them especially vulnerable to infections. (Score: 0.8576) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/d42a4f7842aa6f0d67cad94f1defa6258e9513ad.json)

- ### We assessed the prevalence of comorbidities in the COVID-19 infection patients and found underlying disease, including hypertension, respiratory system disease and cardiovascular, may be a risk factor for severe patients compared with Non-severe patients. (Score: 0.8442) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/4090d8537051fae501355bc567dc2dec620762eb.json)

- ### Presences of signs of severity impose hospitalization: signs of respiratory distress, shock, acute confusion but also fragile patients, insufficient home support or absence of response to initial treatment. (Score: 0.8227) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/d27842f2883b8884d975fe744e1157bb727d6f9b.json)

<hr>

In [10]:
ask_question('Are there socio-economic and behavioral factors that help understand economic impact of the virus COVID-19 and whether there were differences?')

## Question -> Are there socio-economic and behavioral factors that help understand economic impact of the virus COVID-19 and whether there were differences?

**Top 5 answers compiled below by running AI algorithm on research text.**<hr>

- ### The purpose of the current study is to estimate the economic consequences associated with unintentional, hypothetical releases of foot-and-mouth disease virus (FMDv) from NBAF. Specifically, we assess the economic consequences to agricultural firms and consumers, quantify costs and disruptions to non-agricultural activities in the epidemiologically impacted region, and assess costs of response to the government. Different from previous economic studies of FMD outbreaks in the United States [3-6] and across the world [7], the current study is unique as it examines unintentional aerosol, liquid waste, transference, and tornado releases from an animal research facility. This set of releases provides a nearly complete coverage of the feasible risk space, providing a broad landscape over which release events could translate into a range of economic consequences. Given the inherent uncertainty in an FMD outbreak, economic consequences are assessed over a distribution of epidemiological outcomes. Finally, the extensive and intensive nature of this site specific risk assessment (e.g., plume, epidemiological, socioeconomic data, information, and modeling) over a large study region is unprecedented in previous FMD studies focused on the U.S. (Score: 0.7838) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/77b8307c0de7ad71378665456b43e4185d5e5477.json)

- ### Economic analysis is congressionally mandated as part of the site specific risk assessment, necessary to link outcomes from the plume and epidemiological models to risk outcomes. A focal point of the risk assessment centers on potential accidental releases of viruses from NBAF and the subsequent consequences of such releases. Releases of viruses from research facilities are not necessarily common, but have uncertain economic consequences and have occurred in the past [2] . (Score: 0.7738) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/77b8307c0de7ad71378665456b43e4185d5e5477.json)

- ### The vulnerability approach suggests that disasters such as epidemics have different effects according not only to physical vulnerability but also to economic class (status). This paper examines the effect of the Middle East Respiratory Syndrome epidemic on the labor market to investigate whether vulnerable groups become more vulnerable due to an interaction between the socio-economic structure and physical risk. Methods: This paper examines the effect of the Middle East Respiratory Syndrome epidemic on the labor market by considering unemployment status, job status, working hours, reason for unemployment and underemployment status. In particular, the study investigates whether the U-shaped curve becomes a J-shaped curve due to the interaction between medical vulnerability and labor market vulnerability after an outbreak, assuming that the relative vulnerability in the labor market by age shows a U curve with peaks for the young group and middle aged and old aged groups using the Economically Active Population Survey. We use the difference in difference approach and also conduct a falsification check and robustness check. (Score: 0.7612) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/41f31582cb50c99b10a376f465107d82bacf135c.json)

- ### This paper looks at the role that risk, and especially the perception of risk, its communication and management, played in driving the economic impact of SARS. It considers the public and public health response to SARS, the role of the media and official organisations, and proposes policy and research priorities for establishing a system to better deal with the next global infectious disease outbreak. It is concluded that the potential for the rapid spread of infectious disease is not necessarily a greater threat than it has always been, but the effect that an outbreak can have on the economy is, which requires further research and policy development. r (Score: 0.7401) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/6209bb15edbdcc8d2c9bdd5a29e97231a9744e02.json)

- ### There is concern regarding the impact that a global infectious disease pandemic might have, especially the economic impact in the current financial climate. However, preparedness planning concentrates more upon population health and maintaining a functioning health sector than on the wider economic impact. We developed a single country Computable General Equilibrium model to estimate the economic impact of pandemic influenza (PI) and associated policies. While the context for this development was the United Kingdom, there are lessons to be drawn for application of this methodology, as well as indicative results, to other contexts. (Score: 0.7246) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/022a159b5292e57c599853e11c0e1a8b5f8aee06.json)

<hr>

In [11]:
ask_question('What is the severity of disease, including risk of fatality among symptomatic hospitalized patients, and high-risk patient groups?')

## Question -> What is the severity of disease, including risk of fatality among symptomatic hospitalized patients, and high-risk patient groups?

**Top 5 answers compiled below by running AI algorithm on research text.**<hr>

- ###  assessed the prevalence of comorbidities in infected patients.  comorbidities are risk factors for severe patients compare with Non-severe. (Score: 0.8893) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/4090d8537051fae501355bc567dc2dec620762eb.json)

- ### We assessed the prevalence of comorbidities in the COVID-19 infection patients and found underlying disease, including hypertension, respiratory system disease and cardiovascular, may be a risk factor for severe patients compared with Non-severe patients. (Score: 0.8717) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/4090d8537051fae501355bc567dc2dec620762eb.json)

- ### Interpretation Disease caused by MERS-CoV presents with a wide range of clinical manifestations and is associated with substantial mortality in admitted patients who have medical comorbidities. Major gaps in our knowledge of the epidemiology, community prevalence, and clinical spectrum of infection and disease need urgent defi nition. (Score: 0.8571) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/6824b8522fc9b537dd57e4f11f5b0d4313acfe14.json)

- ### Middle East Respiratory Syndrome Coronavirus (MERS-CoV) leads to healthcare-associated transmission to patients and healthcare workers with potentially fatal outcomes. (Score: 0.8309) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/9dcd697b97cd39fbeaf2c5f8be7d0bc139b84629.json)

- ### In the critical patient, several factors are combined making them especially vulnerable to infections. (Score: 0.8300) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/d42a4f7842aa6f0d67cad94f1defa6258e9513ad.json)

<hr>

In [12]:
ask_question('Does rise in pollution increase risk of COVID-19?')

## Question -> Does rise in pollution increase risk of COVID-19?

**Top 5 answers compiled below by running AI algorithm on research text.**<hr>

- ### Exposure to oxidant air pollution is associated with increased respiratory morbidities and susceptibility to infections. Ozone is a commonly encountered oxidant air pollutant, yet its effects on influenza infections in humans are not known. The greater Mexico City area was the primary site for the spring 2009 influenza A H1N1 pandemic, which also coincided with high levels of environmental ozone. Proteolytic cleavage of the viral membrane protein hemagglutinin (HA) is essential for influenza virus infectivity. Recent studies suggest that HA cleavage might be cell-associated and facilitated by the type II transmembrane serine proteases (TTSPs) human airway trypsin-like protease (HAT) and transmembrane protease, serine 2 (TMPRSS2), whose activities are regulated by antiproteases, such as secretory leukocyte protease inhibitor (SLPI). Based on these observations, we sought to determine how acute exposure to ozone may modulate cellular protease/antiprotease expression and function, and to define their roles in a viral infection. We utilized our in vitro model of differentiated human nasal epithelial cells (NECs) to determine the effects of ozone on influenza cleavage, entry, and replication. We show that ozone exposure disrupts the protease/antiprotease balance within the airway liquid. We also determined that functional forms of HAT, TMPRSS2, and SLPI are secreted from human airway epithelium, and acute exposure to ozone inversely alters their expression levels. We also show that addition of antioxidants significantly reduces virus replication through the induction of SLPI. In addition, we determined that ozone-induced cleavage of the viral HA protein is not cell-associated and that secreted endogenous proteases are sufficient to activate HA leading to a significant increase in viral replication. Our data indicate that pre-exposure to ozone disrupts the protease/antiprotease balance found in the human airway, leading to increased influenza susceptibility. (Score: 0.7741) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/24ed22a878649ce74463bf63090563093d002c86.json)

- ### The ongoing outbreak of a new coronavirus (2019-nCoV) causes an epidemic of acute respiratory syndrome in humans. 2019-nCoV rapidly spread to national regions and multiple other countries, thus, pose a serious threat to public health. Recent studies show that spike (S) proteins of 2019-nCoV and SARS-CoV may use the same host cell receptor called angiotensin-converting enzyme 2 (ACE2) for entering into host cells. The affinity between ACE2 and 2019-nCoV S is much higher than ACE2 binding to SARS-CoV S protein, explaining that why 2019-nCoV seems to be more readily transmitted from the human to human. Here, we reported that ACE2 can be significantly upregulated after infection of various viruses including SARS-CoV and MERS-CoV. Basing on findings here, we propose that coronavirus infection can positively induce its cellular entry receptor to accelerate their replication and spread, thus drugs targeting ACE2 expression may be prepared for the future emerging infectious diseases caused by this cluster of viruses. (Score: 0.7430) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/biorxiv_medrxiv/c8b6f2752a842dd9eb9c50a82112748ef10ba259.json)

- ### During the surveillance of influenza pandemics, underreported data are a public health challenge that complicates the understanding of pandemic threats and can undermine mitigation efforts. We propose a method to estimate incidence reporting rates at early stages of new influenza pandemics using 2009 pandemic H1N1 as an example. Routine surveillance data and statistics of travellers arriving from Mexico were used. Our method incorporates changes in reporting rates such as linearly increasing trends due to the enhanced surveillance. From our results, the reporting rate was estimated at 0·46% during early stages of the pandemic in Mexico. We estimated cumulative incidence in the Mexican population to be 0·7% compared to 0·003% reported by officials in Mexico at the end of April. This method could be useful in estimation of actual cases during new influenza pandemics for policy makers to better determine appropriate control measures. (Score: 0.7239) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/b3e1d993fd5eb9e538a5acb4f5933d4bb6209878.json)

- ### Since the mid-19 th century, human activities have increased greenhouse gases such as carbon dioxide, methane, and nitrous oxide in the Earth's atmosphere that resulted in increased average temperature. The effects of rising temperature include soil degradation, loss of productivity of agricultural land, desertification, loss of biodiversity, degradation of ecosystems, reduced fresh-water resources, acidification of the oceans, and the disruption and depletion of stratospheric ozone. All these have an impact on human health, causing non-communicable diseases such as injuries during natural disasters, malnutrition during famine, and increased mortality during heat waves due to complications in chronically ill patients. Direct exposure to natural disasters has also an impact on mental health and, although too complex to be quantified, a link has even been established between climate and civil violence. (Score: 0.7204) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/noncomm_use_subset/94af6bf9cae266730bbc0480ab71595c12072891.json)

- ### Background: Concern intensifying that emerging infectious diseases and global environmental changes that could generate major future human pandemics. (Score: 0.7178) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/4209267ef4da6a8206733929e58967faebf8a5b7.json)

<hr>

In [13]:
ask_question('Are there public health mitigation measures that could be effective for control of COVID-19?')

## Question -> Are there public health mitigation measures that could be effective for control of COVID-19?

**Top 5 answers compiled below by running AI algorithm on research text.**<hr>

- ### Handwashing could prove a useful target for health promotion, but interventions to promote infection control may need to address a number of factors identified within this study as potential barriers to carrying out infection control behaviours. (Score: 0.7956) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/f9a61ae749c3d53492b38119c9fbe5f0e448b52a.json)

- ### Methods: A systematic review of peer-and non-peer-reviewed literature focused on the following questions: 1) What public health systems exist for communicating PHEPR messages from public health agencies to HCPs? 2) Have these systems been evaluated and, if yes, what criteria were used to evaluate these systems? 3) What have these evaluations discovered about characterizations of the most effective ways for public health agencies to communicate PHEPR messages to HCPs? (Score: 0.7887) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/db7db8941a68a14e0b227ce42898ad4ecd40df62.json)

- ###  descriptions of the public health importance of the health event under surveillance; the system under evaluation; the direct costs needed to operate the system; the usefulness of the system; (Score: 0.7784) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/ec9040732c2587bf4776b0953905e8a87828e8cf.json)

- ### Messages regarding health protective behaviours from local health authorities should anticipate the balance between overreacting and underreacting. Also, when protective recommendations from health professionals conflict with company policies, it is unclear how employees will react. (Score: 0.7750) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/c0f7f46699d0ecf0c7ead699d564d952508e6f81.json)

- ### Research that aims to examine priority setting practices in hospitals would benefit from applying a health policy lens to their analysis. (Score: 0.7720) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/d708b6876813c3915edcbdda08c014d90ec694ab.json)

<hr>

In [14]:
ask_question('What do we know about COVID-19 risk factors? What have we learned from epidemiological studies?')

## Question -> What do we know about COVID-19 risk factors? What have we learned from epidemiological studies?

**Top 5 answers compiled below by running AI algorithm on research text.**<hr>

- ### We forecasted the epidemic of COVID-19 based on current clinical and epidemiological data and built a modified SEIR model to consider both the infectivity during incubation period and the influence on the epidemic from strict quarantined measures. (Score: 0.7791) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/biorxiv_medrxiv/ae61300f4e7a12a68baf114f199eb51940c7aad3.json)

- ### The paper by Scarpino and Petri is a step in this direction. The authors study the information theoretic limits to forecasting infectious disease outbreaks. They use permutation entropy as the method of their choice to study this question. Using diverse time series data on a number of diseases, they investigate this important question. (Score: 0.7791) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/02b20ad26d6b2f05b38712292186edbf3aa05862.json)

- ### It can be argued that the arrival of the This article is part of the ''Genomics of Emerging Infectious Disease'' PLoS Journal collection (http://ploscollections.org/emerginginfectiousdisease/). (Score: 0.7729) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/9a1ed211481f2c4e15f48ec3f712e73901eb628a.json)

- ### 2-18. We compared epidemiological characteristics across periods and different demographic groups. We developed a susceptible-exposed-infectious-recovered model to study the epidemic and evaluate the impact of interventions. (Score: 0.7675) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/biorxiv_medrxiv/9701a8c529cd8c18124da4cd61c1165a64b50281.json)

- ### Background Isolation of cases and contact tracing is used to control outbreaks of infectious diseases, and has been used for coronavirus disease 2019 (COVID-19). Whether this strategy will achieve control depends on characteristics of both the pathogen and the response. Here we use a mathematical model to assess if isolation and contact tracing are able to control onwards transmission from imported cases of COVID-19. (Score: 0.7648) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/noncomm_use_subset/43064e9a5b81ad1ac0743c818cda48383c246c95.json)

<hr>

In [15]:
ask_question('Does social status affect the risk of infection with COVID-19?')

## Question -> Does social status affect the risk of infection with COVID-19?

**Top 5 answers compiled below by running AI algorithm on research text.**<hr>

- ### Negative public reactions to emerging infectious diseases can adversely affect population health. We assessed whether social support was associated with knowledge of, worry about, and attitudes towards AIDS and severe acute respiratory syndrome. Our fi ndings suggest that social support may be central to our understanding of public responses to emerging infectious diseases. (Score: 0.7627) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/45d5977d22dcb1bc57c9cd4348f63c1cc19eaa84.json)

- ### We explored how different socioeconomic and racial/ ethnic groups in the United States might fare in an infl uenza pandemic on the basis of social factors that shape exposure, vulnerability to infl uenza virus, and timeliness and adequacy of treatment. We discuss policies that might differentially affect social groups' risk for illness or death. Our purpose is not to establish the precise magnitude of disparities likely to occur; rather, it is to call attention to avoidable disparities that can be expected in the absence of systematic attention to differential social risks in pandemic preparedness plans. Policy makers at the federal, state, and local levels should consider potential sources of socioeconomic and racial/ ethnic disparities during a pandemic and formulate specifi c plans to minimize these disparities. (Score: 0.7612) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/f1446580f1091a33b07aad5ecbe7151c5895cafc.json)

- ### In this paper, we study the interplay between the epidemic spreading and the diffusion of awareness in multiplex networks. In the model, an infectious disease can spread in one network representing the paths of epidemic spreading (contact network), leading to the diffusion of awareness in the other network (information network), and then the diffusion of awareness will cause individuals to take social distances, which in turn affects the epidemic spreading. As for the diffusion of awareness, we assume that, on the one hand, individuals can be informed by other aware neighbors in information network, on the other hand, the susceptible individuals can be self-awareness induced by the infected neighbors in the contact networks (local information) or mass media (global information). Through Markov chain approach and numerical computations, we find that the density of infected individuals and the epidemic threshold can be affected by the structures of the two networks and the effective transmission rate of the awareness. However, we prove that though the introduction of the self-awareness can lower the density of infection, which cannot increase the epidemic threshold no matter of the local information or global information. Our finding is remarkably different to many previous results on single-layer network: local information based behavioral response can alter the epidemic threshold. Furthermore, our results indicate that the nodes with more neighbors (hub nodes) in information networks are easier to be informed, as a result, their risk of infection in contact networks can be effectively reduced. (Score: 0.7496) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/6c008c0ac375184854e2f5e357cc983bbddefc3e.json)

- ### The spread of infectious disease is determined by biological factors, e.g. the duration of the infectious period, and social factors, e.g. the arrangement of potentially contagious contacts. Repetitiveness and clustering of contacts are known to be relevant factors influencing the transmission of droplet or contact transmitted diseases. However, we do not yet completely know under what conditions repetitiveness and clustering should be included for realistically modelling disease spread. (Score: 0.7490) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/b15e513ac2f5696b1e51324fb0a3118c44a6a9e9.json)

- ### Beyond network structure, the rate at which individuals meet with their contacts depends on the individual preemptive measures taken during the course of a disease 16 . Consequently, a number of dynamic models have been developed to assess the effects individual preemptive measures have on infectious disease spread over networks [17] [18] [19] [20] . These models couple behavior and disease dynamics. That is, the state of the disease and the contact network determine the preemptive measures of the individuals which then affect the disease spread. Preemptive measures in these models, which are in the form of social distancing or rewiring of transmissive links, are assumed to be results of simple heuristics that approximate the decision-making of healthy individuals. (Score: 0.7393) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/0234207cba8165f1e1852b92ae0fa0d9e2f870fb.json)

<hr>

In [16]:
ask_question('Which precautions reduce risk of COVID-19?')

## Question -> Which precautions reduce risk of COVID-19?

**Top 5 answers compiled below by running AI algorithm on research text.**<hr>

- ### Consequences of secondary or co-infections for immunity (Score: 0.7449) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/7df426b06ebd25d6570bee0a18a7eb6c5dd673ac.json)

- ### • Results indicate some better hope (such as reducing the connection) may lead to backfire. (Score: 0.7230) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/dcecb3ef49f22680bf9e96142f00142f3b1d0873.json)

- ### Conclusion: This raises the possibility that mismatched AOs could still be therapeutically applicable in some cases, negating the necessity to produce patient-specific compounds. (Score: 0.7145) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/comm_use_subset/612fc352956cba3b9e5e74179125dc5d9aadba23.json)

- ### • The evolutionary vaccination game is considered in a modified activity driven network. • A closeness parameter p which is used to describe the connection between individuals is presented. • The closeness p may have an active role in weakening both the spreading of epidemic and the vaccination. (Score: 0.7075) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/dcecb3ef49f22680bf9e96142f00142f3b1d0873.json)

- ### Conclusion. -Education for infection control programs, hand hygiene campaigns, and antibiotics control programs may decrease the incidence density of AB and HAI, and may help control CRA complex infection. (Score: 0.7062) (filename:/media/sf_Python/CORD19/data/kaggle//media/sf_Python/CORD19/data/kaggle/custom_license/dd2c8dfeb13971b9878c79641d699c399090c748.json)

<hr>