In [1]:
import json


def reader(json_file_path: str):
    with open(json_file_path) as curr_json:
        return json.load(curr_json)


class ArticleReader:

    def __init__(self, json_file_path):
        data = reader(json_file_path)
        self.file = json_file_path
        self.body = data["body_text"]
        self.metadata = data["metadata"]
        self.id = data["paper_id"]
        self.abstract = data["abstract"]
        self.bib = data["bib_entries"]

    def get_text_parts(self):
        return "".join([x['text'] for x in self.body])

In [2]:
import os
from typing import List
from ArticleReader import ArticleReader


def gather_docs_by_keyword(data_path: str, keywords: List[str]) -> (List[str], dict):
    docs = []
    keyword_counts = {}
    for root, dirs, files in os.walk(data_path):
        for file in files:
            if file.endswith(".json"):
                curr_file_path = os.path.join(root, file)
                article = ArticleReader(curr_file_path)
                text_parts = article.get_text_parts()
                status = contains_keywords(text_parts, keywords)
                if status:
                    docs.append(os.path.join(root, file))
                    keyword_count = get_keyword_count(text_parts, keywords)
                    keyword_counts[file] = keyword_count

    return docs, keyword_counts


def contains_keywords(text_parts: str, keywords: List[str]) -> bool:
    for keyword in keywords:
        status = keyword in text_parts
        if status:
            return True
        else:
            continue

    return False


def get_keyword_count(text_parts: str, keywords: List[str]) -> int:
    keyword_count = 0
    for keyword in keywords:
        keyword_count += text_parts.count(keyword)

    return keyword_count


def get_top_docs_by_keyword(keyword_counts: dict, num_top_aricles: int) -> List[str]:
    top_doc_counts = sorted(keyword_counts.items(), key=lambda x: x[1], reverse=True)[:num_top_aricles]
    return [x[0] for x in top_doc_counts]

def get_sentences_with_keywords(docs: List[str], keywords: List[str], n = 30) -> List[str]:
    for doc in docs:
        if n > 0:
            article = ArticleReader(doc)
            text = article.get_text_parts()
            text = text.split('. ')
            for sentence in text:
                if any(keyword in sentence for keyword in keywords):
                    print(article.file)
                    print(article.metadata['title'])
                    print()
                    print(sentence)
                    print()
                    print()
                    print()
                    n = n - 1

In [3]:
docs, keyword_counts = gather_docs_by_keyword('data', ['R0'])

In [4]:
docs[0]

'data\\biorxiv_medrxiv\\biorxiv_medrxiv\\06c1b3535b83251cf92c01258b5048beeab7a460.json'

In [5]:
article = ArticleReader(docs[0])

In [6]:
article.get_text_parts()

'The basic reproductive number -R 0 -is one of the most common and most commonly misapplied numbers in public health. Nevertheless, estimating R 0 for every transmissible pathogen, emerging or endemic, remains a priority for epidemiologists the world over. Although often used to compare outbreaks and forecast pandemic risk, this single number belies the complexity that two different pathogens can exhibit, even when they have the same R 0 . Here, we show how predicting outbreak size requires both an estimate of R 0 and an estimate of the heterogeneity in the number of secondary infections. To facilitate rapid determination of outbreak risk, we propose a reformulation of a classic result from random network theory that relies on contact tracing data to simultaneously determine the first moment (R 0 ) and the higher moments (representing the heterogeneity) in the distribution of secondary infections. Further, we show how this framework is robust in the face of the typically limited amount

In [7]:
get_sentences_with_keywords(docs, ['school'])

data\biorxiv_medrxiv\biorxiv_medrxiv\36a5f6d55d7c5f67d4344e36da0a72856ad3dda0.json
Title: The Novel Coronavirus, 2019-nCoV, is Highly Contagious and More Infectious Than Initially Estimated

If the value of R0 is as high in other countries, our results suggest that active and strong population-wide social distancing efforts, such as closing down transportation system, schools, discouraging travel, etc., might be needed to reduce the overall contacts to contain the spread of the virus



data\biorxiv_medrxiv\biorxiv_medrxiv\4f53e43ba1bfb84611eaa839994f9297cdd65dc9.json
The effectiveness of full and partial travel bans against COVID-19 spread in Australia for travellers from China

The ban on travel from China has been periodically reviewed, with lifting of restrictions announced on February 23 rd for high school students, who number less than 800



data\biorxiv_medrxiv\biorxiv_medrxiv\734779fad93249a2f6fede6afd10eeff7b37919b.json
Projecting the transmission dynamics of SARS-CoV-2 throu

data\comm_use_subset\comm_use_subset\7c1647ec918ab799e8f2dc782620024844c52a55.json
DO IT Trial: vitamin D Outcomes and Interventions in Toddlers -a TARGet Kids! randomized controlled trial

Viral URTI and asthma exacerbations combined make up over 30% of all emergency department visits for children in Canada [2] .Data from our group [3] and others have repeatedly demonstrated that most urban preschoolers living in North America have vitamin D serum levels significantly lower than values recommended by both the American Academy of Pediatrics (AAP) and the Canadian Paediatric Society (CPS) [4, 5] 



data\comm_use_subset\comm_use_subset\7c1647ec918ab799e8f2dc782620024844c52a55.json
DO IT Trial: vitamin D Outcomes and Interventions in Toddlers -a TARGet Kids! randomized controlled trial

Furthermore, it is not known whether vitamin D supplementation leads to measurable improvement in child health outcomes or what dose (or what vitamin D serum level) is needed to maximize health outcomes i

In [8]:
get_sentences_with_keywords(docs, ['school closure'])

data\biorxiv_medrxiv\biorxiv_medrxiv\c8df44a3612e85e267351e936ddeb8fc5867afa1.json
The timing of one-shot interventions for epidemic control

After a 2 week school closure with many additional interventions in place, the number of cases in Mexico City dropped dramatically and did not significantly increase after relaxing the controls.As the disease was introduced into other geographic regions, many schools closed in response to the first observations of infection



data\biorxiv_medrxiv\biorxiv_medrxiv\c8df44a3612e85e267351e936ddeb8fc5867afa1.json
The timing of one-shot interventions for epidemic control

The school closure provided increased time to prepare, but the overall epidemic was very similar.We can understand these historical patterns by observing that some of the most drastic social distancing interventions are unsustainable



data\biorxiv_medrxiv\biorxiv_medrxiv\c8df44a3612e85e267351e936ddeb8fc5867afa1.json
The timing of one-shot interventions for epidemic control

These ap

In [9]:
get_sentences_with_keywords(docs, ['non-pharmaceutical intervention'])

data\biorxiv_medrxiv\biorxiv_medrxiv\9b5f5119bbfbded3245acc37859cefde967458e7.json
TITLE Containing Emerging Epidemics: a Quantitative Comparison of Quarantine and Symptom Monitoring

Here, we present the first study comparing the effectiveness of the two primary non-pharmaceutical interventions targeted via contact tracing: symptom monitoring and quarantine



data\comm_use_subset\comm_use_subset\239f4596308a1247bf27a034c3cab95e0771215f.json
Fomite-mediated transmission as a sufficient pathway: a comparative analysis across three viral pathogens

Guidance on appropriate non-pharmaceutical interventions (e.g., masks, hand hygiene, and surface decontamination) depend on the dominant model of transmission [5] .In spite of the potential significance of fomitemediated transmission, there is a longer tradition of environmental risk assessment and modeling environmental infection transmission through water [6, 7] , food [8, 9] , and even the air [10] ; only recently has there been published 

In [10]:
get_sentences_with_keywords(docs, ['peak'])

data\biorxiv_medrxiv\biorxiv_medrxiv\164b0678afc42f1923aa100b624ca969a53fb3b2.json
Forecasting the Wuhan coronavirus (2019-nCoV) epidemics using a simple (simplistic) model

/2020 of maximum case (at about 65,000) but remarkably shows a very close time for the peak ( Figure  5 ).Figure 5 Predictions of model using linearization in time (Eq



data\biorxiv_medrxiv\biorxiv_medrxiv\2cc809ed4d5c3640ee2351fdb1876e8ff6c01b51.json
Title: Feasibility of controlling 2019-nCoV outbreaks by isolation of cases and contacts

Tracing 20 contacts per case could mean up to 2,000 contacts in the week of peak incidence.



data\biorxiv_medrxiv\biorxiv_medrxiv\3298d34e643c7a157f4226fde679f71194e417b5.json


This simple and top-down method can provide straightforward insights regarding the status of the epidemics and future scenarios of the outbreak.Usually, an epidemic follows an exponential growth at an early stage (following the law of proportional growth), peaks and then the growth rate decays as coun

In [11]:
get_sentences_with_keywords(docs, ['travel ban'])

data\biorxiv_medrxiv\biorxiv_medrxiv\0fd300aefb704c20f32152b97b6194015f1c74e7.json
Characterizing the transmission and identifying the control strategy for COVID-19 through epidemiological modeling

In conclusion, the aggressive quarantine strategy of building square cabin hospitals has effectively decreased the transmission, whereas the usefulness of the travel ban, home isolation, and personal protection is still unclear.To further investigate the timing of the comprehensive quarantine, given β=0.10 and γ=1/6, simulated isolations were employed to track the numbers of confirmed cases (Q) at various starting dates, January 1 st , 10 th , 20 th , and 30 th 



data\biorxiv_medrxiv\biorxiv_medrxiv\0fd300aefb704c20f32152b97b6194015f1c74e7.json
Characterizing the transmission and identifying the control strategy for COVID-19 through epidemiological modeling

The restrictive travel ban and home isolation enforced in Wuhan was expected to largely decrease the transmission and prevent furthe

data\comm_use_subset\comm_use_subset\a6ce4ce12c7af1cfb4d71764b963285b687e6b51.json
Clinical Medicine Assessing the Impact of Reduced Travel on Exportation Dynamics of Novel Coronavirus Infection (COVID-19)

To our knowledge, such drastic movement restrictions are a historical first.Since Wuhan was placed on lockdown, travel restriction and border control have been implemented by various countries, either as: (i) complete travel bans, (ii) travel restriction and quarantine-which allows for restriction of healthy individuals, (iii) entry screening for all incoming travelers, or some combination thereof





In [12]:
get_sentences_with_keywords(docs, ['distancing'])

data\biorxiv_medrxiv\biorxiv_medrxiv\3028628066ec2401f3981f4e70c5b1acd4cef573.json
Transmission interval estimates suggest pre-symptomatic spread of COVID-19

In both datasets, daily counts decline over time, which is likely a combination of delays to symptom onset and between symptom onset and reporting, combined with the effects of strong social distancing and contact tracing.



data\biorxiv_medrxiv\biorxiv_medrxiv\3028628066ec2401f3981f4e70c5b1acd4cef573.json
Transmission interval estimates suggest pre-symptomatic spread of COVID-19

In Tianjin, social distancing measures took place



data\biorxiv_medrxiv\biorxiv_medrxiv\3028628066ec2401f3981f4e70c5b1acd4cef573.json
Transmission interval estimates suggest pre-symptomatic spread of COVID-19

The means are 7 (6.1, 8.0) days for early cases and 12.4 (10.8, 14.2) days for later cases.Social distancing seems unlikely to change the natural course of infection, but these results might be explained if exposure occurred during group quaran

data\comm_use_subset\comm_use_subset\86239535bbc306bd90a85a6145840c784fa35d82.json
Mitigation Strategies for Pandemic Influenza A: Balancing Conflicting Policy Objectives

In the early days of the H1N1 influenza pandemic in Mexico in 2009 [2] , social distancing measures were implemented with the aim of slowing the epidemic during its early stages



data\comm_use_subset\comm_use_subset\86239535bbc306bd90a85a6145840c784fa35d82.json
Mitigation Strategies for Pandemic Influenza A: Balancing Conflicting Policy Objectives

Should it be as soon as the first cases are discovered, or later in the outbreak when more cases have arisen? Neither has it been acknowledged that planned levels of coverage with antiviral treatment or pre-pandemic vaccines may implicitly determine the magnitude of social distancing interventions required.Studies have shown that during the 1918-19 influenza pandemic public health control strategies and changes in population contact rates lowered transmission rates and r

In [13]:
get_sentences_with_keywords(docs, ['peak'])

data\biorxiv_medrxiv\biorxiv_medrxiv\164b0678afc42f1923aa100b624ca969a53fb3b2.json
Forecasting the Wuhan coronavirus (2019-nCoV) epidemics using a simple (simplistic) model

/2020 of maximum case (at about 65,000) but remarkably shows a very close time for the peak ( Figure  5 ).Figure 5 Predictions of model using linearization in time (Eq



data\biorxiv_medrxiv\biorxiv_medrxiv\2cc809ed4d5c3640ee2351fdb1876e8ff6c01b51.json
Title: Feasibility of controlling 2019-nCoV outbreaks by isolation of cases and contacts

Tracing 20 contacts per case could mean up to 2,000 contacts in the week of peak incidence.



data\biorxiv_medrxiv\biorxiv_medrxiv\3298d34e643c7a157f4226fde679f71194e417b5.json


This simple and top-down method can provide straightforward insights regarding the status of the epidemics and future scenarios of the outbreak.Usually, an epidemic follows an exponential growth at an early stage (following the law of proportional growth), peaks and then the growth rate decays as coun

In [14]:
get_sentences_with_keywords(docs, ['failure'])

data\biorxiv_medrxiv\biorxiv_medrxiv\12920263b2846c61d6f2b6105189367a8ff1bc39.json
A diallel of the mouse Collaborative Cross founders reveals strong strain-specific maternal effects on litter size

The CC resource and its founder strains can also be used to study reproductive ability, which is especially interesting because this population has an established record both of breeding successes and of failures



data\biorxiv_medrxiv\biorxiv_medrxiv\1b9fed2824e97db8cb35912549dd2e9cde7b6c18.json
Strategies for vaccine design for corona virus using Immunoinformatics techniques

In more severe cases, infection can cause pneumonia, severe acute respiratory syndrome, kidney failure and even death



data\biorxiv_medrxiv\biorxiv_medrxiv\221f592feb12c19df1f3e619021205e247d39b46.json
Retrospective Analysis of Clinical Features in 101 Death Cases with COVID-19

Among them, 1 patient died of acute coronary syndrome, while the rest died of respiratory failure and multiple organ failure caused by CO

In [21]:
docs, keyword_counts = gather_docs_by_keyword('data', ['fail','comply'])
get_sentences_with_keywords(docs, ['fail to comply'])

data\comm_use_subset\comm_use_subset\c6e819e9d3fdfd19cafd2ddf160b9ed2cfd9a27a.json
BMC Public Health Public perceptions of quarantine: community-based telephone survey following an infectious disease outbreak

The present findings indicate strong public support for the use of quarantine in the context of an infectious disease outbreak and for serious sanctions against those who fail to comply



data\comm_use_subset\comm_use_subset\e20efa446b9d31196d08c2a0cf51d2c851bfbbad.json


Even the legal requirement for facilities to provide data on patients diagnosed with a notifiable disease does not seem to provide a sufficient incentive, probably because there is little possibility that they will be penalised if they fail to comply



data\comm_use_subset\comm_use_subset\f787f681d002077ccf52bcea98f4e708d71fd41d.json
The Effect of Contact Investigations and Public Health Interventions in the Control and Prevention of Measles Transmission: A Simulation Study

Individuals who refuse MMR PEP or I