# What is known about transmission, incubation, and environmental stability?

## Task Details

What is known about transmission, incubation, and environmental stability? What do we know about natural history, transmission, and diagnostics for the virus? What have we learned about infection prevention and control?

Specifically, we want to know what the literature reports about:

* Range of incubation periods for the disease in humans (and how this varies across age and health status) and how long individuals are contagious, even after recovery.
* Prevalence of asymptomatic shedding and transmission (e.g., particularly children).
* Seasonality of transmission.
* Physical science of the coronavirus (e.g., charge distribution, adhesion to hydrophilic/phobic surfaces, environmental survival to inform decontamination efforts for affected areas and provide information about viral shedding).
* Persistence and stability on a multitude of substrates and sources (e.g., nasal discharge, sputum, urine, fecal matter, blood).
* Persistence of virus on surfaces of different materials (e,g., copper, stainless steel, plastic).
* Natural history of the virus and shedding of it from an infected person.
* Implementation of diagnostics and products to improve clinical processes.
* Disease models, including animal models for infection, disease and transmission.
* Tools and studies to monitor phenotypic change and potential adaptation of the virus.
* Immune response and immunity.
* Effectiveness of movement control strategies to prevent secondary transmission in health care and community settings.
* Effectiveness of personal protective equipment (PPE) and its usefulness to reduce risk of transmission in health care and community settings.
* Role of the environment in transmission.

## Reading Data

In [1]:
import os
import pandas as pd
import json

directories = ['biorxiv_medrxiv', 'comm_use_subset', 'noncomm_use_subset', 'pmc_custom_license']

def get_articles(directories):
    articles = []
    for directory in directories:
        folder = f'{directory}/{directory}'
        for file_name in os.listdir(folder):
            file_path = os.path.join(folder, file_name)
            with open(file_path) as file:
                article = json.load(file)
                articles.append(article)
    return articles

In [2]:
articles = get_articles(directories)

In [68]:
from functools import reduce
metadata_df = pd.read_csv('all_sources_metadata_2020-03-13.csv')

def get_text(article, section, separator):
    join_pars = lambda x, y: {'text': x['text'] + separator + y['text']}
    try:
        text = reduce(join_pars, article[section])['text']
    except:
        text = ''
    return text

get_title = lambda article : article['metadata']['title']
get_paper_id = lambda article : article['paper_id']
get_index = lambda article : metadata_df[metadata_df['sha']==article['paper_id']].index[0]
get_doi = lambda article: metadata_df['doi'][get_index(article)]

articles[0].keys()

dict_keys(['paper_id', 'metadata', 'abstract', 'body_text', 'bib_entries', 'ref_entries', 'back_matter'])

In [70]:
articles_df = pd.DataFrame(data = {
    'paper_id': list(map(get_paper_id, articles)),
    'title': list(map(get_title, articles)),
    'doi': list(map(get_doi, articles)),
    'abstract': [get_text(article, 'abstract', ' ') for article in articles],
    'body_text': [get_text(article, 'body_text', ' ') for article in articles]
})

In [71]:
articles_df

Unnamed: 0,paper_id,title,doi,abstract,body_text
0,0015023cc06b5362d332b3baf348d11567ca2fbb,The RNA pseudoknots in foot-and-mouth disease ...,doi.org/10.1101/2020.01.10.901801,word count: 194 22 Text word count: 5168 23 24...,"VP3, and VP0 (which is further processed to VP..."
1,004f0f8bb66cf446678dc13cf2701feec4f36d76,Healthcare-resource-adjusted vulnerabilities t...,doi.org/10.1101/2020.02.11.20022111,,The 2019-nCoV epidemic has spread across China...
2,00d16927588fb04d4be0e6b269fc02f0d3c2aa7b,"Real-time, MinION-based, amplicon sequencing f...",doi.org/10.1101/634600,Infectious bronchitis (IB) causes significant ...,"Infectious bronchitis (IB), which is caused by..."
3,013d9d1cba8a54d5d3718c229b812d7cf91b6c89,Assessing spread risk of Wuhan novel coronavir...,doi.org/10.1101/2020.02.04.20020479,Background: A novel coronavirus (2019-nCoV) em...,"In December 2019, a cluster of patients with p..."
4,01d162d7fae6aaba8e6e60e563ef4c2fca7b0e18,"TWIRLS, an automated topic-wise inference meth...",doi.org/10.1101/2020.02.24.20025437,Faced with the current large-scale public heal...,The sudden outbreak of the new coronavirus (SA...
...,...,...,...,...,...
13197,ff365ebbc0fc55476886b0abd129e227c1f8a527,Article focus Hip,http://dx.doi.org/10.1302/2046-3758.59.BJR-201...,We report a systematic review and metaanalysis...,Despite the fact that total hip arthroplasty (...
13198,ff7d49ac4008f60ef9c5a437e0d504dcefd1246f,,http://dx.doi.org/10.3201/eid1610.100840,,results of studies conducted in other countrie...
13199,ffb381668d93248759ca3855425e05722cb9f562,,http://dx.doi.org/10.3201/eid1108.050110,,H uman coronaviruses (HCoVs) were first record...
13200,ffd3a93b927e221ded4cf76536ad31bef2c74b89,Fatal Respiratory Infections Associated with R...,http://dx.doi.org/10.3201/eid1811.120607,During an outbreak of severe acute respiratory...,During an outbreak of severe acute respiratory...


In [72]:
articles_df.to_pickle('pickles/articles_df.pkl')