## What do we know about COVID-19 risk factors?
### About this Notebook
In this notebook, we have tried to answer questions given in the task by training doctovec model on the full body text of papers. But before applying word2vec model we have first filtered the dataset to get papers that are specifically related to COVID-19, as we know the dataset contains around 50k papers and most of them aren't specifically about Covid-19. Special thanks to [xhlulu](https://www.kaggle.com/xhlulu) for this useful notebook [cord-19-eda-parse-json-and-generate-clean-csv](https://www.kaggle.com/xhlulu/cord-19-eda-parse-json-and-generate-clean-csv) to convert json files to csv, which made it easy for me to spend more time on actual task. For filtering the dataset we have used this notebook [covid-19-thematic-tagging-with-regular-expressions](https://www.kaggle.com/muhammadhassan/covid-19-thematic-tagging-with-regular-expressions) by [Andy White](https://www.kaggle.com/ajrwhite), who have done wonderful work to filter the papers, all thanks to the author.

**PROS:**
* Makes use of full body text.
* Quite simple approach.

**CONS:**
* Doesn't perform well on very short queries.

**Table of Contents:**
* [Combining csv files corresponding to each folder.](#1)
* [Getting full body text of filtered papers.](#2)
* [Training Word2Vec.](#3)
* [Neonates and pregnant women.](#4)
* [Public health mitigation measures that could be effective for control.](#5)
* [Transmission dynamics of the virus, including the basic reproductive number, incubation period, serial interval, modes of transmission and environmental factors.](#6)
* [Severity of disease, including risk of fatality among symptomatic hospitalized patients, and high-risk patient groups.](#7)
* [Co-infections (determine whether co-existing respiratory/viral infections make the virus more transmissible or virulent) and other co-morbidities.](#8)
* [Socio-economic and behavioral factors to understand the economic impact of the virus and whether there were differences.](#9)

In [None]:
import os
import pandas as pd
from nltk import tokenize
import json
import numpy as np
import json
import gensim
import warnings
warnings.filterwarnings("ignore")

## Combining csv files corresponding to each folder.<a id= 1></a>
Individual csvs' have been generated through this notebook [cord-19-eda-parse-json-and-generate-clean-csv](https://www.kaggle.com/xhlulu/cord-19-eda-parse-json-and-generate-clean-csv).

In [None]:
combined = pd.DataFrame()
for file in os.listdir('/kaggle/input/cord-19-eda-parse-json-and-generate-clean-csv/'):
    if file.endswith('.csv'):
        df = pd.read_csv(f'/kaggle/input/cord-19-eda-parse-json-and-generate-clean-csv/{file}')
        combined = pd.concat([combined, df], ignore_index=True)
        print(f'Total documents in {file}: ', len(df))

print('='*80)
combined['title'] = combined['title'].str.lower() 
combined['abstract'] = combined['abstract'].str.lower() 
before = len(combined)
print('Total documents in dataset: ', before)
combined.drop_duplicates(subset=['title', 'abstract'], inplace=True)
print('After removing duplicates based on title and abstract: ', len(combined))
combined.rename(columns={'title': 'combined_title'}, inplace=True)
print('Documents with same title and abstract: ', before - len(combined))

#### Helper function taken from the notebook: [cord-19-eda-parse-json-and-generate-clean-csv](https://www.kaggle.com/xhlulu/cord-19-eda-parse-json-and-generate-clean-csv)

In [None]:
def format_body(body_text):
    texts = [(di['section'], di['text']) for di in body_text]
    texts_di = {di['section']: "" for di in body_text}
    
    for section, text in texts:
        texts_di[section] += text

    body = ""

    for section, text in texts_di.items():
        body += section
        body += "\n\n"
        body += text
        body += "\n\n"
    
    return body

## Getting full body text of filtered papers. <a id= 2></a>
covid_19_2020.csv is the subset of metadata.csv and have been generated using this notebook: [covid-19-thematic-tagging-with-regular-expressions](https://www.kaggle.com/ajrwhite/covid-19-thematic-tagging-with-regular-expressions)
Here we will counter two types of records:
* Ones that have **sha** in metadata, we can compare this **sha** with **paper_id** provided in json files (or in this case with csv generated from json files) and can get their body text.
* Ones that don't have **sha** in metadata, we will search such records i.e. their corresponding files in pmc folder of [biorxiv_medrxiv, comm_use_subset, custom_licensem, noncomm_use_subset] using columns **pmcid** and **full_text_file.**


In [None]:
df_covid_19 = pd.read_csv(f'/kaggle/input/filtered-data/covid_19_2020.csv')
print('Total covid-19 papers: ', len(df_covid_19))
df_covid_19_sha = df_covid_19.dropna(subset=['sha'])
df_covid_19_nosha = df_covid_19[df_covid_19.sha.isnull()]

print('Papers with missing sha in metadata: ', len(df_covid_19_nosha))
df_covid_19_sha = df_covid_19_sha.merge(combined[['paper_id', 'text']], how="left", left_on = 'sha', right_on = 'paper_id')
print('Papers that have sha key in metadata but still not found in json files: ', len(df_covid_19_sha[~df_covid_19_sha['sha'].isin(combined['paper_id'])]))

There are 518 records for which **sha id** wasn't matched with any of the **paper_id**. We will use **has_text_file** to get to the one of these directories [biorxiv_medrxiv, comm_use_subset, custom_licensem, noncomm_use_subset] and will then search that **sha id** in the pdf folder of corresponding directory. 

Using this way, out of 518 records, we were able to locate 279 records i.e. their corresponding json file and were able to get their body text. But we are still left with 239 records, for such records we will simply use abstract as body text, if it exists.

In [None]:
print('No of Records where sha is not null and has_pdf_parse is False')
print(len(df_covid_19[(~df_covid_19.sha.isnull()) & (df_covid_19['has_pdf_parse']== False)]))
print('No of Records where sha is not null and full_text_file is null')
print(len(df_covid_19[(~df_covid_19.sha.isnull()) & (df_covid_19['full_text_file'].isnull())]))
print('No of Records where sha is not null and has_pmc_xml_parse is null')
print(len(df_covid_19[(~df_covid_19.sha.isnull()) & (df_covid_19['has_pmc_xml_parse'] == False)]))

In [None]:
missing_text_index = df_covid_19_sha[~df_covid_19_sha['sha'].isin(combined['paper_id'])].index.tolist()
not_found = []
import json
for index in missing_text_index:
        dir_name = df_covid_19_sha['full_text_file'].loc[index]
        filename = df_covid_19_sha['sha'].loc[index]
        filename = filename.split(';')
        
        for file in filename:
            try: 
                path = f'/kaggle/input/CORD-19-research-challenge/{dir_name}/{dir_name}/pdf_json/{filename[0]}.json'
                file = json.load(open(path, 'rb'))
                text = format_body(file['body_text'])
                df_covid_19_sha['text'].loc[index] = text
                break
            
            except:
                not_found.append(index)
                

print('No of records where sha is given but no corresponding article exists in json: ', len(set(not_found)))

### Getting body text for the papers where sha is null.
Out of **1295** total records where **sha** was null, there were only **13** records where **full_text_file** and **pmcid** were non null. So we will seacrh such records in **pmc** directory, to find if we can get to the corresponding file.

In [None]:
print('Total records in df_covid_19_nosha: ', len(df_covid_19_nosha))
print('Records where full_text_file is also null: ', len(df_covid_19_nosha[df_covid_19_nosha['full_text_file'].isnull()]))
print('Records where full_text_file and pmcid are non null: ', len(df_covid_19_nosha[(~df_covid_19_nosha['full_text_file'].isnull()) & 
                                                                                      (~df_covid_19_nosha['pmcid'].isnull())]))

In [None]:
missing_text_index = df_covid_19_nosha[(~df_covid_19_nosha['full_text_file'].isnull())&(~df_covid_19_nosha['pmcid'].isnull())].index.tolist()
not_found = []
df_covid_19_nosha['text'] = np.nan
for index in missing_text_index:
        dir_name = df_covid_19_nosha['full_text_file'].loc[index]
        pmcid = df_covid_19_nosha['pmcid'].loc[index]
               
        try: 
            path = f'/kaggle/input/CORD-19-research-challenge/{dir_name}/{dir_name}/pmc_json/{pmcid}.xml.json'
            file = json.load(open(path, 'rb'))
            text = format_body(file['body_text'])
            df_covid_19_nosha['text'].loc[index] = text
            
            if pd.isnull(df_covid_19_nosha['abstract'].loc[index]):
                if 'abstract' in file.keys():
                    df_covid_19_nosha['abstract'].loc[index] = file['abstract']
                    
        except:
            print(f'{dir_name}/{dir_name}/pmc_json/{pmcid}.xml.json not found')
            not_found.append(f'{pmcid}.xml.json')

print('='*80)
print('No of records where pmcid is given but no corresponding article exists in json: ', len(set(not_found)))
print('Out of {} df_covid_19_nosha records, records where body text is missing: {}'.format(len(df_covid_19_nosha), len(df_covid_19_nosha[df_covid_19_nosha.text.isnull()])))

### Combining df_covid_sha and df_covid_no_sha
Adding abstract as body text where, body text is null.

In [None]:
df_covid_full_text = pd.concat([df_covid_19_sha, df_covid_19_nosha], ignore_index=False)
print('Out of {} records of df_covid_full_text, records where body text is null: {}'.format(len(df_covid_full_text), len(df_covid_full_text[df_covid_full_text.text.isnull()])))
df_covid_full_text['text'].fillna(df_covid_full_text['abstract'], inplace=True)
print('After substituting abstract as body, records where body text is still missing: ',len(df_covid_full_text[df_covid_full_text.text.isnull()]))
df_covid_full_text.dropna(subset=['text'], inplace=True)
print('Total papers after dropping records where body text is null: ', len(df_covid_full_text))

## Training Word2Vec: <a id= 3></a>
* Here we will train word2vec on paragraphs of full body texts, i.e. we will use paragraphs as documents.
* Remove unnecessary paragraphs from the dataset, i.e. paragraphs that talk about copyright details. You can uncomment a few lines in the below code, to see which paragraphs are being filtered.

In [None]:
def get_docs(df_covid, is_test=False):
    documents = []
    actual_documents = []
    if is_test:
        text = df_covid['text'].loc[0]
        documents.append((text.lower(), 'no_tag'))
    
    else:
        for row in range(0, len(df_covid)):
            text = df_covid['text'].loc[row]      
            text = text.split('\n\n')
            
            pub_time = df_covid['publish_time'].loc[row]
            authors = df_covid['authors'].loc[row]
            title = df_covid['title'].loc[row]
            
            par_no=1

            for par in text:
                par = par.lower()
                if len(par)>=300:
                    sentences = tokenize.sent_tokenize(par)
                    final_par = ''
                    for sentence in sentences:
                        if 'international license' not in sentence and 'copyright' not in sentence and 'https' not in sentence and 'doi' not in sentence:
                            final_par = ''.join([final_par, sentence, ' '])
                        else:
                            pass

                    if len(final_par)>200 and 'no reuse allowed' not in final_par:
                        tag = ''.join([str(row), '-', str(par_no)])
                        
                        documents.append([final_par, tag, pub_time, authors, title])
                        par_no+=1
                    else:
                        pass

    #                 if 'no reuse allowed' in final_par and len(final_par)<=200:
    #                     print(final_par)
    #                     print('='*70)
    #                     pass
                          
    return documents

def read_corpus(docs, tokens_only=False):
    for index, rec in enumerate(docs):
        doc = rec[0]
        tag = rec[1]
        tag = ''.join([tag, '_', str(index)])
        tokens = gensim.utils.simple_preprocess(doc)
        if tokens_only:
            yield tokens
        else:
            yield gensim.models.doc2vec.TaggedDocument(tokens, [tag])

In [None]:
df_covid_train = df_covid_full_text[['text', 'publish_time', 'authors', 'title']]
df_covid_train.reset_index(drop=True, inplace=True)
train_documents = get_docs(df_covid_train)
df_train = pd.DataFrame.from_records(train_documents, columns = ['document', 'tag', 'publish_time', 'authors', 'title'])
df_train.drop(columns=['document'], axis=1, inplace=True)
print('Train Documents: ', len(train_documents))
train_corpus = list(read_corpus(train_documents))

In [None]:
print(train_corpus[:2])

In [None]:
model = gensim.models.doc2vec.Doc2Vec(vector_size=50, min_count=2, epochs=40)
model.build_vocab(train_corpus)

In [None]:
model.train(train_corpus, total_examples=model.corpus_count, epochs=model.epochs)

### Helper function to get answers to a given query.

In [None]:
def get_answer(query, no_of_results):
    df_test = pd.DataFrame(data={'text':[query]})
    test_documents = get_docs(df_test, is_test=True)
    test_corpus = list(read_corpus(test_documents, tokens_only=True))

    inferred_vector = model.infer_vector(test_corpus[0])
    sims = model.docvecs.most_similar([inferred_vector], topn=len(model.docvecs))

    print('Test Document: «{}»\n'.format(' '.join(test_corpus[0])))
    print(u'SIMILAR/DISSIMILAR DOCS PER MODEL %s:\n' % model)

    results = [(f'TOP {i}', i) for i in range(0,no_of_results)]
    unique_papers = []
    answers = []
    for label, index in results:
        splits = sims[index][0].split('_')
        tag = splits[0]
        doc_index = int(splits[1])
        print(doc_index)
        unique_papers.append(int(splits[0].split('-')[0]))
        excerpt = ' '.join(map(str, train_documents[doc_index]))
        
        print(u'%s %s:\n%s\n' % (label, sims[index], excerpt))
        answers.append([tag, excerpt])
        
        print('='*80)
    
    df_excerpts = pd.DataFrame().from_records(answers, columns= ['tag', 'excerpt'])
    return df_excerpts
    #print(unique_papers)
    #print('Total unique papers: ', len(set(unique_papers)))

## Neonates and pregnant women <a id= 4></a>
Key Insights from top 10 results:

* Coronavirus testing, mortality, vaccine development, containment vs mitigation, and more. anthony s. fauci, md discusses the latest developments in the global spread of covid-19 and the sars-cov-2 virus with jama editor howard bauchner, md. - what's the difference between covid-19 and sars-cov-2? (01:15) - what's the status and accuracy of diagnostic testing in the us? (01:58) - what's the case-fatality rate for the virus? (05:31) - scientific advances and vaccine development (25:06) - are current clinical trials providing a picture of treatments? (13:41) - risk communication: how do we present information so there's faith that it's accurate? (15:24) - risk groups (children, the elderly, pregnant women) (16:26) - containment vs mitigation vs quarantine vs isolation (19:10) - protecting the elderly and nursing home resident (23:52) - public health prospects in latin america, africa (26:35) - will coronavirus wane in warmer months like influenza? (27:52) - why is anxiety so high about this disease?- does the us have capacity to care for covid19 infection? (31:03) - what is your daily schedule like?
* Covid-19 is placing significant demands on healthcare resources throughout the world. box 1 outlines recommendations to assist the physiotherapy workforce to plan and respond to this demand. it is recommended that staff who are pregnant avoid exposure to covid-19. it is known that pregnant women are potentially at increased risk of complications from any respiratory disease due to the physiological changes that occur in pregnancy. there is not enough currently available information on the impact of covid-19 on a pregnant woman or her baby.
* Coronaviruses responsible for severe acute respiratory syndrome (sars) and middle east respiratory syndrome (mers) can cause severe adverse pregnancy outcomes, such as miscarriage, premature delivery, intrauterine growth restriction, and maternal death.1 , 2 vertical transmission of the virus responsible for 2019 novel coronavirus disease (covid-19), severe acute respiratory syndrome coronavirus 2 (sars-cov-2), has not yet been detected, whereas perinatal transmission has been suspected in one case.3 consequences of infection with sars-cov-2 for pregnancies are uncertain, with no evidence so far of severe outcomes for mothers and infants; however, the possibility should be considered.4 the recent experience with zika virus suggests that when a new pathogen emerges, the health-care community should be prepared for the worst-case scenario.5 therefore, recommendations for management of pregnant women at risk of sars-cov-2 infection are urgently needed. to this end, we propose a detailed management algorithm for health-care providers (appendix).
* The novel coronavirus (2019-ncov) has rapidly spread throughout china and across the world with more than 60,000 laboratory-confirmed cases. due to the current lack of specific treatment and the risk of transmission during the viral incubation period, infection prevention and control of 2019-ncov are both urgent and critical to global health. in this article, we aim to highlight the necessity of implementing protective measures, and recommend how to set proper emergency management plans for preventing and controlling nosocomial infection of 2019-ncov in dermatology departments.
* In december 2019, the 2019 novel coronavirus disease (covid-19) caused by sars-cov-2 emerged in china and now has spread in many countries. pregnant women are susceptible population of covid-19 which are more likely to have complications and even progresse to severe illness. we report a case of neonatal covid-19 infection in china with pharyngeal swabs tested positive by rrt-pcr assay 36 hours after birth. however, whether the case is a vertical transmission from mother to child remains to be confirmed.
* Since december 2019, the novel coronavirus (2019-ncov) infection has been prevalent in china. due to immaturity of immune function and the possibility of mother-fetal vertical transmission, neonates are particularly susceptible to 2019-ncov. the perinatal-neonatal departments should cooperate closely and take integrated approaches, and the neonatal intensive care unit should prepare the emergency plan for 2019-ncov infection as far as possible, so as to ensure the optimal management and treatment of potential victims. according to the latest 2019-ncov national management plan and the actual situation, the working group for the prevention and control of neonatal 2019-ncov infection in the perinatal period of the editorial committee of chinese journal of contemporary pediatrics puts forward recommendations for the prevention and control of 2019-ncov infection in neonates.

In [None]:
query = 'What is the risk of pregnancy complications in COVID-19 patients? What is the risk of COVID-19 in pregnant women? What is the risk for COVID-19 in neonates? What is the risk for secondary hospital-acquired infections among neonatal COVID-19 patients admitted to critical intensive care?'
df_output = get_answer(query, 15)
df_output = df_output.merge(df_train, how='left', on='tag')
df_output = df_output[['publish_time', 'authors', 'title', 'excerpt']]
df_output.to_csv('./pregnant_neonants.csv', index=False)

## Public health mitigation measures that could be effective for control. <a id= 5></a>
Key Insights from top 10 results:
* It is of course not realistic to shut down all work activities. however, work activities should be reduced to a minimum to reduce the number interpersonal contacts meaning that people should work from home when possible.
* As covid-19 continues to spread, better understanding how to contain it becomes critical.here, using methods we previously developed and the latest epidemiological parameters reported for covid-19, we compare the ability of individual quarantine and active monitoring to reduce the effective reproductive number of covid-19 to below the critical threshold of one.
* The epidemiologic data were taken from an open source repository operated by the johns hopkins university center for systems science and engineering (jhu csse) [10] . the data regard the total number of cases, the recovered cases and the death cases. the number of infected can be estimated by the difference of the total number of cases minus the recovered and the death cases. as it can be seen the chinese outbreak has been almost suppressed, with the peak of infection around the 18 th of february. the italian outbreak is instead still in the fast-growing phase but with the number of deaths already close to the chinese one.
* In light of the covid-19 outbreak in china, a shortage of facemasks and other medical resources can considerably compromise the efficacy of public health measures. effective public health measures should also consider the adequacy and affordability of medical resources.
* Influenza viruses are able to survive on environmental surfaces, particularly hard surfaces, for periods of one to two days. infection can occur through contact with contaminated surfaces then infecting oneself or others by touching eyes, nose or mouth with contaminated hands. as such, regular cleaning and disinfection of surfaces should be undertaken during a suspected or confirmed outbreak to minimise the spread of influenza and other respiratory viruses.
* The coefficient that best controls the spread of infection in the rest of the cities, when this dispersion is by airlines, is the parameter d. therefore, surveillance at airports should be strengthened, with special emphasis on those connecting mexico directly or indirectly with asian countries. finally, this model shows that all these measures can only delay the arrival of sars-cov-2, but if it can be delayed long enough, it would be very important to have as much time as possible to establish the appropriate prevention and control measures.
* To ascertain whether earlier travel restrictions could have prevented the wide-spread increase in cases witnessed in late-january we constructed a simple forecasting model for covid-19. briefly, we forecast the cumulative number of cases in each chinese province by simply doubling the number of cumulative cases reported six days prior. for dates prior to jan. 28th and after feb 3rd, this naive forecast produces an accurate estimate of the cumulative number of cases in each province (fig. s4) . however, the cumulative number of cases reported on jan 28th is poorly estimated using this model (fig. s4) . in order to accurately forecast the number of cases on jan 28th, we must also include the relative amount of mobility out of wuhan into various provinces in the regression model. in fig. s4 , we show how a model including only movement from wuhan on january 22nd fit to the residuals from fig. s4 is once again . this indicates that for any hope of success, movement restrictions must be prompt. :   table s1 : figure s1 : a) dates of symptom onset before date of travel from wuhan. b) incubation period estimates and standard deviation. 

In [None]:
query = 'What public health measures should be taken at government level that could be effective for controling the spread of COVID-19? Also what precautionary measures should people use to avoid coming in contact with covid-19?'
df_output = get_answer(query, 15)
df_output = df_output.merge(df_train, how='left', on='tag')
df_output = df_output[['publish_time', 'authors', 'title', 'excerpt']]
df_output.to_csv('./public_health_mitigation.csv', index=False)

## Transmission dynamics of the virus, including the basic reproductive number, incubation period, serial interval, modes of transmission and environmental factors. <a id= 6></a>

Key insights from top 15 results:
* In december 2019, the 2019 novel coronavirus pneumonia (ncp, officially named coronavirus disease 2019(covid-19) by the world health organization) broke out in wuhan, hubei, and it quickly spread to the whole country and abroad. the situation was at stake. the sudden and serious covid-19 epidemic has brought us a lot of urgent problems. how to effectively control the spread of covid-19? when does the population infection rate rise to its peak? what will eventually be the number of infected patients? how to make early diagnosis? what effective antiviral drugs are available? how to effectively treat with existing drugs? can it successfully improve the survival rate of critically patients? in response to the above questions, we put forward corresponding suggestions and reflections from the perspective of the infectious clinician.
* Lancet 10.1016/s0140-6736(20)30360-3 (2020)the sars-cov-2 coronavirus produces the same clinical symptoms in pregnant women as it does other infected people, and there is currently no evidence for vertical transmission.the recent outbreak of covid-19 pneumonia, caused by sars-cov-2, has been declared a global public-health emergency by the world health organization. sars-cov-2 is highly infectious, and the disease can lead to death; however, it is unknown whether pregnant women have specific support needs after infection and, in particular, if there is a risk of vertical transmission to unborn children.chen et al. studied nine pregnant women with lab-confirmed covi-19 who were admitted to the zhongnan hospital of wuhan university. they found that their clinical symptoms were similar to those of non-pregnant adults and that there was no indication of vertical transmission to children, although the findings need to be confirmed in a larger study.
* Singapore advises that laboratory personnel should record their temperature twice a day, to laboratory measures (see table 1 )given the extraordinary fast spread of the disease and the pace of change in the information about it and guidelines on how to deal with various aspects of fighting it, one can only give general suggestions for the cytology laboratory's response.
* Locking down wuhan, a city of 11 million residents, was an unprecedented measure to contain the spread of the novel coronavirus. an important policy-relevant question is, then, how many covid-19 cases were actually prevented by the wuhan lockdown in china? to answer this question, we must estimate the counterfactual number of covid-19 cases that would have occurred in other cities in the absence of wuhan lockdown, which would, in turn,
* This is the first real-time study to estimate the evolving transmission potential of sars-cov-2 in singapore. our current findings point to temporary sustained transmission of sars-cov-2, with our most recent estimate of the effective reproduction number lying below the epidemic threshold.
* Our analyses showed that rapid diagnosis and isolation of infections based on covid-19 disease alone cannot control outbreaks of sars-cov-2, but that the addition of tracing and isolation of traced cases could in theory be successful (figure 2 ). in practice, however, the potential for containment will be seriously jeopardized by various delays and imperfections.
* The covid-19 outbreak is controllable in the foreseeable future if comprehensive and stringent control measures are taken. our prediction for the world cases is based on the assumption that other countries take effective control measures similar to china, and therefore should be cautiously optimistic.

In [None]:
query = 'What are the Transmission dynamics of the covid-19? What is the basic reproductive number of covid-19? What is the incubation period of covid-19? What is the serial interval of covid-19? What are different modes of transmission of covid-19 and does environmental factors play role in the transmision?'
df_output = get_answer(query, 15)
df_output = df_output.merge(df_train, how='left', on='tag')
df_output = df_output[['publish_time', 'authors', 'title', 'excerpt']]
df_output.to_csv('./transmission_dynamics_etc.csv', index=False)

## Severity of disease, including risk of fatality among symptomatic hospitalized patients, and high-risk patient groups. <a id= 7></a>

Key insights from top 15 results:
* Children comprise a special population whose immune response system is distinct from adults. therefore, pediatric patients infected with 2019-ncov have their own clinical features and therapeutic responses. herein, we formulate this recommendation for diagnosis and treatment of 2019-ncov infection in children which is of paramount importance for clinical practicesn.
* As mentioned in the literature review, the morbidity of covid-19 was reported as 0.9% among children age 0 -14 [1] . however, the clinical and epidemiological characteristics of paediatric patients haven't been determined clearly yet. so far, this is the largest case series to present the clinical and epidemiological characteristics in children with covid-19, as well as the first study to analyze the clinical features in . . . . 
* We model a susceptible-infectious-recovered framework (sirf) [6] to simulate the spread of sars-cov-2. the population is separated into those currently contributing to transmission (y, equation 1 ) and those not available for infection (z, equation 2 ). cumulative death counts ( , equation 3 ) are obtained by considering that mortality occurs with probability , on a λ θ proportion of the population that is at risk of severe disease ( ) among those already ρ exposed (z); we consider the delay between the time of infection and of death ( ) as a ψ combination of incubation period and time to death after onset of symptoms. the small proportion of the population that is at risk of severe disease ( ) is an aggregate model ρ parameter, taking into consideration both a potentially lower risk of infection than the rest of the population, as well as the actual risk of severe disease.
* All the family members living with dialysis patients must follow all the precautions and regulations given to patients to prevent person-to-person and within family transmission of the covid-19, which include body temperature measurement, good personal hygiene, handwashing, and prompt reporting of potentially sick people.
* In late december 2019, a cluster of unexplained pneumonia cases has been reported in wuhan, china. a few days later, the causative agent of this mysterious pneumonia was identified as a novel coronavirus. this causative virus has been temporarily named as severe acute respiratory syndrome coronavirus 2 (sars-cov-2) and the relevant infected disease has been named as coronavirus disease 2019 (covid-19) by the world health organization respectively. the covid-19 epidemic is spreading in china and all over the world now. the purpose of this review is primarily to review the pathogen, clinical features, diagnosis, and treatment of covid-19, but also to comment briefly on the epidemiology and pathology based on the current evidences. all rights reserved. 
* In december 2019, the 2019 novel coronavirus disease (covid-19) caused by sars-cov-2 emerged in china and now has spread in many countries. pregnant women are susceptible population of covid-19 which are more likely to have complications and even progresse to severe illness. we report a case of neonatal covid-19 infection in china with pharyngeal swabs tested positive by rrt-pcr assay 36 hours after birth. however, whether the case is a vertical transmission from mother to child remains to be confirmed.


In [None]:
query = 'What do we know about the severity of disease among people of different age groups? Also what is the risk of fatality among symptomatic hospitalized patients and high-risk patient groups? Is covid-19 disease more severe in patients having some underlying disease? If so what are such diseases?'
df_output = get_answer(query, 15)
df_output = df_output.merge(df_train, how='left', on='tag')
df_output = df_output[['publish_time', 'authors', 'title', 'excerpt']]
df_output.to_csv('./severity_of_disease_etc.csv', index=False)

## Co-infections (determine whether co-existing respiratory/viral infections make the virus more transmissible or virulent) and other co-morbidities.<a id= 8></a>

Key insights from top 10 results:
* At the time of writing this article, the risk of coronavirus in india is extremely low. but that may change in the next few weeks. hence the following is recommended:& healthcare providers should take travel history of all patients with respiratory symptoms, and any international travel in the past 2 wks as well as contact with sick people who have travelled internationally. & they should set up a system of triage of patients with respiratory illness in the outpatient department and give them a simple surgical mask to wear.
* This observational study of the relationship between mtb infection and covid-19 pneumonia suggests that individuals with latent or active tb may be more susceptible to sars-cov-2 infection, and that covid-19 disease progression may be more rapid and severe. given that tb causes more deaths than any other infectious disease (1.45 million deaths and 10 million new cases . . .
* one of the reasons for underdetection.all human covs (hcovs) is mainly of zoonotic origin, and most likely originate from bats [9] . /2020 hunting and management of such wild animals are at high risk of infection, likely live in mountain or rural areas and are more likely to be undetected when having such an infection for various reasons.
* All staff and visitors entering the room of a person with a respiratory illness should wear a single use face mask for close contact, generally within one metre. for further information regarding mask use see section 5.2.1.2.2 single use face masks. gowns, gloves and protective eyewear need only be worn as per standard precautions, that is, if contact or splash with blood or body fluids is anticipated.
* the common (mild) cases were those only had fever, respiratory symptoms, and pneumonia on chest radiography. severe cases need to meet one of the following criteria: (1) respiratory distress, rr>=30/min; (2) resting blood oxygen saturation =< 93%; or (3) arterial blood oxygen partial pressure (pao2)/fio2 =<300 mmhg.critical cases meet one of the following: (1) respiratory failure needing mechanical oxygenation; (2) shock; or (3) development of other organ failure, requiring intensive care unit (icu) care.
* Updates on the respiratory illness that has infected tens of thousands of people. scientists are concerned about a new virus that has infected tens of thousands of people and killed more than 2,000. the virus, which emerged in the chinese city of wuhan in december, is a coronavirus and belongs to the same family as the pathogen that causes severe acute respiratory syndrome, or sars. it causes a respiratory illness called covid-19, which can spread from person to person.
* respiratory tract viral infection caused by viruses or bacteria is one of the most common diseases in human worldwide, while those caused by emerging viruses, such as the novel coronavirus, 2019-ncov that caused the pneumonia outbreak in wuhan, china most recently, have posed great threats to global public health. identification of the causative viral pathogens of respiratory tract viral infections is important to select an appropriate treatment, save people's lives, stop the epidemics, and avoid unnecessary use of antibiotics. conventional diagnostic tests, such as the assays for rapid detection of antiviral antibodies or viral antigens, are widely used in many clinical laboratories. with the development of modern technologies, new diagnostic strategies, including multiplex nucleic acid amplification and microarray-based assays, are emerging. this review summarizes currently available and novel emerging diagnostic methods for the detection of common respiratory viruses, such as influenza virus, human respiratory syncytial virus (rsv), coronavirus, human adenovirus (hadv), and human rhinovirus (hrv). multiplex assays for simultaneous detection of multiple respiratory viruses are also described. it is anticipated that such data will assist researchers and clinicians to develop appropriate diagnostic strategies for timely and effective detection of respiratory virus infections. all rights reserved.
*  Based on the new coronavirus pneumonia prevention and control program (6th edition) published by the national health commission of china 7 , those with one of the following laboratory evidence is considered to have a confirmed covid-19 case: (1) positive for sars-cov-2 nucleic acid by real-time reverse-transcription-polymerase-chain-reaction (rt-pcr); (2) viral gene sequencing showing highly homogeneity to the known sars-cov-2. real-time rt-pcr assays were performed following the protocol established by the who 8 .
* a definitive diagnosis of 2019-ncov was acquired by realtime fluorescence-based rt-pcr. as shown in table 3 , about 80% of the patients had normal or decreased white blood cell counts, and 72.3% (99/137) of the patients figure 2 shows representative lung images of a patient in which lesions developed in multiple lobes, most of which were dense, and ground-glass opacity co-existed with consolidation or cord-like shadows.

In [None]:
query = 'Is covid-19 more transmissible in case a person carries some co-existing respiratory and viral infections or any of the other co-morbidities?'
df_output = get_answer(query, 15)
df_output = df_output.merge(df_train, how='left', on='tag')
df_output = df_output[['publish_time', 'authors', 'title', 'excerpt']]
df_output.to_csv('./con_infections_etc.csv', index=False)

## Socio-economic and behavioral factors to understand the economic impact of the virus and whether there were differences.<a id= 9></a>
Key insights from top 10 results:
* abstract.the infection by the new coronavirus (sars-cov-2) has taken the dimension of a pandemic, affecting more than 160 countries in a few weeks. in colombia, despite the implementation of the rules established by the national government, exists an elevate concern both for mortality and for the limited capacity of the health system to respond effectively to the needs of patients infected.for colombia, assuming a case fatality rate among people infected with sars-cov-2 of 0.6% (average data from the information reported for latin american countries for march 18) (table 1) , the number of deaths, in one or two weeks, could be 16 and 243, respectively. these estimates differ markedly from those documented in countries such as spain and italy, in which covid-19 case fatality rates exceed 8% (case of italy) and from the percentage of patients who have required intensive care, which has ranged from 9% to 11% of patients in mediterranean european countries. these differences could be explained due to: a) the percentage of the population at risk (individuals older than 60 years); b) a higher epidemiological exposure to viral respiratory infections associated with more frequent exposure to them, due to geographic and climatic conditions; c) less spread of the virus by location in the tropical zone; and d) earlier preventive measures to contain the spread of sars-cov-2 infection. therefore, it is possible to establish that the situation in this country will be different from in european mediterranean and that colombia could have different endpoints from spain and italy.
* here we replicate figure 1 in figures s4-s5 and show the variation in fatality rates with varying assumptions about the overall infection rate. japan has a relatively old population, south africa a younger population and the us is more evenly distributed. based on the age-specific mortality rates extracted from the italian data, we project how these different countries will experience deaths attributed to covid-19 by age and sex. the united kingdom is as of march 13 2020 is standing out as one of the few european countries to take not stringent actions such as closing schools or stopping large public events. 9 in spite of the comparatively younger population of the uk, the bottom right panel illustrates that the uk could face similar numbers of covid-19 deaths as italy. due to age structure differences, the uk will likely have slightly fewer deaths of those 80+ in comparison to italy, but in the coming weeks still likely to face considerable pressure on its healthcare system.
* Preventing social contacts and mass gatherings has been used worldwide in the response to reduce transmission communicable diseases, including to reduce transmission of the coronavirus, sars-cov-2. as of march 2020, as of march 2020, multiple countries have banned all gatherings of 1,000 people or more; with some countries such as the czech republic and the usa banning much smaller groups.given knowledge of transmission mechanisms, bringing together large numbers of people into the same space should prove conducive for the spread of close-contact infectious diseases. indeed, mass gatherings have been associated with outbreaks of communicable diseases such as measles [1] , influenza [2] and meningitis [3] . and public health agencies, including the world health organization (who), have specific guidance for preventing disease outbreaks at mass gatherings [4] . factors such as age of participant [1] , zoonotic transmission and presence of animals [5] , crowding [6, 7] , lack of sanitation [7] , location and event duration [6] are associated with the reporting of mass gathering-related outbreaks.despite the evidence of the importance of mass gatherings for disease transmission from intuition and individual outbreaks, the population-level impact of different mass gathering policies has not been established. while systematic reviews have identified outbreak reports involving mass gatherings [5, 6] , the overall impact of mass gatherings could not be quantitatively assessed. a detailed modelling study of disease transmission in the state of georgia, usa, found that in extreme scenarios when 25% of the population participated in a 2day long gathering shortly before the epidemic peak, peak prevalence could increase by up to 10% [8] . more realistic scenarios resulted in minimal population-level changes [8] .here, we use representative data on individuals' daily social contacts, including group contacts, to estimate the population attributable fraction (paf) due to mass gatherings and large numbers of contacts.
* Novel coronavirus officially covid-19 has been detected since december 2019 and it has become a global health issue concern today. according to the statistics from the vietnam’s ministry of health, until 13february 2020, vietnam has fifteen positive cases with covid-19, which one of those is a 3-month-old baby (ministry of health, 2020). it is estimated that the covid-19 outbreak will be reached the top in the next ten days due to the excessive worrying and wrong behaviors towards the virus (thu, 2020). in this letter, the author presents three noticeable issues based on the current situation in vietnam and efforts that nurses should do.
* The rapid spread of covid-19 has revealed the need to understand how population dynamics interact with pandemics now and in the future. population ageing is currently more pronounced in wealthier countries, which mercifully may lessen the impact of this pandemic on poorer countries with weaker health systems but younger age structures. it is plausible that poor general health status and coinfections such as tuberculosis may still increase the danger of covid-19 among younger cases in these countries. thus far, the lower than expected number of cases detected in africa (despite extensive trade and travel links with china), suggests that the young age structure of the continent may be protective of severe and thus detectable cases, or it may be undetected. beyond age structure, there are large sex differences in mortality that need to be understood -with men at higher risk -some of which may be accounted for by the stark differences in smoking rates by sex in asia. distributions of underlying comorbidities such as diabetes, hypertension and copd will likewise refine risk estimates. until these more nuanced data are available, the concentration of mortality risk in the oldest old ages remains one of the best tools we have to predict the burden of critical cases and thus more precise planning of availability of hospital beds, staff and other resources.at this moment, few countries are routinely releasing their covid-19 data with key demographic information such age, sex, or comorbidities.

In [None]:

query = 'What are the economic impacts of covid-19 pandemic, what are different socio-economic and behavioral factors arised as a result of covid-19 that can affect economy? What is the difference between groups for risk for COVID-19 by education level? by income? by race and ethnicity? by contact with wildlife markets? by occupation? household size? for institutionalized vs. non-institutionalized populations (long-term hospitalizations, prisons)?'
df_output = get_answer(query, 15)
df_output = df_output.merge(df_train, how='left', on='tag')
df_output = df_output[['publish_time', 'authors', 'title', 'excerpt']]
df_output.to_csv('./economic_behavioral_factors.csv', index=False)