![logo](https://drive.google.com/uc?id=1VrvlBTHH4D7xsrNp74wtLBamMZygG8Sy)

## Table of Contents
1. [Introduction](#introduction)
2. [Task Overview](#task_overview)
3. [Summary](#task_summary)
4. [Risk factors and respective studies](#risk_factors)
5. [Defining workflow](#task_workflow)
6. [Case Study: Heart Disease Risk Factors](#task_heart)
7. [Our Notebooks](#task_notebooks)
8. [Why we did what we did](#task_reason)
9. [Next Goals (June deadline)](#task_next)
10. [Daily calls](#task_calls)
11. [Appendix](#task_appendix)
12. [Credits](#task_credits)





# 1. Introduction <a id="introduction"></a>
This is a notebook created by a collaborative effort of CoronaWhy.org, multi-disciplinary global effort of volunteers. We are presenting this notebook to address **"What do we know about COVID-19 risk factors?" question.**


- Visit our [website](https://www.coronawhy.org) to learn more.
- Read our [story](https://medium.com/@arturkiulian/im-an-ai-researcher-and-here-s-how-i-fight-corona-1e0aa8f3e714).
- Visit our [main notebook](https://www.kaggle.com/arturkiulian/coronawhy-org-global-collaboration-join-slack) for historical context on how this community started.


# 2. Task Overview <a id="task_overview"></a>

A major topic of interest among researchers is the study of the various risk factors related to COVID-19. A risk factor is anything that increases the chance of being infected, or affects the severity or the survival outcome of the infection. Many of the papers in the dataset are studies on the severity and outcome of the infection, without, however, any systematic documentation that would be easily searchable.

The focus of this study is to extract and present in a meaningful and easily accessible way scientific papers that are related to risk factors associated with viral diseases through a procedure that can be automated as much as possible. 

At the current stage, a semi-automated approach is implemented using manual review of retrieved papers. It is important to note that through the proposed procedure a small subset of papers is manually reviewed, the ones that are identified as most probable to be relevant to a specific risk factor. This brings the volume of papers for review down to less than 100-200 instead of multiple thousands, rendering the review task feasible in much shorter timeframes.

Also, at the current stage the paper extraction is limited to the following factors:
* Environmental: Pollution, Population Density, Humidity, Temperature
* Comorbidity: Heart diseases
* Demographics: Senior age
* Lifestyle: Smoking

The above risk factors were identified as being the most important by the medical community. An extensive list of risk factors is provided under [section 4](#risk_factors) below and is subject of a future extension of this study.


In [None]:
from IPython.display import HTML
HTML('<iframe width="640" height="400" src="https://www.loom.com/embed/78d87335e2e0400aa31f77c2ee8876ca" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>')

# 3. Summary Page (findings) <a id="task_summary"></a>



## Evidence for association of risk factor with COVID-19




### Heart Disease Risks (overview)
We've identified 139 papers relevant to this risk factor across CORD-19 dataset, here we are presenting top 20 by the number of keyword occurences

**Visualization of n-grams**
![heart-n](https://drive.google.com/uc?id=1w6_SVtb7w6fIBWeZZCFK8aelSMlgzhO_)


### Top 20 most relevant papers when it comes to heart disease as a risk factor:


| Risk Factor | Title   | Keyword/Ngram | No of keyword occurences | URL |
| ----- | ------------------------- | ------- | ----- | --------------- |
| heart risks | Trends in 30-Day Readmission Rates for Medicare and Non-Medicare Patients in the Era of the Affordable Care Act | ['heart failure'] | 11 | ['https://doi.org/10.1016/j.amjmed.2018.06.013'] | 
| heart risks | Effects of hypertension, diabetes and coronary heart disease on COVID-19 diseases severity: a systematic review and meta-analysis | ['heart disease'] | 11 | ['https://doi.org/10.1101/2020.03.25.20043133'] | 
| heart risks | ACR Appropriateness Criteria® on Acute Respiratory Illness | ['congestive heart failure', 'heart failure'] | 10 | ['https://doi.org/10.1016/j.jacr.2009.06.022'] | 
| heart risks | Design and implementation of needs-specific critical care response teams | ['heart disease', 'cardiac arrest', 'heart failure'] | 10 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4095497/'] | 
| heart risks | Prevalence of comorbidities in cases of Middle East respiratory syndrome coronavirus: a retrospective study | ['congestive heart failure', 'heart disease', 'heart failure'] | 9 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6518603/'] | 
| heart risks | Spontaneous pathology of the common marmoset (Callithrix jacchus) and tamarins (Saguinus oedipus, Saguinus mystax) | ['congestive heart failure', 'heart disease', 'heart failure'] | 8 | ['http://europepmc.org/articles/pmc2740810?pdf=render'] | 
| heart risks | High incidence of respiratory viruses in critically ill adult patients with respiratory failure | ['cardiac arrest', 'heart failure'] | 8 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3642395/'] | 
| heart risks | Clinical Manifestations, Laboratory Findings, and Treatment Outcomes of SARS Patients | ['congestive heart failure', 'cardiac arrhythmia', 'heart disease', 'heart failure'] | 8 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3323212/'] | 
| heart risks | Soluble Angiotensin Converting Enzyme 2 in Human Heart Failure: Relation with Myocardial Function and Clinical Outcomes | ['heart failure'] | 7 | ['http://europepmc.org/articles/pmc3179261?pdf=render'] | 
| heart risks | Severe Morbidity and Mortality Associated With Respiratory Syncytial Virus Versus Influenza Infection in Hospitalized Older Adults | ['congestive heart failure', 'heart failure'] | 6 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6603263/'] | 
| heart risks | ESICM LIVES 2016: part one: Milan, Italy. 1-5 October 2016 | ['heart disease', 'cardiac arrest', 'heart failure'] | 6 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5042924/'] | 
| heart risks | Increase in methicillin-resistant Staphylococcus aureus acquisition and change in pathogen pattern associated with outbreaks of severe acute respiratory syndrome (SARS) | ['heart disease', 'cardiac arrest', 'heart failure'] | 6 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4099807/'] | 
| heart risks | Angiotensin Peptides and Nitric Oxide in Cardiovascular Disease | ['heart failure'] | 6 | ['http://europepmc.org/articles/pmc3771546?pdf=render'] | 
| heart risks | Human complement receptor type 1 (CR1) protein levels and genetic variants in chronic Chagas Disease | ['heart failure'] | 6 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5765048/'] | 
| heart risks | Association between Serum Angiotensin-converting Enzyme 2 Level with Postoperative Morbidity and Mortality after Major Pulmonary Resection in Non-small Cell Lung Cancer Patients | ['heart disease'] | 5 | ['https://doi.org/10.1016/j.hlc.2013.12.013'] | 
| heart risks | Coronavirus and Other Respiratory Illnesses Comparing Older with Young Adults | ['heart disease'] | 5 | ['https://doi.org/10.1016/j.amjmed.2015.05.034'] | 
| heart risks | Myocarditis and idiopathic dilated cardiomyopathy | ['heart failure'] | 5 | ['https://doi.org/10.1016/s0002-9343(99)80164-8'] | 
| heart risks | Clinical and Laboratory Features of Severe Acute Respiratory Syndrome Vis-À-Vis Onset of Fever | ['congestive heart failure', 'heart failure'] | 5 | ['https://doi.org/10.1378/chest.126.2.509'] | 
| heart risks | The impact of CPR and AED training on healthcare professionals' self-perceived attitudes to performing resuscitation | ['cardiac arrest'] | 5 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3352321/'] | 
| heart risks | Management of the ACC/AHA Stage D Patient Cardiac Transplantation | ['heart disease', 'heart failure'] | 5 | ['https://doi.org/10.1016/j.ccl.2013.09.004'] | 





### Top 10 papers on heart risk disease as a risk factor (qualified by human input):

| factor | title | number of keyword occurences | note on relevancy | URL |
|------|------|------|
| Heart Disease | 36th International Symposium on Intensive Care and Emergency Medicine: Brussels, Belgium. 15-18 March 2016 | 17 | Highlights of an annual emergency medicine symposium | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5493079/
| Heart Disease | Clinical Features, Severity, and Incidence of RSV Illness During 12 Consecutive Seasons in a Community Cohort of Adults ≥60 Years Old | 14 | "The relative risk of a serious outcome was significantly increased in persons aged ≥75 years (vs 60–64 years) and in those with chronic obstructive pulmonary disease or congestive heart failure." | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6306566/
| Heart Disease | Pneumonia Pathogen Characterization Is an Independent Determinant of Hospital Readmission | 13 | "Other independent predictors of 90-day readmission for pneumonia patients were Charlson comorbidity score [includes heart disease] > 4, cirrhosis, and chronic kidney disease." | https://doi.org/10.1378/chest.14-2129
| Heart Disease | Network-based analysis of comorbidities risk during an infection: SARS and HIV case studies | 12 | "Network-based analysis of comorbidities risk during an infection: SARS and HIV case studies"so there is a line about for SARS "Most of the deaths were attributed to complications related to sepsis, ARDS and multiorgan failure, which occurred commonly in the elderly for comorbidities [34]. Age and comorbidity (e.g. diabetes mellitus, heart disease) were consistently found to be significant independent predictors of various adverse outcomes in SARS [35]." | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4363349/
| Heart Disease | Prevalence of chronic comorbidities in dengue fever and West Nile virus: A systematic review and meta-analysis | 12 | One of the most ideal studies I've seen so far for this. Specifically looks at the prevalence of comorbidities like heart disease in flavivirus infections. | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6039036/
| Heart Disease | A survey of UK acute clinicians' knowledge of personal protective requirements for infectious diseases and chemical, biological, and radiological warfare agents | 11 | study on PPE knowledge, unrelated to heart disease | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4472707/
| Heart Disease | Human coronavirus ocurrence in different populations of Sao Paulo: A comprehensive nine-year study using a pancoronavirus RT-PCR assay | 11 | direct risk factor check | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3804219/
| Heart Disease | Lower airway sampling greatly increases detection of respiratory viruses in critically ill patients: the COURSE study | 11 | study looked at tracheal aspirate vs nasopharynx swab for viral sampling. Note: the attached PDF is from an international symposium on intensive care and emergency medicine and includes a ton of studies. | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4069511/
| Heart Disease | Trends in 30-Day Readmission Rates for Medicare and Non-Medicare Patients in the Era of the Affordable Care Act | 11 | Study is on hospital re-admission rates for those on Medicare | https://doi.org/10.1016/j.amjmed.2018.06.013
| Heart Disease | Effects of hypertension, diabetes and coronary heart disease on COVID-19 diseases severity: a systematic review and meta-analysis | 11 | Effect of heart disease on COVID19 severity | https://doi.org/10.1101/2020.03.25.20043133





### Age (overview)
We've identified 77 papers relevant to this risk factor across CORD-19 dataset, here we are presenting top 20 by the number of keyword occurences

**Visualization of 2-grams**
![heart-n](https://drive.google.com/uc?id=1Jy5bKcINnV1lydRoqXM4WuK6p_NRtoge)

**Visualization of 3-grams**
![heart-n](https://drive.google.com/uc?id=13oaIS-xpzqZ-eTCqpW4qVaBbljzsIwfd)


### Top 20 most relevant papers when it comes to age as a risk factor:

| Risk Factor | Title   | Keyword/Ngram | No of keyword occurences | URL |
| ----- | ------------------------- | ------- | ----- | --------------- |
| age | Estimates of the severity of coronavirus disease 2019: a model-based analysis | ['60 years and over', 'older age group'] | 7 | https://doi.org/10.1016/s1473-3099(20)30243-7 | 
| age | Burden, seasonal pattern and symptomatology of acute respiratory illnesses with different viral aetiologies in children presenting at outpatient clinics in Hong Kong | ['older age group'] | 5 | https://doi.org/10.1016/j.cmi.2015.05.027 | 
| age | Rhinitis, Asthma and Respiratory Infections among Adults in Relation to the Home Environment in Multi-Family Buildings in Sweden | ['65 years old'] | 5 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4138153/ | 
| age | Clinical Features, Severity, and Incidence of RSV Illness During 12 Consecutive Seasons in a Community Cohort of Adults ≥60 Years Old | ['60 years old'] | 5 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6306566/ | 
| age | Perception of epidemic's related anxiety in the General French Population: a cross-sectional study in the Rhône-Alpes region | ['over 60 years', '60 years old', 'among the elderly'] | 4 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874530/ | 
| age | The aging lung | ['aging population'] | 4 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3825547/ | 
| age | Seasonal influenza vaccine effectiveness against laboratory-confirmed influenza in 2015–2016: a hospital-based test-negative case–control study in Lithuania | ['65 years old'] | 4 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5652622/ | 
| age | Infections in travellers returning to Turkey from the Arabian peninsula: a retrospective cross-sectional multicenter study | ['65 years old'] | 3 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7087946/ | 
| age | XXIV World Allergy Congress 2015: Seoul, Korea. 14-17 October 2015 | ['60 years old'] | 3 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4896250/ | 
| age | New approach to identifying proper thresholds for a heat warning system using health risk increments | ['above 65 years'] | 3 | https://doi.org/10.1016/j.envres.2018.12.059 | 
| age | Particulate air pollution on cardiovascular mortality in the tropics: impact on the elderly | ['65 years old', 'among the elderly'] | 3 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6471752/ | 
| age | Open drug discovery for the Zika virus | ['over 60 years'] | 3 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4841202/ | 
| age | Influenza-associated Deaths in Tropical Singapore | ['among the elderly'] | 2 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3293465/ | 
| age | The Tsinghua–Lancet Commission on Healthy Cities in China: unlocking the power of cities for a healthy China | ['over 60 years', '60 years old'] | 2 | https://doi.org/10.1016/s0140-6736(18)30486-0 | 
| age | Comparison of Rates of Hospitalization Between Single and Dual Virus Detection in a Mexican Cohort of Children and Adults With Influenza-Like Illness | ['60 years old'] | 2 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6824528/ | 
| age | International Society for Disease Surveillance Conference 2011: Building the Future of Public Health Surveillance: Building the Future of Public Health Surveillance | ['65 years old'] | 2 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3261719/ | 
| age | ESICM LIVES 2016: part one: Milan, Italy. 1-5 October 2016 | ['patients older than'] | 2 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5042924/ | 
| age | The Incidence of Respiratory Tract Infection in Adults Requiring Hospitalization for Asthma | ['patients older than'] | 2 | https://doi.org/10.1378/chest.112.3.591 | 
| age | Increase in methicillin-resistant Staphylococcus aureus acquisition and change in pathogen pattern associated with outbreaks of severe acute respiratory syndrome (SARS) | ['patients older than'] | 2 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4099807/ | 
| age | Delays in Service for Non-Emergent Patients Due to Arrival of Emergent Patients in the Emergency Department: A Case Study in Hong Kong | ['over 65 years', '65 years old'] | 2 | https://doi.org/10.1016/j.jemermed.2012.11.102 | 




### Top 10 papers on age as a risk factor (qualified by human input):


| factor | title | number of keyword occurences | note on relevancy | URL |
|------|------|------|
| Old Age | Estimates of the severity of coronavirus disease 2019: a model-based analysis | 7 | with substantially higher ratios in older age groups (0·32% [0·27–0·38] in those aged <60 years vs 6·4% [5·7–7·2] in those aged ≥60 years | https://doi.org/10.1016/s1473-3099(20)30243-7
| Old Age | Burden, seasonal pattern and symptomatology of acute respiratory illnesses with different viral aetiologies in children presenting at outpatient clinics in Hong Kong | 5 | Co-detection of more than one virus was, overall, more frequent in the younger age group than in the older age group (5.6% vs. 2.8%) | https://doi.org/10.1016/j.cmi.2015.05.027
| Old Age | Rhinitis, Asthma and Respiratory Infections among Adults in Relation to the Home Environment in Multi-Family Buildings in Sweden | 5 |  | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4138153/
| Old Age | Clinical Features, Severity, and Incidence of RSV Illness During 12 Consecutive Seasons in a Community Cohort of Adults ≥60 Years Old | 5 | The relative risk of a serious outcome was significantly increased in persons aged ≥75 years (vs 60–64 years) | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6306566/
| Old Age | Perception of epidemic's related anxiety in the General French Population: a cross-sectional study in the Rhône-Alpes region | 4 |  | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874530/
| Old Age | The aging lung | 4 | COPD has the highest prevalence in the elderly and deserves special consideration in regard to treatment in this fragile population | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3825547/
| Old Age | Seasonal influenza vaccine effectiveness against laboratory-confirmed influenza in 2015–2016: a hospital-based test-negative case–control study in Lithuania | 4 |  | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5652622/
| Old Age | Infections in travellers returning to Turkey from the Arabian peninsula: a retrospective cross-sectional multicenter study | 3 |  | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7087946/
| Old Age | XXIV World Allergy Congress 2015: Seoul, Korea. 14-17 October 2015 | 3 |  | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4896250/



### Humidity (overview)
We've identified 126 papers relevant to this risk factor across CORD-19 dataset, here we are presenting top 20 by the number of keyword occurences


**Visualization of 2-grams**
![heart-n](https://drive.google.com/uc?id=1BZyRD2QhdydS_bb2zqP0VhVHF69E6jpp)

**Visualization of 3-grams**
![heart-n](https://drive.google.com/uc?id=16GjTMqZ1fcGSlTB1yZ_ACAQvHmipvzsU)


### Top 20 most relevant papers when it comes to humidity as a risk factor:

| Risk Factor | Title                                                        | Keyword/Ngram                                | No of keyword occurences | URL                                                       |
| ----------- | ------------------------------------------------------------ | -------------------------------------------- | ------------------------ | --------------------------------------------------------- |
| humidity    | Association between viral seasonality and meteorological factors | ['humidity']                                 | 66                       | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6353886/'] |
| humidity    | Large-scale Lassa fever outbreaks in Nigeria: quantifying the association between disease reproduction number and local rainfall | ['rainy', 'rainfall']                        | 52                       | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7019145/'] |
| humidity    | Large-scale Lassa fever outbreaks in Nigeria: quantifying the association between disease reproduction number and local rainfall | ['rainfall']                                 | 39                       | ['https://doi.org/10.1101/602706']                        |
| humidity    | Weather-Dependent Risk for Legionnaires’ Disease, United States | ['humidity', 'rainfall']                     | 27                       | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5652433/'] |
| humidity    | Long-Term Prediction of Emergency Department Revenue and Visitor Volume Using Autoregressive Integrated Moving Average Model | ['humidity', 'rainfall']                     | 26                       | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3235663/'] |
| humidity    | Effect modification of environmental factors on influenza-associated mortality: a time-series study in two Chinese cities | ['humidity']                                 | 24                       | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3265445/'] |
| humidity    | The Effects of Temperature and Relative Humidity on the Viability of the SARS Coronavirus | ['humidity']                                 | 20                       | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3265313/'] |
| humidity    | Effect of temperature and relative humidity on ultraviolet (UV254) inactivation of airborne porcine respiratory and reproductive syndrome virus | ['humidity']                                 | 17                       | ['https://doi.org/10.1016/j.vetmic.2012.03.044']          |
| humidity    | RNA viruses in community-acquired childhood pneumonia in semi-urban Nepal; a cross-sectional study | ['humidity', 'monsoon', 'rainy', 'rainfall'] | 16                       | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2727531/'] |
| humidity    | The role of absolute humidity on transmission rates of the COVID-19 outbreak | ['humidity']                                 | 14                       | ['https://doi.org/10.1101/2020.02.12.20022467']           |
| humidity    | Short Term Effects of Weather on Hand, Foot and Mouth Disease | ['rainfall']                                 | 13                       | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3037951/'] |
| humidity    | Decline in temperature and humidity increases the occurrence of influenza in cold climate | ['humidity']                                 | 12                       | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3978084/'] |
| humidity    | Exploration of the effects of classroom humidity levels on teachers’ respiratory symptoms | ['humidity']                                 | 12                       | ['http://europepmc.org/articles/pmc4873430?pdf=render']   |
| humidity    | Respiratory viral infections and effects of meteorological parameters and air pollution in adults with respiratory symptoms admitted to the emergency room | ['humidity', 'rainfall']                     | 12                       | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177797/'] |
| humidity    | Seasonal evaluation of bioaerosols from indoor air of residential apartments within the metropolitan area in South Korea | ['humidity']                                 | 11                       | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7087851/'] |
| humidity    | Prevalence, clinical outcomes and rainfall association of acute respiratory infection by human metapneumovirus in children in Bogotá, Colombia | ['rainy', 'rainfall']                        | 11                       | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6785857/'] |
| humidity    | Climate affects global patterns of COVID-19 early outbreak dynamics | ['humidity']                                 | 10                       | ['https://doi.org/10.1101/2020.03.23.20040501']           |
| humidity    | A climatologic investigation of the SARS-CoV outbreak in Beijing, China | ['humidity']                                 | 10                       | ['https://doi.org/10.1016/j.ajic.2005.12.006']            |
| humidity    | Risk Distribution of Human Infections with Avian Influenza H7N9 and H5N1 virus in China | ['humidity']                                 | 9                        | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4686887/'] |



### Top 10 papers on humidity as a risk factor (qualified by human input):

| factor | title | number of keyword occurences | note on relevancy | URL |
|------|------|------|
| Old Age | Estimates of the severity of coronavirus disease 2019: a model-based analysis | 7 | with substantially higher ratios in older age groups (0·32% [0·27–0·38] in those aged <60 years vs 6·4% [5·7–7·2] in those aged ≥60 years | https://doi.org/10.1016/s1473-3099(20)30243-7
| Old Age | Burden, seasonal pattern and symptomatology of acute respiratory illnesses with different viral aetiologies in children presenting at outpatient clinics in Hong Kong | 5 | Co-detection of more than one virus was, overall, more frequent in the younger age group than in the older age group (5.6% vs. 2.8%) | https://doi.org/10.1016/j.cmi.2015.05.027
| Old Age | Rhinitis, Asthma and Respiratory Infections among Adults in Relation to the Home Environment in Multi-Family Buildings in Sweden | 5 |  | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4138153/
| Old Age | Clinical Features, Severity, and Incidence of RSV Illness During 12 Consecutive Seasons in a Community Cohort of Adults ≥60 Years Old | 5 | The relative risk of a serious outcome was significantly increased in persons aged ≥75 years (vs 60–64 years) | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6306566/
| Old Age | Perception of epidemic's related anxiety in the General French Population: a cross-sectional study in the Rhône-Alpes region | 4 |  | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874530/
| Old Age | The aging lung | 4 | COPD has the highest prevalence in the elderly and deserves special consideration in regard to treatment in this fragile population | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3825547/
| Old Age | Seasonal influenza vaccine effectiveness against laboratory-confirmed influenza in 2015–2016: a hospital-based test-negative case–control study in Lithuania | 4 |  | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5652622/
| Old Age | Infections in travellers returning to Turkey from the Arabian peninsula: a retrospective cross-sectional multicenter study | 3 |  | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7087946/
| Old Age | XXIV World Allergy Congress 2015: Seoul, Korea. 14-17 October 2015 | 3 |  | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4896250/
| Old Age | New approach to identifying proper thresholds for a heat warning system using health risk increments | 3 |  | https://doi.org/10.1016/j.envres.2018.12.059



### Pollution (overview)
We've identified 15 papers relevant to this risk factor across CORD-19 dataset, here we are presenting top 15 by the number of keyword occurences


**Visualization of 2-grams**
![heart-n](https://drive.google.com/uc?id=1OuZbEiwvhWZjRIm-0H7Lv2_aUlpsNyQv)

**Visualization of 3-grams**




### Top 15 most relevant papers when it comes to pollution as a risk factor:

| Risk Factor | Title   | Keyword/Ngram | No of keyword occurences | URL |
| ----- | ------------------------- | ------- | ----- | --------------- |
| pollution | Air pollution and case fatality of SARS in the People's Republic of China: an ecologic study | ['air pollution and', 'between air pollution', 'of air pollution'] | 9 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC293432/'] | 
| pollution | Risk factors for severe acute lower respiratory infections in children – a systematic review and meta-analysis | ['indoor air pollution'] | 5 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3641871/'] | 
| pollution | Comparison of the Effects of Air Pollution on Outpatient and Inpatient Visits for Asthma: A Population-Based Study in Taiwan | ['air pollution and', 'air pollution is', 'between air pollution', 'of air pollution'] | 4 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4006842/'] | 
| pollution | Particulate air pollution on cardiovascular mortality in the tropics: impact on the elderly | ['of air pollution', 'particulate air pollution', 'air pollutant data'] | 4 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6471752/'] | 
| pollution | Chapter 9 Environmental and Occupational Health | ['water pollution'] | 3 | ['https://doi.org/10.1016/b978-0-12-415766-8.00009-4'] | 
| pollution | Seasonality, ambient temperatures and hospitalizations for acute exacerbation of COPD: a population-based study in a metropolitan area | ['air pollution and', 'between air pollution'] | 2 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4431472/'] | 
| pollution | Characterization of respiratory infection viruses in hospitalized children from Naples province in Southern Italy | ['air pollution and', 'between air pollution'] | 2 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5958661/'] | 
| pollution | Rapid health transition in China, 1990–2010: findings from the Global Burden of Disease Study 2010 | ['household air pollution'] | 2 | ['https://doi.org/10.1016/s0140-6736(13)61097-1'] | 
| pollution | XXIV World Allergy Congress 2015: Seoul, Korea. 14-17 October 2015 | ['between air pollution', 'of air pollution'] | 2 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4896250/'] | 
| pollution | Burden of lower respiratory infections in the Eastern Mediterranean Region between 1990 and 2015: findings from the Global Burden of Disease 2015 study | ['household air pollution'] | 2 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5973986/'] | 
| pollution | Investigation of the performance of TiO2 photocatalytic coatings | ['indoor air pollutants'] | 1 | ['https://doi.org/10.1016/j.cej.2010.11.061'] | 
| pollution | Quantifying risks and interventions that have affected the burden of lower respiratory infections among children younger than 5 years: an analysis for the Global Burden of Disease Study 2017 | ['household air pollution'] | 1 | ['https://doi.org/10.1016/s1473-3099(19)30410-4'] | 
| pollution | Public awareness of risk factors for cancer among the Japanese general population: A population-based survey | ['air pollution and'] | 1 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1351169/'] | 
| pollution | Size-Segregated Particle Number Concentrations and Respiratory Emergency Room Visits in Beijing, China | ['of air pollution'] | 1 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3080933/'] | 
| pollution | The impact of ambient fine particles on influenza transmission and the modification effects of temperature in China: A multi-city study | ['air pollution is'] | 1 | ['https://doi.org/10.1016/j.envint.2016.10.004'] | 


### Top 10 papers on pollution as a risk factor (qualified by human input):

| factor | title | number of keyword occurences | note on relevancy | URL |
|------|------|------|
| Pollution |  | 9 | Reductions in household air pollution globally were significantly associated with declines in under-5 lower respiratory infection mortality rates between 1990 and 2017 (Global Burden of Disease Study) | 
| Pollution |  | 4 | Irrelevance- A study looking into public awareness of the risk factors for cancer | 
| Pollution |  | 4 | "Associations of respiratory Emergency Room Visits (ERV) with NO2 concentrations and 100–1,000 nm particle number or surface area concentrations were of similar magnitude—that is, approximately 5% increase in respiratory ERV" "Particles < 50 nm were not positively associated with ERV, whereas particles 50–100 nm were adversely associated with respiratory ERV, both being fractions of ultrafine particles" in Beijing. ERVs concerned URIs, pneumonia, acute bronchitis and LRIs. | 
| Pollution |  | 3 | No significant association was found between air pollution and the number of hospitalisations from COPD exacerbations, however ambient temperature was found to have a significant association | 
| Pollution | Seasonality, ambient temperatures and hospitalizations for acute exacerbation of COPD: a population-based study in a metropolitan area | 2 | Irrelevance- study investigates the performance of a catalyst | 
| Pollution |  | 2 | Irrelevance- a book chapter as opposed to a study | 
| Pollution |  | 2 | Hypothesises an association between pollution and acute respiratory infections in children, but does not measure it | 
| Pollution |  | 2 | "Ambient particulate matter concentrations with aerodynamic diameter <2.5 μm (PM2.5) was found significantly associated with influenza incidence at lag 2–3 days, with RR of 1.020. The RR of influenza transmission associated with PM2.5 was higher for cold compared with hot days. Overall, 10.7% of incident influenza cases may result from exposure to ambient PM2.5 in China"" | 
| Pollution | Investigation of the performance of TiO2 photocatalytic coatings | 1 | Indoor air pollution was found to have significant association with acute lower respiratory infections in children (OR= 1.57) | 
| Pollution |  | 1 | Ambient air pollution and household air pollution were the fourth and fifth leading risks of the age-standardised DALY rate in China during 2010 (Global Burden of Disease Study) | 



### Temperature (overview)
We've identified 624 papers relevant to this risk factor across CORD-19 dataset, here we are presenting top 20 by the number of keyword occurences


**Visualization of 2-grams**
![heart-n](https://drive.google.com/uc?id=18NY5gF51W_v0-DOQMC0Nx2BuUEUWU9kc)

**Visualization of 3-grams**
![heart-n](https://drive.google.com/uc?id=1Y8UrwZ9mO33ty0F5HvT5K3aKjz1XpvHz)



### Top 20 most relevant papers when it comes to temperature as a risk factor:

| Risk Factor | Title   | Keyword/Ngram | No of keyword occurences | URL |
| ----- | ------------------------- | ------- | ----- | --------------- |
| temperature | Climate Change and Human Health Impacts in the United States: An Update on the Results of the U.S. National Assessment | ['summer', 'winter', 'weather', 'climate'] | 97 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1570072/'] | 
| temperature | The 1918–1919 influenza pandemic in England and Wales: spatial patterns in transmissibility and mortality impact | ['winter', 'autumn'] | 72 | ['http://europepmc.org/articles/pmc2596813?pdf=render'] | 
| temperature | Seasonal Variation of Newly Notified Pulmonary Tuberculosis Cases from 2004 to 2013 in Wuhan, China | ['tropic', 'summer', 'winter', 'spring', 'autumn', 'climate'] | 69 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4193739/'] | 
| temperature | Variability and Diversity of Nasopharyngeal Microbiota in Children: A Metagenomic Analysis | ['winter', 'spring'] | 62 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3046172/'] | 
| temperature | The Incidence of Respiratory Tract Infection in Adults Requiring Hospitalization for Asthma | ['summer', 'winter', 'spring', 'autumn'] | 48 | ['https://doi.org/10.1378/chest.112.3.591'] | 
| temperature | The prevalence of preterm birth and season of conception | ['summer', 'winter', 'spring', 'autumn'] | 40 | ['http://europepmc.org/articles/pmc4288966?pdf=render'] | 
| temperature | Laboratory epidemiology of respiratory viruses in a large children's hospital: A STROBE-compliant article | ['summer', 'winter', 'spring', 'autumn'] | 39 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6078760/'] | 
| temperature | Performance of cows and summer-born calves and economics in semi-confined and confined beef systems | ['summer', 'winter'] | 39 | ['https://doi.org/10.15232/aas.2019-01858'] | 
| temperature | Challenges in developing methods for quantifying the effects of weather and climate on water-associated diseases: A systematic review | ['weather', 'climate'] | 39 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5481148/'] | 
| temperature | Climate effect on COVID-19 spread rate: an online surveillance tool | ['climate'] | 38 | ['https://doi.org/10.1101/2020.03.26.20044727'] | 
| temperature | The threat of climate change to non-dengue-endemic countries: increasing risk of dengue transmission potential using climate and non-climate datasets | ['tropic', 'summer', 'winter', 'climate'] | 31 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6625070/'] | 
| temperature | Apical barriers to airway epithelial cell gene transfer with amphotropic retroviral vectors | ['tropic'] | 29 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7091907/'] | 
| temperature | Seasonal evaluation of bioaerosols from indoor air of residential apartments within the metropolitan area in South Korea | ['summer', 'winter', 'spring', 'autumn'] | 26 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7087851/'] | 
| temperature | Seasonal variation of respiratory pathogen colonization in asymptomatic health care professionals: A single-center, cross-sectional, 2-season observational study | ['summer', 'winter'] | 26 | ['https://doi.org/10.1016/j.ajic.2015.04.195'] | 
| temperature | Global networks and global change-induced tipping points | ['climate'] | 25 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7104618/'] | 
| temperature | Respiratory viruses are continuously detected in children with chronic tonsillitis throughout the year | ['summer', 'winter', 'spring', 'autumn'] | 25 | ['https://doi.org/10.1016/j.ijporl.2014.07.015'] | 
| temperature | Effects of school breaks on influenza-like illness incidence in a temperate Chinese region: an ecological study from 2008 to 2015 | ['temperate', 'summer', 'winter'] | 24 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5353286/'] | 
| temperature | Transmission Potential of Chikungunya Virus and Control Measures: The Case of Italy | ['temperate', 'tropic', 'summer', 'climate'] | 24 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3086881/'] | 
| temperature | Climate affects global patterns of COVID-19 early outbreak dynamics | ['temperate', 'climate'] | 23 | ['https://doi.org/10.1101/2020.03.23.20040501'] | 
| temperature | Molecular detection of bovine coronavirus in a diarrhea outbreak in pasture-feeding Nellore steers in southern Brazil | ['cold weather', 'tropical weather', 'tropic', 'summer', 'winter', 'weather'] | 23 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7088806/'] | 

_______

### Top 10 papers on temperature as a risk factor (qualified by human input):

| factor | title | number of keyword occurences | note on relevancy | URL |
|------|------|------|
| Air Temp | Evidence of human coronavirus HKU1 and human bocavirus in Australian children | 15 |  | https://doi.org/10.1016/j.jcv.2005.09.008
| Air Temp | Investigation of the performance of TiO2 photocatalytic coatings | 7 | laboratory conditions for plating | https://doi.org/10.1016/j.cej.2010.11.061
| Air Temp | Climate affects global patterns of COVID-19 early outbreak dynamics | 23 | climate effects | https://doi.org/10.1101/2020.03.23.20040501
| Air Temp | Modeling respiratory illnesses with change point: A lesson from the SARS epidemic in Hong Kong | 6 |  | https://doi.org/10.1016/j.csda.2012.07.029
| Air Temp | The 12th Edition of the Scientific Days of the National Institute for Infectious Diseases “Prof. Dr. Matei Bals” and the 12th National Infectious Diseases Conference: Bucharest, Romania. 23–25 November 2016 | 3 | whole journal printing | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5103241/
| Air Temp | Unique and Conserved Features of Genome and Proteome of SARS-coronavirus, an Early Split-off From the Coronavirus Group 2 Lineage | 1 | not topic of discussion | https://doi.org/10.1016/s0022-2836(03)00865-9
| Air Temp | Mécanismes d’émergence virale et transmission interespèces : l’exemple des rétrovirus Foamy simiens chezl’Homme en Afrique Centrale | 7 | in french, also does not appear to be topic | https://doi.org/10.1016/s0001-4079(19)31387-1
| Air Temp | Systemic Spread and Propagation of a Plant-Pathogenic Virus in European Honeybees, Apis mellifera | 11 | not a variable | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3903276/
| Air Temp | A study of the glycoproteins of Autographa californica nuclear polyhedrosis virus (AcNPV) | 3 | only abstract availible, no mention of temp | https://doi.org/10.1016/0042-6822(83)90548-2
| Air Temp | The epidemiology of hospitalized children with pneumococcal/lobar pneumonia and empyema from 1997 to 2004 in Taiwan | 11 | seasonality | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7086680/

### All papers on heart diseases risks validated using crowdsourced medical input

| Risk Factor | Paper   | Number of keyword occurences  |    |
|------|------|
| Heart Disease | Risk factor of acquiring COVID19 | 11 | https://doi.org/10.1101/2020.03.07.20031393 | 
| Heart Disease | "Univariate analysis showed that a severe outcome was significantly more frequent for patients with comorbidity (OR = 3.9), including prematurity or congenital disease (such as heart disease or cerebral malformation)" | 8 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3771918/ | 
| Heart Disease |  | 6 | https://doi.org/10.1016/j.amjmed.2015.05.034 | 
| Heart Disease | Heart diseases were classified as rapidly or ultimately fatal underlying diseases for SARS patients and shown to be associated with high case fatality rates | 6 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3323212/ | 
| Heart Disease | "presence of underlying diseases and congenital heart disease were more frequent in patients admitted to the PICU, and they also had a longer hospital length of stay, " p<0.05 | 4 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546210/ | 
| Heart Disease | analyses comorbidity of heart condtions with viruses | 4 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6603263/ | 
| Heart Disease | Impact of rotating levels of ACE in cardiac disease patients | 4 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7100072/ | 
| Heart Disease | Effect of heart disease on COVID19 severity | 4 | https://doi.org/10.1101/2020.03.25.20043133 | 
| Heart Disease | "" The incidence of ... cardiac injury (65% vs. 5.9%), ... in patients who died was significantly higher than those who recovered (all p<0.001) | 3 | https://doi.org/10.1101/2020.03.19.20033175 | 
| Heart Disease | "Among the comorbidities, DM, HTN, ischemic heart disease (IHD), congestive heart failure (CHF), ... showed significant associations with fatality" | 3 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6518603/ | 
| Heart Disease | Clinical outcomes of treatment for heart failure | 3 | http://europepmc.org/articles/pmc3179261?pdf=render | 
| Heart Disease | Chronic heart diseases had a statistically significant correlation with mortality amongst MERS patients | 3 | https://doi.org/10.1016/j.ejcdt.2015.11.011 | 
| Heart Disease | identified CHF as risk factor for signficant radiographic change on CXR | 2 | https://doi.org/10.1016/j.jacr.2009.06.022 | 
| Heart Disease | direct risk factor check | 2 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3804219/ | 
| Heart Disease | The bulk of death cases had comorbidity (76.8%), including hypertension (56.1%), heart disease (20.7%), diabetes (18.3%), cerebrovascular disease (12.2%), and cancer (7.3%) | 2 | https://doi.org/10.1101/2020.02.26.20028191 | 
| Heart Disease | Children with heart disease at risk of viral infections | 2 | https://doi.org/10.1016/j.ppedcard.2018.09.003 | 
| Heart Disease | "Regarding symptomatology, 185 cases" "Of the 122 cases (47%) with co-morbid conditions, the most common were hypertension (76 cases, 29%), diabetes (72 cases, 28%), heart disease (47 cases, 18%)" It connected with symptom risk but not mortality.  | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6074457/ | 
| Heart Disease | ""Results—851 patients met study criteria; 268 (31.5%) with mild, 503 (59.1%) with moderate, and 80 (9.4%) with severe illness. As expected, illness severity was directly associated with young age, prematurity, heart or lung disease, infection with RSV group A, and elevated concentrations of interleukin (IL)-2R, IL-6, CXCL8, tumor necrosis factor (TNF)-α, interferon (IFN)-α, CCL3, CCL4, and CCL2."" patients were all under 5 years old. | 1 | http://europepmc.org/articles/pmc3883981?pdf=render | 
| Heart Disease | Flu season stats "Among patients with severe forms, there was a slight predominance of males over females (58% national data vs 51% in our population). The median age of these patients was 60 years, and almost 84% presented at least one pre-existing risk factor for developing a severe illness (diabetes, cancer, cardiovascular diseases, chronic respiratory diseases, immunosuppression)." | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6797893/ | 
| Heart Disease | "Lastly the study was conducted in a tertiary care hospital with a non-homogeneous sample of patients in which two thirds of the participants had at least one known risk factor for viral pneumonia, such as chronic lung disease (67%), congenital heart disease (29%), and neoplasia (27%), so the results obtained should be taken into account in this context and not necessarily generalized to all pediatric patients with HAP." | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6532533/ | 
| Heart Disease | patient with ischemic heart disease or HTN more likely to require hospitalization | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3522360/ | 
| Heart Disease | "Other factors associated with severe disease were: chronic heart disease" | 1 | https://doi.org/10.1111/1469-0691.12044 | 
| Heart Disease | Cardiac injury found to be correlated with increased risk for being a non-survivor.  | 1 | https://doi.org/10.1016/s2213-2600(20)30079-5 | 
| Heart Disease | "major comorbid diseases for hospitalized patients treated with IABP were identified: acute coronary syndrome (ACS), CS, heart failure, fetal cardiac arrhythmia, acute myocarditis, and valvular heart disease" | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4483178/ | 
| Heart Disease | evaluated heart disease as comorbidity | 1 | https://doi.org/10.1016/j.gene.2017.12.022 | 
| Heart Disease |  | 1 | http://europepmc.org/articles/pmc2740810?pdf=render | 
| Heart Disease | directly discussed heart disease as risk factor for death | 1 | https://doi.org/10.1016/j.jfma.2011.06.010 | 
| Heart Disease | "Common underlying medical conditions reported included hypertension (22%), diabetes (14%), and heart disease (8%)" | 1 | https://doi.org/10.1016/j.jcv.2017.01.010 | 
| Heart Disease | One of the most ideal studies I've seen so far for this. Specifically looks at the prevalence of comorbidities like heart disease in flavivirus infections.  | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6039036/ | 
| Heart Disease | Another ideal study for this. Looks specifically at risk factors for MERS coronavirus illness in humans.   | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4696714/ | 
| Heart Disease | "The relative risk of a serious outcome was significantly increased in persons aged ≥75 years (vs 60–64 years) and in those with chronic obstructive pulmonary disease or congestive heart failure." | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6306566/ | 
| Heart Disease | discussed ischemic heart disease comorbidity as risk factor for nonvaccination | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6068768/ | 
| Heart Disease | looked at risk of adverse pregnancy outcomes in women with/without pneumonia. heart disease was included as a variable. | 1 | https://doi.org/10.1016/j.ajog.2012.08.023 | 
| Heart Disease | heart disease was more common in case population than control population | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3667930/ | 
| Heart Disease | unsure on this one, but addresses relationship between comorbidities like heart disease and severity of COVID. "Our systems biology approach offers a possible explanation for increase of COVID-19 severity in patients with certain comorbidities" | 1 | https://doi.org/10.1101/2020.03.21.20040261 | 
| Heart Disease | Leading causes of death in a region of china | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2774646/ | 
| Heart Disease | A systematic analysis of 25 articles surrounding the Middle East respiratory syndrome (MERS), found heart disease to be a clinical predictor of death associated with MERS and to hold the highest OR compared to other predictors | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5129628/ | 
| Heart Disease | Congenital heart disease was significantly higher in child pneumonia cases as opposed to the control  | 1 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6999894/ | 
|  |  |  |  | 
|  |  |  |  | 



### Population density (overview)
We've identified 28 papers relevant to this risk factor across CORD-19 dataset, here we are presenting top 20 by the number of keyword occurences




**Visualization of 2-grams**
![heart-n](https://drive.google.com/uc?id=17TQxh2uoEsf8lOh3q4CcVcwLFHluGF5O)

**Visualization of 3-grams**
![heart-n](https://drive.google.com/uc?id=1L-rlwgXSzxPyVCYOBpukwbYK3p8HWv1x)




### Top 20 most relevant papers when it comes to population density as a risk factor:

| Risk Factor | Title   | Keyword/Ngram | No of keyword occurences | URL |
| ----- | ------------------------- | ------- | ----- | --------------- |
| population density | Rhinitis, Asthma and Respiratory Infections among Adults in Relation to the Home Environment in Multi-Family Buildings in Sweden | ['densely populated', 'populated areas'] | 7 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4138153/'] | 
| population density | Ascertaining the impact of public rapid transit system on spread of dengue in urban settings | ['densely populated', 'populated areas'] | 5 | ['https://doi.org/10.1016/j.scitotenv.2017.04.050'] | 
| population density | A geographic analysis of population density thresholds in the influenza pandemic of 1918–19 | ['population densities', 'densely populated'] | 3 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3641965/'] | 
| population density | Association of HLA class I with severe acute respiratory syndrome coronavirus infection | ['densely populated', 'populated regions'] | 2 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC212558/'] | 
| population density | The effect of travel restrictions on the spread of a moderately contagious disease | ['densely populated', 'populated areas'] | 2 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1764026/'] | 
| population density | Identifying Live Bird Markets with the Potential to Act as Reservoirs of Avian Influenza A (H5N1) Virus: A Survey in Northern Viet Nam and Cambodia | ['densely populated', 'populated areas'] | 2 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3366999/'] | 
| population density | Estimation of Time-Dependent Reproduction Numbers for Porcine Reproductive and Respiratory Syndrome across Different Regions and Production Systems of the US | ['densely populated', 'populated regions'] | 2 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5380673/'] | 
| population density | Spatiotemporal diffusion of influenza A (H1N1): Starting point and risk factors | ['populous'] | 1 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6122785/'] | 
| population density | Seasonality, ambient temperatures and hospitalizations for acute exacerbation of COPD: a population-based study in a metropolitan area | ['densely populated'] | 1 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4431472/'] | 
| population density | Geographic Distribution and Risk Factors of the Initial Adult Hospitalized Cases of 2009 Pandemic Influenza A (H1N1) Virus Infection in Mainland China | ['populous'] | 1 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3192122/'] | 
| population density | Efficacy and gastrointestinal risk of aspirin used for the treatment of pain and cold | ['population studies'] | 1 | ['https://doi.org/10.1016/j.bpg.2012.01.008'] | 
| population density | Transmission or Within-Host Dynamics Driving Pulses of Zoonotic Viruses in Reservoir–Host Populations | ['population densities'] | 1 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4973921/'] | 
| population density | The threat of climate change to non-dengue-endemic countries: increasing risk of dengue transmission potential using climate and non-climate datasets | ['densely populated'] | 1 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6625070/'] | 
| population density | Mapping road network communities for guiding disease surveillance and control strategies | ['population densities'] | 1 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5856805/'] | 
| population density | Rapid changes in shape and number of MHC class II expressing cells in rat airways after Mycoplasma pulmonis infection | ['densely populated'] | 1 | ['https://doi.org/10.1016/s0008-8749(03)00026-1'] | 
| population density | Early real-time estimation of the basic reproduction number of emerging or reemerging infectious diseases in a community with heterogeneous contact pattern: Using data from Hong Kong 2009 H1N1 Pandemic Influenza as an illustrative example | ['densely populated'] | 1 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570805/'] | 
| population density | Trends in Tuberculosis in Taiwan, 2002–2008 | ['populous'] | 1 | ['https://doi.org/10.1016/s0929-6646(11)60076-4'] | 
| population density | Cross sectional survey of human-bat interaction in Australia: public health implications | ['populous'] | 1 | ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3908316/'] | 
| population density | Human rhinoviruses: The cold wars resume | ['populous'] | 1 | ['https://doi.org/10.1016/j.jcv.2008.04.002'] | 
| population density | Polio Eradication Initiative (PEI) contribution in strengthening public health laboratories systems in the African region | ['populous'] | 1 | ['https://doi.org/10.1016/j.vaccine.2016.05.055'] | 


_______

### Top 10 papers on population density as a risk factor (qualified by human input):

| factor | title | number of keyword occurences | note on relevancy | URL |
|------|------|------|
| Population Density | Spatiotemporal diffusion of influenza A (H1N1): Starting point and risk factors |  |  | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6122785/
| Population Density | Ascertaining the impact of public rapid transit system on spread of dengue in urban settings |  |  | https://doi.org/10.1016/j.scitotenv.2017.04.050
"| Population Density | A geographic analysis of population density thresholds in the influenza pandemic of 1918–19 |  | Specifically, it estimates a level of population density above which 
 policies to socially distance, redistribute or quarantine populations 
 are likely to be more effective than they are for areas with population 
 densities that lie below the threshold. | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3641965/"
| Population Density | Association of HLA class I with severe acute respiratory syndrome coronavirus infection |  | Areas with increased population density are more affected by SARS | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC212558/
| Population Density | Spatiotemporal diffusion of influenza A (H1N1): Starting point and risk factors |  | Population density represented a risk factor for disease intensity | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6122785/
| Population Density | Rhinitis, Asthma and Respiratory Infections among Adults in Relation to the Home Environment in Multi-Family Buildings in Sweden |  | correlation between population density and sinusitis | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4138153/
| Population Density | Geographic Distribution and Risk Factors of the Initial Adult Hospitalized Cases of 2009 Pandemic Influenza A (H1N1) Virus Infection in Mainland China |  | discusses distribution of H1N1 cases in China, mentioning the association with high-density areas such as urban settings | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3192122/

# 4. Risk factors and respective studies <a id="risk_factors"></a>



## Groups of factors (in progress) 

So it's logical to ask, is there any taxonomy for risk classification? Even non-medical professionals should be able to reason within the multi-dimensional space of different risk factors that we observe in our daily lives. That's why we've decided to list out all the possible ones and group them. Also, for most of these risk factors, we used a [thresher.io](https://thresher.io) tool in order to find out what percentage of papers in the CORD-19 dataset mention each risk factor.

### 1. Demographic Factors:

| Risk Factor | Search Query | Number of papers | % of papers |
|------|------|
| Age > 60 + underlying health conditions | "senior citizen" OR ("65 age"~3 OR "65 years") OR elderly | 3300 | 10% | 
| Men/women | (men AND women) OR (gender AND (NOT (cat OR pig))) | 2700 | 8% | 
| Neonates | (newborn OR neonate) AND (children OR babies) AND (NOT (mice OR rat)) | 974 | 3% | 
| Infants | infant | 3600 | 11% | 
| Race/Ethnicity | race OR ethnic OR (asian OR african OR hispanic OR or caucasian) | 6900 | 21% | 
| Sexual Orientation | "sexual orientation" OR homosexual OR transgender OR heterosexual | 250 | 1% | 
| Pregnant Women | (pregnant OR gestation OR trimester) AND (human OR man OR woman OR children OR infants OR people OR patients) | 2700 | 8% | 
| Family composition (elderlies, kids) | ("family composition" OR household) AND (human OR man OR woman OR children OR people OR patients) | 2300 | 7% | 
| Day Care Attendance | "day care" OR daycare OR ("child care" AND (center OR centre)) | 581 | 2% | 
| Crowding/Siblings | (crowding AND (household OR home)) OR (siblings AND (human OR man OR woman OR children OR infants  OR people OR person)) | 1200 | 4% | 

### 2. Diseases:

| Risk Factor | % of Papers mentioning |
|------|------|
|   Cardiac/Cerebrovascular Disease, CVD, Heart diseases, Chronic heart diseases  |  30% |
|   Acute Myocardial Injury, MI, Myocardial infarction, STEMI, NSTEMI  |  2% |
|   HTN, hypertension, high blood pressure  |  4% |
|   Cardiomyopathy, Heart Failure, LV dysfunction, RV dysfunction  |  30% |
|   Cardiac Arrhythmias, atrial fibrillation, AF, A fib  |  2% |
|   CAD, coronary artery disease  |  NaN |
|   Diabetes Mellitus, DM, DM type 1, DM type 2  |  4% |
|   Chronic pulmonary disease, chronic obstructive pulmonary disease, COPD  |  10% |
|   Asthma  |  7% |
|   Chronic kidney disease, CKD, end-stage kidney disease, ESKD  |  1% |
|   Obesity, BMI ≥ 28 kg/m  |  5% |
|   HIV, immunodeficiency, immunosuppressive medication, immunocompromised groups  |  36% |
|   Chronic liver disease, liver cirrhosis, cirrhosis  |  3% |
|   Cancer, chemotherapy  |  19% |
|   Neurological conditions: Parkinson, motor neurone disease, sclerosis, cerebral palsy  |  4% |

### 3. Environmental Factors:

| Risk Factor | Search Query | Number of papers | % of papers |
|------|------|
| Climate | climate OR (tropical OR temperate OR polar OR arid OR subtropical) | 6900 | 21% | 
| Disinfection | disinfection OR sterilization OR decontamination | 5700 | 17% | 
| Latitude | (latitude OR "degrees south" OR "degrees north") AND (human OR man OR woman OR or OR children OR or OR people OR or patient) | 377 | 1% | 
| Pollution | (pollution OR contamination) AND (human OR man OR woman OR children OR patient OR people) | 7100 | 21% | 
| Population Density | "population density" | 764 | 2% | 
| Relative Humidity | "relative humidity" | 735 | 2% | 
| Temperature | (temperature OR temp) AND (human OR man OR woman OR patient OR child OR infant OR people) | 11000 | 33% | 
| Precipitation | (precipitation OR rain OR snow OR sleet OR hail) AND (human OR man OR woman OR children OR infants OR people OR person OR patients) | 4100 | 12% | 
| Airflow and Ventilation | airflow OR "air flow" OR ventilation AND (human OR man OR woman OR children OR infants OR people OR person OR patients) | 3300 | 10% | 
| Nutrients | nutrients AND (human OR man OR woman OR children OR infants OR people OR person OR patients) | 1500 | 5% | 
| Salinity | (("salinity water ~4") OR ("salinity soil ~4") OR "salt content" OR saltiness) AND (NOT phosphate) | 73 | 0% | 
| Arctic/Inlet/Hypoxic | (arctic OR inlet OR hypoxic) AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 863 | 3% | 
| Oxygen (hyperoxia, normoxia, hypoxia) | ("oxygen level" OR normoxia OR normoxic OR hyperoxia OR hyperoxic OR hypoxia OR hypoxic) AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 1300 | 4% | 
| Chlorophyll | chlorophyll AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 94 | 0% | 
| Water supply | ("water supply" OR "water system" OR "water distribution" OR "drinking water") AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 1100 | 3% | 
| Sanitation facilities | "sanitation facility" OR "sanitation solution" OR  "sanitation technique" OR "sanitation infrastructure" OR "sanitation system" | 93 | 0% | 
| Food | ("food supply" OR "food quality ~3" OR "food contamination ~3" OR "food habit" OR "food pattern" OR "eating habits" OR diet) AND (human OR man OR woman OR children OR person OR people OR patients) | 2300 | 7% | 
| Radiation | (radiation OR radioactive) AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 2400 | 7% | 
| Ultraviolet radiation | ("ultraviolet radiation" OR "uv radiation" OR "uv index" OR "ultraviolet index") AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 253 | 1% | 
| Ionised radiation | ("ionized radiation" OR "ionised radiation" OR "ion radiation ~4") AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 155 | 0% | 



### 4. Genetic Factors

| Risk Factor | Search Query | Number of papers | % of papers |
|------|------|
| Blood type A | "blood type" | 129 | 0% | 
| Allergy | (allergy OR epipen OR anaphylaxis OR allergen OR allergic OR aeroallergen) AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 3000 | 9% | 
| Haplotype | haplotype AND (man OR woman OR children OR infants OR person OR people OR patients) | 477 | 1% | 
| T cell responses (immunodeficient) | tcell AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 339 | 1% | 
| Cell homeostasis | "cell homeostasis ~2" AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 127 | 0% | 
| Cytokine | cytokine AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 6400 | 19% | 
| Transcription factor | "transcription factor" AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 2600 | 8% | 
| Innate host defence genes | ("defense genes" OR "defence genes" OR "antiviral genes ~3" OR "host genes ~2") AND (human OR man OR woman OR children OR infants OR person OR people OR patients) | 942 | 3% | 

### 5. Lifestyle Factors
| Risk Factor | Search Query | Number of papers | % of papers |
|------|------|
|Smoking Status|smoking OR cigarette|1,900|6%|
|Alcohol Consumption|("alcohol consumption" OR "alcohol intake" OR "alcohol use") AND (human OR man OR woman OR children OR infants OR person OR people OR patients)|452|1%|
|Drugs addiction|("drugs addiction" OR "drug addiction" OR "substance abuse") AND (human OR man OR woman OR children OR infants OR person OR people OR patients)|286|0%|
|Level of stress|stress AND (level OR high OR low OR increased OR decreased OR reduced)|10,000|17%|
|Type of society (traditional, modern, individualistic, community -oriented)|("traditional society" OR "modern society" OR "individualistic society" OR "community oriented society") AND (human OR man OR woman OR children OR infants OR person OR people OR patients) |195|0%|
|Starvation|starvation AND (human OR man OR woman OR children OR infants OR person OR people OR patients)|838|1%|
|Travelling|travelling OR travels OR traveled OR travel|9,100|15%|
|Contact with wildlife|(wildlife AND contact) AND (human OR man OR woman OR children OR infants OR person OR people OR patients)|1,600|3%|
|Hygiene|(hygiene OR hygienic) AND (human OR man OR woman OR children OR infants OR person OR people OR patients)|4,600|8%|
|Water drinking|("drinking water" OR "potable water") AND (human OR man OR woman OR children OR infants OR person OR people OR patients)|1,400|2%|
|Eating habits|("eating habits" OR "food habits" OR diet OR nutrition) AND (human OR man OR woman OR children OR infants OR person OR people OR patients)|5,600|9%|
|Face touching|("face touching" OR (hands AND face)) AND (human OR man OR woman OR children OR infants OR person OR people OR patients)|5,100|9%|


### 6. Socieconomic Factors
| Risk Factor | Search Query | Number of papers | % of papers |
|------|------|
| Mental hospital patients | "mental hospital" OR ("mental health" AND (service OR institution OR support OR patient)) | 529 | 2% | 
| Access to health services |  access AND (healthcare OR "healh care" OR "health service" OR "medical care" OR "medical service")|  6,300| 10% | 
| Access to testing |  access AND testing|  18,000| 29% | 
| Housing status |  housing OR homeowner OR "home owner" OR homeless OR (property AND ("renter" OR "rental" OR "owner"))|  10,000| 17% | 
| Insurance |  insurance OR insured OR "health insurance" OR "medical insurance" OR "health coverage" OR "medical coverage"|  1,800| 3% | 
| Occupation |  occupation OR employment OR workplace OR "work place"|  5,600| 9% | 
| Religion |  religion OR "religious belief"|  386| 1% | 
| Homeless |  homeless OR housing|  9,700| 16% | 
| Healthcare workers, hospital staff<br>(first responders) |  "healthcare worker" OR "healthcare staff" OR "hospital worker" OR "hospital staff" OR "first responder" OR "first response team"|  3,600| 6% | 
| Long-term care facility residents |  "long term" AND care AND (resident OR facility)|  3,500| 6% | 
| Low-income |  "low income" OR "poor" OR "minimum wage"|  14,000| 24% | 
| Immigration |  "immigration" OR "immigrant"|  1,100| 2% | 
| Prison staff |  "prison staff" OR "correction officer"|  7| 0% | 
| Prisoners |  prisoner OR inmate OR incarcerated|  525| 1% | 
| Level of education |  "education level"|  566| 1% | 
| Governamental finances and priorities |  "government" AND "finance" AND priorities|  503| 1% |

The risk factors identified in the corpus and the respective frequencies give us a pretty good high level picture on the volume for each group:

| Category      | Total Papers per group | Average Number of Papers |
|---------------|--------|--------|
| Demographics  | 24,505      |  2451 |   
| Diseases      | 66,788      |  2154 |   
| Environmental | 50,107      |  2505 |      
| Genetic       | 14,014      |  1752 | 
| Lifestyle     | 41,071	  |  3423 |      
| Socioeconomic | 76,116      |  4757 |  

# 5. Defining Workflow <a id="task_workflow"></a>

Our current workflow is a pipeline formed from a commensuration of our methodologies and strategies used to take one search keyword and, from this simple output, generate a few papers (no more than 10) from the CORD-19 dataset that are specific and super-relevant to the keyword. Here is a high-level overview of our pipeline:

![image.png](attachment:image.png)

### The Stages in Detail
-------

**1. Produce N-gram synonyms:**
Based on the initial input (currently not one keyword but a customized array of keywords chosen by MD input), this stage utilizes an assortment of linguistic/medical packages to extend the initial input and search for synonymous/similar N-grams (unigrams, bigrams, and trigrams) in the CORD-19 database.

**2. Search for N-grams in papers:**
Based on the N-grams (output of the previous stage), this stage involves searching for the papers containing these N-grams inside the Abstract, Result, and Method sections of each paper. The search excludes body text as this section might contain rich content and lesser focus on the primary target of the research. It is also imperative that the papers also mention COVID or coronavirus-related infections for best results (coronavirus keywords are also currently inputted in the form of a custom array). After each paper undergoes a search using the N-grams list, the papers that have keyword mentions are filtered through as outputs of this stage. The number of papers that are outputted for each risk factor usually does not exceed 200 papers.

**3. Quality Assessment - Medical Annotators:**
In this stage, the output papers from the previous stage undergo a manual examination to secure their place as relevant articles in the search. We have an amazing team of medical students who are helping with the annotations. The process of the annotation is detailed below:

* Each article has associated with it the risk factor as predicted by the code.
* The annotator reads the abstract of the paper
* If the abstract contains the keyword as a risk factor (ex: “Population density is correlated with/influences ---“), then the annotator deems the paper relevant.
* If no, then the annotator reads the methods and results section of the paper
* If the methods/results section contains a mention of the keyword as a risk factor, then the annotator deems the paper relevant.
* If no, then the paper is marked as not relevant
* In addition, the annotators will comment on why they made their decision. For example, if they determined that the paper is relevant, they may state their reason as the phrase/sentence that correlated the risk factor with infection. This will be useful in Phase 2 of our pipeline.

### Degree of Automation
--------

Currently, some parts of our workflow are automated, and some parts of our workflow are not automated. Here is a summary of automation currently implemented in our workflow:

**1. Produce N-gram synonyms:**
* Input keyword(s) - Currently, this is not automated. Custom keywords synonymous to the search topic are applied as inputs to the code that does this
* Coronavirus keywords - Currently, this is not automated, but is pretty much a static array for COVID applications

**2. Search for N-grams in papers:**
The search for N-grams is completely automated and will work for any general set of N-grams that are fed in as input.

**3. Quality Assessment - Medical Annotators:**
This part of the process is not automated, but there is potential for this phase to be automated. Currently, our annotation process involves annotators commenting on why they made their decision. If we can take these reasonings and graft them into an NLP model, the model can serve as an artificial annotator to produce our golden papers in the output stage.

### Portability
--------

The clear advantage of our current workflow is that it can be used for a plethora of applications. Our team at CoronaWhy is not only focused on analyzing COVID-19, but is also interested in scaling our approach on researching COVID to other medical conditions as well. Simply plug in an array of cancer keywords as input, and a whole new field of research is analyzed by our pipeline. This is the power of our workflow: one pipeline can fit all.


# 6. Case Study: Heart Disease Risk Factors <a id="task_heart"></a>

This subsection will analyze our entire workflow with the heart disease risk factor as an example.

### Part 1: Produce N-grams

**1. Description:**

Our case study starts with the medical subdomain **“heart disease”**. The first piece of code takes the raw sentences from the v7 dataset and outputs relevant bigrams/trigrams related to risk factors of a particular subdomain in medicine. Each sentence is put through **Allen Ai's python nlp package** for biomedical/scientific/clinical data processing. The package processes each sentence, tokenizing only parts of the sentence that represents some useful named entity (NER: named entity recognition). 

Then those list of entities are processed by sci scapys "UmlsEntityLinker". This cross reference the named entity with any entity in the **Unified Medical Language System (UMLS)** which is developed by the National Institutes of Health. The purpose of this is to correctly distinguish any useful entity with something that has a medical connotation. If an entity receives a medical concept id, it then is then convert into an **ICD code**. ICD code is International Statistical Classification of Diseases and Related Health Problems, one of many standardized medical terminology bases. The useful thing about ICD code is that it presents any medical term in a **hierarchical organization**, such that anything related to heart disease will be group together and have a common code denomination.

The notebook then captures only the ICD codes that fit within the ICD code range pertaining to what the user requires bigrams/trigrams of. The **advantages** of such a complicated search for bigrams/trigrams is that the output is something that can be traced back and is also backed by actual medical terminology. Furthermore, simpler methods of ngram search wouldn’t find as many bigrams given that medical terminology don’t follow conventional naming and would be hard press to grab such granularity.

**2. Input:** 

The medical subdomain “heart disease”. In the ICD code, heart disease is in category I (for circulatory diseases) and section 5-52.

**3. Output:**

A list of n-grams that correspond to heart diseases. Some examples that are generated are “aortic valve stenosis”, “rheumatic heart disease”, and “congestive heart failure”.

Link to code: https://colab.research.google.com/drive/1bWNPqS76YYEKIXBXXC_0-QnuN8y7u3g6

### Part 2: Search for Relevant Papers

**1. Description:**

This piece of code takes all the n-grams that were produced from the last piece of code and searches each paper for these n-grams. Also, the paper must contain mentions of COVID or another coronavirus-related term in order to be considered as a relevant paper. 

**2. Input:**

The n-grams that were produced from the ICD codes for heart disease (in the last stage of the pipeline).

**3. Output:**

A list of relevant papers that contain heart disease n-grams and mentions of coronavirus infections. Other statistics such as the matching n-gram and the number of occurrences of n-grams also are displayed in output.

Link to code: https://www.kaggle.com/hmwang/riskfactors-heartdisease

### Part 3: Medical Input

**1. Description:**

In order to make sure that the papers the heart disease model has outputted are actually relevant to medical professionals, the last stage of our pipeline uses medical annotators to read through the papers and check if the paper does indeed present heart disease as a risk factor. This includes mentions of heart disease in the abstract, heart disease correlating with increased infection susceptibility, etc.

**2. Input:**

A list of papers (< 200) that the heart disease model found to be relevant by manner of keyword search.

**3. Output:**

A list of relevant papers that clearly state/imply heart disease as a risk factor to infection. From these (if applicable), a top 10 list of papers is crafted from the list of golden papers sorted by number of keyword occurrences in the previous model.

### Part 4: A Snapshot of Output

Here are our top 10 papers on heart risk disease as a risk factor (qualified by the medical output):

| Rank | Title | Paper URL | # Keyword Matches |
|------|------|------|
| 1 |  Outcome Risk Factors during Respiratory Infections in a Paediatric Ward in Antananarivo, Madagascar 2010–2012 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3771918/  |  8 |
| 2 | Coronavirus and Other Respiratory Illnesses Comparing Older with Young Adults |  https://doi.org/10.1016/j.amjmed.2015.05.034  |  6 |
| 3 | Clinical Manifestations, Laboratory Findings, and Treatment Outcomes of SARS Patients | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3323212/  |  6 |
| 4 | Human coronavirus alone or in co-infection with rhinovirus C is a risk factor for severe respiratory disease and admission to the pediatric intensive care unit: A one-year study in Southeast Brazil |  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546210/  |  4 |
| 5 | Severe Morbidity and Mortality Associated With Respiratory Syncytial Virus Versus Influenza Infection in Hospitalized Older Adults | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6603263/  |  4 |
| 6 | Relationship between circulating levels of angiotensin-converting enzyme 2-angiotensin-(1–7)-MAS axis and coronary heart disease | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7100072/  |  4 |
| 7 | Effects of hypertension, diabetes and coronary heart disease on COVID-19 diseases severity: a systematic review and meta-analysis | https://doi.org/10.1101/2020.03.25.20043133  |  4 |
| 8 | Characteristics of patients with COVID-19 during epidemic ongoing outbreak in Wuhan, China | https://doi.org/10.1101/2020.03.19.20033175  |  3 |
| 9 | Prevalence of comorbidities in cases of Middle East respiratory syndrome coronavirus: a retrospective study |  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6518603/  |  3 |
| 10 | Soluble Angiotensin Converting Enzyme 2 in Human Heart Failure: Relation with Myocardial Function and Clinical Outcomes | http://europepmc.org/articles/pmc3179261?pdf=render  |  3 |


---------

### And here are some key phrases that demonstrate the effectiveness of our output:

> "Univariate analysis showed that a severe outcome was significantly more frequent for patients with comorbidity (OR = 3.9), including prematurity or congenital disease (such as heart disease or cerebral malformation)." - **Paper #1**

> "This was a prospective, observational study conducted from November 2009 to July 2013 to assess acute respiratory illness in patients aged ≥60 years with chronic lung or heart disease or both (group 1) and in healthy young adults aged 18 to 40 years (group 2)." - **Paper #2**

> “However, use of systemic antibiotics and the presence of underlying diseases and congenital heart disease were more frequent in patients admitted to the PICU, and they also had a longer hospital length of stay” - **Paper #4**

> “Circulating ACE2 activity has been considered to be a marker of cardiovascular disease (CVD), with low levels in healthy individuals, and increased levels in those with CVD such as hypertension, heart failure and myocardial infarction.” - **Paper #6**

> “The incidence of acute respiratory distress syndrome (90% vs. 17.5%), acute liver injury (71% vs. 5.3%), cardiac injury (65% vs. 5.9%), kidney injury (43% vs. 4.6%), and secondary infection (58% vs. 6.3%) in patients who died was significantly higher than those who recovered (all p<0.001).” - **Paper #8**

> “Among the comorbidities, DM, HTN, ischemic heart disease (IHD), congestive heart failure (CHF), end-stage renal disease (ESRD) and chronic kidney disease (CKD) showed significant associations with fatality from MERS-CoV” - **Paper #9**

# 7. Our Notebooks <a id="task_notebooks"></a>

List of all notebooks/codebases that we have for risk factors categories.

Generalised code for extraction of papers regarding following risk factors :
- **Population Density**
- **Air Temperature**
- **Humidity**
- **Pollution**
- **Age** (More specifically for senior ages)

#### https://www.kaggle.com/pranjalya/coronawhy-risk-factors-analysis/


# 8. Why We Did What We Did <a id="task_reason"></a>

![moon](https://www.azquotes.com/picture-quotes/quote-we-choose-to-go-to-the-moon-in-this-decade-and-do-the-other-things-not-because-they-john-f-kennedy-34-98-64.jpg)

> We choose to go to the Moon in this decade and do the other things, not because they are easy, but because they are hard; because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one we intend to win, and the others, too. 


Our method and results are not ideal but they allowed us to produce something that we believe is useful, there are many cons to the current method but we believe that the presentation of the current results will allow us to bring awareness to the fact that answers most probably exist and we just need some more help to iterate on this.

Here's a full outline of what we did

1. Initial task exploration
2. Establishing knowledge base and ultimate list of all risk factors
3. Coming to realization that we need to prioritize and filter out factors that are underrepresented in the CORD-19
4. Further prioritization based on MD input
5. Semantic exploration of n-gram approach
6. Futher filtering out most relevant papers
7. Established steps for annotators to determine paper relevancy
8. Producing final list of the best papers based on crowdsourced human/medical input.


How we did it is another story... probably worth a Netflix documentary on how 900+ members from all over the world managed to figure out a way to work together and produce meaningful results in a very very short timeframe.

# 9. Next goal (June deadline) <a id="task_next"></a>

* Coverage of a wider range of risk factors
* Automation of the paper search process
* Implementation of the similarity-based search
* Improvement of outputs
* Presenting more data and segmentations within articles by focus of study, research method, and more

# 10. Daily calls <a id="task_calls"></a>
We operate under radical transparency and all of our meetings/calls are recorded, feel free to review the historical progress and the way we reached this stage so far:

https://trello.com/c/aDFxpxPP/12-recorded-daily-calls

# 11. Appendix <a id="task_appendix"></a>


## Data visualization of all the outputs


https://app.powerbi.com/view?r=eyJrIjoiY2E5YjFkZjItN2Q2ZS00MGI5LWFiMWQtZmY0OWRiZTlkNDVmIiwidCI6ImRjMWYwNGY1LWMxZTUtNDQyOS1hODEyLTU3OTNiZTQ1YmY5ZCIsImMiOjEwfQ%3D%3D

link to notebook:
https://www.kaggle.com/mikehoney/coronawhy-org-task-risk-factors-data-viz


## Geopatial analysis overview

![](https://drive.google.com/uc?id=1R4iTdIj33jn7T4P1XfRfjdSbBMNpMRG2)

A tool (built with Carta) for exploring geospatial relationships between COVID-19 prevalence (by cases, recovery and/or fatality) and geospatial properties (climate and demographic factors), at different geospatial resolution (country, region, etc.) and temporal resolution (dates). 

For a more complete overview of the work done by our #task-geo team please visit notebook:
https://www.kaggle.com/manuelalvarez/coronawhy-org-task-geographical-factors

## How to use
Across both climate and demographic filters, are some common controls and visualization. In the center of the screen is a map that can be zoomed in/out using the minus and plus buttons on the bottom left side of the screen.


On the left side of the map are layers that correspond to COVID-19 prevalence. Select one (and only 1) from each display group to populate the map with corresponding data (see sections below for more details). 

Each “display group” basically populates the map with different types of data. In this visualization, Group A corresponds to the COVID case statistic type (deaths, recovered, active). Group B corresponds to the specific metrics being examined (climatic or demographic, depending on the visualization). Group C is an option that can allow data to be grouped specifically by region in the country.

Date selection. Along the bottom of the screen there is a date picker. Selecting a single date will populate the map with the number of COVID cases up until that date. For example, if the date April 12 is selected, then the COVID statistics will display the accumulated cases up until April 12, and not the number of new cases/recoveries/deaths on April 12 only. Therefore, even though the visualization has a built-in slider to accommodate date range, it does not work correctly yet. Date ranges will be incorporated into future functionality. The Group B data at that given date will also be displayed (for climate data, and not applicable to demographic data).


## Climate Metrics
![](http://)

There are a variety of layers included in display group B to represent climate such as average temperature, minimum temperature, relative humidity, etc. A color spectrum is then used to mark regions depending on the value of the numerical climate metric measure. Currently, we have comprehensive climate data available for visualization only in the country of Italy (none of the other countries can be analyzed with this tool yet). The climate data for each region can be accessed by hovering the mouse over one of the colored dots inside the map view (the climate summary for the selected layer will be displayed above the cursor).

Sources:
* Corona Data Scraper - https://coronadatascraper.com/#home
* NASA Langley Research Center (LaRC) POWER project: https://power.larc.nasa.gov/

**WHAT IT DOES:*

This chart displays Italy’s  municipal regions’ covid-19 information and the corresponding climate condition for a given date.

This chart gives following INFORMATION of the region:
* Number Covid Cases
* Region Name
* Number of Active Cases
* Number of Deaths
* Number of test conducted

And ONE of the below CLIMATE conditions of that region is displayed:
* Average Temperature
* Minimum Temperature
* Maximum Temperature
* Relative Humidity
* Pressure
 
 
## Demographic Metrics
https://juancalvo.carto.com/builder/b1b0b61e-acdc-4cc1-b47f-93c4a97b664d/embed


This visualization shows various population demographic data from regional age and gender distributions to population density.
The options on the left are best when only using one selection from each group:
Group A - Age, Gender
Group B - COVID-19 cases: Total, Active, Recovered and Deaths
Group C - Regional Population Density of average number of people per square kilometer.

**Group A**
Age and gender/sex distributions:

9 age distribution brackets: 0-9, 20-29, 30-39, 30-49, 50-59, 60-69, 70-79, 80+

The order goes from top left like you would read a book. If a region had a uniform distribution of ages they would all be white dots, so this example from Emilia-Romagna shows a lower proportion of under 29’s and an over representation of 40-59 year olds.

Gender/sex distribution, 0.5 would be equal distribution. Is displayed with male on the left and female on the right, equal would make both white though the deviation is only small, on average most countries have slightly more women than men, except places like China that had long term policies that affected it.


**Group B**
These allow different options to show regional counts for COVID-19’s: Cumulative tested cases, number of active cases, number of recovered patients and number of deaths.

This is displayed as a circle in the centre of a region, the size displays the relative count.

**Group C
This option only has one choice that colours the regions to display the population density measured in people per square kilometer.


**Sources**
* Population by different regional levels: http://demo.istat.it/pop2019
* Italian communes to province mapping: https://www.istat.it/it/archivio/6789#Elencodeicodiciedelledenominazionidelleunitterritoriali-0
* Cartographic data: https://www.istat.it/it/archivio/222527


**Errata**

In the climate visualization, the humidity data is missing some data points in specific Italian regions, and in the database, if there is a missing data point, the database records the humidity value as -999 instead of NaN. We will fix this soon!




### Public health mitigation measures that could be effective for control

As a way to address this question a group of independent members of our community gathered dataset on number of hospital beds globally:

https://www.kaggle.com/ikiulian/global-hospital-beds-capacity-for-covid19

and did some correlation analysis for mortality/death rates:

![](https://drive.google.com/uc?id=1oVKus7AlAfpuhNZOB4fMnxPk-wPNeQMM)

* green dot - >0.2 beds per 1000
* yellow - >0.125
* red - less than 0.125

![](https://drive.google.com/uc?id=12NbjOFak0C1jwWf-zr_uDRop5LQoNL1n)

Current visualization is hosted here:

https://www.kaggle.com/ikiulian/simple-global-countries-visualization




As well as some correlation analysis here:
https://www.kaggle.com/hevalo/correlationanalysis


# 12. Credits <a id="task_credits"></a>

Our Task-Risk team:
https://docs.google.com/document/d/1iD4J8uBgkba9rKD6RftI34JpOc-Ds2ITCNfQZ1rF-_k/