Skip to content
report1
Go to file
Code
This branch is 23 commits behind master.

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
R
 
 
 
 
 
 
man
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Data for the 2019-nCoV outbreak

On 30th of January, 2020, The World Health Organization has declared the 2019-nCoV outbreak a Public Health Emergency of International Concern. The outbreak originated from my hometown, Wuhan.

This dataset is collected from public agencies or news media, containing information about the 2019-nCov cases confirmed in and outside China. This dataset is free to use and share given that appropriate credits are given (see the license). It can be loaded in R as a package:

devtools::install_github("qingyuanzhao/2019-nCov-Data")
library(nCoV2019.data)
data(cases.in.china)
data(cases.outside.china)

A preliminary report is avaiable as a preprint on medRxiv. This report is currently being peer-reviewed. It is based on the analysis in Feb6.R and Feb6.Rmd (results available here).

To Contribute

I am hoping this can become a collaborative and transparent project by people across the world. You can contribute by either updating the dataset or by forming a team to analyze the dataset.

Dataset

There are two ways to contribute to building this dataset (please only use publicly available information that I can confirm):

  1. You can suggest comments in this Google Spreadsheet. The easiest way to help is to pick a random row and verify the information is correct by reading the link source. Record your result by commenting on the "Verified" cell in that row. Currently I am having a hard time to obtain detailed information for cases in Australia, France, Thailand, United Kingdom, United States, and Vietnam.

  2. You can also use the Issues to record information for new cases. Make sure you read the lessons below before posting.

Lessons I learned when building this dataset:

  1. News articles don't always report the cases in the same order. It's useful to record the nationality/residence, gender and age of the cases to distinguish them.
  2. The most useful columns for data analysis are
  • Outside (if the case is infected outside Wuhan). "Y" means yes, "L" means likely, empty means (almost certainly) no.
  • Infected (when the case was initially infected). This is rarely available, but anything (for example an interval) can help.
  • Arrive (when the case first arrived in the country/region). This is helpful to narrow down the infection time.
  • Symptom (when the case first showed symptom). This is useful because we can impute the infected time if we know the distribution of the incubation period.
  1. Make sure to record the URL to your source so everyone can verify.

Analysis

If you would like to form a team to analyze this dataset, please register your interest #1. You are also welcome to use it for your own research.

Please use the GitHub Issues to make any long suggestion or discussion. You may want to first read the report of a preliminary analysis. Some problems that we can all think about include:

  • How to impute the missing values #3.
  • How to better model the dynamics (from infection, international arrival, symptom onset, initial medical visit to case confimation) recorded in the dataset #4.
  • How to incorporate the lockdown on 23rd of January in the model #5.

Update: Febraury 11th

Major update to the GitHub repository

  • The project has been restructured as a R package.

Major update to the dataset

  • Now more than 300 cases in China.

Update: February 3rd

Major update to the dataset

  • Included a new dataset for cases in China (suggested by Cindy Chen). Elsa Yang have recorded 76 cases in Hefei that are confirmed by February 2nd.
  • Updated most countries to February 3rd.

New report

Please click here to view the report. This analysis fits an exponential growth model to infection time imputed using symptom onset and reported incubation interval. This report has NOT been peer-reviewed and extra caution is required to interpret the results.

Update: February 1st

Major update to the dataset

  • Added cases in Hong Kong and Macau (suggested by Cindy Chen).
  • The "Hospital" column has been splitted to "Initial" and "Hospital". The "Initial" column record when the patient first went (or was taken) to an outpatient clinic or emergency room after developing symptoms. The "Hospital" column records if the patient was not admitted immediately during the first visit, when he/she was eventually admitted to an hospital. This split has only been done for Japan, Singapore, Taiwan, Hong Kong, Macau, South Korea.

New preliminary analysis

Please click here to view the report. This report has NOT been peer-reviewed and extra caution is required to interpret the results.

About

Data and analysis for the early COVID-19 outbreak

Topics

Resources

License

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.