Skip to content

jscriptcoder/Coronavirus-2019-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coronavirus 2019 Analysis

Image from https://www.heywood.org/education/covid-19-coronavirus-updates

The datasets come from Kaggle: Novel Corona Virus 2019 Dataset

Visit Project Report for more details.

Context

From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

Edited: Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

Content

2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

The data is available from 22 Jan, 2020.

Column Description

Main file in this dataset is covid_19_data.csv and the detailed descriptions are below.

covid_19_data.csv

  • Sno - Serial number
  • ObservationDate - Date of the observation in MM/DD/YYYY
  • Province/State - Province or state of the observation (Could be empty when missing)
  • Country/Region - Country of observation
  • Last Update - Time in UTC at which the row is updated for the given province or country. Not standardised, so please clean before using it
  • Confirmed - Cumulative number of confirmed cases till that date
  • Deaths - Cumulative number of of deaths till that date
  • Recovered - Cumulative number of recovered cases till that date

Added two new files with individual level information:

COVID_open_line_list_data.csv. This file is obtained from this link

COVID19_line_list_data.csv. This files is obtained from this link

Country level datasets

If you are interested in knowing country level data, please refer to the following Kaggle datasets: South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

Acknowledgements

Keeping track of infections

About

Understanding Coronavirus 2019 pandemic

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors