Skip to content

indirection/COVID-19_Unified-Dataset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unified COVID-19 Dataset

Copyright: © 2020 JHU) Credits: NASA/NIH Citation: Badr et. al 2020 License: GPL v3 GitHub Commit

This is a unified COVID-19 dataset to fulfill the following objectives:

  • Mapping all geospatial units globally into a unique standardized ID.
  • Standardizing administrative names and codes at all levels.
  • Standardizing dates, data types, and formats.
  • Unifying variable names, types, and categories.
  • Merging data from all credible sources at all levels.
  • Cleaning the data and fixing confusing entries.
  • Integrating hydrometeorological variables at all levels.
  • Integrating population-weighted hydrometeorological variables.
  • Integrating policy data from Oxford government response tracker.
  • Integrating an augmented version from all sources (future releases).
  • Optimizing the data for machine learning applications.

Coverage Map

COVID-19 Coverage

Geospatial ID

COVID-19 ID

Note that COVID-19 data for some European countries from Johns Hopkins University (JHU) Center for Systems Science and Engineering (CSSE) are reported in the global daily reports at province level, which will be replaced by higher-resolution data at NUTS 0-3 levels.

Lookup Table

Column Type Description
ID Character Geospatial ID, unique identifier (described above)
Level Character Geospatial level (e.g., Country, Province, State, County, District, and NUTS 0-3)
ISO1_3N Character ISO 3166-1 numeric code, 3-digit, administrative level 0 (countries)
ISO1_3C Character ISO 3166-1 alpha-3 code, 3-letter, administrative level 0 (countries)
ISO1_2C Character ISO 3166-1 alpha-2 code, 2-letter, administrative level 0 (countries)
ISO2 Character ISO 3166-2 code, principal subdivisions (e.g., provinces and states)
ISO2_UID Character ISO 3166-2 code, principal subdivisions (e.g., provinces and states), full unique ID
FIPS Character Federal Information Processing Standard (FIPS, United States)
NUTS Character Nomenclature of Territorial Units for Statistics (NUTS, Europe)
AGS Character Official municipality key / Amtlicher Gemeindeschlüssel (AGS, German regions only)
IBGE Character Brazilian municipality code, Brazilian Institute of Geography and Statistics
ZTCA Character ZIP Code Tabulation Area (ZCTA, United States)
Longitude Double Geospatial coordinate (centroid), east–west
Latitude Double Geospatial coordinate (centroid), north–south
Population Integer Total population of each geospatial unit
Admin Integer Administrative level (0-3)
Admin0 Character Standard name of administrative level 0 (countries)
Admin1 Character Standard name of administrative level 1 (e.g., provinces, states, groups of regions)
Admin2 Character Standard name of administrative level 2 (e.g., counties, municipalities, regions)
Admin3 Character Standard name of administrative level 3 (e.g., districts and ZTCA)
NameID Character Full name ID of combined administrative levels, unique identifier

COVID-19 Data Structure

Column Type Description
ID Character Geospatial ID, unique identifier (described above)
Date Date Date of data record
Cases Integer Number of cumulative cases
Cases_New Integer Number of new daily cases
Type Character Type of the reported cases
Age Character Age group of the reported cases
Sex Character Sex/gender of the reported cases
Source Character Data source: JHU, CTP, NYC, NYT, SES, DPC, RKI, JRC

Case Types

Type Description
Active Active cases
Confirmed Confirmed cases
Deaths Deaths
Home_Confinement Home confinement / isolation
Hospitalized Total hospitalized cases excluding intensive care units
Hospitalized_Now Currently hospitalized cases excluding intensive care units
Hospitalized_Sym Symptomatic hospitalized cases excluding intensive care units
ICU Total cases in intensive care units
ICU_Now Currently in intensive care units
Negative Negative tests
Pending Pending tests
Positive Positive tests, including hospitalised cases and home confinement
Positive_Dx Positive cases emerged from clinical activity / diagnostics
Positive_Sc Positive cases emerging from surveys and tests
Recovered Recovered cases
Tested Cases tested = Tests - Pending
Tests Total performed tests
Ventilator Total cases receiving mechanical ventilation
Ventilator_Now Currently receiving mechanical ventilation

Hydromet Data Structure

Column Type Unit Description
ID Character Geospatial ID, unique identifier (described above)
Date Date Date of data record
T Double °C Daily average near-surface air temperature
Tmax Double °C Daily maximum near-surface air temperature
Tmin Double °C Daily minimum near-surface air temperature
Td Double °C Daily average near-surface dew point temperature
Tdd Double °C Daily average near-surface dew point depression
RH Double % Daily average near-surface relative humidity
SH Double kg/kg Daily average near-surface specific humidity
MA Double % Daily average moisture availability (NLDAS)
RZSM Double kg/m2 Daily average root zone soil moisture content (NLDAS)
SM Double kg/m2 Daily average soil moisture content (NLDAS)
SM1 Double m3/m3 Daily average volumetric soil water layer 1 (ERA5)
SM2 Double m3/m3 Daily average volumetric soil water layer 2 (ERA5)
SM3 Double m3/m3 Daily average volumetric soil water layer 3 (ERA5)
SM4 Double m3/m3 Daily average volumetric soil water layer 4 (ERA5)
SP Double Pa Daily average surface pressure
SR Double J/m2 Daily average surface downward solar radiation (ERA5)
SRL Double W/m2 Daily average surface downward longwave radiation flux (NLDAS)
SRS Double W/m2 Daily average surface downward shortwave radiation flux (NLDAS)
LH Double J/m2 Daily average surface latent heat flux (ERA5)
LHF Double W/m2 Daily average surface latent heat flux (NLDAS)
PE Double m Daily average potential evaporation / latent heat flux (ERA5)
PEF Double W/m2 Daily average potential evaporation / latent heat flux (NLDAS)
P Double mm/day Daily total precipitation
U Double m/s Daily average 10-m above ground Zonal wind speed
V Double m/s Daily average 10-m above ground Meridional wind speed
HydrometSource Character Data source: ERA5, NLDAS ± CIESIN*
  • The hydromet data with _CIESIN suffix in HydrometSource are population-weighted using Gridded Population of the World (GPW), hosted by Center for International Earth Science Information Network (CIESIN).

Policy Data Structure

Column Type Description
ID Character Geospatial ID, unique identifier (described above)
Date Date Date of data record
PolicyType Character Type of the policy
PolicyValue Double Value of the policy
PolicyFlag Logical Logical flag for geographic scope
PolicyNotes Character Notes on the policy record
PolicySource Character Data source: OxCGRT

Policy Types

Type Description
CX Containment and closure policies
C1 School closing
C2 Workplace closing
C3 Cancel public events
C4 Restrictions on gatherings
C5 Close public transport
C6 Stay at home requirements
C7 Restrictions on internal movement
C8 International travel controls
EX Economic policies
E1 Income support
E2 Debt/contract relief
E3 Fiscal measures
E4 International support
HX health system policies
H1 Public information campaigns
H2 Testing policy
H3 Contact tracing
H4 Emergency investment in healthcare
H5 Investment in vaccines
H6 Investment in vaccines
H7 Vaccination policy
MX Miscellaneous policies
M1 Wildcard
IX Policy indices
I1 Containment health index
I2 Economic support index
I3 Government response index
I4 Stringency index
IC Confirmed cases
ID Confirmed deaths
IXD Policy indices (Display)
IXL Policy indices (Legacy)
IXLD Policy indices (Legacy, Display)

For more details, see OxCGRT's codebook, index methodology, interpretation guide, and subnational interpretation.

Data Sources

Source Description Level
JHU Johns Hopkins University CSSE Global & County/State, United States
CTP The COVID Tracking Project State, United States
NYC New York City Department of Health and Mental Hygiene ZCTA/Borough, New York City
NYT The New York Times County/State, United States
SES Monitoring COVID-19 Cases and Deaths in Brazil Municipality/State/Country, Brazil
DPC Italian Civil Protection Department NUTS 0-3, Italy
RKI Robert Koch-Institut, Germany NUTS 0-3, Germany
JRC Joint Research Centre Global & NUTS 0-3, Europe
ERA5 The fifth generation of ECMWF reanalysis All levels
NLDAS North American Land Data Assimilation System County/State, United States
CIESIN C. for International Earth Science Information Net. Global gridded population
OxCGRT Oxford COVID-19 Government Response Tracker National (global) & subnational (US, UK)

Credits

This work is supported by NASA Health & Air Quality project 80NSSC18K0327, under a COVID-19 supplement, and National Institute of Health (NIH) project 3U19AI135995-03S1 ("Consortium for Viral Systems Biology (CViSB)"; Collaboration with The Scripps Research Institute and UCLA).

Citation

To cite this dataset:

Badr, H. S., B. F. Zaitchik, G. H. Kerr, J. M. Colston, P. Hinson, Y. Chen, N. H. Nguyen, M. Kosek, H. Du, E. Dong, M. Marshall, K. Nixon, and L. M. Gardner, 2020: Unified COVID-19 Dataset.

About

Unified COVID-19 Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published