Leggilo in italiano (Read it in Italian).
NOTICE (2021/01/23): The workflow for updating the data was disabled, as anticipated in the previous notice.
NOTICE (2020/12/22): ISS has recently started to release daily data in xlsx format. This repository will still be updated by the automatic workflow for some months (or until it stops working).
This repository contains datasets about the number of Italian Sars-CoV-2 confirmed cases and deaths disaggregated by age group and sex. The data is (automatically) extracted from pdf reports (like this) published by Istituto Superiore di Sanità (Italian National Institute of Health), ISS in short. A link to the most recent report can be found in this page under section "Documento esteso".
Reports were originally published by ISS twice per week; since april, they are published only once per week.
This repository is automatically updated by a GitHub workflow that is run regularly (see the workflow file for more details).
In iccas-python, you can find a Python package for downloading, processing and visualizing the data. It also contains a bunch of Jupyter notebooks with tables and charts that you can also run on Binder by clicking here or on the badge at the top of the page.
-
2020/11/30:
- published iccas-python
-
2020/10/07:
- the
date
column now includes the hour (in ISO format,yyyy-mm-ddThh:mm
). - the
date
column was added to all datasets by date; of course, it contains a unique duplicated datetime.
- the
The data
folder is structured as follows:
data
├── by-date
│ └── iccas_{date}.csv Dataset with cases/deaths updated to a specific {date}
├── util
│ ├── italian_population_by_age_2020.csv [1]
│ └── italian_population_by_age_group_2020.csv [1]
└── iccas_full.csv Concatenation of all datasets iccas_{date}.csv
[1] Source: ISTAT.
All numerical values are relative to the first two fields: the date and the age group.
Below, {sex}
can be male
or female
.
Column | Description |
---|---|
date |
Italian local time in ISO-8601 format yyyy-mm-ddThh:mm |
age_group |
Values: "0-9", "10-19", ..., "80-89", ">=90", "unknown" |
cases |
Number of confirmed cases (including cases of unknown sex) since the start of the pandemic |
deaths |
Number of deaths (including cases of unknown sex) since the start of the pandemic |
{sex}_cases |
Number of {sex} cases since the start of the pandemic |
{sex}_deaths |
Number of {sex} cases ended up in death since the start of the pandemic |
cases_percentage |
100 * cases_in_age_group / all_cases |
deaths_percentage |
100 * deaths_in_age_group / all_deaths |
fatality_rate |
100 * deaths / cases |
{sex}_cases_percentage |
100 * {sex}_cases / (male_cases + female_cases) |
{sex}_deaths_percentage |
100 * {sex}_deaths / (male_deaths + female_deaths) |
{sex}_fatality_rate |
100 * {sex}_deaths / {sex}_cases |
-
The sum of
male_cases
andfemale_cases
is notcases
, since this also includes cases of unknown sex. -
The sum of
male_deaths
andfemale_deaths
is notdeaths
, since this also includes deaths of unknown sex. -
In computing
cases_percentage
, the denominator (all_cases
) includes cases of unknown age; if you are interested in estimating the age distribution of cases, you should instead ignore cases of unknown age. -
The same reasoning of the previous point applies to
deaths_percentage
.