Skip to content

Submission Format

Paolo Milano edited this page Feb 26, 2024 · 19 revisions

Each forecast should be stored as a comma-separated value (CSV) file in your model-output/team-model folder.

The CSV file must use a standardised file name, and contain specific variable names and values which identify the forecast you are submitting. This allows us to evaluate and compare across forecasts. The automatic check validates both the filename and file contents to ensure the file can be used in the visualization and ensemble forecasting.

File name

Each forecast file within the subdirectory should have the following name format:

YYYY-MM-DD-team-model.csv

The date YYYY-MM-DD is the origin date of the forecast (i.e., last day of submission window). The team and model in this file name must match the name of the model-output directory this file is in (and correspond to the team_abbr and model_abbr parameters in the metadata file).

File format

Required variables

The CSV file must be contain only the following columns (in any order). No additional columns are allowed.

column column type description
origin_date date Date as YYYY-MM-DD, last day of submission window (Wednesday)
target string Fixed value: "ILI incidence"
target_end_date date Date in format YYYY-MM-DD: the last day of the target week (Sunday)
horizon integer Week ahead from -1 to 4, i.e. target week of the forecast, starting from the week corresponding to the last ERVISS data update
location string An ISO-2 or ISO 3166-2:GB country code
output_type string One of "quantile" or "median"
output_type_id string When output_type = "quantile", one of the 23 accepted quantiles. When output_type = "median" shall be an empty string
value decimal The forecasted incidence, a non-negative number of new ILI cases per $100,000$ in the target week and output type specified

Notes on each variable

origin_date

This should correspond with the date in the filename: see above. The date must use the format YYYY-MM-DD and represents the origin date of the forecast (i.e., last day of submission window). Note: A file with origin_date and target_end_date for each submission round is provided here

target

Values in the target column must be a character (string) equal to "ILI incidence"

target_end_date

Values in the target_end_date column must be a date in the format YYYY-MM-DD.

This is the date for the forecast target and will be the Sunday at the end of the week time period. We provide a template CSV to convert between an ISO week and its end date.

Note: A file with origin_date and target_end_date for each submission round is provided here

horizon

Values in the horizon column must be an integer indicating the week ahead to which the forecast is referred. The horizon is computed with respect to the week of the last data update. Consult the forecasting_weeks file for a correlation between the origin_date and the dates to which the horizons are related. Beginning from forecasting round 11 of the 2023-2024 season, we've introduced the inclusion of horizons 0 and -1. These horizons are also calculated in relation to the week of the last available ground truth target data point.

We use the ISO week format. Each week starts on Monday and ends on Sunday. For more details check the template file for CSV files converting between dates and ISO weeks.

location

Values in the location column must be one of the ISO 3166-1 alpha-2 (ISO-2) geocodes for EU countries or an ISO 3166-2:GB extended geocode for UK countries. We provide a geocode file to convert between country names and ISO-2 codes or, if using R, you can use the countrycode package.

output_type

Values in the output_type column are one of

  • “median”
  • “quantile”

This value indicates whether that row corresponds to a median forecast or a quantile forecast. Median forecasts are used in visualization as point values, while quantile forecasts are used in visualisation and in ensemble construction, as long as all the quantiles given below are present.

Forecasts must include exactly 1 “median” forecast for each unique combination of location, target, horizon.

output_type_id

When output_type is set to “quantile”, then output_type_id must be one of the 23 accepted quantiles in the format "0.###"". Teams should provide the following 23 quantiles:

c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99)

i.e.

0.010 0.025 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500 
0.550 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990

When output_type is set to median, output_type_id shall be an empty string.

value

Values in this column should be non-negative decimal value.

  • For a “median” prediction, value is simply the value of that point prediction for the target, location, and horizon associated with that row.
  • For a “quantile” prediction, value is the inverse of the cumulative distribution function (CDF) for the target, location, horizon, and quantile associated with that row.

Example

The following shows a few lines from an example CSV file complying with the required format:

origin_date,target,target_end_date,horizon,location,output_type,output_type_id,value
2023-12-05,ILI incidence,2023-12-03,1,IT,quantile,0.975,0.973104
2023-12-05,ILI incidence,2023-12-10,2,IT,quantile,0.975,0.982182
2023-12-05,ILI incidence,2023-12-17,3,IT,quantile,0.975,0.99
2023-12-05,ILI incidence,2023-12-24,4,IT,quantile,0.975,1.084212
2023-12-05,ILI incidence,2023-12-03,1,IT,quantile,0.250,0.5046
2023-12-05,ILI incidence,2023-12-03,1,IT,median,,0.701233