-
Notifications
You must be signed in to change notification settings - Fork 18
/
slowstart.Rmd
157 lines (98 loc) · 5.99 KB
/
slowstart.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
title: "Package overview"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Package overview}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = FALSE
)
```
## Installing and loading the package
The package can either be installed from CRAN, from our [`r-universe`](https://epiforecasts.r-universe.dev/ui) repository, or from GitHub. See the README for details. Once installed load the package using the following,
```{r, eval=TRUE}
library(covidregionaldata)
```
## Worldwide data
### Accessing national data
Both the World Health Organisation (WHO) and European Centre for Disease Control (ECDC) provide worldwide national data. Access national level data for any country using:
```{r, eval=FALSE}
get_national_data()
```
This returns daily new and cumulative (total) cases, and where available, deaths, hospitalisations, and tests. For a complete list of variables returned, see section 5, "Data glossary" below. See the documentation (`?get_national_data`) for details of optional arguments.
Data is returned with no gaps in the structure of the data by country over time, and NAs fill in where data are not available.
## Sub-national time-series data
### Accessing sub-national data
Access sub-national level data for a specific country over time using `get_regional_data()`. Use `get_available_datasets()` to explore the currently supported sub-national datasets and select the data set of interest using the `country` (selects the country of interest), and `level` (selects the spatial scale of the data) arguments of `get_regional_data`.
This function returns daily new and cumulative (total) cases, and where available, deaths, hospitalisations, and tests. For a complete list of variables returned, see section 5, "Data glossary" below. See the documentation (`?get_regional_data`) for details of optional arguments.
As for national level data any gaps in reported data are filled with NAs.
For example, data for France Level 1 regions over time can be accessed using:
```{r}
get_regional_data(country = "france")
```
This data then has the following format:
```{r, echo=FALSE, eval=TRUE, message=FALSE}
start_using_memoise()
knitr::kable(
tail(get_regional_data(country = "france"), n = 5)
)
```
Alternatively, the same data can be accessed using the underlying class as follows (the France object now contains data at each processing step and the methods used at each step),
```{r, eval=FALSE, message=FALSE}
france <- France$new(get = TRUE)
france$return()
```
### Level 1 and Level 2 regions
All countries included in the package (see below,"Coverage") have data for regions at the admin-1 level, the largest administrative unit of the country (e.g. state in the USA). Some countries also have data for smaller areas at the admin-2 level (e.g. county in the USA).
Data for Level 2 units can be returned by using the `level = "2"` argument. The dataset will still show the corresponding Level 1 region.
An example of a country with Level 2 units is France, where Level 2 units are French departments:
```{r}
get_regional_data(country = "france", level = "2")
```
This data again has the following format:
```{r, echo=FALSE, eval=TRUE, message=FALSE}
knitr::kable(
tail(get_regional_data(country = "france", level = "2"), n = 5)
)
```
### Totals
For totalled data up to the most recent date available, use the `totals` argument.
```{r}
get_regional_data("france", totals = TRUE)
```
This data now has no date variable and reflects the latest total:
```{r, echo=FALSE, eval=TRUE, message=FALSE}
knitr::kable(
tail(get_regional_data(country = "france", totals = TRUE), n = 5)
)
```
## Data glossary
#### Subnational data
The data columns that will be returned by `get_regional_data()` are listed below.
To standardise across countries and regions, the columns returned for each country will _always_ be the same. If the corresponding data was missing from the original source then that data field is filled with NA values (or 0 if accessing totals data).
Note that Date is not included if the `totals` argument is set to TRUE. Level 2 region/level 2 region code are not included if the `level = "1"`.
* `date`: the date that the counts were reported (YYYY-MM-DD).
* `level_1_region`: the level 1 region name. This column will be named differently for different countries (e.g. state, province).
* `level_1_region_code`: a standard code for the level 1 region. The column name reflects the specific administrative code used. Typically data returns the iso_3166_2 standard, although where not available the column will be named differently to reflect its source.
* `level_2_region`: the level 2 region name. This column will be named differently for different countries (e.g. city, county).
* `level_2_region_code`: a standard code for the level 2 region. The column will be named differently for different countries (e.g. `fips` in the USA).
* `cases_new`: new reported cases for that day.
* `cases_total`: total reported cases up to and including that day.
* `deaths_new`: new reported deaths for that day.
* `deaths_total`: total reported deaths up to and including that day.
* `recovered_new`: new reported recoveries for that day.
* `recovered_total`: total reported recoveries up to and including that day.
* `hosp_new`: new reported hospitalisations for that day.
* `hosp_total`: total reported hospitalisations up to and including that day (note this is cumulative total of new reported, _not_ total currently in hospital).
* `tested_new`: tests for that day.
* `tested_total`: total tests completed up to and including that day.
#### National data
In addition to the above, the following columns are included when using `get_national_data()`.
* `un_region`: country geographical region defined by the United Nations.
* `who_region`: only included when `source = "WHO"`. Country geographical region defined by WHO.
* `population_2019`: only included when `source = "ECDC"`. Total country population estimate in 2019.