-
Notifications
You must be signed in to change notification settings - Fork 5
/
retrieving_cansim_vectors.Rmd
82 lines (62 loc) · 4.89 KB
/
retrieving_cansim_vectors.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
title: "Retrieving individual Statistics Canada vectors"
description: >
A worked example showing how to retrieve individual data vectors instead of entire tables.
author: ""
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Retrieving individual Statistics Canada vectors}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = nzchar(Sys.getenv("COMPILE_VIG"))
)
library(cansim)
library(dplyr)
library(ggplot2)
```
### Retrieving individual vectors
Many of the time-series data available from Statistics Canada have individual vector codes. These vector codes follow a naming format of a lower-case "v" and an identifying numbers. Time-series tables will often bundle many series together, resulting in large and sometimes unwieldy files. Many users of Canadian statistical data, who are often concerned with specific time series such as the CPI or international arrivals, will typically know the exact series they need. For this reason, the `cansim` package also provides two functions to make it easier to retrieve individual vectors: `get_cansim_vector()` and `get_cansim_vector_for_latest_periods()`.
### get_cansim_vector()
Running `search_cansim_cubes("consumer price index")` shows `r search_cansim_cubes("consumer price index") %>% nrow()` tables as results. However, if you are tracking the Canadian Consumer Price Index (CPI) over time, you might already know the Statistics Canada vector code the seasonally-unadjusted all-items CPI value: *v41690973*. To retrieve just this data series on its own without all of the additional data available in related tables, we can use the `get_cansim_vector()` function with the vector code and the date onwards from which we want to get vector results for.
```{r}
get_cansim_vector("v41690973","2015-01-01")
```
The call to `get_cansim_vector` takes three inputs: a string code (or codes) for `vectors`, a `start_time` in YYYY-MM-DD format, and an optional value for `end_time`, also in YYYY-MM-DD format. By default, the `start_time` and `end_time` vectors uses Statistics Canada's reference periods ("REF_DATE") for selecting the date range of the data for retrieved vectors. There are a few optional input parameters for this function. If `end_time` is not provided, the call will use the current date as the default series end time. If the optional parameter `use_ref_date` is set to `FALSE`, then vector retrieval will instead filter on the release date of the vector itself.
Vectors can be coerced into a list object in order to retrieve multiple series at the same time. For example, provincial seasonally-unadjusted CPI values have their own vector codes. The vector code for British Columbia all-items CPI is *v41692462*.
The below code retrieves monthly Canadian and BC CPI values for the period January 2015 to December 2017 only. Monthly data series are always dated to the first day of the month.
```{r}
vectors <- c("v41690973","v41692462")
get_cansim_vector(vectors, "2017-01-01")
```
### get_cansim_vectors_for_latest_periods()
Some vectors extend backwards for a significant number of periods that may not be of interest. `get_cansim_vectors_for_lates_periods()` is a wrapper around `get_cansim_vectors` that takes a `periods` input instead of arguments for `start_time` and `end_time`, and provides data for the selected vector(s) for the last `n` periods for which data is available, irrespective of dates.
```{r}
get_cansim_vector_for_latest_periods("v41690973", periods = 60)
```
### Naming vector series
In these examples, we have used *v41690973* for Canada and *v41692462* for BC. This can be hard to remember and can get annoying to work with. Both vector retrieval functions in the `cansim` package allow for named vector extraction. This works by providing a user-determined string directly into a `get_*` call. This may be useful when working with table code and vector codes that do not have any information in their name and become easy to lose track of.
### Normalizing data
Data retrieved as vectors also gains the additional `val_norm` column with normalized values.
### Putting it all together
This quick example uses a list with two named vectors and a starting date as an input value, converts values ("normalizes") on the fly, and prepares a simple `ggplot2` graphic.
```{r}
vectors <- c("Canadian CPI"="v41690973",
"BC CPI"="v41692462")
data <- get_cansim_vector(vectors, "2010-01-01")
library(ggplot2)
ggplot(data,aes(x=Date,y=val_norm,color=label)) +
geom_line() +
labs(title="Consumer Price Index, January 2010 to September 2018",
subtitle = "Seasonally-unadjusted, all-items (2002 = 100)",
caption=paste0("CANSIM vectors ",paste0(vectors,collapse = ", ")),x="",y="",color="")
```
To access metadata for vectors we can use the `get_cansim_vector_info` call
```{r}
get_cansim_vector_info(vectors)
```