A terribly-simple database for numeric time series, written purely in R, so no external database-software is needed. Series are stored in plain-text files (the most-portable and enduring file type) in CSV format. Timestamps are encoded using R’s native numeric representation for ‘Date’/’POSIXct’, which makes them fast to parse, but keeps them accessible with other software. The package provides tools for saving and updating series in this standardised format, for retrieving and joining data, for summarising files and directories, and for coercing series from and to other data types (such as ‘zoo’ series).
- no setup needed, no system dependencies (i.e. external software, such as a database)
- completely portable; moving from one computer to another requires no effort other than copying the files (the only thing to take care of is file encoding if non-ASCII column names are used)
- data usable by other software
- tsdb is potentially slow
- no multi-user support; no access-rights management (other than that provided by the OS)
- no network protocols
We first load the package.
library("tsdb")
Start by creating time-series data.
library("zoo")
z <- zoo(1:5, as.Date("2016-1-1") + 0:4)
z
2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05 1 2 3 4 5
To store these data, we need to enforce a consistent
format, which the functions ts_table
and
as.ts_table
do.
ts <- as.ts_table(z, columns = "A")
ts
5 rows [2016-01-01 -> 2016-01-05]: A
Note that we had to provide a column name (A
) for the
data. This is not optional. It is one of the things
that ts_table
enforces. Another is that timestamps
need to be of class Date
or POSIXct
.
To store the data to a file, use write_ts_table
. The
function will take a directory and file name as
arguments, which mimics the hierarchy of databases and
tables in a classical database.
write_ts_table(ts, dir = "~/tsdb/daily", file = "example1")
The written file will look like this:
"timestamp","A" 16801,1 16802,2 16803,3 16804,4 16805,5
You may notice that the dates have been replaced by numbers. The mapping between these numbers and calendar times is described later, when we discuss the representation of timestamps. (But if you can’t wait: it is the number of days since 1 January 1970.)
Let us write a second file. This time, we use
ts_table
directly.
x <- array(1:20, dim = c(10, 2))
colnames(x) <- c("A", "B")
x
A B [1,] 1 11 [2,] 2 12 [3,] 3 13 [4,] 4 14 [5,] 5 15 [6,] 6 16 [7,] 7 17 [8,] 8 18 [9,] 9 19 [10,] 10 20
ts_table(x, timestamp = as.Date("2016-1-1") + 0:9)
10 rows [2016-01-01 -> 2016-01-10]: A, B
We can also explicitly specify the column names, which will override the column names of the data. In fact, this is the preferred way, since it makes things more explicit (which usually means safer).
ts <- ts_table(x, timestamp = as.Date("2016-1-1") + 0:9,
columns = c("B", "A"))
ts
10 rows [2016-01-01 -> 2016-01-10]: B, A
We write the data to a file example2
.
write_ts_table(ts, dir = "~/tsdb/daily", file = "example2")
The written file looks like this:
"timestamp","B","A" 16801,1,11 16802,2,12 16803,3,13 16804,4,14 16805,5,15 16806,6,16 16807,7,17 16808,8,18 16809,9,19 16810,10,20
Use the function read_ts_tables
.
read_ts_tables("example1", dir = "~/tsdb/daily", columns = "A")
The default return value is a list with components
data
, timestamp
, columns
and file.path
.
$data A [1,] 1 [2,] 2 [3,] 3 [4,] 4 [5,] 5 $timestamp [1] "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04" "2016-01-05" $columns [1] "A" $file.path [1] "~/tsdb/daily/example1::A"
More convenient may be to specify a return.class
.
read_ts_tables("example1", dir = "~/tsdb/daily", columns = "A",
return.class = "zoo")
~/tsdb/daily/example1::A 2016-01-01 1 2016-01-02 2 2016-01-03 3 2016-01-04 4 2016-01-05 5
read_ts_tables("example1", dir = "~/tsdb/daily", columns = "A",
return.class = "data.frame")
timestamp ~/tsdb/daily/example1::A 1 2016-01-01 1 2 2016-01-02 2 3 2016-01-03 3 4 2016-01-04 4 5 2016-01-05 5
With tsdb
before version 0.7, read_ts_tables
would
per default only have returned values for non-weekend
days. (tsdb
was written with financial data in mind,
and on weekends there are no prices.) This behaviour is
controlled by argument drop.weekends
, which defaults
to FALSE
.
weekdays(as.Date("2016-1-1")+0:4)
[1] "Friday" "Saturday" "Sunday" "Monday" "Tuesday"
To obtain data for weekends as well, specify the
argument drop.weekends
.
read_ts_tables("example1", dir = "~/tsdb/daily",
columns = "A",
return.class = "data.frame",
drop.weekends = TRUE)
timestamp ~/tsdb/daily/example1::A 1 2016-01-01 1 2 2016-01-04 4 3 2016-01-05 5
You may have noticed a small difference in the names of the functions for reading and writing. We always write a single table, but we read tables.
read_ts_tables(c("example1", "example2"),
dir = "~/tsdb/daily",
columns = "A",
return.class = "data.frame")
timestamp ~/tsdb/daily/example1::A ~/tsdb/daily/example2::A 1 2016-01-01 1 11 2 2016-01-02 2 12 3 2016-01-03 3 13 4 2016-01-04 4 14 5 2016-01-05 5 15 6 2016-01-06 NA 16 7 2016-01-07 NA 17 8 2016-01-08 NA 18 9 2016-01-09 NA 19 10 2016-01-10 NA 20
The column names of the returned object consist of the
filepaths and the column, which may be more information
than we actually want. The argument column.name
specifies the format; its default is
%dir%/%file%::%column%
.
read_ts_tables(c("example1", "example2"),
dir = "~/tsdb/daily",
columns = "A",
return.class = "data.frame",
column.name = "%file%/%column%")
timestamp example1/A example2/A 1 2016-01-01 1 11 2 2016-01-02 2 12 3 2016-01-03 3 13 4 2016-01-04 4 14 5 2016-01-05 5 15 6 2016-01-06 NA 16 7 2016-01-07 NA 17 8 2016-01-08 NA 18 9 2016-01-09 NA 19 10 2016-01-10 NA 20
Missing values are by default set to NA
. That happens
even for missing columns, with a warning though.
read_ts_tables(c("example1", "example2"),
dir = "~/tsdb/daily",
columns = c("A", "B"),
return.class = "data.frame",
column.name = "%file%/%column%")
timestamp example1/A example1/B example2/A example2/B 1 2016-01-01 1 NA 11 1 2 2016-01-02 2 NA 12 2 3 2016-01-03 3 NA 13 3 4 2016-01-04 4 NA 14 4 5 2016-01-05 5 NA 15 5 6 2016-01-06 NA NA 16 6 7 2016-01-07 NA NA 17 7 8 2016-01-08 NA NA 18 8 9 2016-01-09 NA NA 19 9 10 2016-01-10 NA NA 20 10 Warning message: In read_ts_tables(c("example1", "example2"), dir = "~/tsdb/daily", : columns missing
tsdb works with time-series tables (objects of
class ts_table
). A ts_table
is a numeric matrix,
so there is always a dim
attribute. For a
time-series table x
, you get the number of
observations with dim(x)[1L]
.
Attached to this matrix are several attributes:
- timestamp
- a vector: the numeric representation of the timestamp
- t.type
- character: the class of the original
timestamp, either
Date
orPOSIXct
- columns
- a character vector that provides the columns names
There may be other attributes as well, but these three are always present.
A ts_table
is not meant as a time-series class. For
most computations (plotting, calculation of statistics,
etc), the ts_table
must first be coerced to zoo
,
xts
, a data-frame or a similar data
structure. Methods that perform such coercions are
responsible for converting the numeric timestamp vector
to an actual timestamp. For this, they may use the
function ttime
, whose pronounciation may remind you
of a hot beverage, but whose name really stands for
translate time
.
tsdb
can store and load time-series data. The format
it uses is plain CSV. A sample file may look as
follows:
"timestamp","close" 17131,11 17132,12 17133,13 17134,14 17135,15
Thus, the file has a header line that gives the
names of the columns, with the first column always
being named timestamp
.
The advantage of this plain format is that the data
are in no way dependent on tsdb
. The files can be
used and manipulated by other software as well.
Two types of timestamps are supported: Date
and
POSXIct
. As part of a ts_table
, timestamps are
always stored in their numeric representation: daily
timestamps are represented as the number of days
since 1 Jan 1970; intraday timestamps are the number
of seconds since 1 Jan 1970.