-
Notifications
You must be signed in to change notification settings - Fork 2
/
manipulation.Rmd
208 lines (153 loc) · 4.68 KB
/
manipulation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
---
title: "Time series manipulation"
date: "`r Sys.Date()`"
author:
- name: Leo Lahti
email: leo.lahti@iki.fi
- name: Yagmur Simsek
email: yagmur.simsek@hsrw.org
package:
miaTime
output:
BiocStyle::html_document:
fig_height: 7
fig_width: 10
toc: yes
toc_depth: 2
number_sections: true
vignette: >
%\VignetteIndexEntry{miaTime}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: references.bib
---
```{r eval=FALSE, include=FALSE}
library(Cairo)
knitr::opts_chunk$set(cache = FALSE,
fig.width = 9,
dpi=300,
dev = "png",
dev.args = list(type = "cairo-png"),
message = FALSE,
warning = FALSE)
```
```{r loadhide, message=FALSE, warning=FALSE, echo=FALSE}
library(mia)
library(miaTime)
library(dplyr)
library(lubridate)
library(SummarizedExperiment)
```
# Introduction
`miaTime` implements tools for time series manipulation based on the `TreeSummarizedExperiment` [@TSE] data container. Much of the functionality is also applicable to the `SummarizedExperiment` [@SE] data objects. This tutorial shows how to use `miaTime` methods as well as the broader R/Bioconductor ecosystem to manipulate time series data.
Check also the related package [TimeSeriesExperiment](https://github.com/nlhuong/TimeSeriesExperiment).
## Installation
Installing the latest development version in R.
```{r, eval=FALSE}
library(devtools)
devtools::install_github("microbiome/miaTime")
```
Loading the package:
```{r load-packages, message=FALSE, warning=FALSE}
library(miaTime)
```
## Sorting samples
The _tidySingleCellExperiment_ package provides handy functions for (Tree)SE manipulation.
```{r sort samples, message=FALSE, warning=FALSE}
library("tidySingleCellExperiment")
data(hitchip1006)
tse <- hitchip1006
tse2 <- tse %>% tidySingleCellExperiment::arrange(subject, time)
```
## Storing time information with Period class
`miaTime` utilizes the functions available in the package `lubridate`
to convert time series field to "Period" class object. This gives access to a
number of readily available [time series manipulation tools](https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html).
Load example data:
```{r}
# Load packages
library(miaTime)
library(lubridate)
library(SummarizedExperiment)
# Load demo data
data(hitchip1006)
tse <- hitchip1006
# Time is given in days in the demo data.
# Convert days to seconds
time_in_seconds <- 60*60*24*colData(tse)[,"time"]
# Convert the time data to period class
Seconds <- as.period(time_in_seconds, unit="sec")
# Check the output
Seconds[1140:1151]
```
## Conversion between time units
The time field in days is now shown in seconds. It can then be
converted to many different units using the lubridate package.
```{r}
Hours <- as.period(Seconds, unit = "hour")
Hours[1140:1151]
```
The updated time information can then be added to the
`SummarizedExperiment` data object as a new `colData` (sample data)
field.
```{r}
colData(tse)$timeSec <- Seconds
colData(tse)
```
## Calculating time differences
The \link[lubridate]{as.duration} function helps to specify time points as duration.
```{r}
Duration <- as.duration(Seconds)
Duration[1140:1151]
```
The difference between subsequent time points can then be calculated.
```{r}
Timediff <- diff(Duration)
Timediff <- c(NA, Timediff)
Timediff[1140:1151]
```
The time difference from a selected point to the other time points
can be calculated as follows.
```{r}
base <- Hours - Hours[1] #distance from starting point
base[1140:1151]
base_1140 <- Seconds - Seconds[1140]
base_1140[1140:1151]
```
## Time point rank
Rank of the time points can be calculated by `rank` function provided in base R.
```{r}
colData(tse)$rank <- rank(colData(tse)$time)
colData(tse)
```
## Operations per unit
Sometimes we need to operate on time series per unit (subject,
reaction chamber, sampling location, ...).
Add time point rank per subject.
```{r}
library(dplyr)
colData(tse) <- colData(tse) %>%
as.data.frame() %>%
group_by(subject) %>%
mutate(rank = rank(time, ties.method="average")) %>%
DataFrame()
```
## Subset to baseline samples
```{r}
library(tidySummarizedExperiment)
# Pick samples with time point 0
tse <- hitchip1006 |> filter(time == 0)
# Or: tse <- tse[, tse$time==0]
# Sample with the smallest time point within each subject
colData(tse) <- colData(tse) %>%
as.data.frame() %>%
group_by(subject) %>%
mutate(rank = rank(time, ties.method="average")) %>%
DataFrame
# Pick the subset including first time point per subject
tse1 <- tse[, tse$rank == 1]
```
# Session info
```{r}
sessionInfo()
```