/
process-results.Rmd
135 lines (101 loc) · 4.81 KB
/
process-results.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
title: "Processing results"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Processing results}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
After running the registration function `register()` as shown in the [registering data](https://ruthkr.github.io/greatR/articles/register-data.html) article, users can summarise and visualise the results as illustrated in the figure below.
```{r vis-reg-data, echo=FALSE, fig.align='center', out.width='100%'}
knitr::include_graphics("figures/visualisation_diagram.png")
```
## Summarising registration results
The total number of registered and non-registered genes can be obtained by running the function `summary()` with `registration_results` object as an input.
```{r load-greatR, message=FALSE, include=FALSE}
# Load the package
library(greatR)
library(data.table)
```
```{r brapa-data-results, message=FALSE, warning=FALSE, include=FALSE}
# Load a data frame from the sample data
registration_results <- system.file("extdata/brapa_arabidopsis_registration.rds", package = "greatR") |>
readRDS()
```
The function `summary()` returns a list with S3 class `summary.res_greatR` containing four different objects:
- `summary` is a data frame containing the summary of the registration results (default S3 print).
- `registered_genes` is a vector of gene IDs which are successfully registered.
- `non_registered_genes` is a vector of non-registered gene IDs.
- `reg_params` is a data frame containing the distribution of registration parameters.
```{r get-summary-results, fig.align='center'}
# Get registration summary
reg_summary <- summary(registration_results)
reg_summary$summary |>
knitr::kable()
```
The list of gene IDs which are registered or non-registered can be viewed by calling:
```{r print-accession-of-registered-genes}
reg_summary$registered_genes
```
```{r print-accession-of-non-registered-genes}
reg_summary$non_registered_genes
```
### Plot distribution of registration parameters
The function `plot()` allows users to plot the bivariate distribution of the registration parameters. Non-registered genes can be ignored by selecting `type = "registered"` instead of the default `type = "all"`. Similarly, the marginal distribution type can be changed from `type_dist = "histogram"` (default) to `type_dist = "density"`.
```r
plot(
reg_summary,
type = "registered"
)
```
```{r plot-summary-results, echo=FALSE, fig.align='center', fig.height=4, fig.width=4.5, warning=FALSE}
plot(
reg_summary,
type = "registered",
scatterplot_size = c(4, 3.5)
)
```
## Plotting registration results
The function `plot()` allows users to plot the registration results of the genes of interest (by default only up to the first 25 genes are shown, for more control over this, use the `genes_list` argument).
```{r plot-results, fig.align='center', fig.height=8, fig.width=7, warning=FALSE}
# Plot registration result
plot(
registration_results,
ncol = 2
)
```
Notice that the plot includes a label indicating if the particular genes are registered or non-registered, as well as the registration parameters in case the registration is successful.
For more details on the other function arguments, go to `plot()`.
## Analysing similarity of expression profiles over time before and after registering
### Calculate sample distance
After registering the data, users can compare the overall similarity between datasets before and after registering using the function `calculate_distance()`. By default all genes are considered in this calculation, this can be changed by using the `genes_list` argument.
```{r get-sample-distance}
sample_distance <- calculate_distance(registration_results)
```
The function `calculate_distance()` returns a list with S3 class `dist_greatR` of two data frames:
- `result` is the distance between scaled reference and query expressions using time points after registration.
- `original` is the distance between scaled reference and query expressions using original time points before registration.
### Plot heatmap of sample distances
Each of these data frames above can be visualised using the `plot()` function, by selecting either `type = "result"` (default) or `type = "original"`.
```{r plot-dist-original, fig.align='center', fig.height=4, fig.width=3, warning=FALSE}
# Plot heatmap of mean expression profiles distance before registration process
plot(
sample_distance,
type = "original"
)
```
```{r plot-dist-registered, fig.align='center', fig.height=4, fig.width=4, warning=FALSE}
# Plot heatmap of mean expression profiles distance after registration process
plot(
sample_distance,
type = "result",
match_timepoints = TRUE
)
```
Notice that we use `match_timepoints = TRUE` to match the registered query time points to the reference time points.