/
register-data-manually.Rmd
112 lines (91 loc) · 4.52 KB
/
register-data-manually.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
title: "Registering data without optimisation"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Registering data without optimisation}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
This article will show users how to register data using some specified shift and stretch parameters. This demo will use one of the genes from the sample data provided by the package.
## Load sample data
`greatR` provides an example of data frame containing two different species *A. thaliana* and *B. rapa* with two and three different replicates, respectively. This data frame can be read as follows:
```{r load-greatR, message=FALSE}
# Load the package
library(greatR)
library(data.table)
```
```{r brapa-data, message=FALSE, warning=FALSE}
# Load a data frame from the sample data
b_rapa_data <- system.file("extdata/brapa_arabidopsis_data.csv", package = "greatR") |>
data.table::fread()
```
## Registering without optimisation
The illustrative table below shows the major differences between runing `register()` with and without optimisation.
```{r reg-data-manual, echo=FALSE, fig.align='center', out.width='65%'}
knitr::include_graphics("figures/registration_params_table.png")
```
Here, we will only use a single gene with `gene_id = "BRAA03G023790.3C"` from the sample data, but this feature can also be used when registering multiple genes.
```{r get-subset-data, message=FALSE, warning=FALSE}
gene_BRAA03G023790.3C_data <- b_rapa_data[gene_id == "BRAA03G023790.3C"]
```
Before registering, we can use the helper function `get_approximate_stretch()` to approximate the stretch factor between our sample datasets.
```{r get-approximate, message=FALSE, warning=FALSE}
get_approximate_stretch(
gene_BRAA03G023790.3C_data,
reference = "Ro18",
query = "Col0"
)
```
We can now use the estimated stretch calculated above in the registration process below. Users need to set `use_optimisation = FALSE` to disable the automated optimisation process.
```{r register-subset-data, message=FALSE, warning=FALSE}
registration_results <- register(
gene_BRAA03G023790.3C_data,
reference = "Ro18",
query = "Col0",
scaling_method = "z-score",
stretches = 2.25,
shifts = -4.36,
use_optimisation = FALSE
)
#> ── Validating input data ───────────────────────────────────────────────────────
#> ℹ Will process 1 gene.
#> ℹ Using estimated standard deviation, as no `exp_sd` was provided.
#> ℹ Using `scaling_method` = "z-score".
#>
#> ── Starting manual registration ────────────────────────────────────────────────
#> ℹ Using `overlapping_percent` = 50% as a registration criterion.
#> ✔ Applying registration for genes (1/1) [38ms]
```
To check whether the gene is registered or not, we can get the summary results by accessing the `model_comparison` table from the registration result.
```{r get-model-summary-data, warning=FALSE}
registration_results$model_comparison |>
knitr::kable()
```
As we can see, using the given stretch and shift parameter above, the *B. rapa* gene *BRAA03G023790.3C* can be registered.
## Registering multiple gene with different pre-defined registration parameters
Users can also specify a list of parameters rather than a single value. Similar to the registration process above, users need to set `use_optimisation = FALSE` to disable the automated optimisation process.
```{r register-multiple-gene-manual, message=FALSE, warning=FALSE, eval=FALSE}
registration_results <- register(
gene_BRAA03G023790.3C_data,
reference = "Ro18",
query = "Col0",
scaling_method = "z-score",
stretches = seq(1, 3, 0.1),
shifts = seq(0, 4, 0.1),
use_optimisation = FALSE
)
#> ── Validating input data ───────────────────────────────────────────────────────
#> ℹ Will process 1 gene.
#> ℹ Using estimated standard deviation, as no `exp_sd` was provided.
#> ℹ Using `scaling_method` = "z-score".
#>
#> ── Starting manual registration ────────────────────────────────────────────────
#> ℹ Using `overlapping_percent` = 50% as a registration criterion.
#> ✔ Applying registration for genes (1/1) [1.3s]
```