vignettes/process-results.Rmd

---
title: "Processing results"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Processing results}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

After running the registration function `register()` as shown in the [registering data](https://ruthkr.github.io/greatR/articles/register-data.html) article, users can summarise and visualise the results as illustrated in the figure below.

```{r vis-reg-data, echo=FALSE, fig.align='center', out.width='100%'}
knitr::include_graphics("figures/visualisation_diagram.png")
```

## Summarising registration results

The total number of registered and non-registered genes can be obtained by running the function `summary()` with `registration_results` object as an input.

```{r load-greatR, message=FALSE, include=FALSE}
# Load the package
library(greatR)
library(data.table)
```

```{r brapa-data-results, message=FALSE, warning=FALSE, include=FALSE}
# Load a data frame from the sample data
registration_results <- system.file("extdata/brapa_arabidopsis_registration.rds", package = "greatR") |>
  readRDS()
```

The function `summary()` returns a list with S3 class `summary.res_greatR` containing four different objects:

- `summary` is a data frame containing the summary of the registration results (default S3 print).
- `registered_genes` is a vector of gene IDs which are successfully registered.
- `non_registered_genes` is a vector of non-registered gene IDs.
- `reg_params` is a data frame containing the distribution of registration parameters.

```{r get-summary-results, fig.align='center'}
# Get registration summary
reg_summary <- summary(registration_results)

reg_summary$summary |>
  knitr::kable()
```

The list of gene IDs which are registered or non-registered can be viewed by calling:

```{r print-accession-of-registered-genes}
reg_summary$registered_genes
```

```{r print-accession-of-non-registered-genes}
reg_summary$non_registered_genes
```

### Plot distribution of registration parameters

The function `plot()` allows users to plot the bivariate distribution of the registration parameters. Non-registered genes can be ignored by selecting `type = "registered"` instead of the default `type = "all"`. Similarly, the marginal distribution type can be changed from `type_dist = "histogram"` (default) to `type_dist = "density"`.

```r
plot(
  reg_summary,
  type = "registered"
)
```

```{r plot-summary-results, echo=FALSE, fig.align='center', fig.height=4, fig.width=4.5, warning=FALSE}
plot(
  reg_summary,
  type = "registered",
  scatterplot_size = c(4, 3.5)
)
```

## Plotting registration results

The function `plot()` allows users to plot the registration results of the genes of interest (by default only up to the first 25 genes are shown, for more control over this, use the `genes_list` argument).

```{r plot-results, fig.align='center', fig.height=8, fig.width=7, warning=FALSE}
# Plot registration result
plot(
  registration_results,
  ncol = 2
)
```

Notice that the plot includes a label indicating if the particular genes are registered or non-registered, as well as the registration parameters in case the registration is successful.

For more details on the other function arguments, go to `plot()`.

## Analysing similarity of expression profiles over time before and after registering

### Calculate sample distance

After registering the data, users can compare the overall similarity between datasets before and after registering using the function `calculate_distance()`. By default all genes are considered in this calculation, this can be changed by using the `genes_list` argument.

```{r get-sample-distance}
sample_distance <- calculate_distance(registration_results)
```

The function `calculate_distance()` returns a list with S3 class `dist_greatR` of two data frames:

- `result` is the distance between scaled reference and query expressions using time points after registration.
- `original` is the distance between scaled reference and query expressions using original time points before registration.

### Plot heatmap of sample distances

Each of these data frames above can be visualised using the `plot()` function, by selecting either `type = "result"` (default) or `type = "original"`.

```{r plot-dist-original, fig.align='center', fig.height=4, fig.width=3, warning=FALSE}
# Plot heatmap of mean expression profiles distance before registration process
plot(
  sample_distance,
  type = "original"
)
```

```{r plot-dist-registered, fig.align='center', fig.height=4, fig.width=4, warning=FALSE}
# Plot heatmap of mean expression profiles distance after registration process
plot(
  sample_distance,
  type = "result",
  match_timepoints = TRUE
)
```

Notice that we use `match_timepoints = TRUE` to match the registered query time points to the reference time points.