# KEGG: Get all HSA-associated reactions
### George L. Malone

### Contents
1. Overview
2. Function definitions
3. Typical operations

## 1. Overview

##### Task
The task was to collect all reactions associated with the H. sapiens system. The method required was to imitate the route used by the MATLAB package [MetaboNetworks](https://au.mathworks.com/matlabcentral/fileexchange/42684-metabonetworks), but implemented in *R*, using the [`KEGGREST` package](https://bioconductor.org/packages/release/bioc/html/KEGGREST.html);
- Given an organism ID;
  1. Extract all enzymes.
  2. Extract all reactions.

##### Progress
In its current state, the *R* functions are able to;
- extract all enzymes given an organism ID
- extract all associated reactions
- return a sorted list of unique reactions

##### Notes and comments
- `roxygen2`-style documentation is included for each function definition.
- `split_to_n` has been mildly updated from its original state for inclusion in this document.
- The cause of greatest computation time appears to be the function `get_enzymedetails_from_enzymelist`, as it's making a large number of requests to KEGG.
- The output of this set of functions could be sent to functions created previously, such that all compounds can be extracted from the list of reactions.
  - In essence, these functions theoretically provide a more comprehensive method to collect data on all reactions relating to the H. sapiens metabolism.
- The operations have not been tested on the entire enzyme list, only on a non-random sample of 40.

## 2. Function definitions

Provided below are function definitions for those used in operations.

In order of appearance;
- `split_to_n`
- `get_enzymedetails_from_enzymelist`
- `get_allreac_from_enzymedetails`

`split_to_n`: Splits an object of arbitrary length into groups of size *n*. Required for use in the function `get_enzymedetails_from_enzymelist`, to reduce the number of KEGG REST API requests from `length(enzymelist)` to `ceiling(length(enzymelist) / 10)`.

In [None]:
#' Given a vector, split this into a list of entries of group size n.
#' @param obj Vector to split.
#' @param n Size to split vector by.
#' @return List containing elements of `obj`, in groups of `n`.
split_to_n <- function(obj, n) {
    ## Initialise list index and marker;
    k <- 1
    marker <- 0
    ## Prepare the empty list;
    split <- vector(mode = 'list', length = ceiling(length(obj) / n))
    ## Loop over the object;
    for (i in seq_along(obj)) {
        ## Add to the list;
        split[[k]] <- append(split[[k]], obj[i])
        ## Iterate marker;
        marker <- marker + 1
        ## If marker at n, iterate k and reset marker;
        if (marker == n) {
            k <- k + 1
            marker <- 0
        }
    }
    ## Return the list;
    return(split)
}

`get_enzymedetails_from_enzymelist`: Extracts the details of enzymes, given a list of their IDs. The enzyme ID list can be found as shown in Section 3.

In [None]:
#' Given a list of KEGG Enzyme IDs, extract the enzyme details.
#' @param enzymelist Character vector of enzyme IDs.
#' @return List containing all unique enzyme data from the enzyme ID list.
get_enzymedetails_from_enzymelist <- function(enzymelist) {
    source("./split_to_n.R")  # Requires this.
    enzymelist_spl <- split_to_n(sort(unique(enzymelist)), 10L)
    return(unlist(
        lapply(enzymelist_spl, function(x) { KEGGREST::keggGet(x) }),
        recursive = FALSE
    ))
}

`get_allreac_from_enzymedetails`: Extracts all reactions from a list of enzyme details. The list of enzyme details should come from `get_enzymedetails_from_enzymelist`.

In [None]:
#' Given a list of enzymes, extract all reactions.
#' @param enzymedetails List of enzyme details. Each element is an enzyme.
#' @return Sorted, unique character vector of KEGG reaction IDs found.
get_allreac_from_enzymedetails <- function(enzymedetails) {
    allreac_raw <- unlist(lapply(enzymedetails, function(x) { x$ALL_REAC }))
    allreac_spl <- unlist(strsplit(allreac_raw, " "))
    clean <- sapply(
        allreac_spl,
        function(x) { gsub("[^R0-9]", '', x, perl = TRUE) }
    )
    return(sort(unique(clean[which(clean != '')])))
}

## 3. Typical operations

The following code block gives an example of the intended usage of the functions provided in the previous segment.

In [None]:
## Set working directory and check for package requirements.
setwd("~/")  # Change to preferred directory.
if (!require("KEGGREST", quietly = TRUE)) { stop("KEGGREST is required.") }
source("./split_to_n.R")

## Begin ops.

## Get the enzyme list.
enzymelist <- sort(unique(KEGGREST::keggLink("enzyme", "hsa")))

## Get the enzyme details.
enzymedetails <- get_enzymedetails_from_enzymelist(enzymelist)

## Extract the reactions.
allreac <- get_allreac_from_enzymedetails(enzymedetails)

## End ops.