Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
175 lines (134 sloc) 5.38 KB
---
title: "Manual"
author: "N. Frerebeau"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
number_sections: yes
fig_caption: yes
toc: true
header-includes:
- \usepackage{amsmath}
- \usepackage{amssymb}
vignette: >
%\VignetteIndexEntry{Manual}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r setup, include = FALSE, echo=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(nomnoml)
```
# Definitions
The **arkhe** package provides a set of S4 classes for archaeological data matrices that extend the basic `matrix` data type. These new classes represent different special types of matrix.
* Numeric matrix:
* `CountMatrix` represents absolute frequency data,
* `AbundanceMatrix` represents relative frequency data,
* `OccurrenceMatrix` represents a co-occurrence matrix,
* `SimilarityMatrix` represents a (dis)similarity matrix,
* Logical matrix:
* `IncidenceMatrix` represents presence/absence data,
* `StratigraphicMatrix` represents stratigraphic relationships.
*It assumes that you keep your data tidy*: each variable (taxon/type) must be saved in its own column and each observation (assemblage/sample) must be saved in its own row. Note that missing values are not allowed.
The internal structure of S4 classes implemented in **arkhe** is depicted in the UML class diagram in the following figure.
```{nomnoml uml, echo=FALSE, fig.cap="UML class diagram of the S4 classes structure.", fig.width=7, fig.height=7}
[base::matrix||]
[Matrix|-id: character;+dates: list;+coordinates: list|]
[NumericMatrix||]
[CountMatrix||]
[AbundanceMatrix|+totals: numeric|]
[SimilarityMatrix|+method: character|]
[OccurrenceMatrix||]
[LogicalMatrix||]
[IncidenceMatrix||]
[StratigraphicMatrix||]
[base::matrix] <:- [Matrix]
[Matrix] <:- [NumericMatrix]
[NumericMatrix] <:- [CountMatrix]
[NumericMatrix] <:- [AbundanceMatrix]
[NumericMatrix] <:- [SimilarityMatrix]
[NumericMatrix] <:- [OccurrenceMatrix]
[Matrix] <:- [LogicalMatrix]
[LogicalMatrix] <:- [IncidenceMatrix]
[LogicalMatrix] <:- [StratigraphicMatrix]
```
## Numeric matrix
### Absolute frequency matrix (`CountMatrix`)
We denote the $m \times p$ count matrix by $A = \left[ a_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]$ with row and column sums:
\begin{align}
a_{i \cdot} = \sum_{j = 1}^{p} a_{ij} &&
a_{\cdot j} = \sum_{i = 1}^{m} a_{ij} &&
a_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} a_{ij} &&
\forall a_{ij} \in \mathbb{N}
\end{align}
### Relative frequency matrix (`AbundanceMatrix`)
A frequency matrix represents relative abundances.
We denote the $m \times p$ frequency matrix by $B = \left[ b_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]$ with row and column sums:
\begin{align}
b_{i \cdot} = \sum_{j = 1}^{p} b_{ij} = 1 &&
b_{\cdot j} = \sum_{i = 1}^{m} b_{ij} &&
b_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} b_{ij} &&
\forall b_{ij} \in \left[ 0,1 \right]
\end{align}
### Co-occurrence matrix (`OccurrenceMatrix`)
A co-occurrence matrix is a symmetric matrix with zeros on its main diagonal, which works out how many times (expressed in percent) each pairs of taxa occur together in at least one sample.
The $p \times p$ co-occurrence matrix $D = \left[ d_{i,j} \right] ~\forall i,j \in \left[ 1,p \right]$ is defined over an $m \times p$ abundance matrix $A = \left[ a_{x,y} \right] ~\forall x \in \left[ 1,m \right], y \in \left[ 1,p \right]$ as:
$$ d_{i,j} = \sum_{x = 1}^{m} \bigcap_{y = i}^{j} a_{xy} $$
with row and column sums:
\begin{align}
d_{i \cdot} = \sum_{j \geqslant i}^{p} d_{ij} &&
d_{\cdot j} = \sum_{i \leqslant j}^{p} d_{ij} &&
d_{\cdot \cdot} = \sum_{i = 1}^{p} \sum_{j \geqslant i}^{p} d_{ij} &&
\forall d_{ij} \in \mathbb{N}
\end{align}
## Logical matrix
### Incidence matrix (`IncidenceMatrix`)
We denote the $m \times p$ incidence matrix by $C = \left[ c_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]$ with row and column sums:
\begin{align}
c_{i \cdot} = \sum_{j = 1}^{p} c_{ij} &&
c_{\cdot j} = \sum_{i = 1}^{m} c_{ij} &&
c_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} c_{ij} &&
\forall c_{ij} \in \lbrace 0,1 \rbrace
\end{align}
# Usage
```{r packages}
# Load packages
library(arkhe)
```
## Create
These new classes are of simple use, on the same way as the base `matrix`:
```{r create}
set.seed(12345)
## Create a count data matrix
CountMatrix(data = sample(0:10, 100, TRUE),
nrow = 10, ncol = 10)
## Create an incidence (presence/absence) matrix
## Numeric values are coerced to logical as by as.logical
IncidenceMatrix(data = sample(0:1, 100, TRUE),
nrow = 10, ncol = 10)
```
Note that an `AbundanceMatrix` can only be created by coercion (see below).
## Coerce
**arkhe** uses coercing mechanisms (with validation methods) for data type conversions:
```{r coerce}
## Create a count matrix
A0 <- matrix(data = sample(0:10, 100, TRUE), nrow = 10, ncol = 10)
## Coerce to absolute frequencies
A1 <- as_count(A0)
## Coerce to relative frequencies
B <- as_abundance(A1)
## Row sums are internally stored before coercing to a frequency matrix
## (use get_totals() to get these values)
## This allows to restore the source data
A2 <- as_count(B)
all(A1 == A2)
## Coerce to presence/absence
C <- as_incidence(A1)
## Coerce to a co-occurrence matrix
D <- as_occurrence(A1)
```
You can’t perform that action at this time.