vignettes/calcSD.Rmd

---
title: "Calculate sexual dimorphism of faces"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: facefuns.bib
csl: apa-numeric-superscript-brackets.csl
link-citations: yes
vignette: >
  %\VignetteIndexEntry{Calculate sexual dimorphism of faces}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

**WORK IN PROGRESS**

This tutorial will explain how to calculate facial sexual dimorphism (SD) in a set of 2D faces, using a [vector method](#vs) (e.g., [@holzleitner2014; @komori2011; @valenzano2006]) and [linear discriminant analysis](#ds) (e.g., [@scott2010]).

```{r setup}
library(facefuns)
library(ggplot2)
```

## Read and prep landmark data

```{r}
path_to_tem <- system.file("extdata", "tem", package="facefuns")

shapedata <- facefuns(data = read_lmdata(lmdata = path_to_tem,
                                         plot = FALSE),
                      remove_points = "frlgmm",
                      pc_criterion = "broken_stick",
                      plot_sample = FALSE,
                      quiet = TRUE)
```

In addition, we'll also need to know which of the faces in our sample are female, and which are male.

```{r}
data(LondonSet_info)
head(LondonSet_info)
```

## 1. Vector scores: `calc_(shape)vs`{#vs}

`facefuns::calc_vs` calculates sexual dimorphism by computing an n-dimensional vector between the average female and the average male shape, and then projecting each face onto this vector. Scores are scaled so that a score of 0 corresponds to the average female, and a score of 1 to the average male shape; a negative score, e.g., would thus indicate a hyperfeminine face.

You can use `calc_vs` to calculate vector scores from a data frame or matrix of principal component scores (see `help(calc_vs)`), or `calc_shapevs` to calculate them from the Procrustes-aligned templates in the facefuns output.

`calc_shapevs` takes three arguments, all of which should be a data frame or matrix with specimen as rows, and PC scores as columns:

* `facefuns_obj`: Output from `facefuns::facefuns`
* `anchor1_index`: Vector specifying indices of faces that will constitute lower anchor point     
* `anchor2_index`: Vector specifying indices of faces that will constitute upper anchor point    

We get the index of which faces in our array of Procrustes-aligned faces are female/male by testing which IDs have a corresponding entry in the info table for which sex == female/male

```{r}
fem_i <- gsub("^ID=","", dimnames(shapedata$aligned)[[3]]) %in%
  LondonSet_info$face_id[which(LondonSet_info$face_sex == "female")]

mal_i <- gsub("^ID=", "", dimnames(shapedata$aligned)[[3]]) %in%
  LondonSet_info$face_id[which(LondonSet_info$face_sex == "male")]
```

Now, we can run our function:
```{r}
sd_vector <- calc_shapevs(shapedata, fem_i, mal_i)
```

`calc_(shape)vs` will return a tibble that has two columns, `id` and `VS`:

```{r}
head(sd_vector)
```

```{r}
LondonSet_info %>%
  dplyr::left_join(sd_vector, by = c("face_id" = "id")) %>%
  ggplot(aes(x = VS, color=face_sex, fill = face_sex)) +
  geom_histogram(binwidth = .1) +
  theme_bw()
```

### Note: calculating vector scores within- and out-of-set

If your current sample is small, but you have a larger sample at hand that was delineated with the same landmarks or a compatible sub-/superset, you could use that bigger sample to build your shape PCA model and construct the sexual dimorphism vector, and then project faces from the smaller sample onto that vector. You can find an example [here](https://osf.io/s7c2z/) (to be updated and added to this vignette in the future).

### Note: calculating sexual dimorphism from symmetrized faces

Two-dimensional images are strongly affected by head posture, which is reflected in the fact that in 2D sets, the first principal components usually show shape associated with heads being tilted up/down, or heads being turned left/right. While it is not possible to "correct" for up/down head postures (plus shape differences associated with an upward-/downward-tilt overlap with those of sexual dimorphism), slight asymmetries due to a left/right tilt *could* be "remedied" by symmetrizing faces.

You could either symmetrize templates before submitting them to facefuns (using `facefuns::symm_templates`) which has the added benefit that it's very little fuss to plot the PCs on which the scores are based; or you could do so by using the option `symm = TRUE` in `calc_shapevs`.

Symmetrizing works by mirroring each landmark template, and calculating the mean between the original and mirrored template. After mirroring, the landmarks need to be relabeled - what used to be point 0 (right pupil) in the original template, is point 1 (the left pupil) after mirroring.

Unfortunately, this information needs to be manually entered, and will very much depend on the landmark template you want to mirror. For the set of 132 landmarks used throughout the examples in `facefuns`, this info is stored in `mirroredlandmarks`.

If `symm = TRUE`, `calc_shapevs` will first symmetrize the faces, re-conduct a GPA and PCA on these symmetrized templates, and then run `calc_vs`.

```{r}
data("mirroredlandmarks")

sd_vector_symm <- calc_shapevs(shapedata, fem_i, mal_i, symm = TRUE, mirroredlandmarks = mirroredlandmarks)
head(sd_vector_symm)

LondonSet_info %>%
  dplyr::left_join(sd_vector_symm, by = c("face_id" = "id")) %>%
  ggplot(aes(x = VS, color=face_sex, fill = face_sex)) +
  geom_histogram(binwidth = .1) +
  theme_bw()
```

## 2. Discriminant scores: `calc_ds`{#ds}

Another way to calculate sexual dimorphism is by employing a linear discriminant analyses to predict group membership.

`calc_ds` first conducts a (forward) stepwise variable selection and then carries out a linear discriminant analysis. It takes two arguments:

* `data`: A data frame or matrix with specimen as rows, and PCs as columns. Can also contain an additional column that defines group membership.
* `group_info`: Either a data frame with two columns (col 1 = id matching the rownames in data; col 2 = group), or a single numeric value indexing the column in data that contains group membership info

For this example, we'll specify group membership with a separate data frame.

```{r}
group_info <- LondonSet_info %>%
  dplyr::select(face_id, face_sex)

sd_discrim <- calc_ds(shapedata$pc_scores, group_info = group_info)
```

Like `calc_vs`, `calc_ds` will return a tibble that has two columns, in this case `id` and `DS`:

```{r eval=FALSE}
head(sd_discrim)
```

## Compare vector and discriminant scores

```{r message=FALSE}
compareSD <- LondonSet_info %>%
  dplyr::left_join(sd_vector, by = c("face_id" = "id")) %>%
  dplyr::left_join(sd_discrim, by = c("face_id" = "id"))

compareSD %>%
  dplyr::group_by(face_sex) %>%
  dplyr::summarise(mean_vs = round(mean(VS), 3),
                   mean_ds = round(mean(DS), 3))

compareSD %>%
  ggplot(aes(x = VS, y = DS)) +
  geom_point(aes(color = face_sex)) +
  geom_smooth(method='lm', formula = y ~ x, color = "purple") +
  theme_bw()
```

### Visualize differences

Plot difference between lowest-scoring (top) and highest-scoring faces (bottom) on VS (left) and DS (right)

```{r, fig.width=7, fig.height=7, fig.show='hold', fig.align='center', echo=FALSE}

.pardefault <- par(no.readonly = T)
par(mar = c(0, 0, 0, 0))

lo_vs <- compareSD %>%
  dplyr::slice_min(VS, n=5) %>%
  dplyr::pull(face_id)

lo_ds <- compareSD %>%
  dplyr::slice_min(DS, n=5) %>%
  dplyr::pull(face_id)

hi_vs <- compareSD %>%
  dplyr::slice_max(VS, n=5) %>%
  dplyr::pull(face_id)

hi_ds <- compareSD %>%
  dplyr::slice_max(DS, n=5) %>%
  dplyr::pull(face_id)

plotl <- list(geomorph::mshape(shapedata$aligned[,,lo_vs]),
              geomorph::mshape(shapedata$aligned[,,lo_ds]),
              geomorph::mshape(shapedata$aligned[,,hi_vs]),
              geomorph::mshape(shapedata$aligned[,,hi_ds]))

ref <- shapedata$average

plot_list <- rapply(plotl, function(x) {
    function() { geomorph::plotRefToTarget(ref, x, mag=1.5) }
  }, how = "list")

do.call(cowplot::plot_grid, c(plot_list, list(ncol = 2), scale = 1.5))

par(.pardefault)
```


## References