vignettes/dilp.Rmd

---
title: "Leaf physiognomic walkthrough"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Leaf physiognomic walkthrough}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(dilp)
```

This will be a quick and dirty walkthrough of how to get results from a raw leaf physiognomic dataset.

#### Background

This package contains functions that enable the quick analysis of a quantitative leaf physiognomic dataset.

We provide a function for Digital Leaf Physiognomy (`dilp()`), which estimates paleoclimate via multiple linear regressions calibrated with a modern dataset.

We also provide functions for Leaf Margin Analysis (`temp_slr()`) and Leaf Area Analysis (`precip_slr()`), simple linear regressions that estimate mean annual temperature (MAT) and mean annual precipitation (MAP) respectively.

Lastly, we provide a function for reconstructing fossil leaf mass per area (`lma()`), a functional trait that reflects leaf resource economy.

In this vignette, we'll walk you through the standard workflow for a complete leaf physiognomic dataset, using the included `McAbeeExample` dataset.

For ease of use, a template spreadsheet for data collection can be found here: [DiLP Data Collection Template](https://drive.google.com/file/d/1UYAd0u2fIn2QCLF6aKKzTj6KPaAPv0d1/view?usp=sharing){.uri}.

If you encounter any problems, or would like to request a feature, please create an issue [on the github page.](https://github.com/mjbutrim/dilp/issues)

```{r The simplest workflow}
# If the dataset is in good shape, this is all you need to do

dilp_results <- dilp(McAbeeExample)
lma_results <- lma(McAbeeExample)

# This just grabs the key data points from the results
data.frame(
  Site = c("McAbee H1", "McAbee H2"),
  MAT_MLR = dilp_results$results$MAT.MLR,
  MAT_SLR = dilp_results$results$MAT.SLR,
  MAP_MLR = dilp_results$results$MAP.MLR,
  MAP_SLR = dilp_results$results$MAP.SLR,
  site_mean_LMA = lma_results$lowe_site_mean_lma$value
)
```

And that's basically it! Climate estimates and associated information can be found in the output generated by `dilp()`, and leaf mass per area reconstructions can be found in the output generated by `lma()`.

Read on for a breakdown of key DiLP and LMA components and helper functions.

#### DiLP Paleoclimate Estimates in Depth

To go a bit more in depth, if the dataset is correctly formatted, all that needs to be done is to pass it through the `dilp()` function, which takes the following steps to produce paleoclimate estimates.

First, the data is processed using the `dilp_processing()` function, which cleans up the raw dataset and generates derived physiognomic characters based on the raw physiognomic data.

Next, possible errors and outlier measurements are identified using the `dilp_errors()` and `dilp_outliers()` functions.

Finally, Mean Annual Temperature (MAT) and Mean Annual Precipitation (MAP) are estimated using both multiple and simple linear regressions. Default parameters used are from the global regressions provided by Peppe et al. (2011).

All this information will be contained within the returned list.

```{r DiLP result elements}
# Elements of DiLP results:
print(paste0("dilp_results$", names(dilp_results)))
```

After generating `dilp()` results, make sure to check whether any common errors were discovered within the dataset.

There are no errors in the `McAbeeExample` dataset, but if there were, this table would identify the specimens in the original dataset that triggered the errors.

```{r Check errors}
dilp_results$errors
```

Similarly, check if there are any outlier datapoints. These aren't necessarily errors, but it may be worth double checking the original measurements.

In the `McAbeeExample` dataset, three specimens are identified as outliers in tooth count:internal perimeter ratio, three specimens are outliers in leaf area, and four specimens are outliers in perimeter ratio. In this case, each of these were re-examined and found to be acceptable outliers.

```{r Check outliers}
dilp_results$outliers
```

Now, let's take a look at the results. Paleoclimate reconstructions will be generated for each unique site found within the dataset.

The Margin, FDR, TC.IP, Ln.leaf.area, Ln.TC.IP, and Ln.PR columns simply report the site-level values for the parameters used in the DiLP regressions. The MAT.MLR and MAP.MLR columns report the temperature and precipitation results using those parameters. The MAT.SLR and MAP.SLR columns report temperature and precipitation results using simple linear regressions. Positive and negative error for all paleoclimate estimates are reported as well.

```{r Check the final results}
dilp_results$results
```

Finally, `dilp_cca()` can be called to make sure that your sites fit within the physiognomic space encompassed by the calibration data.

```{r Check CCA}
dilp_cca(dilp_results)
```

If a site you are testing falls outside the bounds of the calibration data, the DiLP regressions may not be able to accurately reconstruct the paleoclimate of that site.

In this case, both McAbee localities do fall within the bounds of the calibration data; thus, the use of DiLP is appropriate here.

#### Leaf Mass per Area Reconstructions in Depth

Leaf mass per area reconstructions can be generated from a smaller subset of leaf physiognomic data than is needed for DiLP paleoclimate estimates. All you really need is leaf area and petiole width.

The standard suite of DiLP traits already includes leaf area and petiole width, so here we will just continue using the included `McAbeeExample` dataset.

```{r LMA results elements}
lma_results <- lma(McAbeeExample)
print(paste0("lma_results$", names(lma_results)))
```

As with `dilp()`, results for `lma()` are saved within a list. lma_results$species_mean_lma includes the reconstructed mean LMA for every species-site pair in the dataset. Upper and lower prediction intervals are calculated as well.

```{r}
lma_results$species_mean_lma
```

Three different metrics of site level LMA are calculated.

royer_site_mean_lma and lowe_site_mean_lma use slightly different regressions to show the average LMA of all species at a site.

lowe_site_variance_lma shows the variance of species-mean LMA values at a site.

```{r}
# Royer Site Mean LMA
lma_results$royer_site_mean_lma
```

```{r}
# Lowe Site Mean LMA
lma_results$lowe_site_mean_lma
```

```{r}
# Lowe Site Variance LMA
lma_results$lowe_site_variance_lma
```

#### Paleoclimate Estimates with Simple linear regressions

Sometimes, you may not have a full leaf physiognomic dataset recorded for a site. In that case, simple linear regressions can be used to estimate MAT (`temp_slr()`) and MAP(`precip_slr()`) so long as you have margin state or leaf area data, respectively.

See the documentation for either function to learn about the different regressions that are preloaded into the functions.  In this case, we'll use the Peppe2018 regression for MAT and the Wilf1998 regression for MAP.

```{r}
temp_slr(McAbeeExample, regression = "Peppe2018")
```

```{r}
precip_slr(McAbeeExample, regression = "Wilf1998")
```

You can also use your own regressions for both of these functions as long as you provide the slope, the constant, and the standard error .

```{r}
temp_slr(McAbeeExample, slope = 0.290, constant = 1.320, error = 5)
```