#' It returns a tibble with the predictions from all the terms in a \link[mgcv]{gam} or \link[mgcv]{bam} model.
#'
#' @param model A \code{gam} or \code{bam} model object.
#' @param exclude_terms Terms to be excluded from the prediction. Term names should be given as they appear in the model summary (for example, \code{"s(x0,x1)"}).
#' @param length_out An integer indicating how many values along the numeric predictors to use for predicting the outcome term (the default is \code{50}).
#' @param values User supplied values for numeric terms as a named list.
#'
#' @return A tibble with predictions from a a \link[mgcv]{gam} or \link[mgcv]{bam} model.
#'
#' @examples
#' library(mgcv)
#' set.seed(10)
#' data <- gamSim(4)
#' model <- gam(y ~ fac + s(x2) + s(x2, by = fac) + s(x0), data = data)
#'
#' # get predictions
#' p <- predict_gam(model)
#'
#' # get predictions excluding x0 (the coefficient of x0 is set to 0)
#' @param facet_terms An unquoted formula with the terms used for faceting.
#' @param conditions A list of quosures with \link[rlang]{quos} specifying the levels to plot from the model terms not among \code{time_series}, \code{comparison}, or \code{facet_terms}.
#' It provides a `geom` for plotting GAM smooths with confidence intervals from the output of \link[tidymv]{predict_gam}. It inherits the following `aes` from a call to `ggplot`:
#' \itemize{
#' \item The term defining the x-axis.
#' \item The fitted values (the \code{fit} column in the tibble returned by \link[tidymv]{predict_gam}).
#' \item The standard error of the fit (the \code{se.fit} column in the tibble returned by \link[tidymv]{predict_gam}).
#' }
#'
#' @param group The optional grouping factor.
#' @param ci_z The z-value for calculating the CIs (the default is \code{1.96} for 95 percent CI).
#' @param ci_alpha Transparency value of CIs (the default is \code{0.1}).
#' @param data The data to be displayed in this layer. If \code{NULL}, it is inherited.
#' @param ... Arguments passed to \code{geom_path()}.
#'
#' @examples
#' library(mgcv)
#' library(ggplot2)
#' set.seed(10)
#' data <- gamSim(4)
#' model <- gam(y ~ fac + s(x2) + s(x2, by = fac), data = data)
#`tidymv`: Plotting for generalised additive models
This is the repository of the `R` package `tidymv`. This package provides functions for the visualisation of GAM(M)s using tidy tools from the `tidyverse`. `tidymv`is based on the `itsadug` package, and indeed it uses some of its functions under the hood.
This is the repository of the `R` package `tidymv`. This package provides functions for the visualisation of GAM(M)s and the generation of model-based predicted values using tidy tools from the `tidyverse`. `tidymv`uses some functions from the `itsadug` package.
##Installation
To install the package, use `devtools::install_github("stefanocoretta/tidymv@v1.5.4", build_opts = c("--no-resave-data", "--no-manual"))`. To learn how to use the package, do `vignette("plot-smooths", package = "tidymv")` after the installation.
To install the package, use `devtools::install_github("stefanocoretta/tidymv@v2.0.0", build_opts = c("--no-resave-data", "--no-manual"))`. To learn how to use the package, check out the vignettes (for example, `vignette("predict-gam", package = "tidymv")`).
If you wish to install the development version, use `devtools::install_github("stefanocoretta/tidymv", build_opts = c("--no-resave-data", "--no-manual"))`.
title: "Get model predictions and plot them with `ggplot2`"
author: "Stefano Coretta"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Get and plot model predictions}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse=TRUE,
comment="#>",
out.width="300px", fig.align="center", dpi=300
)
library(ggplot2)
theme_set(theme_bw())
library(dplyr)
library(mgcv)
library(tidymv)
```
While `plot_smooths()` offers a streamlined way of plotting predicted smooths from a GAM model (see `vignette("plot-smooths", package = "tidymv")`), it is too constrained for other more complex cases.
The most general solution is to get the predicted values of the outcome variable according to all the combinations of terms in the model and use this dataframe for plotting.
This method grants the user maximum control over what can be plotted and how to transform the data (if necessary).
I will illustrate how to use the function `predict_gam()` to create a prediction dataframe and how this dataframe can be used for plotting different cases.
##Smooths
First of all let's generate some simulated data and create a GAM model with a factor `by` variable.
```{r model}
library(mgcv)
set.seed(10)
data<- gamSim(4, 400)
model<- gam(
y~
fac+
s(x2, by=fac),
data=data
)
summary(model)
```
We can extract the predicted values with `predict_gam()`.
The predicted values of the outcome variable are in the column `fit`, while `fit.se` reports the standard error of the predicted values.
```{r model-p}
model_p<- predict_gam(model)
model_p
```
Now plotting can be done with `ggplot2`.
The convenience function `geom_smooth_ci()` can be used to plot the predicted smooths with confidence intervals.
```{r model-plot}
model_p %>%
ggplot(aes(x2, fit)) +
geom_smooth_ci(fac)
```
##Surface smooths
Now let's plot a model that has a tensor product interaction term (`ti()`).
```{r model-2}
model_2<- gam(
y~
s(x2) +
s(f1) +
ti(x2, f1),
data=data
)
summary(model_2)
```
Let's get the prediction dataframe and produce a contour plot.
We can adjust labels and aesthetics using the usual `ggplot2` methods.
```{r model-2-p}
model_2_p<- predict_gam(model_2)
model_2_p
```
```{r model-2-plot}
model_2_p %>%
ggplot(aes(x2, f1, z=fit)) +
geom_raster(aes(fill=fit)) +
geom_contour(colour="white") +
scale_fill_continuous(name="y") +
theme_minimal() +
theme(legend.position="top")
```
##Smooths at specified values of a continuous predictor
To plot the smooths across a few values of a continuous predictor, we can use the `values` argument in `predict_gam()`.
`exclude_terms` takes a character vector of term names, as they appear in the output of `summary()` (rather than as they are specified in the model formula).
For example, to remove the term `s(x2, fac, bs = "fs", m = 1)`, `"s(x2,fac)"` should be used since this is how the summary output reports this term.
The output still contains the excluded columns.
The predicted values of the outcome variable are not affected by the value the excluded terms (the predicted values are repeated for each value of the excluded terms).
In other words, the coefficients for the excluded terms are set to 0 when predicting.
We can filter the predicted dataset to get unique predicted values by choosing any value or level of the excluded terms.\footnote{Alternatively, we can use `splice()`: `group_by(a) %>% splice(1)`. See `?splice`.}