Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
170 lines (137 sloc) 6.08 KB
title: "Introduction to ggepi"
author: "Luke W. Johnston"
date: "`r Sys.Date()`"
toc: true
vignette: >
%\VignetteIndexEntry{Introduction to ggepi}
```{r setup, include = FALSE}
collapse = TRUE,
comment = "#>",
fig.width = 7,
fig.height = 7,
fig.align = "center",
warning = FALSE
The main aim of the ggepi package is to provide ggplot2 layers for common
statistical presentations (at least in epidemiology and at least what I use
often). The ggepi package is as hands off as possible. This package assumes what
you want to plot is what you want to plot, and lets you have full control over
how the plot of the results looks like. When it comes to your data or results
you want to plot, all of the work is done by you and done before plotting. The
reason I created this package and these functions is because many of the other
packages that have similar functionality assume too many things about what data
you want to plot and how the plot will look... and it is very difficult to change
how these settings. The other thing about other packages is that they don't make
use of ggplot2 ability to [extend and add layers](
This package creates new geom layers that provide all the full customizability of
ggplot2 in a convenient wrapper for common ways of plotting certain types of
results. There are of course trade offs to this different way of doing things!
After each section there is a list of similar packages. Try the different
packages out yourself and find out what works for you!
# Estimates and confidence intervals from regression models
In most cases, showing a figure of some results is more impactful than showing a
table. Often times in epidemiology, results from analyses are put into a table.
This is less then ideal for busy researchers, as it takes longer to process and
understand the results from a table than it is from a figure. So, for presenting
results from a regression type anaylsis (something that creates an estimate and
a confidence interval) you can use the `geom_estci()` layer.
```{r lm_model}
fit <- lm(Fertility ~ 0 + Catholic + Agriculture + Examination + Education +
Infant.Mortality, data = swiss)
fit <- tidy(fit, = TRUE)
# A quick look at the data:
Using the `geom_estci` layer, you have fine-grained control of what it will look like.
```{r basic_estci}
ggplot(fit, aes(x = estimate, y = term, xmin = conf.low, xmax = conf.high)) +
There is a utility function `discrete_pvalue` to convert a numeric p-value into
a factor, which makes it easier to visualize the results. So:
```{r discrete_pval}
fit2 <- transform(fit, p.value = discrete_pvalue(fit$p.value))
p <- ggplot(fit2, aes(x = estimate, y = term, xmin = conf.low, xmax = conf.high,
alpha = p.value, size = p.value))
p + geom_estci(fatten = 1, center.linetype = "dotted")
We can add ends to the error bars too.
```{r misc_edits_estci}
p + geom_estci(fatten = 1, center.linetype = "dotted", height = 0.5, linetype = "dashed")
All of the power of ggplot2 is available to customize the `geom_estci` layer.
Check out the documentation for the `geom_estci` function (via `?`) as well as
the ggplot2 layers e.g. `geom_pointrange` or `geom_point` for full
customization. Since this is a ggplot2 object, you can add to it all you want,
like using `facet_grids`.
## Similar packages
There are a few packages that have similar functionality as the e.g.
`geom_estci`, but each has its own specific use cases:
- [dotwhisker]( package
- [GGally]( package
In both of these packages, they assume you'll plot *directly* from the `lm` or
other regression function, so they do a lot of internal data wrangling and
# Dimensionality reduction techniques
## Correlation circle
For dimensionality reduction techniques, often it is useful to look at the
correlation circle plots. These plots show the correlation between the original
values and the scores (or component scores or components or factors) to get a
sense of what is going on in the data. So, let's set up the data.
```{r pls_model}
# Set up data to plot. We need to manually calculate the correlations and then
# wrangle the results into the appropriate form (with components columns and the
# variable name column)
fit <- plsr(density ~ NIR, data = yarn)
fit <- cor(model.matrix(fit), scores(fit)[, 1:2, drop = FALSE])
fit <-
fit$Variables <- rownames(fit)
rownames(fit) <- NULL
colnames(fit)[1:2] <- c("Comp1", "Comp2")
# A quick look at the results:
And we can plot it using `geom_corr_circle()`:
```{r corr_circle}
p <- ggplot(fit, aes(x = Comp1, y = Comp2))
p + geom_corr_circle()
And to customize it:
```{r custom_corr_circle}
p + geom_corr_circle(
size = 1,
colour = "grey50",
outer.linesize = 0.5,
inner.linesize = 0.5,
center.linecolour = "grey",
center.linesize = 0.3
) +
If you want to add text to the "highly correlated" variables, I suggest using
the [ggrepel::geom_text_repel()](
functions. They are really useful!
## Similar packages
These other packages have similar functionality.
- [ggbiplot]( package
- [factoextra]( package
- `biplot.princomp()` function in base R
However, these packages again assume you are plotting directly from a `prcomp()`
or `pls::plsr()` output, so they do a lot of internal data wrangling. Sometimes
you may want more control over what data is plotted. Customizing the display of
the plots from these other packages or functions can also be a bit difficult.
Here, ggepi is as hands off as possible. You have control over what is plotted
and how it looks.
You can’t perform that action at this time.