diff --git a/.github/workflows/render_pdf.yaml b/.github/workflows/render_pdf.yaml new file mode 100644 index 0000000..8952903 --- /dev/null +++ b/.github/workflows/render_pdf.yaml @@ -0,0 +1,45 @@ +name: Render PDF + +on: + push: + branches: main + + workflow_dispatch: + + +jobs: + render: + runs-on: ubuntu-latest + steps: + - name: Check out repository + uses: actions/checkout@v3 + + - name: Set up Quarto + uses: quarto-dev/quarto-actions/setup@v2 + with: + tinytex: true + + - name: Set up R + uses: r-lib/actions/setup-r@v2 + with: + use-public-rspm: true + + - name: Install R packages + run: | + install.packages(c("knitr", "rmarkdown", "quarto", "stringr")) + shell: Rscript {0} + + - name: Render PDF + run: | + quarto::quarto_render("_make_pdf.qmd") + shell: Rscript {0} + + - name: Commit files + run: | + git config --local user.email "actions@github.com" + git config --local user.name "GitHub Actions" + git add RSS-data-vis-guide.pdf + git commit -m "Render PDF" + git push + env: + github_token: ${{ secrets.GITHUB_TOKEN }} \ No newline at end of file diff --git a/RSS-data-vis-guide.pdf b/RSS-data-vis-guide.pdf index 102ccbb..33e852d 100644 Binary files a/RSS-data-vis-guide.pdf and b/RSS-data-vis-guide.pdf differ diff --git a/_make_pdf.qmd b/_make_pdf.qmd index 1afae59..32e5719 100644 --- a/_make_pdf.qmd +++ b/_make_pdf.qmd @@ -2,6 +2,9 @@ title: Best Practices for Data Visualisation subtitle: Insights, advice, and examples (with code) to make data outputs more readable, accessible, and impactful author: "Andreas Krause, Nicola Rennie, & Brian Tarran" +bibliography: references.bib +nocite: | + @* format: pdf: toc: true @@ -10,6 +13,10 @@ format: number-depth: 2 output-file: "RSS-data-vis-guide" output-ext: "pdf" + linkcolor: RSSblue + urlcolor: RSSblue + include-in-header: + - "styles.tex" editor: source --- diff --git a/about.qmd b/about.qmd index 3f5dde0..ec5ac38 100644 --- a/about.qmd +++ b/about.qmd @@ -6,7 +6,7 @@ editor: source ## Andreas Krause ::: grid -::: {.g-col-12 .g-col-md-6} +::: {.g-col-12 .g-col-md-8} Andreas Krause holds a PhD and an MSc in statistics and computer science. His current position is senior director, data science, modeling and simulation, at Idorsia Pharmaceuticals Ltd in Basel, Switzerland. His group provides clinical drug development teams with statistics and model-based analysis and predictions to support decision making. His works include more than 100 peer-reviewed [scientific publications](https://scholar.google.com/citations?hl=en&user=9a--8FUAAAAJ&view_op=list_works&sortby=pubdate). Andreas' scientific activities include positions as co-editor of the American Statistical Association's section on [Statistical Computing and Graphics](https://community.amstat.org/jointscsg-section/home), book review editor and board member of [*Computational Statistics*](https://www.springer.com/journal/180), associate editor and editor-in-chief of [*Pharmaceutical Statistics*](https://onlinelibrary.wiley.com/page/journal/15391612/homepage/editorialboard.html), and, currently, advisory board member of *Pharmaceutical Statistics* and the [*Journal of Pharmacokinetics and Pharmacodynamics*](https://doi.org/10.1007/s10928-023-09855-x). @@ -19,15 +19,17 @@ Graphics and visualisation are long-standing topics for him, including numerous - ORCID: [orcid.org/0000-0002-4686-976X](https://orcid.org/0000-0002-4686-976X) ::: -::: {.g-col-12 .g-col-md-6} -![](images/andreas.png){fig-align="center" fig-alt="Image of Andreas Krause" width="70%"} +::: {.content-visible unless-format="pdf"} +::: {.g-col-12 .g-col-md-4} +![](images/andreas.png){fig-align="center" fig-alt="Image of Andreas Krause"} +::: ::: ::: ## Nicola Rennie ::: grid -::: {.g-col-12 .g-col-md-6} +::: {.g-col-12 .g-col-md-8} Nicola Rennie is an academic, data scientist, and educator with a passion for effectively communicating data. She is a Lecturer in Health Data Science at Lancaster Medical School, where her research focuses on the use of statistical models to improve health outcomes. Her teaching experience covers topics including data visualisation, programming in R and Python, and how to effectively communicate the results of statistical analyses. Nicola is the author and maintainer of multiple R packages, and an active member of the R community - regularly presenting at R User Group Meetups and mentoring new members of the community. - Website: [nrennie.rbind.io](https://nrennie.rbind.io/) @@ -37,15 +39,17 @@ Nicola Rennie is an academic, data scientist, and educator with a passion for ef - ORCID: [orcid.org/0000-0003-4797-557X](https://orcid.org/0000-0003-4797-557X) ::: -::: {.g-col-12 .g-col-md-6} -![](images/nicola.jpg){fig-align="center" fig-alt="Image of Nicola Rennie wearing a red jacket" width="70%"} +::: {.content-visible unless-format="pdf"} +::: {.g-col-12 .g-col-md-4} +![](images/nicola.jpg){fig-align="center" fig-alt="Image of Nicola Rennie wearing a red jacket"} +::: ::: ::: ## Brian Tarran ::: grid -::: {.g-col-12 .g-col-md-6} +::: {.g-col-12 .g-col-md-8} Brian Tarran is a writer and editor with 20 years of experience covering the research and data space. He has worked for the Royal Statistical Society (RSS) for the past 8 years, and was editor of [*Significance* Magazine](https://www.significancemagazine.com/) (a joint publication of the RSS, the American Statistical Association and the Statistical Society of Australia) prior to the launch of [Real World Data Science](https://realworlddatascience.net/). Brian is a former editor of Research-Live.com and was launch editor of Impact magazine, both published by the Market Research Society. - Twitter: [\@brtarran](https://twitter.com/brtarran) @@ -54,7 +58,9 @@ Brian Tarran is a writer and editor with 20 years of experience covering the res - GitHub: [github.com/brtarran](https://github.com/brtarran) ::: -::: {.g-col-12 .g-col-md-6} -![](images/brian.png){fig-align="center" fig-alt="Image of Brian Tarran" width="70%"} +::: {.content-visible unless-format="pdf"} +::: {.g-col-12 .g-col-md-4} +![](images/brian.png){fig-align="center" fig-alt="Image of Brian Tarran"} +::: ::: ::: diff --git a/docs/index.qmd b/docs/index.qmd index 015faee..5570492 100644 --- a/docs/index.qmd +++ b/docs/index.qmd @@ -57,7 +57,7 @@ toc: true ::: ::: {.g-col-12 .g-col-md-4 style="background-color:#f8f9fa; padding-left:1rem; padding-right:1rem;"} -#### [References and resources](/docs/references.qmd){style="text-decoration:none;"} +#### [References and resources](/docs/references.qmd){style="text-decoration:none;"} - Primary sources - Further reading diff --git a/docs/principles.qmd b/docs/principles.qmd index 2839a37..bf7bbc9 100644 --- a/docs/principles.qmd +++ b/docs/principles.qmd @@ -34,26 +34,27 @@ set.seed(93384) time <- c(0, 0.5, 1, 2, 4, 8, 12, 16, 24) n <- 32 # no of subjects -data <- expand.grid(ID=1:n, time=time) +data <- expand.grid(ID = 1:n, time = time) bw <- data.frame( - ID=sort(unique(data$ID)), - bw=rlnorm(n, log(75), sdlog=0.25) + ID = sort(unique(data$ID)), + bw = rlnorm(n, log(75), sdlog = 0.25) ) bw$bw.category <- cut(bw$bw, - breaks=quantile(bw$bw, c(0, 0.33, 0.66, 1)), - labels=paste(c("low", "medium", "high"), "body weight"), - include.lowest=TRUE) + breaks = quantile(bw$bw, c(0, 0.33, 0.66, 1)), + labels = paste(c("low", "medium", "high"), "body weight"), + include.lowest = TRUE +) data <- merge(data, bw) data <- data[order(data$ID, data$time), ] # Simulate drug concentrations as a function of body weight. -data$conc <- 100/(data$bw^1.0)*exp(-0.085*data$time)* - rlnorm(nrow(data), sdlog=0.25)+ # res. error - (data$ID-mean(data$ID))/mean(data$ID)/4 # r. eff +data$conc <- 100 / (data$bw^1.0) * exp(-0.085 * data$time) * + rlnorm(nrow(data), sdlog = 0.25) + # res. error + (data$ID - mean(data$ID)) / mean(data$ID) / 4 # r. eff # --- # Visualisation. @@ -63,25 +64,24 @@ gg <- list() data$ID <- factor(data$ID) -gg[["3x1"]] <- ggplot(data, aes(x=time, y=conc, group=ID, color=ID)) + geom_line() -gg[["3x1"]] <- gg[["3x1"]] + scale_x_continuous(breaks=seq(0, 24, by=4)) -gg[["3x1"]] <- gg[["3x1"]] + theme_bw() + xlab("time [h]") + ylab("drug concentration [ng/mL]") +gg[["3x1"]] <- ggplot(data, aes(x = time, y = conc, group = ID, color = ID)) + + geom_line() +gg[["3x1"]] <- gg[["3x1"]] + scale_x_continuous(breaks = seq(0, 24, by = 4)) +gg[["3x1"]] <- gg[["3x1"]] + theme_bw() + + xlab("time [h]") + + ylab("drug concentration [ng/mL]") gg[["3x1"]] <- gg[["3x1"]] + facet_grid(bw.category ~ .) -gg[["3x1"]] <- gg[["3x1"]] + theme(legend.position="none") +gg[["3x1"]] <- gg[["3x1"]] + theme(legend.position = "none") gg[["1x3"]] <- gg[["3x1"]] + facet_grid(. ~ bw.category) # Add space to the rhs of the first figure for better separation in the cowplot. -gg[["3x1"]] <- gg[["3x1"]] + theme(plot.margin = unit(c(0.5, 4, 0.5, 0.5), "lines")) - +gg[["3x1"]] <- gg[["3x1"]] + + theme(plot.margin = unit(c(0.5, 4, 0.5, 0.5), "lines")) -# print(gg) - -# --- # Both figures into a single output figure. - library(cowplot) -plot_grid(gg[[1]], gg[[2]], rel_widths=c(1.5, 2)) +plot_grid(gg[[1]], gg[[2]], rel_widths = c(1.5, 2)) ``` [![Line plots showing how choice of layout determines ease of comparison across panels.](images/principles-layout.png){fig-alt="Line plots showing how choice of layout determines ease of comparison across panels. Left panel shows 3 plots stacked vertically. Right panel shows same three plots arranged horizontally"}](images/principles-layout.png) @@ -102,13 +102,13 @@ Our visual perception of data in a display must not depend on the choice of x- a # Old Faithful Geyser (Yellowstone) data set with eruption duration # and waiting time to the next eruption (both in minutes). data <- data.frame( - x=faithful$eruptions, - y=faithful$waiting + x = faithful$eruptions, + y = faithful$waiting ) # --- # Regression model fit. -fit <- lm(y ~ x, data=data) +fit <- lm(y ~ x, data = data) # Addition of predicted values to the data set. data$pred <- predict(fit) @@ -121,21 +121,22 @@ r <- range(unlist(data[c("y", "pred")])) library(ggplot2) -gg <- ggplot(data, aes(x=pred, y=y)) +gg <- ggplot(data, aes(x = pred, y = y)) -# Adding the line of identity, y=x (note: plotting it first will add points on top). -gg <- gg + geom_abline(intercept=0, slope=1, color="black", linewidth=1) +# Adding the line of identity, y = x +# (note: plotting it first will add points on top). +gg <- gg + geom_abline(intercept = 0, slope = 1, color = "black", linewidth = 1) # Adding points, removing grey background. -gg <- gg + geom_point() + theme_bw() +gg <- gg + geom_point() + theme_bw() # Adding linear OLS regression fit of y~x. -gg <- gg + geom_smooth(method="loess", color="firebrick", se=FALSE) +gg <- gg + geom_smooth(method = "loess", color = "firebrick", se = FALSE) # Adding axis labels. gg <- gg + xlab("predicted") + ylab("observed") -# Transpose (flip coordinates) +# make a second copy gg2 <- gg # + coord_flip() @@ -149,7 +150,7 @@ gg3 <- gg3 + xlim(r) + ylim(r) # Cow (column-wise) plot, combine all figures into one. library(cowplot) -plot_grid(gg, gg2, gg3, rel_widths=c(4, 2, 2), nrow=1) +plot_grid(gg, gg2, gg3, rel_widths = c(4, 2, 2), nrow = 1) ``` [![Scatter plots showing how visual perception of model fit can be influenced by aspect ratio.](images/principles-aspect-ratio-2.png){fig-alt="Scatter plots showing how visual perception of model fit can be influenced by aspect ratio. Left panel has a stretched x-axis, middle panel has a stretched y-axis, and right panel has a 1:1 aspect ratio with identical axis limits."}](images/principles-aspect-ratio2.png) @@ -182,34 +183,40 @@ If the data displayed as x and as y are comparable, axis limits should be identi #| layout-ncol: 2 library(ggplot2) -plot_data <- data.frame(type = factor( - c("Our product", "Competitor"), - levels = c("Our product", "Competitor")), - value = c(220, 210)) +plot_data <- data.frame( + type = factor( + c("Our product", "Competitor"), + levels = c("Our product", "Competitor") + ), + value = c(220, 210) +) # Original plot ggplot(plot_data) + - geom_col( - mapping=aes(x=type, y=value), - fill="lightblue", - colour="black" + geom_col( + mapping = aes(x = type, y = value), + fill = "lightblue", + colour = "black" ) + - scale_y_continuous(breaks=seq(0, 220, by=20), expand=c(0, 0)) + - labs(x="", y="") + theme_minimal() + scale_y_continuous(breaks = seq(0, 220, by = 20), expand = c(0, 0)) + + labs(x = "", y = "") + + theme_minimal() # Offset the y axis offset <- 208 ggplot(plot_data) + - geom_col( - mapping = aes(x=type, y=value-offset), - fill="lightblue", - colour="black" + geom_col( + mapping = aes(x = type, y = value - offset), + fill = "lightblue", + colour = "black" + ) + + scale_y_continuous( + breaks = seq(0, 14, by = 2), + labels = seq(0 + offset, 14 + offset, by = 2), + expand = c(0, 0) ) + - scale_y_continuous(breaks=seq(0, 14, by=2), - labels=seq(0+offset, 14 + offset, by=2), - expand=c(0, 0)) + - labs(x="", y="") + - theme_minimal() + labs(x = "", y = "") + + theme_minimal() ``` [![Bar charts showing how visual impression can be driven by the choice of axis limits.](images/principles-axes.png){fig-alt="Bar charts showing how visual impression can be driven by the choice of axis limits. Left panel chart has y-axis starting at zero, right panel has y-axis starting at 208, overselling the small 10-unit difference between two bars."}](images/principles-axes.png) @@ -224,59 +231,67 @@ The figure below shows data with, on average, no change compared to a reference #| warning: false #| message: false #| layout-ncol: 2 -# Naïve plot of y vs x. If there is no change (on average), +# Naïve plot of y vs x. If there is no change (on average), # half the data are below the line of no change. # Asymmetric view, and it depends on y/x or x/y. set.seed(33838) x <- data.frame( - x=rlnorm(200, 2, 0.2), - y=rlnorm(200, 0.2, 0.75) + x = rlnorm(200, 2, 0.2), + y = rlnorm(200, 0.2, 0.75) ) # Add an outlier manually. -x <- rbind(x, data.frame(x=quantile(x$x, 0.8), y=max(x$y)*1.5)) +x <- rbind(x, data.frame(x = quantile(x$x, 0.8), y = max(x$y) * 1.5)) # --- # Plotting. library(ggplot2) # Scatterplot of y vs x. -gg <- ggplot(x, aes(x=x, y=y)) + geom_point() + theme_bw() -gg <- gg + geom_hline(yintercept=1, color="firebrick", linewidth=2) +gg <- ggplot(x, aes(x = x, y = y)) + + geom_point() + + theme_bw() +gg <- gg + geom_hline(yintercept = 1, color = "firebrick", linewidth = 2) gg <- gg + xlab("x-variable") + ylab("Fold-change") gg # Logarithmic axes, symmetric range (!): -xbr <- c(1/10, 1/5, 1/2, 1, 2, 5, 10) -gg <- gg + scale_y_continuous(breaks=xbr, trans="log10", - limits=max(abs(x$y))^c(-1, 1)) -print(gg) +xbr <- c(1 / 10, 1 / 5, 1 / 2, 1, 2, 5, 10) +gg <- gg + scale_y_continuous( + breaks = xbr, trans = "log10", + limits = max(abs(x$y))^c(-1, 1) +) +gg # Second axis: gg <- gg + scale_y_continuous( - breaks=xbr, - labels=paste(100*xbr, "%", sep=""), - trans="log10", - limits=max(abs(x$y))^c(-1, 1), - sec.axis=sec_axis(trans=~.*1, breaks=xbr, - labels=ifelse(xbr < 1, paste("1/", 1/xbr, sep=""), xbr)) + breaks = xbr, + labels = paste(100 * xbr, "%", sep = ""), + trans = "log10", + limits = max(abs(x$y))^c(-1, 1), + sec.axis = sec_axis( + trans = ~ . * 1, breaks = xbr, + labels = ifelse(xbr < 1, paste("1/", 1 / xbr, sep = ""), xbr) + ) ) # --- # Univariate distribution (histogram). -gg <- ggplot(x, aes(x=y)) + theme_bw() + xlab("Fold-change") -gg <- gg + geom_histogram(color="firebrick", fill="gray") -print(gg) +gg <- ggplot(x, aes(x = y)) + + theme_bw() + + xlab("Fold-change") +gg <- gg + geom_histogram(color = "firebrick", fill = "gray") +gg # Symmetric range, log scale. gg <- gg + scale_x_continuous( - breaks=xbr, - labels=ifelse(xbr < 1, paste("1/", 1/xbr), xbr), - trans="log10", - limits=max(abs(x$x))^c(-1, 1) + breaks = xbr, + labels = ifelse(xbr < 1, paste("1/", 1 / xbr), xbr), + trans = "log10", + limits = max(abs(x$x))^c(-1, 1) ) -print(gg) +gg ``` [![Switching from linear axes (left) to logarithmic axes (right) helps clarify changes in values relative to some baseline measure.](images/principles-log.png){fig-alt="Switching from linear axes to logarithmic axes helps clarify changes in values relative to some baseline measure. In the top row are two scatter plots, with a basline value marked with a red horizontal keyline. The top left panel is on a linear axis, while the top right panel is on a logarithmic axis and more clearly shows the even spread of data points around the baseline. In the bottom row, the same data is shown in histogram form, again with linear axis on the left and logarithmic on the right."}](images/principles-log.png) @@ -305,22 +320,22 @@ gg.list <- list() # Prepare the data set (reformat EuStockMarkets that comes with R). x <- EuStockMarkets df <- data.frame( - time=as.vector(time(x)), - index=rep(colnames(x), rep(nrow(x), ncol(x))), - value=as.vector(x), - stringsAsFactors = TRUE + time = as.vector(time(x)), + index = rep(colnames(x), rep(nrow(x), ncol(x))), + value = as.vector(x), + stringsAsFactors = TRUE ) df$index2 <- df$index # For use with labels later. library(ggplot2) # Standard layout and legend. -gg <- ggplot(df, aes(x=time, y=value, group=index, color=index, label=index2)) +gg <- ggplot(df, aes(x = time, y = value, group = index, color = index, label = index2)) gg <- gg + geom_line() + theme_bw() # Nicer axis tick mark settings. -ax <- pretty(df$time, n=10) -gg <- gg + scale_x_continuous(limits=range(ax), breaks=ax) +ax <- pretty(df$time, n = 10) +gg <- gg + scale_x_continuous(limits = range(ax), breaks = ax) gg <- gg + xlab("year") + ylab("Stock index") gg.list[[1]] <- gg @@ -331,21 +346,21 @@ gg.list[[1]] <- gg y <- df[rev(order(df$time)), ] # descending in time. y <- y[!duplicated(y$index), ] # first entry per index y$index2 <- y$index # Create a copy that contains formatted strings. -levels(y$index2)[levels(y$index2)=="FTSE"] <- "\n\nFTSE" - # Add a newline to separate FTSE from DAX. - # Note that the factor level is modified, not the data. +levels(y$index2)[levels(y$index2) == "FTSE"] <- "\n\nFTSE" +# Add a newline to separate FTSE from DAX. +# Note that the factor level is modified, not the data. # Drop the legend, move labels into figure. -gg <- gg + geom_text(data=y, hjust="left", nudge_x=0.1) +gg <- gg + geom_text(data = y, hjust = "left", nudge_x = 0.1) # aes as before, nudge adds space on the lhs. -gg <- gg + theme(legend.position="none") +gg <- gg + theme(legend.position = "none") gg.list[[2]] <- gg # --- # Both figures into a single output figure. library(cowplot) -plot_grid(gg.list[[1]], gg.list[[2]], rel_widths=c(2.25, 2)) +plot_grid(gg.list[[1]], gg.list[[2]], rel_widths = c(2.25, 2)) ``` [![Line plots showing how direct labelling of figure elements can improve ease of reading.](images/principles-legends.png){fig-alt="Line plots showing how direct labelling of figure elements can improve ease of reading. Left panel uses a legend to identify different coloured lines. Right panel includes labelling within figure, at the end of each line."}](images/principles-legends.png) @@ -365,7 +380,7 @@ library(ggplot2) x <- mpg # miles per gallon data set. x$car <- paste(x$manufacturer, x$model) -gg <- ggplot(x, aes(x=car, y=hwy, group=car)) +gg <- ggplot(x, aes(x = car, y = hwy, group = car)) gg <- gg + geom_boxplot() + theme_bw() + xlab("Miles per gallon (highway)") gg.list[["vertical orientation"]] <- gg @@ -375,7 +390,7 @@ gg.list[["horizontal orientation"]] <- gg + coord_flip() # Both figures into a single output figure. library(cowplot) -plot_grid(gg.list[[1]], gg.list[[2]], rel_widths=c(2, 2.5)) +plot_grid(gg.list[[1]], gg.list[[2]], rel_widths = c(2, 2.5)) ``` [![Changes to orientation can improve both readability of labels and the visual interpretation of data.](images/principles-orientation.png){fig-alt="Changes to orientation can improve both readability of labels and the visual interpretation of data. Two boxplots are shown. On the left, the boxplots are vertical: axis labels overlap and visual comparison of values is more difficult than on the right, where boxplots are horizontal and axis labels are now separated and readable."}](images/principles-orientation.png) @@ -394,29 +409,28 @@ A local scatterplot smoother (loess, lowess, polynomial) can be particularly hel #| fig-height: 3 # Function for data set generation. make.data <- function( - x=c(0, 0.5, 1, 2, 4, 8, 12, 16, 24), - y=exp(-0.2*x) - exp(-0.21*x), - sd=0.25, # std dev of y - seed=4384590, - n=50 -) -{ + x = c(0, 0.5, 1, 2, 4, 8, 12, 16, 24), + y = exp(-0.2 * x) - exp(-0.21 * x), + sd = 0.25, # std dev of y + seed = 4384590, + n = 50) { # Setting the random number seed for reproducibility. set.seed(seed) # Creation of x- and y-variables. x2 <- rep(x, n) y2 <- NULL - for(i in 1:n) - y2 <- c(y2, y*(2*(n/4-i)) + rlnorm(length(y), sd=sd)) + for (i in 1:n) { + y2 <- c(y2, y * (2 * (n / 4 - i)) + rlnorm(length(y), sd = sd)) + } # Creation of an identifier for each profile. ID <- factor(rep(1:n, rep(length(y), n))) # Composition of the data set. - df <- data.frame(PD=100*y2, time=x, ID=ID) + df <- data.frame(PD = 100 * y2, time = x, ID = ID) # Addition of a baseline variable. - BL <- df[df$time==0, c("ID", "PD")] + BL <- df[df$time == 0, c("ID", "PD")] names(BL) <- c("ID", "BL") df <- merge(df, BL) @@ -436,21 +450,21 @@ x <- make.data() # Figures. library(ggplot2) -gg <- ggplot(x, aes(x=time, y=Change, group=ID, color=ID)) +gg <- ggplot(x, aes(x = time, y = Change, group = ID, color = ID)) gg <- gg + theme_bw() gg <- gg + xlab("Time [h]") + ylab("Change from baseline") -gg <- gg + geom_line(linewidth=1.1) + theme(legend.position="none") -gg <- gg + facet_grid(.~trt) +gg <- gg + geom_line(linewidth = 1.1) + theme(legend.position = "none") +gg <- gg + facet_grid(. ~ trt) # Addition of an auxiliary line at y=0. -gg2 <- gg + geom_hline(yintercept=0, linewidth=1.2) +gg2 <- gg + geom_hline(yintercept = 0, linewidth = 1.2) # Symmetric y-axis limits. -gg3 <- gg2 + ylim(c(-1, 1)*max(abs(x$Change))) +gg3 <- gg2 + ylim(c(-1, 1) * max(abs(x$Change))) # Arranging all plots into one figure. library(cowplot) -plot_grid(gg, gg2, gg3, nrow=1) +plot_grid(gg, gg2, gg3, nrow = 1) ``` [![Adding elements, such as lines to indicate the point of no change, can improve readability.](images/principles-auxiliary.png){fig-alt="Adding elements, such as lines to indicate the point of no change, can improve readability. In this three-part figure, the panel on the left is a line plot showing changes over time. In the middle plot, a keyline is added to mark the baseline -- zero -- on the y-axis, and in the right panel, the axis range is set to be symmetrical around this keyline."}](images/principles-auxiliary.png) diff --git a/docs/references.qmd b/docs/references.qmd index bbe4b2e..682568c 100644 --- a/docs/references.qmd +++ b/docs/references.qmd @@ -1,70 +1,13 @@ --- title: References and resources +nocite: | + @* --- ## Texts referenced in the Guide -Battle-Baptiste W, Rusert B.: W.E.B. Du Bois's data portraits: Visualizing black America: The Color Line at the Turn of the Twentieth Century. The W.E.B. Du Bois Center at the University of Massachusetts; 2018. - -Becker RA, Chambers JM. S: An Interactive Environment for Data Analysis and Graphics. Pacific Grove, CA: Wadsworth & Brooks/Cole; 1984.  - -Becker RA, Cleveland WS: S-PLUS Trellis Graphics User's Manual. MathSoft, Seattle WA, 1996. Available from (accessed June 19, 2023). - -Cleveland, W.S. (1993) Visualizing data. Summit NJ: Hobart Press. - -Cleveland, W.S. (1994) The elements of graphing data. Summit NJ: Hobart Press. - -Corbett J. (2001). Charles Joseph Minard, Mapping Napoleon's March, 1861. CSISS Class 2001.   - -Du Bois W.E.B. The Exhibit of American Negroes. Paris; 1900. - -Few S. Show me the numbers. Burlingame CA: Analytics Press; 2004. - -Friendly M. A very brief history of visualization: visions, stories and pictures. Chicago IL; 2018. Available from: - -Friendly M. Remembrances of Things EDA \[Internet\]. 2022. Available from:   - -Friendly M, Denis D. The early origins and development of the scatterplot. J Hist Behav Sci. 2005;41(2):103--30. Available from:   - -Garland, K. (1994) Mr Beck's underground map. Capital Transport.  - -Kent, A.J. (2021) 'When Topology Trumped Topography: Celebrating 90 Years of Beck's Underground Map', The Cartographic Journal, 58(1), pp. 1--12. Available at:   - -Krause A (2013): Concepts and Principles of Clinical Data Graphics. In Krause and O'Connell (2013). - -Krause A, O'Connell M. (editors), 2013. A picture is worth a thousand tables: Graphics in life sciences. Springer. - -Nightingale F. A contribution to the sanitary history of the British army during the late war with Russia. London, UK: Harrison and Sons; 1859. (accessed June 19, 2023). - -Norman, D.A. (1990) The design of everyday things. New York NY: Currency Doubleday. - -R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [https://www.R-project.org/](https://www.r-project.org/) (accessed June 19, 2023). - -Robbins, N.B. (2006) Creating more effective graphs. Hoboken NJ: Wiley. - -Robinson A.H. (1967). The thematic maps of Charles Joseph Minard. Imago Mundi; 21: 95--108.  - -Sarkar D. Lattice: multivariate data visualization with R. New York NY: Springer; 2008.  - -SAS (Statistical Analysis System). Documentation. (accessed June 19, 2023). - -Snow J. Mode of communication of cholera. 2nd ed. Piccadilly (London), UK: John Churchill; 1854. (accessed June 19, 2023). - -Tufte, E.R. (1990) Envisioning information. Graphics Press. - -Tufte, E.R. (2004) Visual explanations: images and quantities, evidence and narrative. Graphics Press, Cheshire CT. - -Tufte, E.R. (2006) Beautiful Evidence. Graphics Press, Cheshire CT. - -Tufte, E.R. (2001). The visual display of quantitative information. 2nd edn. Graphics Press, Cheshire CT. - -Tukey JW. Exploratory data analysis. Reading MA: Addison-Wesley; 1977. - -Wickham H. ggplot2. Wiley Interdiscip Rev Comput Stat 2011; 3: 180--5.   - -Wickham H. ggplot2. Create elegant data visualisations using the grammar of graphics. Springer International Publishing; 2016 (Use R!; vol. 2). Available from:   - -Wilkinson, L. (2005) The Grammar of Graphics. New York: Springer-Verlag (Statistics and Computing). Available at: . +::: {#refs} +::: ## Further reading @@ -92,11 +35,6 @@ Why we recommend: *We often hear the term "storytelling" when we're talking abou Why we recommend: *Patients are often presented with data about health risk information in the form in charts and tables. This article examines how well patients understand the information presented to them across a range of different chart types. It finds that the types of charts that patients prefer, is not always the type of chart that they understand best.* -### References - -::: {#refs} -::: - ## Additional resources ### Training courses @@ -120,8 +58,11 @@ Power BI is rapidly becoming a standard tool for producing interactive reports a This tutor-led virtual course will introduce how the tidyverse and ggplot2 can be used to reproducibly create publication quality charts from R. ### Low-code/no-code visualisation tools + #### Datawrapper -[https://www.datawrapper.de/](https://www.datawrapper.de/) + + #### Flourish -[https://flourish.studio/](https://flourish.studio/) + + diff --git a/docs/styling.qmd b/docs/styling.qmd index 1c85f6d..6158d0b 100644 --- a/docs/styling.qmd +++ b/docs/styling.qmd @@ -58,8 +58,8 @@ ggplot( title = "Tooth Growth", subtitle = "Each of 60 guinea pigs received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods: - **orange juice** or - **ascorbic acid**.", + **orange juice** or + **ascorbic acid**.", y = "Dosage (mg/day)" ) + theme_minimal(base_size = 14) + @@ -158,7 +158,9 @@ library(readr) library(dplyr) library(tidyr) library(ggplot2) -wheels <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-09/wheels.csv") +wheels <- read_csv( + "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-09/wheels.csv" + ) plot_data <- wheels %>% select(country, height, diameter) %>% drop_na() %>% @@ -334,8 +336,8 @@ g <- ggplot( title = "Tooth Growth", subtitle = "Each of 60 guinea pigs received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods: - **orange juice** or - **ascorbic acid**.", + **orange juice** or + **ascorbic acid**.", y = "Dosage (mg/day)" ) + theme_minimal(base_size = 14) + @@ -513,4 +515,4 @@ As you can see, there is no single font recommendation that will ensure your vis Alt text (short for alternative text) is text that describes the visual aspects and purpose of an image -- including charts. Though alt text has various uses, its primary purpose is to aid visually impaired users in interpreting images when the alt text is read aloud by screen readers. -In @Green2023, Mine Dogucu discusses the importance of adding alt text to your data visualisations, to ensure those who are blind or visually impaired don't miss out on the content in your charts. @Cesal2020 provides a simple structure to aid you in writing alt text for data visualisations. +In @Green2023, Mine Dogucu discusses the importance of adding alt text to your data visualisations, to ensure those who are blind or visually impaired don't miss out on the content in your charts. @Cesal2020 provides a simple structure to aid you in writing alt text for data visualisations. \ No newline at end of file diff --git a/docs/tools.qmd b/docs/tools.qmd index eeb5bea..371430e 100644 --- a/docs/tools.qmd +++ b/docs/tools.qmd @@ -280,7 +280,8 @@ ax.set_axisbelow(True) ax.xaxis.grid(color = 'lightgrey') ax.yaxis.grid(color = 'lightgrey') # add legend below plot -ax.legend(ncol = 4, loc = 'lower center', bbox_to_anchor = (0.5, -0.15), frameon = False) +ax.legend(ncol = 4, loc = 'lower center', + bbox_to_anchor = (0.5, -0.15), frameon = False) plt.show() ``` diff --git a/docs/why-visualise.qmd b/docs/why-visualise.qmd index d26f792..e41ea4e 100644 --- a/docs/why-visualise.qmd +++ b/docs/why-visualise.qmd @@ -59,39 +59,23 @@ Data visualisation has been, for a long time, both a topic of scientific researc In 1859, [Florence Nightingale](https://en.wikipedia.org/wiki/Florence_Nightingale), the founder of modern nursing, published her findings on the sanitary status of the British army during the war with Russia. She showed raw data as well as summary statistics in tables and charts (Nightingale, 1859). One chart in particular continues to be celebrated today: a polar area chart on the "causes of mortality in the army in the East". -[![](images/Nightingale-mortality.jpg){width="500px"}](images/Nightingale-mortality.jpg) - -::: figure-caption -Florence Nightingale's polar area chart, "Diagram of the causes of mortality in the army in the East". Source: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Nightingale-mortality.jpg). -::: +[![Florence Nightingale's polar area chart, "Diagram of the causes of mortality in the army in the East". Source: Wikimedia Commons.](images/Nightingale-mortality.jpg){width="500px"}](images/Nightingale-mortality.jpg) What made Nightingale's graphs "particularly iconic was their powerful use of visual rhetoric to make an argument about data" [@Hedley]. This quality is also evident in other visualisations produced by Nightingale's contemporaries. A simplistic but rather impactful visualisation of the water pumps in London associated with transmission of cholera paved the way for root cause identification [@Snow1854]. -[![](images/Snow-cholera-map.jpg){width="500px"}](images/Snow-cholera-map.jpg) - -::: figure-caption -Map by John Snow showing clusters of cholera cases in the London epidemic of 1854. Source: [Wikimedia Commons](https://en.wikipedia.org/wiki/File:Snow-cholera-map-1.jpg). -::: +[![Map by John Snow showing clusters of cholera cases in the London epidemic of 1854. Source: Wikimedia Commons.](images/Snow-cholera-map.jpg){width="500px"}](images/Snow-cholera-map.jpg) An early complex visualisation was created by Minard in 1861, depicting [data from Napoleon's march on Moscow](https://academic.oup.com/jrssig/article/5/3/133/7029764) in 1812/13 and his subsequent retreat. -[![](images/minard.png){width="500px"}](images/minard.png) - -::: figure-caption -Charles Minard's 1869 map of "the successive losses in men of the French Army in the Russian campaign 1812--1813". Source: [Wikimedia Commons](https://en.wikipedia.org/wiki/File:Minard.png). -::: +[![Charles Minard's 1869 map of "the successive losses in men of the French Army in the Russian campaign 1812--1813". Source: Wikimedia Commons.](images/minard.png){width="500px"}](images/minard.png) The map shows latitude and longitude of the army as it moved. The line shows the direction of movement, and the line width represents the size of the army (the surviving soldiers). Particular locations were marked by the date of the army presence, and the temperature is shown, too. Six variables were elegantly woven into a single display [@Tufte2001; @Corbett2001; @Robinson1967]. The Paris Exposition in 1900 featured W. E. B. Du Bois exhibiting graphs, charts, and maps of how Black Americans were living [@DuBois1900; @BattleBaptiste2018]. -[![](images/du-bois.jpg){width="500px"}](images/du-bois.jpg) - -::: figure-caption -A series of statistical charts illustrating the condition of the descendants of former African slaves now in residence in the United States of America. Drawing, ca. 1900. //hdl.loc.gov/loc.pnp/ppmsca.33899. Source: [Library of Congress](https://blogs.loc.gov/picturethis/2014/02/du-boiss-american-negro-exhibit-for-the-1900-paris-exposition/). -::: +[![A series of statistical charts illustrating the condition of the descendants of former African slaves now in residence in the United States of America. Drawing, ca. 1900. //hdl.loc.gov/loc.pnp/ppmsca.33899. Source: Library of Congress.](images/du-bois.jpg){width="500px"}](images/du-bois.jpg) Groundbreaking work in modern visualisation was provided by Tukey with his book *Exploratory Data Analysis* [@Tukey1977], and Edward Tufte [@Tufte1990; @Tufte2004; @Tufte2006]. diff --git a/styles.tex b/styles.tex new file mode 100644 index 0000000..8cfd50c --- /dev/null +++ b/styles.tex @@ -0,0 +1,21 @@ +% load packages +\usepackage{geometry} +\usepackage{xcolor} +\usepackage{eso-pic} + +%% Set page size with a wider right margin +\geometry{a4paper, left=25mm, top=25mm, bottom=25mm, right=25mm} + +%% Let's define some colours +\definecolor{RSSyellow}{HTML}{d3a435} +\definecolor{RSSblue}{HTML}{005573} + +%% Let's add a logo to the bottom right +\AddToShipoutPicture{% + % logo + \AtPageLowerLeft{% start the bar at the bottom right of the page + \put(\LenToUnit{\dimexpr\paperwidth-195mm},1cm){% move it to the top right + \color{white}\includegraphics[width=2cm]{images/RSS_logo_colour.png} + }% + }% +} \ No newline at end of file