Fast svg
plots in R. Currently just a demonstration of speed relative to svglite
based on four functions:
getlines(n)
which generates a series ofn
random edges tracing a single path with varying colours and line widths;getpoints(n)
which generatesn
random points with varying colours;svgplot_lines()
to write line data to ahtml
-formattedsvg
file.svgplot_points()
to write point data to ahtml
-formattedsvg
file.
Comparison is against a ggplot2
object with no embellishments, set up with the following code
require (ggplot2)
ggmin_theme <- function ()
{
theme <- theme_minimal ()
theme$panel.background <- element_rect (fill = "transparent",
size = 0)
theme$line <- element_blank ()
theme$axis.text <- element_blank ()
theme$axis.title <- element_blank ()
theme$plot.margin <- margin (rep (unit (0, 'null'), 4))
theme$legend.position <- 'none'
theme$axis.ticks.length <- unit (0, 'null')
return (theme)
}
ggline <- function (dat)
{
ggplot () + ggmin_theme () +
geom_segment (aes (x = xfr, y = yfr, xend = xto, yend = yto, size = lwd),
col = dat$col, size = dat$lwd / 8, data = dat)
}
ggpoint <- function (dat)
{
ggplot () + ggmin_theme () +
geom_point (aes (x = x, y = y), col = dat$col, data = dat)
}
One set of random lines can then be generated and plotted via ggplot2
like this:
dat <- getlines (n = 1e5, xylim = 1000)
ggline (dat)
The equivalent output of svgplotr
can be directly viewed as a .html
, or a .svg
file can be converted to any other format using the rsvg
package:
svgplot_lines (dat, file = "junk", html = FALSE) # makes junk.svg
require (rsvg)
png::writePNG (rsvg ("junk.svg"), "junk.png")
svgplotr
is considerably faster than svglite
, but speed differences depend on numbers of edges plotted. The following code quantifies the time taken to plot both lines and points by svglite
in comparison to svgplotr
as a function of n
.
require (svglite)
require (rbenchmark)
plotgg <- function (fig)
{
svglite ("lines.svg")
print (fig)
graphics.off ()
}
testlines <- function (n = 1e3, nreps = 5)
{
dat <- getlines (n = n)
fig <- ggline (dat)
benchmark (
plotgg (fig),
svgplot_lines (dat, filename = "lines"),
order = "test",
replications = nreps)$relative [1]
}
testpoints <- function (n = 1e3, nreps = 5)
{
dat <- getpoints (n = n)
fig <- ggpoint (dat)
benchmark (
plotgg (fig),
svgplot_points (dat, filename = "lines"),
order = "test",
replications = nreps)$relative [1]
}
n <- 10 ^ (20:60 / 10)
ylines <- sapply (n, testlines)
ypoints <- sapply (n, testpoints)
dat <- data.frame (n = n, lines = ylines, points = ypoints)
Then plot the results
dat <- tidyr::gather (dat, key = "n")
names (dat) <- c ("n", "type", "y")
ggplot (dat, aes (x = n, y = y, group = type)) +
theme (panel.grid.minor = element_blank ()) +
scale_x_log10 (breaks = 10 ^ (2:6)) +
scale_y_log10 (limits = c (1, max (dat$y)), breaks = c (1:5, 10, 50, 100)) +
scale_colour_manual (values = c ("red", "blue")) +
geom_point (aes (colour = type)) +
geom_smooth (aes (colour = type), method = "loess", se = TRUE) +
ylab ("time (svgplotr) / time (svglite)") +
labs (title = "relative performance of svgplotr vs svglite")
And efficiency gains initially decrease exponentially, but then flatten out and appear to approach asymptotic limits. Even for the maximum size in this plot of 1 million objects, svgplotr
is almost 4 times faster than svglite
for lines, and over 8 times faster for points. The right portion of the graph may also be a second exponential regime, but even if so, parity for lines is only going to be reached at:
indx <- which (dat$n >= 1e5 & dat$type == "lines")
mod <- as.numeric (lm (log10 (dat$y [indx]) ~ log10 (dat$n [indx]))$coefficients)
format (10 ^ (mod [1] / abs (mod [2])), scientific = TRUE, digits = 2)
#> [1] "1.3e+11"
which is 130 billion edges, and obviously enormously more for points. Parity is not really going to happen, and svgplotr
will always remain faster than svglite
.