Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

aesthetic inheritance and annotation_custom #756

Closed
bryanhanson opened this Issue · 21 comments

5 participants

@bryanhanson

I posted this at SO and bring it here since it seems to be a real issue (and not just my confusion). SO post: http://stackoverflow.com/questions/14415073/inheritance-of-aesthetics-in-ggplot2-0-9-3-the-behavior-of-annotation-custom

I looked at the code for annotation_custom and it has a hard-wired inherit.aes = TRUE which I think is the problem. I don't see why this function needs any aesthetic at all. I did try several ways to override it and set inherit.aes = FALSE but I was unable to fully penetrate the namespace and make it stick.

Here is a MWE which illustrates the problem:

library("ggplot2")
library("grid")

# Case 1, works as expected

td1 <- data.frame(x = rnorm(10), y = rnorm(10))

tf1 <- function(df) { # works as expected
    p <- ggplot(aes(x = x, y = y), data = df)
    p <- p + geom_point(color = "red")
    p <- p + annotation_custom(circleGrob())
    print(p)
}

tf1(td1)

# Case 2, no error, but circle is not drawn

tf2 <- function(df) { # circle isn't draw, but no error either
    p <- ggplot()
    p <- p + geom_point(data = df, aes(x = x, y = y), color = "red")        
    p <- p + annotation_custom(circleGrob())
    print(p)
    }

tf2(td1)

# Case 3, works if annotation is commented out

td3 <- data.frame(r = c(rnorm(5, 5, 1.5), rnorm(5, 8, 2)),
    f1 = c(rep("L", 5), rep("H", 5)), f2 = rep(c("A", "B"), 5))

tf3 <- function(df) {
    p <- ggplot()
    p <- p + geom_point(data = df, 
        aes(x = f1, y = r, color = f2, group = f2))     
#   p <- p + annotation_custom(circleGrob()) # comment out and it works
    print(p)
    }

tf3(td3)
@hadley
Owner

Weird. I'm not sure why inherit.aes = TRUE would ever be a good idea.

@bryanhanson

Thanks for looking into this Hadley. The troubleshooting that led to this discovery really helped my understanding of how ggplot2 works under the hood, but this issue sure makes building custom functions with it tough!

A quick look shows inherit.aes is only rarely true:

NEWS:* stat: statistics now also respect layer parameter inherit.aes (thanks to bug
R/annotation-custom.r:    position = "identity", data = NULL, inherit.aes = TRUE)
R/annotation-map.r:    NULL, inherit.aes = FALSE)
R/annotation-raster.r:    stat = "identity", position = "identity", data = NULL, inherit.aes = TRUE)
R/annotation.r:    inherit.aes = FALSE,
R/geom-abline.r:    .super$new(., ..., mapping = mapping, inherit.aes = FALSE)
R/geom-hline.r:    .super$new(., data = data, mapping = mapping, inherit.aes = FALSE, 
R/geom-vline.r:    .super$new(., data = data, mapping = mapping, inherit.aes = FALSE, 
R/layer.r:  inherit.aes <- FALSE
R/layer.r:  new <- function (., geom=NULL, geom_params=NULL, stat=NULL, stat_params=NULL, data=NULL, mapping=NULL, position=NULL, params=NULL, ..., inherit.aes = TRUE, legend = NA, subset = NULL, show_guide = NA) {
R/layer.r:      inherit.aes = inherit.aes,
R/layer.r:    if (.$inherit.aes) {
R/layer.r:    if (.$inherit.aes) {
R/limits.r:  geom_blank(aes_all(names(data)), data, inherit.aes = FALSE)

I assume that inherit.aes does exactly what it sounds like, and thus should be FALSE in the case of annotation_custom but I don't completely trust myself with these things!

@fmitha

Having reviewed Bryan's excellently written question on SO, including the MWE above,
I think this is most probably the same bug I just ran into -
see Positioning two legends independently in a faceted ggplot2 plot.
The corresponding problem in that question is what I refer to as version 2, which is in the code snippet minimal2.R. The relevant code is:

## Add data type legend: version 2 (data type legend should be somewhere in the interior)
plotNew <- stat + annotation_custom(grob = dataleg, xmin = 7, xmax = 10, ymin = 0, ymax = 4)

Here dataleg object is extracted from a ggplot object, unlike in Bryan's example, by using:

## Extract data type legend
dataleg <- gtable_filter(ggplot_gtable(ggplot_build(stat1)), "guide-box")

The error message I get it a bit different from what Bryan gets, but is probably the same underlying issue showing up differently. If I use print(plotNew) I get

Error in if (empty(data)) { : missing value where TRUE/FALSE needed
Calls: simplot ... facet_map_layout -> facet_map_layout.grid -> locate_grid.

The basic issue seems to be that if one creates a ggplot object starting with just ggplot(), which one may need to do to avoid multiple aesthetics from mixing together, then annotation_custom has problems dealing with it. It appears the objects created by adding a ggplot object to the output of annotation_custom gives something that looks like a ggplot object, but is not. At least the sanity checks in print fail early. In my case, the function print.ggplot in plot-render.r errors out in the call to ggplot_build. It looks like Bryan's example fails a little further, in ggplot_gtable.

I won't attempt my own MWE unless someone asks for it, because Bryan's example shows the problem well.

@bryanhanson
@baptiste

since no-one knows why inherit.aes = TRUE would have been used (I certainly don't!), it sounds like a trivial pull request to make (after testing that it works with FALSE).

@fmitha
@baptiste

The procedure is outlined here: https://github.com/hadley/ggplot2/wiki/Improving-the-ggplot2-documentation

if you're familiar with installing ggplot2 with devtools::install_github then in theory you should be all set to test your own fork of ggplot2 with a few clicks.

@fmitha

I patched 0.9.3.1 to change to inherit.aes = FALSE. (I built a Debian package for it.) It didn't make any difference to Bryan's test or mine. I don't think this was the problem.

Does anyone have any other ideas? I am not familar at all with the design of ggplot2, and it looks very complicated, so I
won't attempt to debug it. Transcript with Bryan's code (case 3) follows.

> library(ggplot2)
> annotation_custom
function (grob, xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf) 
{
    GeomCustomAnn$new(geom_params = list(grob = grob, xmin = xmin, 
        xmax = xmax, ymin = ymin, ymax = ymax), stat = "identity", 
        position = "identity", data = NULL, inherit.aes = FALSE)
}
<environment: namespace:ggplot2>
> source("bryan.R")
Error in if (nrow(layer_data) == 0) return() : argument is of length zero
@fmitha

This is related to #587, I think

@fmitha
@bryanhanson
@fmitha

[Sending this by email resulted in a post with major formatting problems (not sure why), so I am reposting it, this time directly into the web page.]

Here is an analysis of this issue (#756). Ggplot2 developers, I'd
appreciate comments, corrections, clarifications and other feedback.

Bryan, if you don't want to read the whole thing, you can apply PATCH
1 and PATCH 2, though you really only need PATCH 2, and check if it
fixes things for you. It is intended as a workaround, not a proper
fix.

For the record, the original problem that brought me to this issue is
illustrated by Positioning two legends independently in a faceted
ggplot2 plot
. I checked
that PATCH 1 and PATCH 2 fixed the issue described as Version 2
(Version 1 was user error I think). The resulting (faceted) graph now
has two identical legends, one for each facet, which is a bit
annoying, but it is better than nothing at all.

Consider the following modified version of Bryan's code

 ## EXAMPLE 1
 library("ggplot2")
 library("grid")
 library("proto")
 len = 2
 d <- data.frame(r = c( 6.279072, 2.995998, 8.193851, 11.274669),
      f1 = c(rep("L", len), rep("H", len)), f2 = rep(c("A", "B"), len))

 p <- ggplot(data = d, aes(x = f1, y = r, color = f2, group = f2))
 p <- p + geom_point()
 pbuild = ggplot_build(p)
 pA <- p + annotation_custom(circleGrob())
 pAbuild = ggplot_build(pA)

 pnew <- ggplot()
 pnew <- pnew + geom_point(data = d, aes(x = f1, y = r, color = f2, group = f2))
 pnewbuild = ggplot_build(pnew)
 pnewA <- pnew + annotation_custom(circleGrob())
 pnewAbuild = ggplot_build(pnewA)

PART 1: WHY PRINT GIVES AN ERROR FOR pnewA.

Here pA and pnewA are the objects after the annotation has been added.
Printing pnewA gives the same error as Bryan's example, version 3, namely

 Error in if (nrow(layer_data) == 0) return() : argument is of length zero

It is not difficult to see what is going wrong. A fix is less obvious.

The object pAbuild and pnewAbuild resulting from the call to
ggplot_build look like this. You can call str on them to see the
structure. Here the data frames correspond to e.g. str(pa$data[[1]]) and
str(pa$data[[2]]).

                                      ------------data frame NON NULL
             |                       |
pAbuild --- |                       |
             ------data (list) ------|
             |                       |
             |                       |
                                      ------------data frame NON NULL


                                      ------------data frame NON NULL
             |                       |
pnewAbuild- |                       |
             ------data (list) ------|
             |                       |
             |                       |
                                      ------------data frame NULL

ggplot2 does not like it if the any of the data frames in the data
list are zero, and it gives this error, specifically, from
ggplot_gtable from inside the print function print.ggplot in
"plot-render.r". Detailed analysis follows

##################################################################
Detailed analysis for PART 1 begins
##################################################################

Say p is the object. Then we are passing it to print.ggplot (in
"plot-render.r").

First print.ggplot calls ggplot_build (in "plot-build.r"). This is
the line in print.ggplot:

 data <- ggplot_build(x)

Then it calls ggplot_gtable (also in plot-render.r) on the
resulting object, data. This is the line in print.ggplot:

 gtable <- ggplot_gtable(data)

ggplot_gtable begins with the lines (... means lines omitted)

 gplot_gtable <- function(data) {
   ...
   data <- data$data
   ...
   build_grob <- function(layer, layer_data) {
     if (nrow(layer_data) == 0) return()
   ...
   # List by layer, list by panel
   geom_grobs <- Map(build_grob, plot$layer, data)

So, data$data is passed as the layer_data argument to
geom_grobs. Recall that data here is ggplot_build applied to
p. So the object we are looking at is

 ggplot_build(p)$data

This is a list of data frames. This is where the error is thrown,
because Map calls build_grob on each component of the list
ggplot_build(p)$data, These components are data frames. For problem
object pnewA,

 ggplot_build(pnewA)$data

contains two data frames, one of which is empty (NULL).

The code if (nrow(layer_data) == 0) return() checks if any of those
data frames is empty, and if so, returns to the calling function. In
other words, if the data frame is non-null but has zero rows, the
function will return and the program will continue.

In this case, the data frame is NULL, so the computation is

 if (nrow(NULL) == 0) return()

Since nrow(NULL) is NULL, (NULL == 0) returns logical(0),
However, if can only deal with True or False, so it returns an
error. argument zero refers to the fact that logical(0) is of
length 0. It seems to me that this line should be modified to deal
with NULL objects as well, because there is no effective difference
between NULL and an empty data set. We can do this with the
following patch.

PATCH 1
--- a/R/plot-render.r
+++ b/R/plot-render.r
@@ -22,7 +22,7 @@ ggplot_gtable <- function(data) {
   theme <- plot_theme(plot)

   build_grob <- function(layer, layer_data) {
-    if (nrow(layer_data) == 0) return()
+    if ((is.null(layer_data)) || (nrow(layer_data) == 0)) return()

     dlply(layer_data, "PANEL", function(df) {
       panel_i <- match(df$PANEL[1], panel$layout$PANEL)

##################################################################
Detailed analysis for PART 1 ends
##################################################################

PART 2: ANNOTATION_CUSTOM LAYER HAS A NULL DATA ATTRIBUTE AND IS THAT A PROBLEM?

We now look at what ggplot_build does.

We can diagramatically represent the structure of pA and pnewA as follows.

       ---data (default) D     ----------obj corresponding to geom_point ------- data NULL
      |                       |
      |                       |
pA - |                       |
       ---layers------------- |
                              |
                              |
                              |
                               ------------obj corresponding to geom_point ----- data NULL


        --data (default) NULL -----------obj corresponding to geom_point ------- data D
        |                     |
        |                     |
pnewA- |                     |
        ---layers-------------|
                              |
                              |
                              |
                               ----------obj corresponding to geom_point ------- data NULL

Here the data objects in pnewA correspond to pnewA$data
pnewA$layers[[1]]$data etc.

When ggplot_build takes e.g. pA as an argument, it first selects the
layer data elements as a list, calling this layer_data. If any of
these elements are empty, it replaces them with the top level data
attribute, pnewA$data. In the case of pA, both layer data elements
are replaced with the default D. For pnewA, since the default is
empty, we are still left with one non-empty and one empty.

This happens at the beginning of ggplot_build, in the map_layout
function. I assume it is done like this because each layer is
supposed to be associated with data, and if it isn't, then one falls
back on the default (if it exists), corresponding to
ggplot(data). There isn't really another candidate for default data.

Now, when the remaining functions in gglot_build act on the result
of map_layout the data corresponding to the annotation_custom
layer in pnewA quickly becomes NULL and stays NULL. Whereas in
the case of the data in the annotation_custom for pA, the value for
data returned from gglot_build is

> str(pAbuild$data)
List of 2
  $ :'data.frame':  4 obs. of  5 variables:
   ..$ colour: chr [1:4] "#F8766D" "#00BFC4" "#F8766D" "#00BFC4"
   ..$ x     : int [1:4] 2 2 1 1
   ..$ y     : num [1:4] 6.28 3 8.19 11.27
   ..$ group : int [1:4] 1 2 1 2
   ..$ PANEL : int [1:4] 1 1 1 1
  $ :'data.frame':  4 obs. of  5 variables: <-- corresponds to annotation_custom
   ..$ colour: chr [1:4] "#F8766D" "#00BFC4" "#F8766D" "#00BFC4"
   ..$ x     : int [1:4] 2 2 1 1
   ..$ y     : num [1:4] 6.28 3 8.19 11.27
   ..$ group : int [1:4] 1 2 1 2
   ..$ PANEL : int [1:4] 1 1 1 1

where the second data frame corresponds to annotation_custom.
However, it does not appear that this data is actually used for the
custom_annotation. Let's assume that it is not. The problem we had
with pnewA was that there was no default data to replace the NULL
annotation_custom layer data with. How about if we replace it with
dummy data? If the data is not actually used it might work. We can do
this as follows:

PATCH 2
--- a/R/panel.r
+++ b/R/panel.r
@@ -47,6 +47,8 @@ train_layout <- function(panel, facet, d
  # @param data list of data frames (one for each layer)
  # @param plot_data default plot data frame
  map_layout <- function(panel, facet, data, plot_data) {
+  if (is.waive(plot_data) || empty(plot_data))
+    plot_data = data.frame(dummy=c(1))
    lapply(data, function(data) {
      if (is.waive(data)) data <- plot_data
      facet_map_layout(facet, data, panel$layout)

This patch fixes pnewA in EXAMPLE 1, which now shows the circle
custom_annotation correctly. This confirms that the actual data used
does not matter, as long as it is non-empty. Furthermore, note that
the difference between pnewAbuild with and without PATCH 2 is just

List of 3
  $ data :List of 2
   ..$ :'data.frame':   4 obs. of  5 variables:
   .. ..$ colour: chr [1:4] "#F8766D" "#00BFC4" "#F8766D" "#00BFC4"
   .. ..$ x     : int [1:4] 2 2 1 1
   .. ..$ y     : num [1:4] 6.28 3 8.19 11.27
   .. ..$ group : int [1:4] 1 2 1 2
   .. ..$ PANEL : int [1:4] 1 1 1 1
   ..$ : NULL

vs

List of 3
  $ data :List of 2
   ..$ :'data.frame':   4 obs. of  5 variables:
   .. ..$ colour: chr [1:4] "#F8766D" "#00BFC4" "#F8766D" "#00BFC4"
   .. ..$ x     : int [1:4] 2 2 1 1
   .. ..$ y     : num [1:4] 6.28 3 8.19 11.27
   .. ..$ group : int [1:4] 1 2 1 2
   .. ..$ PANEL : int [1:4] 1 1 1 1
   ..$ :'data.frame':   1 obs. of  2 variables:
   .. ..$ PANEL: int 1
   .. ..$ group: int 1

i.e. the rest of the structure is the same.

So, we return to the question, if it doesn't matter what data is used,
why is any data at all needed here?

Recall from our detailed discussion of PART 1, we observed at the end
that

 if (nrow(layer_data) == 0) return()

was the reason that print was rejecting ggplot_build(pnewA) =
pnewAbuild
. This can be easily fixed by allowing NULL values. As
discussed earlier, the current code can deal with data frames with 0
rows, but not NULL values. PATCH 1 fixes this. If we apply this
instead of PATCH 2, we see that EXAMPLE 1 does not error out any more,
but it does not render the annotation either.

So, we have the curious situation that ggplot does not actually use
the data associated with the annotation custom layer (at least in
EXAMPLE 1), but still expects it to be there for rendering to
happen. I think this requires looking into the rendering code to see
why this is so. It looks to me like a bug.

Actually, if we take the pAbuild object and replace each of the two
data frames in pAbuild$data with empty data frames, or just NULL
values, and the plot still renders, however, if we zero out
pAbuild$plot$data, then the plot fails to render. So, in fact, it
looks like the data attribute is not used at all for rendering,
which makes it doubly puzzling that pnewAbuild does not work without
PATCH 2.

It seems that the function grid.draw from the grid package is used
for the actual rendering, but I'm not sure how it works.

##################################################################
Detailed analysis for Part 2 begins
##################################################################

The output of layer_data for ggplot_build run on pA for the
first couple of functions is:

 [1] "map_layout(panel, plot$facet, layer_data, plot$data) finished"
 List of 2
  $ :'data.frame':       4 obs. of  4 variables:
   ..$ r    : num [1:4] 6.28 3 8.19 11.27
   ..$ f1   : Factor w/ 2 levels "H","L": 2 2 1 1
   ..$ f2   : Factor w/ 2 levels "A","B": 1 2 1 2
   ..$ PANEL: int [1:4] 1 1 1 1
  $ :'data.frame':       4 obs. of  4 variables:
   ..$ r    : num [1:4] 6.28 3 8.19 11.27
   ..$ f1   : Factor w/ 2 levels "H","L": 2 2 1 1
   ..$ f2   : Factor w/ 2 levels "A","B": 1 2 1 2
   ..$ PANEL: int [1:4] 1 1 1 1

 [1] "dlapply (function(d, p) p$compute_aesthetics(d, plot)) finished"
 List of 2
  $ :'data.frame':       4 obs. of  5 variables:
   ..$ x     : Factor w/ 2 levels "H","L": 2 2 1 1
   ..$ y     : num [1:4] 6.28 3 8.19 11.27
   ..$ colour: Factor w/ 2 levels "A","B": 1 2 1 2
   ..$ group : Factor w/ 2 levels "A","B": 1 2 1 2
   ..$ PANEL : int [1:4] 1 1 1 1
  $ :'data.frame':       4 obs. of  5 variables:
   ..$ x     : Factor w/ 2 levels "H","L": 2 2 1 1
   ..$ y     : num [1:4] 6.28 3 8.19 11.27
   ..$ colour: Factor w/ 2 levels "A","B": 1 2 1 2
   ..$ group : Factor w/ 2 levels "A","B": 1 2 1 2
   ..$ PANEL : int [1:4] 1 1 1 1

The final result for ggplot_build on pA is

 List of 2
  $ :'data.frame':       4 obs. of  5 variables:
   ..$ colour: chr [1:4] "#F8766D" "#00BFC4" "#F8766D" "#00BFC4"
   ..$ x     : int [1:4] 2 2 1 1
   ..$ y     : num [1:4] 6.28 3 8.19 11.27
   ..$ group : int [1:4] 1 2 1 2
   ..$ PANEL : int [1:4] 1 1 1 1
  $ :'data.frame':       4 obs. of  5 variables:
   ..$ colour: chr [1:4] "#F8766D" "#00BFC4" "#F8766D" "#00BFC4"
   ..$ x     : int [1:4] 2 2 1 1
   ..$ y     : num [1:4] 6.28 3 8.19 11.27
   ..$ group : int [1:4] 1 2 1 2
   ..$ PANEL : int [1:4] 1 1 1 1

The output for ggplot_build run on problem object pnewA for the
first couple of functions is:

 [1] "map_layout(panel, plot$facet, layer_data, plot$data) finished"
 List of 2
  $ :'data.frame':       4 obs. of  4 variables:
   ..$ r    : num [1:4] 6.28 3 8.19 11.27
   ..$ f1   : Factor w/ 2 levels "H","L": 2 2 1 1
   ..$ f2   : Factor w/ 2 levels "A","B": 1 2 1 2
   ..$ PANEL: int [1:4] 1 1 1 1
  $ : list()
   ..- attr(*, "dim")= int [1:2] 0 2
   ..- attr(*, "dimnames")=List of 2
   .. ..$ : NULL
   .. ..$ : chr [1:2] "data" "PANEL"

 [1] "dlapply (function(d, p) p$compute_aesthetics(d, plot)) finished"
 List of 2
  $ :'data.frame':       4 obs. of  5 variables:
   ..$ x     : Factor w/ 2 levels "H","L": 2 2 1 1
   ..$ y     : num [1:4] 6.28 3 8.19 11.27
   ..$ colour: Factor w/ 2 levels "A","B": 1 2 1 2
   ..$ group : Factor w/ 2 levels "A","B": 1 2 1 2
   ..$ PANEL : int [1:4] 1 1 1 1
  $ :'data.frame':       0 obs. of  0 variables

The final result for ggplot_build on pnewA is

 List of 2
  $ :'data.frame':       4 obs. of  5 variables:
   ..$ colour: chr [1:4] "#F8766D" "#00BFC4" "#F8766D" "#00BFC4"
   ..$ x     : int [1:4] 2 2 1 1
   ..$ y     : num [1:4] 6.28 3 8.19 11.27
   ..$ group : int [1:4] 1 2 1 2
   ..$ PANEL : int [1:4] 1 1 1 1
  $ : NULL

As can clearly be seen from this, things appear to go wrong from
map_layout. I'm assuming if they go wrong, then they continue to go
wrong. Already by the second transformation on pnewA, dlapply
(function(d, p) p$compute_aesthetics(d, plot))
the data frame is
empty.

In the case of pA, the data is close to its final form after the
second transformation - call to dlapply (function(d, p)
p$compute_aesthetics(d, plot))
.

map_layout is in "panel.r". This mostly calls facet_map_layout in
"facet-.r". This calls some facet_map_layout method. There are
methods in "facet-grid-.r", "facet-null.r", and "facet-wrap.r". In
this case it turns out to call the method in "facet-null.r", since
there is no actual faceting here.

map_layout does the following operations:

STEP 1: For each data attribute in layers, pA$layers[[i]]$data, it
checks if it is a waiver object, basically meaning the calling
function should use the default value.

 if (is.waive(data))

If so, it then replaces the pA$layers[[i]]$data with the top level
data attribute, e.g. pA$data, and passes down to
facet_map_layout.null (in this case).

STEP 2:If the data object constructed from above is either a waiver object or
empty, then it does a cbind and then returns the result

 if (is.waive(data) || empty(data))
   return(cbind(data, PANEL = integer(0)))

otherwise it adds a PANEL attribute to data.

 data$PANEL <- 1L

and returns.

STEP 1 is the important step here.

So to summarize what happens

First, extract the layer data.

        ggplot_build
 pnewA ----------------> layer_data (lapply(pnewA$layers, function(y) y$data))

Second, pass layer_data to map_layout, replacing empty elements with
pnewA$data.

In this case of pnewA, this results in one non-empty element, and one
empty element.

In this case of pA, this results in two non-empty elements.

##################################################################
Detailed analysis for Part 2 ends
##################################################################

@wch
Collaborator

@fmitha Thanks for the very detailed analysis. I don't get errors running your example code -- was that addressed by your fix for #587? So that seems to make PATCH 1 unnecessary.

As for the second problem, the circle not rendering for pnewA, PATCH 2 does seem to make the circle render properly - but I think adding in the dummy data at that stage isn't quite the right solution. This is because the data object in ggplot_build() will from that point on have some incorrect data in it. Also, I believe it doesn't handle cases where the inherited data set has columns, but no rows. For example:

p3 <- ggplot(data.frame(foo = numeric(0)))
p3 <- p3 + geom_point(data = d, aes(x = f1, y = r, color = f2, group = f2))
p3 + annotation_custom(circleGrob())

I don't know the exact source of the bug, but I think it should be possible to have a more targeted bug fix.

When digging into this, I did find that there is a bug in facet_map_layout.null where, when the input is a waiver object, it returns a list instead of a data frame. This should fix it (though it doesn't change the final result at all):

--- a/R/facet-null.r
+++ b/R/facet-null.r
@@ -22,7 +22,9 @@ facet_train_layout.null <- function(facet, data) {
 facet_map_layout.null <- function(facet, data, layout) {
   # Need the is.waive check for special case where no data, but aesthetics
   # are mapped to vectors, like qplot(1:5, 1:5)
-  if (is.waive(data) || empty(data))
+  if (is.waive(data))
+    return(data.frame(PANEL = integer(0)))
+  else if (empty(data))
     return(cbind(data, PANEL = integer(0)))
   data$PANEL <- 1L
   data
@fmitha
@fmitha

Hi Winston,

Thanks for the reply.

On Thu, 23 May 2013, Winston Chang wrote:

@fmitha Thanks for the very detailed analysis. I don't get errors
running your example code -- was that addressed by your fix for
#587? So that seems to make PATCH 1 unnecessary.

By my example code, I assume you mean what I call EXAMPLE 1?

If so, yes, the fix for #587 makes PATCH 1 unnecessary. They have the
same effect, but in different ways. PATCH 1 allows NULL values for
elements of the top level data attribute to go through in the
rendering code, whereas the #587 fix changes apparently the only place
in the ggplot_build code where an empty data frame is changed to NULL,
hence making PATCH 1 unnecessary.

Of course, with just the #587 fix, the circle does not render for my
example, or for Bryans similar one, which is really the main point of this issue.

Let me take this opportunity to ask - is it considered undesirable to
have a NULL as a list value in the top level data? It seems clear
from the code that an empty data set is Ok.

As for the second problem, the circle not rendering for pnewA, PATCH
2 does seem to make the circle render properly - but I think adding
in the dummy data at that stage isn't quite the right solution. This
is because the data object in ggplot_build() will from that point on
have some incorrect data in it. Also, I believe it doesn't handle
cases where the inherited data set has columns, but no rows. For
example:

p3 <- ggplot(data.frame(foo = numeric(0)))
p3 <- p3 + geom_point(data = d, aes(x = f1, y = r, color = f2, group = f2))
p3 + annotation_custom(circleGrob())

I don't know the exact source of the bug, but I think it should be
possible to have a more targeted bug fix.

Oh, this clearly isn't the correct solution for this bug. I hoped it
might help someone who knows the code better than I do to get a proper
fix. Generally fixes by non-experts are not proper fixes, but can
still save the experts time.

I'll take a look at your counterexample, and see if I can think of
anything else.

Did you see my comments about the fact that the rendering is
not affected by zeroing out the top level data attribute, at least in my
example? It is quite odd. Perhaps the top level data attribute is just
not used in my case. But I don't understand how the rendering happens at all.
Any pointers? I think this may be done by the grid package, is that right?

When digging into this, I did find that there is a bug in
facet_map_layout.null where, when the input is a waiver object, it
returns a list instead of a data frame. This should fix it (though
it doesn't change the final result at all):

Ok. Are you going to commit this?

BTW, I'd appreciate it if you could help me figure out why my tests
are failing. I'm not sure who else to ask. I posted to the general
list about this earlier. I can create an issue for this if you
want. If I can't run the tests, it is difficult to test patches,
obviously.
Regards, Faheem.

> --- a/R/facet-null.r
> +++ b/R/facet-null.r
> @@ -22,7 +22,9 @@ facet_train_layout.null <- function(facet, data) {
>  facet_map_layout.null <- function(facet, data, layout) {
>    # Need the is.waive check for special case where no data, but 
> aesthetics
>    # are mapped to vectors, like qplot(1:5, 1:5)
> -  if (is.waive(data) || empty(data))
> +  if (is.waive(data))
> +    return(data.frame(PANEL = integer(0)))
> +  else if (empty(data))
>      return(cbind(data, PANEL = integer(0)))
>    data$PANEL <- 1L
>    data 
@fmitha

I spent some more time looking at the code. The following patch fixes
at least EXAMPLE 1, as well as Bryan's similar code at the beginning
of this issue. Also, it passes the unit tests (I ran them using R CMD
CHECK .
, which is apparently how one runs the unit tests, though it
seems nobody could be bothered to tell me). It also passes the visual
tests.

Note: I noticed the unit tests failing far more often than the visual
tests. In fact, I tried a few different patches (including PATCH 2
above), and I did not see any failures. It is possible I am not
running the visual tests correctly. Patches that break the visual
tests in interesting ways would be appreciated.

NB: If this patch or any later one is used, I want credit for it.

Discussion follows.

--- a/R/panel.r
+++ b/R/panel.r
@@ -43,10 +43,18 @@ train_layout <- function(panel, facet, d
 # caused problems when they had names of aesthetics (like colour or group).
 # 
 # @param panel a trained panel object
-# @param the facetting specification
+# @param facet the facetting specification
 # @param data list of data frames (one for each layer)  
 # @param plot_data default plot data frame
 map_layout <- function(panel, facet, data, plot_data) {
+  ## if plot_data is itself a waiver object, then select the first
+  ## non-waiver element of data as the new default
+  if(is.waive(plot_data))
+    {
+      firstnonwaivepos = match(TRUE, lapply(data, function(x) !is.waive(x)))
+      if(!is.na(firstnonwaivepos))
+        plot_data = data[[firstnonwaivepos]]
+    }
   lapply(data, function(data) {
     if (is.waive(data)) data <- plot_data
     facet_map_layout(facet, data, panel$layout)

DISCUSSION:

I should first say I don't really understand the details of this code;
it is very complex. However, we can break down the basic structure.

(1) Create ggplot object by adding layers. This corresponds
approximately to add_ggplot in "plot-construction.r".

(2) Convert the ggplot object to an object than can be rendered. This
corresponds approximately to the function ggplot_build in
"plot-build.r".

(3) Build the pieces (grobs) of the plot, so it can be passed to
grid.draw(). At the end of this step the plot has been completely
described programatically. This corresponds approximately to
ggplot_gtable in "plot-render.r".

Let us consider EXAMPLE 1. Getting rid of the bug described in
#587 helped to simplify
things, so now if we apply the patch for that issue, we don't get an
error. However, the circle still does not render for pnewA.

Now, the function ggplot_build starts with the function
build_grob. This seems to be where the basic grobs are built for
rendering, and we can see the circle fails to build there already.

Consider the code inside ggplot_gtable:

[...]
ggplot_gtable <- function(data) {

  plot <- data$plot
  panel <- data$panel
  data <- data$data
  theme <- plot_theme(plot)
  # loop over layers eg. one layer is points, second layer is data
  build_grob <- function(layer, layer_data) {
    if (nrow(layer_data) == 0) return()
    # dlply loops over facets, e.g. panel 1 and panel 2
    dlply(layer_data, "PANEL", function(df) {
      panel_i <- match(df$PANEL[1], panel$layout$PANEL)
      layer$make_grob(df, scales = panel$ranges[[panel_i]], cs = plot$coord)
    }, .drop = FALSE)
  }

  # List by layer, list by panel
  geom_grobs <- Map(build_grob, plot$layer, data)

We see that the geom_grob call corresponds to build_grob looping
over the layers using Map. The first argument to build_grob
(layer) is the layer. This corresponds to
pnewAbuild$plot$layer[[i]], where i=1, 2 for a total of two
layers, the points and the circle.

The second argument is (layer_data), which corresponds to the data
pAbuild$data[[i]] for i=1, 2.

As discussed earlier, pAbuild$data[[2]], which corresponds to the
data for the circle, is an empty data frame. This forces an immediate
exit from build_grob via the code

if (nrow(layer_data) == 0) return()

That means the result returned for the circle layer is empty. Even if
we disregard this line, we see that the following dlply loop, which
loops over facets, splits layer_data into sub data frames via the
PANEL variable. Then matches are looked for in
pnewAbuild$panel$layout$PANEL. I think this corresponds to different
facets. Of course, this is completely fails if pAbuild$data[[i]] is
empty. The upshot is that if the data is empty for any layer at the
rendering stage, it will not render, and this does not seem like
something that one should try to change.

Backing up a step, the pAbuild$data is constructed in
ggplot_build. The first step, namely map_layout is where the
initial construction is made. The code is

map_layout <- function(panel, facet, data, plot_data) {
  lapply(data, function(data) {
    if (is.waive(data)) data <- plot_data
    facet_map_layout(facet, data, panel$layout)

So, plot_data = pnewA$data is the default data. The data
corresponds to pnewA$layer[[i]]$data for i=1, 2. This code replaces
pnewA$layer[[i]]$data with pnewA$data if pnewA$layer[[i]]$data
is a waiver object. If pnewA$data is itself a waiver object, then
the result is still a waiver object, as is the case for pnewA. If
data is still empty at the point of being passed to
facet_map_layout, no PANEL information will get added, and it seems
likely that the data frame will end up as empty at the end of
ggplot_build. So our options here are limited, it seems, to changing
things at the point where map_layout is called.

The situation is that the annotation_custom circle does not have any
data associated at the construction step. This seems
reasonable. However, it also seems clear that the annotation_custom
layer does need some data in general, if only in the case of
faceting. The faceting is determined by aesthetics data, and this is
done by another layer. But the annotation_custom circle needs to sit
inside the panel. Now the faceting happens relative to some data
set. So the annotation_custom layer needs to get access to that data
for the facet information.

The bottom line is that it seems we need to copy some data to the
annotation_custom layer at the map_layout step. What data should
we copy? Well, the closest thing to the default data is the first data
set that appears in the construction step - this would the first data
set in the layers. Now, see the previous para. I don't know what
transformation ggplot2 does to the data. Since the faceting could
happen at some later step, at first glance the annotation_custom
layer could get data that does not tell it about the faceting, but the
same objections apply to the original setup where the default data
appears first, i.e. the ggplot(data=d) situation, and it does the
right thing there. I'm guessing that ggplot2 copies the aesthetics
over from other layers as necesary in later steps.

ADDENDUM: I notice that annotation_custom puts the same object
in every facet of a faceted plot. It would be nice to have a version
where one could choose which facet to have it in.

Comments, corrections, and any other feedback welcome, of course.

@fmitha

For the record, the patch given at the beginning of the previous posting breaks Winston Chang's example earlier in the thread. This example is:

library(ggplot2)                                                               
library(grid)                                                                                                           
len = 2                                                                                                                 
d <- data.frame(r = c( 6.279072, 2.995998, 8.193851, 11.274669),                                                        
                 f1 = c(rep("L", len), rep("H", len)), f2 = rep(c("A", "B"), len))                                      
p3 <- ggplot(data.frame(foo = numeric(0)))                                                                              
p3 <- p3 + geom_point(data = d, aes(x = f1, y = r, color = f2, group = f2))                                             
p3 = p3 + annotation_custom(circleGrob())
@hadley hadley closed this
@fmitha

This issue was closed without explanation, and I don't see anything in the repository that fixes it. Note there is a patch that (appears) to fix the problem, though I don't know if it is completely correct. If you want a pull request, please say so.

@hadley
Owner

Please resubmit a PR with minimal discussion. Unfortunately I don't have time to read this massive thread.

@fmitha

Ok, will do. Should I do so on this issue, or create a new issue?

@hadley
Owner

New issue please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.