Skip to content

facet_grid drops duplicate cases #443

Closed
wch opened this Issue Mar 15, 2012 · 6 comments

3 participants

@wch
Collaborator
wch commented Mar 15, 2012

This is from the discussion thread here: http://groups.google.com/group/ggplot2/browse_thread/thread/5213ac35da6b36d4/bdbb8be0a658bfe5

dat <- data.frame(x=rep(1:4, 10), g=rep(c("A","B"),each=20))

p <- ggplot(dat, aes(x,x)) + geom_point(position="jitter")

# OK: 20 points in each facet
p + facet_wrap(~g)

# Not OK: 4 points in each facet
p + facet_grid(.~g)
@wch
Collaborator
wch commented Apr 6, 2012

I found the guilty line, in facet-locate.r, line 17:

  # Workaround for bug in reshape
  data <- unique(data)

I commented out this line and it appears to generate the correct result image. However, it fails one of the tests: Facetting (location): margins add extra data. Here's the relevant test code, using the original (uncommented) version of facet-locate:

df <- expand.grid(a = 1:2, b = 1:2)
panel <- layout_grid(list(df), "a", "b", margins = "b")
loc <- locate_grid(df, panel, "a", "b", margins = "b")
panel
#   PANEL ROW COL a     b
# 1     1   1   1 1     1
# 2     2   1   2 1     2
# 3     3   1   3 1 (all)
# 4     4   2   1 2     1
# 5     5   2   2 2     2
# 6     6   2   3 2 (all)

loc
#   a     b PANEL
# 1 1     1     1
# 2 1     2     2
# 3 1 (all)     3
# 4 2     1     4
# 5 2     2     5
# 6 2 (all)     6

expect_that(nrow(loc), equals(nrow(df) + 2))  
# This works

With the modified (commented) version, there are a couple extra lines in loc:

loc
#   a     b PANEL
# 1 1     1     1
# 2 1     2     2
# 3 1 (all)     3
# 4 1 (all)     3
# 5 2     1     4
# 6 2     2     5
# 7 2 (all)     6
# 8 2 (all)     6

expect_that(nrow(loc), equals(nrow(df) + 2))  
# Fails

@hadley what is the meaning of (all)? When I comment out the line, it produces the correct jittered graph, all the visual tests are unchanged (about 12 of the vtests use facets), and all the non-visual tests pass except the one above. Perhaps the test is wrong?

@BrianDiggs

I think the test is wrong. (all) is the category that is the margin. Effectively, there should be a copy of the data where any margined value is replaced with (all). I think that the test should be

expect_that(nrow(loc), equals(2 * nrow(df)))

Or in general, for n faceting variables,

expect_that(nrow(loc), equals((2 ^ n) * nrow(df)))
@wch
Collaborator
wch commented Apr 6, 2012

@BrianDiggs that makes sense. I don't think that the (all) rows are actually used anywhere in ggplot2 code.

I wonder - what is the bug in reshape2 that is worked around by that line of code?

@BrianDiggs

The (all) rows contain the data that is to be displayed/used in the panels that are the marginals.

I don't know what the bug is/was that was being worked around. It was introduced in 9fb2a97 by Hadley. There was later discussion of this bug on that commit.

@hadley
Owner
hadley commented Apr 11, 2012

Hmmm, I think I must have fixed the bug because the output without unique looks fine. I'll fix the test.

@hadley hadley added a commit that closed this issue Apr 11, 2012
@hadley Remove spurious unique in facet_grid.
Fixes #443
653ca15
@hadley hadley closed this in 653ca15 Apr 11, 2012
@BrianDiggs

This bug fix warrants a mention in the NEWS file.

@hadley hadley added a commit that referenced this issue Apr 20, 2012
@hadley NEWS item for #443 692265c
@kohske kohske added a commit to kohske/ggplot2 that referenced this issue Apr 27, 2012
@hadley Remove spurious unique in facet_grid.
Fixes #443
72a564e
@kohske kohske added a commit to kohske/ggplot2 that referenced this issue Apr 27, 2012
@hadley NEWS item for #443 87fb4ed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.