Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Stacking for geom_area doesn't properly handle missing entries #280

Closed
wch opened this Issue · 4 comments

3 participants

@wch
Collaborator
wch commented

With a a stacked area graph, it requires an entry at every x for every group. If for a given x value, one group doesn't have an entry in the data frame, then it will behave as though the y value of that group at that x is zero.

The pictures will illustrate better:

dat <- data.frame(
        g=rep(LETTERS[1:3], each=4),
        x=rep(1:4, 3),
        y=rep(3:14))

# Remove row with g=B, x=3 
dat <- dat[-7,] 
dat

# Lines all look straight
ggplot(dat, aes(x=x, y=y, colour=g)) + geom_line()

# With a stacked area graph, there's a dip at x=3 
ggplot(dat, aes(x=x, y=y, fill=g)) + geom_area()

Test code:

test_that("Stacked area graph interpolates missing values", {
  dat <- data.frame(
           g=rep(LETTERS[1:3], each=4),
           x=rep(1:4, 3),
           y=rep(3:14))

  # Remove row with g=B, x=3 
  dat <- dat[-7,] 

  p <- ggplot_build(ggplot(dat, aes(x=x, y=y, fill=g)) + geom_area())

  topgroup_y <- with(p$data[[1]], y[x==3 & group==3] )
  expect_equal(topgroup_y, 27)  
})

I think fixing this one would require doing some interpolation. Perhaps solving this one is better left to the large changes to stacking code in the future?

@kohske
Collaborator

Just a note, although this is useful in some cases, I don't think this kind of automatic interpolation is good idea.
The purpose of visualization is to visually inspect how the data is.
But the automatic interpolation will make users miss the missing values.
Furthermore, there is no reason to apply liner interpolation. Why not smoothing, why not other filtering?
So, in my view, interpolation should be done by users' hand.
Or, at least, the explicit way, such as stat_interpolate, should be provided. But maybe this is beyond the scope of "plotting."
Another way is to simply induce an error or a warning when missing values are detected.

@hadley
Owner

Yeah, I'm totally with @Koshke on this one. It gets even more complicated if you consider longitudinal data where possibly none of the time points align.

But I think it's worth having some tool that will do this, just not automatically. Something to consider for 1.0

@wch
Collaborator
wch commented

That makes sense. I think it would be a good idea to have an informative warning message so that people know how to deal with the issue if they encounter it.

@hadley
Owner

This sounds like a great feature, but unfortunately we don't currently have the development bandwidth to support it. If you'd like to submit a pull request that implements this feature, please follow the instructions in the development vignette.

@hadley hadley closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.