Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

binwidth for dates #219

Closed
jeroenooms opened this Issue · 2 comments

4 participants

@jeroenooms

This is in 0.89, not sure if it's better in 0.90.

x <- as.Date(Sys.time()) + c(0,7)
y <- c(7,9)
qplot(x,y, geom="bar", stat="identity")

Bars are too wide and it's not clear to which date they refer. Better default for dates might be to have the bar the width of exactly 1 date.

@BrianDiggs

It's tricky, and I don't think that narrower bars are necessarily "better." Breaking it down: the x scale is a date scale which is a continuous scale. Bars are generally used in two contexts: on a discrete scale as a counting, or on a continuous scale as a aggregation over some region of the scale (effectively: histogram). The latter is what you are doing (since, as I said, dates are a continuous scale). So the interpretation is that each value of the date represents the midpoint of some bin of values. Without specific information, ggplot assumes the width of the bins can be figured out by looking at the spacing of the midpoints, and so the width of the bars covers the space between the points. That is what you see. And you are right in your interpretation that "which date they refer" isn't as clear, because it is meant to reflect corresponding to a range of dates.

The behavior you suggest is that dates implicitly have a day-sized discrete nature. But that is not necessarily the scale over which they are aggregated. So if you want to indicate that the data you have is day sized bins, you need to give data that is binned at the day level.

Here are several examples that try different aspects of continuous and discrete scales (note the discrete scales are not truly date scales, but discrete scales whose values a formatted like dates).

x <- as.Date("2011-08-08") + c(0,7,21)
y <- c(7,9,11)

DF1 <- data.frame(x,y)
p <- ggplot(DF1, aes(x,y)) +
    geom_bar(stat="identity")
p

DF2 <- data.frame(x=factor(x), y)
p %+% DF2

DF3 <- data.frame(x=factor(as.character(x), levels=as.character(seq(min(x),max(x),by="1 day"))), y)
p %+% DF3

DF4 <- rbind.fill(DF3, data.frame(x=levels(DF3$x), y=0))
p %+% DF4

DF5 <- rbind(DF1, data.frame(x=seq(min(x), max(x), by="1 day"), y=0))
p %+% DF5

The last one is most like what you want. It has a proper date scale on the x axis, but the data itself is binned at the day level (there is an entry for each day).

Also, compare that with what happens when there is actually aggregation with the binning of geom_bar

DF6 <- data.frame(x=rep(as.Date(c("2011-08-07", "2011-08-08", "2011-08-09",
    "2011-08-14", "2011-08-15", "2011-08-16", 
    "2011-08-28", "2011-08-29", "2011-08-30")),c(2,3,2,3,3,3,4,3,4)))
ggplot(DF6, aes(x)) + geom_bar(binwidth=7)
@hadley hadley closed this
@dedcode

Not sure I agree with this work around. My X is already binned at the day level, still geom_bar() changes that, now I don't know what binwidth I should use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.