# Small cleanup of stat density#554

Closed
wants to merge 3 commits into from
+5 −8

### 2 participants

To keep the code clear and the documentation in line with the code.

 jiho Remove double spaces 229bfc0 jiho Remove unused code b5aa707 jiho Allow density to expand scale bounds In the previous implementation, density was estimated only within the range of the scale, which is determined by default by the range of the data. So, to see the effect of not trimming the data, the user had to explicitly expand the scale. In the new implementation, with trim = FALSE, the density is indeed estimated up to 3 bandwidths after the bounds of the data, and this expands the scale. But trim = TRUE by default (as was said in the documentation but not respected in the code) so the final aspect should stay the same in most circumstances. 538d75a

The most important change is explained in defail in the commit message of 538d75a

Owner
commented on R/stat-density.r in 538d75a

I'm pretty sure you have to explicitly supply the range of the density, otherwise stacked density don't work because they're not estimated on the same points.

hum, probably indeed. But in that case, should we extend this range by cut bandwidth beyond the range of the full dataset, to allow the "normal" behaviour of density?
Otherwise, in the current implementation, there's no difference between trim=TRUE or FALSE (and this probably explains the discrepancy between the documentation and the code regarding the default value of trim)

Owner

Hmmm, I don't think that will work either because we're not setting the bandwidth and so it will vary by group. Maybe trim should just be deprecated?

Alternatively, we could switch to some other density method that would work with all the data simultaneously.

Or it could be computed with bw.nrd and then passed to density(); but that probably adds another layer of computation and complexity. On the other hand, I can't remember a situation where I needed trim indeed.

If trim is suppressed, the from and to arguments must be reinstated and there probably should be a message in the help indicating that, in that case, the area under the curve is not actually 1 (since this is what you would expect from a density estimation).

Owner

Maybe we should move to a model more like stat_bin - i.e. compute the bandwidth based on the complete data, and output a message saying what was used. Then trim could actually work because there would be a common bandwidth across all groups.

Owner

See discussion of #595 - any fix is going to require backward incompatible changes.

Owner

Unfortunately that change would break existing code so isn't suitable for inclusion in a very mature package like ggplot2.

closed this
deleted the jiho:fix/clean-stat-density branch
Commits on May 11, 2012
1. jiho committed
2. jiho committed
3. jiho committed
In the previous implementation, density was estimated only within the
range of the scale, which is determined by default by the range of the
data. So, to see the effect of not trimming the data, the user had to
explicitly expand the scale.

In the new implementation, with trim = FALSE, the density is indeed
estimated up to 3 bandwidths after the bounds of the data, and this
expands the scale. But trim = TRUE by default (as was said in the
documentation but not respected in the code) so the final aspect should
stay the same in most circumstances.
Showing with 5 additions and 8 deletions.
1. +5 −8 R/stat-density.r
13 R/stat-density.r
 @@ -4,11 +4,11 @@ #' @param kernel kernel used for density estimation, see #' \code{\link{density}} for details #' @param trim if \code{TRUE}, the default, densities are trimmed to the -#' actually range of the data. If \code{FALSE}, they are extended by the +#' actually range of the data. If \code{FALSE}, they are extended by the #' default 3 bandwidths (as specified by the \code{cut} parameter to #' \code{\link{density}}) #' @param na.rm If \code{FALSE} (the default), removes missing values with -#' a warning. If \code{TRUE} silently removes missing values. +#' a warning. If \code{TRUE} silently removes missing values. #' @inheritParams stat_identity #' @return data.frame with additional columns: #' \item{density}{density estimate} @@ -85,7 +85,7 @@ #' qplot(length, data=movies, geom="density", weight=rating/sum(rating)) #' } stat_density <- function (mapping = NULL, data = NULL, geom = "area", position = "stack", -adjust = 1, kernel = "gaussian", trim = FALSE, na.rm = FALSE, ...) { +adjust = 1, kernel = "gaussian", trim = TRUE, na.rm = FALSE, ...) { StatDensity$new(mapping = mapping, data = data, geom = geom, position = position, adjust = adjust, kernel = kernel, trim = trim, na.rm = na.rm, ...) } @@ -93,18 +93,15 @@ adjust = 1, kernel = "gaussian", trim = FALSE, na.rm = FALSE, ...) { StatDensity <- proto(Stat, { objname <- "density" - calculate <- function(., data, scales, adjust=1, kernel="gaussian", trim=FALSE, na.rm = FALSE, ...) { + calculate <- function(., data, scales, adjust=1, kernel="gaussian", trim=TRUE, na.rm = FALSE, ...) { data <- remove_missing(data, na.rm, "x", name = "stat_density", finite = TRUE) n <- nrow(data) if (n < 3) return(data.frame()) if (is.null(data$weight)) data$weight <- rep(1, n) / n - - range <- scale_dimension(scales$x, c(0, 0)) - xgrid <- seq(range[1], range[2], length=200) - dens <- density(data$x, adjust=adjust, kernel=kernel, weight=data$weight, from=range[1], to=range[2]) + dens <- density(data$x, adjust=adjust, kernel=kernel, weight=data$weight) densdf <- as.data.frame(dens[c("x","y")]) densdf$scaled <- densdf$y / max(densdf\$y, na.rm = TRUE)