Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rescale() works with Date, POSIXct and POSIXlt objects. #75

Merged
merged 2 commits into from Jun 26, 2017

Conversation

@zeehio
Copy link
Contributor

@zeehio zeehio commented Mar 1, 2016

This pull request deals with #74 as discussed.

Changes:

  • documents the change in NEWS.md
  • adds tests for rescaling dates (simple tests but enough to show and test the feature).
  • edits R/bounds.r, checking for Date, POSIXct and POSIXlt inputs; converting them to numeric.

Thanks for your time. Feel free to tell me if there is anything that does not convince you.

@zeehio zeehio force-pushed the scales_dates branch 2 times, most recently from 6864ea5 to 1fc1920 Mar 1, 2016
@codecov-io
Copy link

@codecov-io codecov-io commented Nov 1, 2016

Current coverage is 61.71% (diff: 100%)

Merging #75 into master will increase coverage by 0.57%

@@             master        #75   diff @@
==========================================
  Files            26         26          
  Lines           803        815    +12   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            491        503    +12   
  Misses          312        312          
  Partials          0          0          

Powered by Codecov. Last update d58d83a...8044799

Loading

R/bounds.r Outdated
#' If needed, converts input from classes that lack division and multiplication
#' to classes that can be scaled.
#' @param x vector of values to convert, if needed
adapt_input_class <- function(x) {
Copy link
Member

@hadley hadley Nov 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be documented

Loading

Copy link
Contributor Author

@zeehio zeehio Nov 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the comments as conventional R comments, so it does not appear under man/. I can remove the comments completely if you prefer.

Loading

R/bounds.r Outdated
#' to classes that can be scaled.
#' @param x vector of values to convert, if needed
adapt_input_class <- function(x) {
if (any(class(x) %in% c("Date", "POSIXct", "POSIXlt", "POSIXt"))) {
Copy link
Member

@hadley hadley Nov 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use inherits() here.

A better formulation would be:

if (...) {
  ...
} else {
 ...
}

as then you no longer need explicit return() statements..

Loading

Copy link
Contributor Author

@zeehio zeehio Nov 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you for your quick review.

Loading

@karawoo
Copy link
Collaborator

@karawoo karawoo commented Jun 23, 2017

This would be useful for some of the scales stuff I'm working on for ggplot2 – it would allow scaling alpha with datetimes, for one. But I think we need a way to go back to datetimes after the conversion, otherwise you get legends that look like this:

dat <- data.frame(
  x = sample(1:10, 10),
  y = (1:10)^2,
  dt = seq(
    as.POSIXct("2017-06-01 15:28:44 PDT"),
    as.POSIXct("2017-06-12 15:28:44 PDT"),
    length.out = 10
  )
)

ggplot(dat, aes(x, y)) +
  geom_point(aes(alpha = dt))

scale_alpha_datetimes

Loading

@zeehio
Copy link
Contributor Author

@zeehio zeehio commented Jun 23, 2017

@karawoo Thanks for reviewing this. I believe that what you are asking for fits better in the ggplot2 package and not in the scales package.

Here is your example, with a draft of what the ggplot2::scale_alpha_datetime could be:

library(ggplot2)
dat <- data.frame(
  x = sample(1:10, 10),
  y = (1:10)^2,
  dt = seq(
    as.POSIXct("2017-06-01 15:28:44 PDT"),
    as.POSIXct("2017-06-12 15:28:44 PDT"),
    length.out = 10
  )
)


scale_alpha_datetime <- function(name = waiver(), origin = "1970-01-01 00:00.00 UTC", ...) {
  origin <- force(origin)
  label_fun <- function(breaks) {
    as.POSIXct(breaks, origin = origin)
  }
  scale_alpha_continuous(name = name, labels = label_fun, ...)
}


ggplot(dat, aes(x, y)) +
  geom_point(aes(alpha = dt)) +
  scale_alpha_datetime()

imatge

The actual scale_alpha_datetime function should be closer to the other scale_*_datetime functions, to provide better label formatting and a more reasonable choice of date breaks.

Link to the relevant ggplot2 issue: tidyverse/ggplot2#1526

Loading

R/bounds.r Outdated
# If needed, converts input from classes that lack division and multiplication
# to classes that can be scaled.
# @param x vector of values to convert, if needed
adapt_input_class <- function(x) {
Copy link
Member

@hadley hadley Jun 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the reasoning here. Why not just unconditionally coerce to numeric?

Loading

Copy link
Contributor Author

@zeehio zeehio Jun 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scales::rescale mainly depends on elementary arithmetic operations (addition, subtraction, multiplication and division). Coercing to numeric is a good way to define multiplication and division operations for POSIXct objects, but it is not the way to go for all R objects. For instance integer64 numbers, from the bit64 package, define their own arithmetic operators to prevent loss of precision.

If the coercion to numeric is done, there will be a warning when using some integer64 objects:

x <- bit64::as.integer64(c(2^50, 2^54+1, 2^60+2))
from <- range(x, na.rm = TRUE)
# I need to compute the `from` range here due to an unrelated bit64 bug triggered
# when range(..., finite = TRUE) is used. (this bug has already been
# reported to the bit64 maintainer)

# No warning given now:
scales::rescale(x = x, from = from)
# [1] 0.00000000 0.01466276 1.00000000

# If we force coercion to numeric there is a warning:
scales::rescale(x = as.numeric(x), from = as.numeric(from))
# Warning messages:
# 1: In as.double.integer64(from) :
#   integer precision lost while converting to double
# 2: In as.double.integer64(x) :
#   integer precision lost while converting to double
# [1] 0.00000000 0.01466276 1.00000000

I wanted to provide an example were the as.numeric approach is wrong, but scales::rescale calls scales::zero_range that has a default tolerance too large for integer64 data. I could work towards letting the user give a custom tolerance value, but for the sake of the explanation you can set tol to zero manually in the zero_range definition in r/bounds.r and rebuild, in case you want to reproduce this example:

x <- bit64::as.integer64(2^60) + 1:3 
# x has three consecutive numbers, we expect the result to be c(0, 0.5, 1)
x_range <- range(x, na.rm = TRUE)
scales::rescale(x = x, from = x_range) # Correct result
# [1] 0.0 0.5 1.0
scales::rescale(x = as.numeric(x), from = as.numeric(x_range)) # loss of precision
# [1] 0.5 0.5 0.5

Even if we have tolerance issues in this case, I prefer to work towards making scales work with other objects, and therefore not coercing everything to numeric.

But you have the last word, though 😃

Loading

Copy link
Member

@hadley hadley Jun 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I'd prefer to use an S3 generic.

Loading

Copy link
Contributor Author

@zeehio zeehio Jun 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an S3 generic is the best approach, my apologies for not thinking about them when I wrote this patch. I was now able to provide a method (and test case) for integer64 data as well.

Loading

Provide support for Date, POSIXct, POSIXlt and integer64 types, as well as numeric.
@hadley hadley merged commit 0101575 into r-lib:master Jun 26, 2017
3 checks passed
Loading
@hadley
Copy link
Member

@hadley hadley commented Jun 26, 2017

Thanks!

Loading

zeehio added a commit to zeehio/scales that referenced this issue Oct 17, 2017
The idea behind r-lib#74 and r-lib#75 was to use an S3 method so we never coerced to numeric
unless we had to. The aim was to prevent future cases similar to the integer64 one
where the coercion to numeric implies a loss of precision.

Reality is that there are many more numeric types out there that do not inherit from the numeric
class (like the dist object mentioned in r-lib#102), compared to the number of cases similar to integer64, where the numeric coercion is wrong.

So there is a trade-off:

- Not providing the `default` methods has the advantage that we only rescale the classes that we know can be rescaled, and the disadvantage that we get issues like r-lib#102 when we find a new object. This is what was implemented in r-lib#75

- Providing `rescale.default` and `rescale_mid.default` has the advantage of catching all the numeric-like types that do not inherit the numeric class (such as the dist object) and working as expected. However, we risk to deliver wrong results without an error if a case similar to integer64 appeared. This is what this commit implements.
@zeehio zeehio deleted the scales_dates branch Feb 18, 2018
zeehio added a commit to zeehio/scales that referenced this issue Jun 9, 2018
The idea behind r-lib#74 and r-lib#75 was to use an S3 method so we never coerced to numeric
unless we had to. The aim was to prevent future cases similar to the integer64 one
where the coercion to numeric implies a loss of precision.

Reality is that there are many more numeric types out there that do not inherit from the numeric
class (like the dist object mentioned in r-lib#102), compared to the number of cases similar to integer64, where the numeric coercion is wrong.

So there is a trade-off:

- Not providing the `default` methods has the advantage that we only rescale the classes that we know can be rescaled, and the disadvantage that we get issues like r-lib#102 when we find a new object. This is what was implemented in r-lib#75

- Providing `rescale.default` and `rescale_mid.default` has the advantage of catching all the numeric-like types that do not inherit the numeric class (such as the dist object) and working as expected. However, we risk to deliver wrong results without an error if a case similar to integer64 appeared. This is what this commit implements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants