Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rescale() works with Date, POSIXct and POSIXlt objects. #75

Merged
merged 2 commits into from Jun 26, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Expand Up @@ -9,6 +9,8 @@
* `train_discrete()` gets a new `na.rm` argument which controls whether
`NA`s are preserved or dropped.

* `rescale()` works with `Date`, `POSIXct` and `POSIXlt` objects (@zeehio, #74).

# scales 0.4.0

* Switched from `NEWS` to `NEWS.md`
Expand Down
18 changes: 18 additions & 0 deletions R/bounds.r
@@ -1,3 +1,14 @@
# If needed, converts input from classes that lack division and multiplication
# to classes that can be scaled.
# @param x vector of values to convert, if needed
adapt_input_class <- function(x) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be documented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the comments as conventional R comments, so it does not appear under man/. I can remove the comments completely if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the reasoning here. Why not just unconditionally coerce to numeric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scales::rescale mainly depends on elementary arithmetic operations (addition, subtraction, multiplication and division). Coercing to numeric is a good way to define multiplication and division operations for POSIXct objects, but it is not the way to go for all R objects. For instance integer64 numbers, from the bit64 package, define their own arithmetic operators to prevent loss of precision.

If the coercion to numeric is done, there will be a warning when using some integer64 objects:

x <- bit64::as.integer64(c(2^50, 2^54+1, 2^60+2))
from <- range(x, na.rm = TRUE)
# I need to compute the `from` range here due to an unrelated bit64 bug triggered
# when range(..., finite = TRUE) is used. (this bug has already been
# reported to the bit64 maintainer)

# No warning given now:
scales::rescale(x = x, from = from)
# [1] 0.00000000 0.01466276 1.00000000

# If we force coercion to numeric there is a warning:
scales::rescale(x = as.numeric(x), from = as.numeric(from))
# Warning messages:
# 1: In as.double.integer64(from) :
#   integer precision lost while converting to double
# 2: In as.double.integer64(x) :
#   integer precision lost while converting to double
# [1] 0.00000000 0.01466276 1.00000000

I wanted to provide an example were the as.numeric approach is wrong, but scales::rescale calls scales::zero_range that has a default tolerance too large for integer64 data. I could work towards letting the user give a custom tolerance value, but for the sake of the explanation you can set tol to zero manually in the zero_range definition in r/bounds.r and rebuild, in case you want to reproduce this example:

x <- bit64::as.integer64(2^60) + 1:3 
# x has three consecutive numbers, we expect the result to be c(0, 0.5, 1)
x_range <- range(x, na.rm = TRUE)
scales::rescale(x = x, from = x_range) # Correct result
# [1] 0.0 0.5 1.0
scales::rescale(x = as.numeric(x), from = as.numeric(x_range)) # loss of precision
# [1] 0.5 0.5 0.5

Even if we have tolerance issues in this case, I prefer to work towards making scales work with other objects, and therefore not coercing everything to numeric.

But you have the last word, though 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I'd prefer to use an S3 generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an S3 generic is the best approach, my apologies for not thinking about them when I wrote this patch. I was now able to provide a method (and test case) for integer64 data as well.

if (inherits(x, c("Date", "POSIXct", "POSIXlt", "POSIXt"))) {
as.numeric(x)
} else {
x
}
}

#' Rescale numeric vector to have specified minimum and maximum.
#'
#' @param x numeric vector of values to manipulate.
Expand All @@ -11,6 +22,8 @@
#' rescale(runif(50))
#' rescale(1)
rescale <- function(x, to = c(0, 1), from = range(x, na.rm = TRUE, finite = TRUE)) {
x <- adapt_input_class(x)
from <- adapt_input_class(from)
if (zero_range(from) || zero_range(to)) {
return(ifelse(is.na(x), NA, mean(to)))
}
Expand All @@ -31,6 +44,9 @@ rescale <- function(x, to = c(0, 1), from = range(x, na.rm = TRUE, finite = TRUE
#' rescale_mid(runif(50), mid = 0.5)
#' rescale_mid(1)
rescale_mid <- function(x, to = c(0, 1), from = range(x, na.rm = TRUE), mid = 0) {
x <- adapt_input_class(x)
from <- adapt_input_class(from)
mid <- adapt_input_class(mid)
if (zero_range(from) || zero_range(to)) return(rep(mean(to), length(x)))

extent <- 2 * max(abs(from - mid))
Expand All @@ -49,6 +65,8 @@ rescale_mid <- function(x, to = c(0, 1), from = range(x, na.rm = TRUE), mid = 0)
#' rescale_max(runif(50))
#' rescale_max(1)
rescale_max <- function(x, to = c(0, 1), from = range(x, na.rm = TRUE)) {
x <- adapt_input_class(x)
from <- adapt_input_class(from)
x / from[2] * to[2]
}

Expand Down
28 changes: 21 additions & 7 deletions tests/testthat/test-bounds.r
Expand Up @@ -2,13 +2,13 @@ context("Bounds")

test_that("rescale_mid returns correct results", {
x <- c(-1, 0, 1)

expect_equal(rescale_mid(x), c(0, 0.5, 1))
expect_equal(rescale_mid(x, mid = -1), c(0.5, 0.75, 1))
expect_equal(rescale_mid(x, mid = 1), c(0, 0.25, 0.5))

expect_equal(rescale_mid(x, mid = 1, to = c(0, 10)), c(0, 2.5, 5))
expect_equal(rescale_mid(x, mid = 1, to = c(8, 10)), c(8, 8.5, 9))
expect_equal(rescale_mid(x, mid = 1, to = c(8, 10)), c(8, 8.5, 9))
})


Expand All @@ -23,13 +23,27 @@ test_that("resacle_max returns correct results", {

test_that("zero range inputs return mid range", {
expect_that(rescale(0), equals(0.5))
expect_that(rescale(c(0, 0)), equals(c(0.5, 0.5)))
expect_that(rescale(c(0, 0)), equals(c(0.5, 0.5)))
})


test_that("censor and squish ignore infinite values", {
expect_equal(squish(c(1, Inf)), c(1, Inf))
expect_equal(censor(c(1, Inf)), c(1, Inf))


})


})

test_that("scaling is possible with dates and times", {
dates <- as.Date(c("2010-01-01", "2010-01-03", "2010-01-05", "2010-01-07"))
expect_equal(rescale(dates, from = c(dates[1], dates[4])), seq(0,1,1/3))
expect_equal(rescale_mid(dates, mid = dates[3])[3], 0.5)

dates <- as.POSIXct(c("2010-01-01 01:40:40",
"2010-01-01 03:40:40",
"2010-01-01 05:40:40",
"2010-01-01 07:40:40"))
expect_equal(rescale(dates, from = c(dates[1], dates[4])), seq(0, 1, 1/3))
expect_equal(rescale_mid(dates, mid = dates[3])[3], 0.5)

})