New approach to variable rescaling #339

hadley · 2022-03-25T21:47:01Z

@davidchall I started getting worried that the change to label_number_si() was likely to affect too much existing code, and after much refactoring this is what I ended up with:

number() gets a new argument called scale_cut, equivalent to dollar()'s rescale_large.
rescale_long_scale() and rescale_short_scale() are renamed to cut_short_scale() and cut_long_scale()
Add new cut_si() and cut_bytes() to pull out the code previously nested inside label_number_si() and label_bytes(). This involved refactoring scale_cut() to take a vector of break points, not log10 break points.

Along the way, I also fixed #264 by refactoring number() to allow accuracy to be a vector (which then requires some manual looping over nsmall).

I think overall this is a cleaner separation of concerns and should be a little more flexible because it's implemented in number() rather than the wrappers.

hadley · 2022-03-25T21:47:49Z

@davidchall I'll probably start the release process next Friday, so if you have time before then I'd love to get your feedback.

davidchall · 2022-03-26T00:12:59Z

Hi @hadley - I remember the design being quite tricky when I wrote this code a couple of years ago, so I really appreciate you taking another look at it before release. I wondered about surfacing the long and short scales in number(), so it's nice to see this here. I'll try to review over the weekend. Thanks!

davidchall

I think this is absolutely fantastic! 🎉 It feels much more natural, and I'm sure it will be easier to maintain as a result. Most of my feedback was very minor.

My only remaining concern is about assigning units to the number zero. My previous solution would output "0 B" from the bytes labeler, "0 m" from the SI unit labeler, and "0" from the currency labeler (which has no unit). I think these now zeros always omit the unit. I think this has a couple of disadvantages, and I wonder if others would report this later?

On plot axes, it'll look weird to have inconsistent tick labels (e.g. 0, 0.5 m, 1.0 m, 1.5 m, 2.0 m).
From a scientific standpoint, I think there is a difference between "0" and "0 m". The latter implies that the measured quantity has a dimension of length. To omit the unit is odd because it suggests the dimension of the axis is inconsistent (length and dimensionless).

In conclusion, I can't wait to start using these functions! 😄

R/label-number-si.R

R/label-number.r

davidchall · 2022-03-26T15:14:09Z

tests/testthat/test-label-number.r

+})
+
+test_that("handles out-of-range inputs", {
+  expect_equal(number(1e15, scale_cut = cut_short_scale()), "1 000T")


I think we should also test the low end, because cut() is directional.

Suggested change

expect_equal(number(1e15, scale_cut = cut_short_scale()), "1 000T")

expect_equal(number(1e15, scale_cut = cut_short_scale()), "1 000T")

expect_equal(number(1e-25, scale_cut = cut_si("")), "0.1y")

tests/testthat/test-label-number.r

davidchall · 2022-03-26T15:28:04Z

tests/testthat/test-label-number.r

+test_that("cut_si() adds space before unit", {
+  expect_equal(number(c(1, 1000), scale_cut = cut_si("m")), c("1 m", "1 km"))
+})
+


I think we should test how cut_si("m") handles zero. It seems like an edge case due to how cut_si() bakes the units into the scale suffix. IMHO the expected result should be "0 m".

Ok, should be fixed now.

tests/testthat/test-label-bytes.R

davidchall · 2022-03-26T15:40:14Z

NEWS.md

+  for billion). It can be used with `cut_short_scale()` when billion =
+  thousand million and `cut_long_scale()` when billion = million million.
+  Additionally, the accuracy is now computed per scale category, so rescaled
+  values can have different numbers of decimal places.


I know I didn't implement the final version, but I did spend quite a bit of time thinking about this problem. Could I possibly request something like (with help from @davidchall)?

Yes, absolutely! Many apologies for accidentally editing out your contributions to this feature.

No worries!

Co-authored-by: David C Hall <davidchall@users.noreply.github.com>

hadley

Thanks for the review! I really appreciate your help making this code higher quality.

hadley · 2022-03-28T14:04:27Z

NEWS.md

+  for billion). It can be used with `cut_short_scale()` when billion =
+  thousand million and `cut_long_scale()` when billion = million million.
+  Additionally, the accuracy is now computed per scale category, so rescaled
+  values can have different numbers of decimal places.


Yes, absolutely! Many apologies for accidentally editing out your contributions to this feature.

R/label-number.r

hadley · 2022-03-28T14:11:21Z

tests/testthat/test-label-number.r

+    number(1, scale_cut = "x")
+    number(1, scale_cut = c(x = 0, y = 1))
+  })
+


This doesn't seem worth testing to me, as it's obvious via inspection of the code, and any test would effectively just be testing that base R generates a useful error message here.

hadley · 2022-03-28T14:24:31Z

tests/testthat/test-label-number.r

+test_that("cut_si() adds space before unit", {
+  expect_equal(number(c(1, 1000), scale_cut = cut_si("m")), c("1 m", "1 km"))
+})
+


Ok, should be fixed now.

hadley · 2022-03-28T14:33:28Z

tests/testthat/test-label-dollar.R

    c("$1", "$1K", "$1M", "$1,000M", "$1B", "$1,000B", "$1T")
  )
  expect_equal(
-    label_dollar(rescale_large = c(k = 3L, m = 6L, bn = 9L, tn = 12L))(x),
+    label_dollar(scale_cut = c(k = 1e3, m = 1e6, bn = 1e9, tn = 1e12))(x),
    c("$1", "$1k", "$1m", "$1bn", "$1tn", "$1,000tn", "$1,000,000tn")
  )
 })


Good catch. I generally simplified these tests to focus on what label_dollar() now implements.

davidchall · 2022-03-28T15:24:16Z

NEWS.md

+  auto units (@davidchall, #235) and leaves `0` as is (instead of formatting to 
+  "0 B") for consistency with `label_number_si()`.


@hadley I think this addition should be removed, now that we format as "0 B".

Thanks, I'll fix

hadley added 2 commits March 25, 2022 09:22

Ground work

6a50f31

Major overhaul

e244bb5

davidchall reviewed Mar 26, 2022

View reviewed changes

hadley and others added 8 commits March 28, 2022 09:08

Apply suggestions from code review

e8ed5aa

Co-authored-by: David C Hall <davidchall@users.noreply.github.com>

Tweaks from code review

091b1ab

Update docs + snapshots

ce7a591

Unit now always required

1eed63d

Show units on 0

2945a5f

More news tweaking

48407dc

Refine tests

9b56c81

Example fixes

366d689

hadley commented Mar 28, 2022

View reviewed changes

hadley merged commit b3c8173 into main Mar 28, 2022

hadley deleted the rescaled-accuracy branch March 28, 2022 14:53

hadley mentioned this pull request Mar 28, 2022

Is label_number_si() change too aggressive? #331

Closed

davidchall reviewed Mar 28, 2022

View reviewed changes

davidchall mentioned this pull request Mar 29, 2022

Release scales 1.2.0 #340

Closed

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New approach to variable rescaling #339

New approach to variable rescaling #339

hadley commented Mar 25, 2022

hadley commented Mar 25, 2022

davidchall commented Mar 26, 2022

davidchall left a comment

davidchall Mar 26, 2022

davidchall Mar 26, 2022

hadley Mar 28, 2022

davidchall Mar 26, 2022

hadley Mar 28, 2022

davidchall Mar 28, 2022

hadley left a comment

hadley Mar 28, 2022

hadley Mar 28, 2022

hadley Mar 28, 2022

hadley Mar 28, 2022

davidchall Mar 28, 2022

hadley Mar 29, 2022

	expect_equal(number(1e15, scale_cut = cut_short_scale()), "1 000T")
	expect_equal(number(1e15, scale_cut = cut_short_scale()), "1 000T")
	expect_equal(number(1e-25, scale_cut = cut_si("")), "0.1y")

		auto units (@davidchall, #235) and leaves `0` as is (instead of formatting to
		"0 B") for consistency with `label_number_si()`.

New approach to variable rescaling #339

New approach to variable rescaling #339

Conversation

hadley commented Mar 25, 2022

hadley commented Mar 25, 2022

davidchall commented Mar 26, 2022

davidchall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment