Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New approach to variable rescaling #339

Merged
merged 10 commits into from
Mar 28, 2022
Merged

New approach to variable rescaling #339

merged 10 commits into from
Mar 28, 2022

Conversation

hadley
Copy link
Member

@hadley hadley commented Mar 25, 2022

@davidchall I started getting worried that the change to label_number_si() was likely to affect too much existing code, and after much refactoring this is what I ended up with:

  • number() gets a new argument called scale_cut, equivalent to dollar()'s rescale_large.
  • rescale_long_scale() and rescale_short_scale() are renamed to cut_short_scale() and cut_long_scale()
  • Add new cut_si() and cut_bytes() to pull out the code previously nested inside label_number_si() and label_bytes(). This involved refactoring scale_cut() to take a vector of break points, not log10 break points.

Along the way, I also fixed #264 by refactoring number() to allow accuracy to be a vector (which then requires some manual looping over nsmall).

I think overall this is a cleaner separation of concerns and should be a little more flexible because it's implemented in number() rather than the wrappers.

@hadley
Copy link
Member Author

hadley commented Mar 25, 2022

@davidchall I'll probably start the release process next Friday, so if you have time before then I'd love to get your feedback.

@davidchall
Copy link
Contributor

Hi @hadley - I remember the design being quite tricky when I wrote this code a couple of years ago, so I really appreciate you taking another look at it before release. I wondered about surfacing the long and short scales in number(), so it's nice to see this here. I'll try to review over the weekend. Thanks!

Copy link
Contributor

@davidchall davidchall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is absolutely fantastic! 🎉 It feels much more natural, and I'm sure it will be easier to maintain as a result. Most of my feedback was very minor.

My only remaining concern is about assigning units to the number zero. My previous solution would output "0 B" from the bytes labeler, "0 m" from the SI unit labeler, and "0" from the currency labeler (which has no unit). I think these now zeros always omit the unit. I think this has a couple of disadvantages, and I wonder if others would report this later?

  1. On plot axes, it'll look weird to have inconsistent tick labels (e.g. 0, 0.5 m, 1.0 m, 1.5 m, 2.0 m).

  2. From a scientific standpoint, I think there is a difference between "0" and "0 m". The latter implies that the measured quantity has a dimension of length. To omit the unit is odd because it suggests the dimension of the axis is inconsistent (length and dimensionless).

In conclusion, I can't wait to start using these functions! 😄

R/label-number-si.R Outdated Show resolved Hide resolved
R/label-number.r Show resolved Hide resolved
R/label-number.r Outdated Show resolved Hide resolved
R/label-number.r Outdated Show resolved Hide resolved
R/label-number.r Outdated Show resolved Hide resolved
})

test_that("handles out-of-range inputs", {
expect_equal(number(1e15, scale_cut = cut_short_scale()), "1 000T")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also test the low end, because cut() is directional.

Suggested change
expect_equal(number(1e15, scale_cut = cut_short_scale()), "1 000T")
expect_equal(number(1e15, scale_cut = cut_short_scale()), "1 000T")
expect_equal(number(1e-25, scale_cut = cut_si("")), "0.1y")

tests/testthat/test-label-number.r Show resolved Hide resolved
test_that("cut_si() adds space before unit", {
expect_equal(number(c(1, 1000), scale_cut = cut_si("m")), c("1 m", "1 km"))
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should test how cut_si("m") handles zero. It seems like an edge case due to how cut_si() bakes the units into the scale suffix. IMHO the expected result should be "0 m".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, should be fixed now.

tests/testthat/test-label-bytes.R Outdated Show resolved Hide resolved
NEWS.md Outdated
for billion). It can be used with `cut_short_scale()` when billion =
thousand million and `cut_long_scale()` when billion = million million.
Additionally, the accuracy is now computed per scale category, so rescaled
values can have different numbers of decimal places.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I didn't implement the final version, but I did spend quite a bit of time thinking about this problem. Could I possibly request something like (with help from @davidchall)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely! Many apologies for accidentally editing out your contributions to this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries!

Copy link
Member Author

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I really appreciate your help making this code higher quality.

NEWS.md Outdated
for billion). It can be used with `cut_short_scale()` when billion =
thousand million and `cut_long_scale()` when billion = million million.
Additionally, the accuracy is now computed per scale category, so rescaled
values can have different numbers of decimal places.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely! Many apologies for accidentally editing out your contributions to this feature.

R/label-number.r Outdated Show resolved Hide resolved
R/label-number.r Outdated Show resolved Hide resolved
number(1, scale_cut = "x")
number(1, scale_cut = c(x = 0, y = 1))
})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem worth testing to me, as it's obvious via inspection of the code, and any test would effectively just be testing that base R generates a useful error message here.

test_that("cut_si() adds space before unit", {
expect_equal(number(c(1, 1000), scale_cut = cut_si("m")), c("1 m", "1 km"))
})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, should be fixed now.

c("$1", "$1K", "$1M", "$1,000M", "$1B", "$1,000B", "$1T")
)
expect_equal(
label_dollar(rescale_large = c(k = 3L, m = 6L, bn = 9L, tn = 12L))(x),
label_dollar(scale_cut = c(k = 1e3, m = 1e6, bn = 1e9, tn = 1e12))(x),
c("$1", "$1k", "$1m", "$1bn", "$1tn", "$1,000tn", "$1,000,000tn")
)
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I generally simplified these tests to focus on what label_dollar() now implements.

@hadley hadley merged commit b3c8173 into main Mar 28, 2022
@hadley hadley deleted the rescaled-accuracy branch March 28, 2022 14:53
Comment on lines +55 to +56
auto units (@davidchall, #235) and leaves `0` as is (instead of formatting to
"0 B") for consistency with `label_number_si()`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hadley I think this addition should be removed, now that we format as "0 B".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll fix

@davidchall davidchall mentioned this pull request Mar 29, 2022
27 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

label_number_si accuracy when labelling 'K' and 'M'
2 participants