label_number_si accuracy when labelling 'K' and 'M' #264

samuelhuerga · 2020-04-01T07:41:01Z

Hello!

I've been trying to scale my axis, that goes from 0 to 1.2M. Ideally, I would like to get numbers below 1M with K (and no decimals) but numbers above with one decimal (that is, 1.0M, 1.2M)

When I try default accuracy, I may get confused by numbers above 1M:
demo_continuous(c(1, 1.2e6), label = label_number_si())

But if I change accuracy, then I get decimals too for numbers below 1M:
demo_continuous(c(1, 1.2e6), label = label_number_si(0.1))

It would be awesome to have some parameters to control this kind of options.

Thanks!

The text was updated successfully, but these errors were encountered:

SimonDedman · 2020-11-30T22:23:20Z

Similarly, were this level of control to exist, feasibly (elements of) it could be enabled by default. With the caveat that I've done these below plots in a loop and am able to exert more control if I do them individually and edit the accuracy parameter, I wonder if the existing background checks:

If NULL, the default, uses a heuristic that should ensure breaks have the minimum number of digits needed to show the difference between adjacent values

could be extended to implement higher accuracy when labels deviate too far from the real values? By way of examples:

Y scale 1500 Km labelled as 2K but the next item up is actually 2K so they both have the same label.

Here one can infer that the 2K is 2.5K but that's simply because a linear scale is more common than a log or similar.

In other instances it works out fine. Possible checks that could be implemented:

If >1 ticks have the same label, increase accuracy by 1 and recheck
If the percentage difference between label & tick value is too high, increase accuracy by 1 and recheck. E.g. mod(1500-2000) = 500, 500/1500 = 0.33 = label off by 33%.
I'm not sure what the values should be, of course, and I strongly suspect this'll only be an issue at the start of small ranges, since long ranges should cause the problematic midpoints to disappear, and reduce the relative discrepancies.

Cheers, as ever.

hadley · 2022-03-17T16:01:00Z

With the current dev version:

library(scales)
x <- c(0, 250, 500, 750, 1250, 1500) * 1e3
label_number_si("")(x)
#> [1] "0.00"    "250.00k" "500.00k" "750.00k" "1.25M"   "1.50M"

^{Created on 2022-03-17 by the reprex package (v2.0.1)}

So it looks like we've solved one problem and created another (smaller) one 😄

hadley · 2022-03-17T16:18:52Z

I think probably we need to compute accuracy individually for each scale group (something like accuracy <- ave(x, rescale$scale, FUN = precision)) but it seems like that needs some downstream fixes to fully work.

...

Of course, because the ndigits argument to format() is not vectorised.

This need fixes everywhere that rescale_by_suffix() is used, i.e. also to label_bytes() and label_dollar().

So maybe need to move rescale_by_suffix() into number()?

* `number()` gets a new argument called `scale_cut`, equivalent to `dollar()`'s `rescale_large`. * `rescale_long_scale()` and `rescale_short_scale()` are renamed to `cut_short_scale()` and `cut_long_scale()` * Add new `cut_si()` and `cut_bytes()` to pull out the code previously nested inside `label_number_si()` and `label_bytes()`. This involved refactoring `scale_cut()` to take a vector of break points, not `log10` break points. Along the way, I also fixed #264 by refactoring `number()` to allow `accuracy` to be a vector (which then requires some manual looping over `nsmall`).

hadley added the feature a feature request or enhancement label Apr 3, 2020

hadley added this to the v1.2.0 milestone Mar 19, 2022

hadley mentioned this issue Mar 25, 2022

New approach to variable rescaling #339

Merged

hadley closed this as completed in #339 Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

label_number_si accuracy when labelling 'K' and 'M' #264

label_number_si accuracy when labelling 'K' and 'M' #264

samuelhuerga commented Apr 1, 2020

SimonDedman commented Nov 30, 2020 •

edited

Loading

hadley commented Mar 17, 2022

hadley commented Mar 17, 2022 •

edited

Loading

label_number_si accuracy when labelling 'K' and 'M' #264

label_number_si accuracy when labelling 'K' and 'M' #264

Comments

samuelhuerga commented Apr 1, 2020

SimonDedman commented Nov 30, 2020 • edited Loading

hadley commented Mar 17, 2022

hadley commented Mar 17, 2022 • edited Loading

SimonDedman commented Nov 30, 2020 •

edited

Loading

hadley commented Mar 17, 2022 •

edited

Loading