Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

label_number_si accuracy when labelling 'K' and 'M' #264

Closed
samuelhuerga opened this issue Apr 1, 2020 · 3 comments · Fixed by #339
Closed

label_number_si accuracy when labelling 'K' and 'M' #264

samuelhuerga opened this issue Apr 1, 2020 · 3 comments · Fixed by #339
Labels
feature a feature request or enhancement
Milestone

Comments

@samuelhuerga
Copy link

Hello!

I've been trying to scale my axis, that goes from 0 to 1.2M. Ideally, I would like to get numbers below 1M with K (and no decimals) but numbers above with one decimal (that is, 1.0M, 1.2M)

When I try default accuracy, I may get confused by numbers above 1M:
demo_continuous(c(1, 1.2e6), label = label_number_si())
image

But if I change accuracy, then I get decimals too for numbers below 1M:
demo_continuous(c(1, 1.2e6), label = label_number_si(0.1))

image

It would be awesome to have some parameters to control this kind of options.

Thanks!

@hadley hadley added the feature a feature request or enhancement label Apr 3, 2020
@SimonDedman
Copy link

SimonDedman commented Nov 30, 2020

Similarly, were this level of control to exist, feasibly (elements of) it could be enabled by default. With the caveat that I've done these below plots in a loop and am able to exert more control if I do them individually and edit the accuracy parameter, I wonder if the existing background checks:

If NULL, the default, uses a heuristic that should ensure breaks have the minimum number of digits needed to show the difference between adjacent values

could be extended to implement higher accuracy when labels deviate too far from the real values? By way of examples:

2020-11-25_Stock_DistanceToShoreKm
Y scale 1500 Km labelled as 2K but the next item up is actually 2K so they both have the same label.

Here one can infer that the 2K is 2.5K but that's simply because a linear scale is more common than a log or similar.
2020-11-25_Stock_EddySpeedAmp

In other instances it works out fine. Possible checks that could be implemented:

  • If >1 ticks have the same label, increase accuracy by 1 and recheck
  • If the percentage difference between label & tick value is too high, increase accuracy by 1 and recheck. E.g. mod(1500-2000) = 500, 500/1500 = 0.33 = label off by 33%.
    I'm not sure what the values should be, of course, and I strongly suspect this'll only be an issue at the start of small ranges, since long ranges should cause the problematic midpoints to disappear, and reduce the relative discrepancies.

Cheers, as ever.

@hadley
Copy link
Member

hadley commented Mar 17, 2022

With the current dev version:

library(scales)
x <- c(0, 250, 500, 750, 1250, 1500) * 1e3
label_number_si("")(x)
#> [1] "0.00"    "250.00k" "500.00k" "750.00k" "1.25M"   "1.50M"

Created on 2022-03-17 by the reprex package (v2.0.1)

So it looks like we've solved one problem and created another (smaller) one 😄

@hadley
Copy link
Member

hadley commented Mar 17, 2022

I think probably we need to compute accuracy individually for each scale group (something like accuracy <- ave(x, rescale$scale, FUN = precision)) but it seems like that needs some downstream fixes to fully work.

...

Of course, because the ndigits argument to format() is not vectorised.


This need fixes everywhere that rescale_by_suffix() is used, i.e. also to label_bytes() and label_dollar().

So maybe need to move rescale_by_suffix() into number()?

@hadley hadley added this to the v1.2.0 milestone Mar 19, 2022
hadley added a commit that referenced this issue Mar 28, 2022
* `number()` gets a new argument called `scale_cut`, equivalent to `dollar()`'s `rescale_large`.
* `rescale_long_scale()` and `rescale_short_scale()` are renamed to `cut_short_scale()` and `cut_long_scale()`
* Add new `cut_si()` and `cut_bytes()` to pull out the code previously nested inside `label_number_si()` and `label_bytes()`. This involved refactoring `scale_cut()` to take a vector of break points, not `log10` break points.

Along the way, I also fixed #264 by refactoring `number()` to allow `accuracy` to be a vector (which then requires some manual looping over `nsmall`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants