Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix label_number_si() to use SI prefixes #235

Merged
merged 30 commits into from Mar 26, 2021
Merged

Conversation

davidchall
Copy link
Contributor

@davidchall davidchall commented Nov 20, 2019

Fixes #234. Fixes #301.

TLDR: label_number_si() is currently providing short scale abbreviations instead of SI prefixes.

Both SI prefixes and long/short scale abbreviations are solutions to the same problem – how to specify large values more concisely. SI prefixes are now favored in most domains except finance. Abbreviations of long/short scales are not standardized (e.g. million could be "M", "m", "mn" in different circles) and depends on locale.

I've done my best to adjust the interface, but would love to hear feedback on how I could improve!

Here's what changed:

  • Changes to label_number_si()
    • Now uses SI prefixes
    • The unit argument was promoted to a required argument
    • Added scale argument
    • Default accuracy changed to NULL (i.e. auto precision)
    • Removed sep argument (wasn't actually doing anything)
  • Changes to label_dollar()
    • Added rescale_large argument
    • Added rescale_short_scale() and rescale_long_scale() built-in values, to use with this argument
  • Changes to label_bytes()
    • Handle scale argument (would've been passed by ...)
  • Added examples and tests

@davidchall davidchall changed the title Adjust label_number_si() to use SI prefixes Fix label_number_si() to use SI prefixes Feb 20, 2020
Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable overall, but will need to wait for a bigger release since it will need some decent marketing around the change.

R/label-dollar.R Outdated Show resolved Hide resolved
@hadley hadley added the breaking change ☠️ API change likely to affect existing code label May 7, 2020
R/label-number-si.R Outdated Show resolved Hide resolved
@hadley
Copy link
Member

hadley commented Mar 19, 2021

It's been a while since I looked at this, so could you please give me a brief summary of the overall goals and what's changed since last time I looked?

@davidchall
Copy link
Contributor Author

Hi @hadley - sure thing!

Overall Goals

label_number_si() scales each input and assigns a corresponding suffix. For example, 1e3 -> 1 K (1 thousand) and 1e9 -> 1 B (1 billion). The problem is that these suffixes do not correspond to the SI unit prefixes used in science. They actually correspond to the short scale abbreviations used more colloquially (e.g. finance, populations).

I think it'd be good to retain support for the short scale, but there are some additional complexities to consider. First, many countries use the alternative long scale or another system (e.g. Indian numbering system). Because it's particularly common to use the short scale in finance (e.g. $1M), I decided it makes most sense to add this feature to label_dollar(). Other use cases (e.g. populations) could simply disable the dollar prefix.

The goal of this PR is twofold:

  1. Adjust label_number_si() so it uses the intended SI unit prefixes.
  2. Adjust label_dollar() so it supports custom scaling-by-suffix. The most common scenarios are supported by short_scale() and long_scale().

Changes since last review

  • You requested the billion_scale argument of label_dollar() be renamed to something more generic, and suggested rescale. I've renamed this as rescale_large, which hints at when it should be used.
  • I adjusted label_dollar() so the accuracy and scale arguments work well with rescale_large.
  • I changed a default argument of label_number_si() to accuracy = NULL so that the displayed precision is automatically chosen by scales::number(). This is a more natural choice for this labeler following these changes.
  • I refactored logic into rescale_by_suffix(), so both label_number_si() and label_dollar() use the same underlying scaling code.
  • I requested that we elevate unit to a required argument of label_number_si(). I don't think it makes sense to use an SI prefix without a unit (though @sjackman thinks this sounds like a reasonable use case). From my perspective, it doesn't make sense to say "5 milli-" without following up with "meter" or "gram" or something else. I was hoping we could help the user to remember to specify the unit. But I don't feel particularly strongly about this.

@sjackman
Copy link

a dimensionless quantity is a quantity to which no physical dimension is assigned, also known as a bare, pure, or scalar quantity or a quantity of dimension one,[1] with a corresponding unit of measurement in the SI of the unit one (or 1),[2][3] which is not explicitly shown [emphasis added]

https://en.wikipedia.org/wiki/Dimensionless_quantity

It's entirely reasonable to use SI prefixes to represent dimensionless quantities, such as counts and ratios.

@sjackman
Copy link

Thank you very much for this work, David! It'll make label_number_si much more useful to me. Have a good weekend!

@davidchall
Copy link
Contributor Author

@sjackman Thanks for your feedback about dimensionless quantities, though my concern was specifically about using SI prefixes with dimensionless quantities. I've never seen this done before and I couldn't find any examples on that Wikipedia page.

This might be getting a little pedantic now, but I looked in the official publication of the Bureau International des Poids et Mesures titled "The International System of Units (SI)". In section 3.1, they state:

Prefix symbols can neither stand alone nor be attached to the number 1, the symbol for the unit one. Similarly, prefix names cannot be attached to name of the unit one, that is, the word "one".

In the vast majority of cases, people will want to set the unit argument. I was hoping that elevating it to a required argument would make the function more user-friendly. It would still be possible to set unit = "" to ascribe SI prefixes to dimensionless quantities if desired.

Just to reiterate that I don't mind being overruled on this, if people feel strongly about this. I know it must sound like I'm very passionate about SI units right now! 😆

@sjackman
Copy link

I'm glad that you're passionate about SI units! My preference is for unit to remain optional, but if the consensus preference is to make it mandatory, then explicitly setting unit = "" isn't the worst thing in the world.

Does the official publication make any recommendation as to how to represent dimensionless quantities, other than don't? Which is actually pretty funny. 😆

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, and for the detailed explanation of what's changed. Overall, it looks great 😀 I have a few minor comments below, but can you please also add a NEWS bullet making it clear that this is a breaking change, and briefly justifying why (i.e. it previously didn't actually use SI units)

R/label-dollar.R Outdated Show resolved Hide resolved
R/label-dollar.R Show resolved Hide resolved
R/si-prefixes.R Outdated Show resolved Hide resolved
@davidchall
Copy link
Contributor Author

@sjackman

Does the official publication make any recommendation as to how to represent dimensionless quantities, other than don't?

There are some types of dimensionless quantities, where the recommendation is to use a derived unit (not a base SI unit). For example, a measurement of plane angle would use radians (which expressed in terms of base SI units is meter / meter). In this situation, we can use the SI prefixes (e.g. milliradians or mrad).

They don't give explicit recommendation for situations where there is no appropriate unit (e.g. count) but the number is significantly different from 1. I think a reasonable approach would be to use scientific notation. There is already the scales::label_scientific() formatter for this purpose.

@davidchall
Copy link
Contributor Author

Hi @hadley, I've updated with your feedback. Before updating the NEWS, I was just hoping for your feedback on this point:

I requested that we elevate unit to a required argument of label_number_si().

The official recommendation from the standards organization is not to use SI prefixes without a unit, so I think 99% of usage will require that this argument is set. My hope is to reduce user error by prompting for this argument. They could still set unit = "" if they desire no unit.

@sjackman is in favor of retaining the default value of NULL (i.e. no unit). This has the advantage that existing code still produces a plot (though the result is changed). I think we each have moderate strength of feeling for our own preference, but are both willing to accept the alternative option. So I'm really just looking for a decision on how you'd like {scales} to handle this. There's a bit of discussion above, if you'd like to read more context.

@hadley
Copy link
Member

hadley commented Mar 24, 2021

Making it required has the nice property that existing users will get an error, forcing them to look up the docs and discover that its changed.

@davidchall
Copy link
Contributor Author

Hi @hadley - I think I'm finished with this PR now. I tried to follow the NEWS style guide, but please let me know if I need to update anything. Thanks!

NEWS.md Outdated Show resolved Hide resolved
@sjackman
Copy link

Thank you, David! ✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change ☠️ API change likely to affect existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possibly confusing default for label_number_si? label_number_si does not use SI prefixes (Billion vs Giga)
3 participants