a generic number formatter #142

larmarange · 2018-07-04T16:42:51Z

cf. discussion in #77

hadley

Basic idea looks great. I gave you a bunch of style comments because I think this is a good idea, and I wanted to make the implementation as strong as possible. Thanks for all your work on this!

hadley · 2018-07-05T13:03:31Z

R/formatter.r

+#'                            suffix = "\u2030",
+#'                            accuracy = .1)
+#' per_mille(v)
+number_format <- function(accuracy = 1, scale = 1, prefix = "",


I think it would make more sense to default big.mark to ",", and set decimal.mark = NULL

Then we can do:

if (is.null(decimal.mark)) { decimal.mark = if (identical(big.mark, ",")) "." else "," }

And then document big.mark and decimal.mark together.

Probably because I'm coming from a different culture (I'm French and comma is used as a decimal separator), I'm personally not in favour of using a comma as the default thousands separator, for the following reasons:

using comma as the thousands separator is the purpose of comma_format specific formatter;

in an international perspective, it would be more relevant to use a space as a thousands separator, as used for example by Lancet journal (a space for thousands separator and a dot for decimal separator being a good compromise for a good understanding for an international audience);

since 2003, the use of spaces as separators (for example: 20 000 and 1 000 000 for "twenty thousand" and "one million") has been officially endorsed by SI/ISO 31-0 standard, as well as by the International Bureau of Weights and Measures and the International Union of Pure and Applied Chemistry (IUPAC), the American Medical Association's widely followed AMA Manual of Style, and the Metrication Board, among others. (cf. https://www.wikiwand.com/en/Decimal_separator#/Digit_grouping)

If a space is used for thousands separator, I would suggest to simply state "." as the default value for decimal.mark, rather than NULL as it avoids to have to consider an implicit rule.

I understand those reasons, but it would be unfortunately inconsistent with every other number formatting function in the tidyverse

OK, I have changed the default behaviour,.

By default, big.mark is NULL. If NULL, a comma will be used except if decimal.mark is already a comma and in such case a space will be used for thousands separators.

hadley · 2018-07-05T13:04:04Z

R/formatter.r

+#' my_format(v)
+#'
+#' # Per mille
+#' per_mille <- number_format(scale = 1000,


Can you please use tidyverse style guide here? http://style.tidyverse.org/syntax.html#long-lines

Sorry, I forgot that styler do not check documentation examples.

It is corrected.

hadley · 2018-07-05T13:04:23Z

R/formatter.r

@@ -1,3 +1,72 @@
+#' Number formatter: a generic formatter for numbers
+#'
+#' @return \code{number_format} returns a function with single parameter


Can you please use markdown formatting, i.e. `number_format`

hadley · 2018-07-05T13:04:38Z

R/formatter.r

+#' decimal point
+#' @param trim logical, if \code{FALSE}, values are right-justified to a common
+#' width (see \code{\link[base]{format}})
+#' @param ... other arguments passed on to \code{\link[base]{format}}


Markdown link syntax is [base::format()]

hadley · 2018-07-05T13:06:05Z

R/formatter.r

+  ret <- paste0(
+    prefix,
+    format(
+      scale * x,


Shouldn't this be x / scale? I don't think I understand the name scale.

Originally, in the first version of this generic formatter, I used the word multiplier, feeling it was more explicit.

However, when checking the other existing formatters available in scales, it appeared that unit_format already used such argument, called scale and returning x * scale.

Therefore, I changed the name of that argument of number_format to scale to be consistent with unit_format.

We can decide to use another wording. However, my feeling is that we should try to have consistent names and behaviours between all formatters, as much as possible.

However, I do not know if you want to maintain backward compatibility with previous version of scales or if you prefer to reorganize all formatters, even if it's introducing some breaking.

Ah ok, leave as scale then

Maybe emphasise that x is multiplied by scale in the docs?

hadley · 2018-07-05T13:06:16Z

R/formatter.r

+    prefix,
+    format(
+      scale * x,
+      big.mark = big.mark, decimal.mark = decimal.mark,


Please put each named argument on a separate line

hadley · 2018-07-05T13:07:12Z

tests/testthat/test-formatter.r

@@ -8,6 +8,33 @@ test_that("time_format formats hms objects", {
  expect_equal(time_format(format = "%H")(hms::as.hms(a_time, tz = "UTC")), "11")
 })

+
+test_that("number format works correctly", {


Can you please break up in to smaller tests? Like one for accuracy, one for decimal vs big mark, one for scale, one for prefix/suffix?

tests have been simplified and broken into smaller tests

hadley · 2018-07-05T13:07:56Z

R/formatter.r

+                          suffix = "", big.mark = " ", decimal.mark = ".",
+                          trim = TRUE, ...) {
+  function(x) number(
+      x, accuracy, scale, prefix,


Can you please name arguments here (apart from x). That makes it safer if we later change the arguments to number().

hadley · 2018-07-05T13:09:15Z

R/formatter.r

+                   suffix = "", big.mark = " ", decimal.mark = ".",
+                   trim = TRUE, ...) {
+  if (length(x) == 0) return(character())
+  if (is.null(accuracy)) {


Would it make the code simpler to do

accuracy <- accuracy %||% precision(x) x <- round_any(x, accuracy / scale) nsmall <- -floor(log10(accuracy)) nsmall <- min(max(nsmall, 0), 20)

?

yes, clearly

hadley · 2018-07-05T13:09:38Z

R/formatter.r

+  nsmall <- min(max(nsmall, 0), 20)
+  ret <- paste0(
+    prefix,
+    format(


Might make the code a bit simpler to pull this out into a variable?

sorry, I'm not sure to understand your comment.

Do you mean something like:

ret <- format( scale * x, big.mark = big.mark, decimal.mark = decimal.mark, trim = trim, nsmall = nsmall, scientific = FALSE, ... ) ret <- paste0(prefix, ret, suffix)

Yes, exactly.

hadley

Looking really good - thanks. Just a few more minor suggestions.

hadley · 2018-07-05T18:00:07Z

R/formatter.r

@@ -1,3 +1,84 @@
+#' Number formatter: a generic formatter for numbers
+#'
+#' @return `number_format` returns a function with single parameter


Can you please ident subsequent lines with two spaces? http://style.tidyverse.org/documentation.html#indents-and-line-breaks

hadley · 2018-07-05T18:00:40Z

R/formatter.r

+#'
+#' @return `number_format` returns a function with single parameter
+#' `x`, a numeric vector, that returns a character vector
+#' @param x a numeric vector to format


Can you please make parameter descriptions into sentences? i.e. start with a capital letter and end with a full stop?

ok. I will update the PR tonight or tomorrow.

hadley · 2018-07-05T18:02:03Z

R/formatter.r

+  nsmall <- -floor(log10(accuracy))
+  nsmall <- min(max(nsmall, 0), 20)
+
+  if (is.null(big.mark)) {


Hmmmm, now this logic seems like it's the wrong way around. Maybe it's decimal.mark that should be NULL?

This is making me feel that you're correct that the default big.mark should be " "

It's not so easy to find an appropriate behaviour if we set decimal.mark to NULL.

Your previous proposal was:

if (is.null(decimal.mark)) { decimal.mark = if (identical(big.mark, ",")) "." else "," }

But this behaviour is probably not the best. If I need to present results in an international format, I have to specify big.mark = " " but I want to keep a dot as a decimal separator.

If I want to present results in French, the first thing I want to set is the decimal.mark as a comma, and the change of big.mark is a consequence of this previous change.

Therefore, there is no general rule to derive decimal.mark from big.mark.

It is true that using a space as the default thousands separator is more generic as it is working with different decimal marks.

In addition, for those in need to present results in an American style, there is still the comma_format that will be simply a short cut for number_format with a comma as the default decimal mark.

Ok, lets go back to your original proposal with big.mark = " "

hadley · 2018-07-05T18:04:30Z

And can you please add a bullet to NEWS? It should briefly describe the change and end with (@yourname, #issuenumber).

cf. r-lib#142

larmarange · 2018-07-06T15:15:42Z

I applied all requested corrections and I squashed all PR commits into one

hadley · 2018-07-06T15:58:52Z

Thanks for all your hard work on this!

larmarange · 2018-07-06T16:29:52Z

You are welcome.

Let me know how you want to deal now with the existing formatters and if you want to base them on the generic formatter.

It could be an opportunity to harmonise arguments between the different formatters.

comma_format could be an alias of number_format with just a different default value for big.mark, keeping the same list of arguments.

dollar_format could also be based on number_format with the management of its 2 additional arguments largest_with_cents and negative_parens.

percent_format could simply be an alias of number_format, just changing default values.

scientific_format: seems not appropriate to base that function on number_format. But we could consider to add to that formatter some arguments implemented in number_format like prefix, suffix...

ordinal_format: it's possible to use number_format in that formatter and also to allow to customise the function to be adapted to other languages.

Is format_format still relevant?

unit_format could be an alias of number_format. But we need to decide if the unit argument should be maintained for backward compatibility (otherwise, it could be just the suffix argument)

Let me know if you want to reorganise these formatters using number_format.

If yes, would you prefer separate PR or only one PR?

Would you prefer to keep a separate documentation page or to present the different alias on the same page?

hadley · 2018-07-06T16:35:03Z

Maybe start with PR that uses number_format() for the most obvious cases? i.e. comma_format(), percent_format(), and unit_format(). Then we can discuss the other ones on a case-by-case basis (but generally, the more shared code, the better).

I'd prefer to have them all documented together.

larmarange mentioned this pull request Jul 4, 2018

More options for Percent format #77

Closed

hadley reviewed Jul 5, 2018

View reviewed changes

larmarange added a commit to larmarange/scales that referenced this pull request Jul 6, 2018

a generic number formatter

610e850

cf. r-lib#142

a generic number formatter

31c327f

cf. r-lib#142

larmarange force-pushed the number_format branch from 610e850 to 31c327f Compare July 6, 2018 15:14

hadley merged commit 8257643 into r-lib:master Jul 6, 2018

dpseidel mentioned this pull request Jul 6, 2018

percent gives NaN% for all values in column if any one is NaN #97

Closed

larmarange mentioned this pull request Jul 6, 2018

updating comma_format, percent_format and unit_format #146

Merged

larmarange deleted the number_format branch July 10, 2018 08:37

larmarange mentioned this pull request Jul 25, 2018

Change big.mark default in number() #162

Closed

larmarange mentioned this pull request Jun 29, 2020

Add European Number Styling Theme ddsjoberg/gtsummary#557

Closed

a generic number formatter #142

a generic number formatter #142

Conversation

larmarange commented Jul 4, 2018

hadley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

larmarange Jul 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadley commented Jul 5, 2018

larmarange commented Jul 6, 2018

hadley commented Jul 6, 2018

larmarange commented Jul 6, 2018

hadley commented Jul 6, 2018

larmarange Jul 5, 2018 •

edited

Loading