New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a generic number formatter #142

Merged
merged 1 commit into from Jul 6, 2018

Conversation

Projects
None yet
2 participants
@larmarange
Contributor

larmarange commented Jul 4, 2018

cf. discussion in #77

@hadley

Basic idea looks great. I gave you a bunch of style comments because I think this is a good idea, and I wanted to make the implementation as strong as possible. Thanks for all your work on this!

#' suffix = "\u2030",
#' accuracy = .1)
#' per_mille(v)
number_format <- function(accuracy = 1, scale = 1, prefix = "",

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

I think it would make more sense to default big.mark to ",", and set decimal.mark = NULL

Then we can do:

if (is.null(decimal.mark)) {
  decimal.mark = if (identical(big.mark, ",")) "." else ","
}

And then document big.mark and decimal.mark together.

This comment has been minimized.

@larmarange

larmarange Jul 5, 2018

Contributor

Probably because I'm coming from a different culture (I'm French and comma is used as a decimal separator), I'm personally not in favour of using a comma as the default thousands separator, for the following reasons:

  • using comma as the thousands separator is the purpose of comma_format specific formatter;
  • in an international perspective, it would be more relevant to use a space as a thousands separator, as used for example by Lancet journal (a space for thousands separator and a dot for decimal separator being a good compromise for a good understanding for an international audience);
  • since 2003, the use of spaces as separators (for example: 20 000 and 1 000 000 for "twenty thousand" and "one million") has been officially endorsed by SI/ISO 31-0 standard, as well as by the International Bureau of Weights and Measures and the International Union of Pure and Applied Chemistry (IUPAC), the American Medical Association's widely followed AMA Manual of Style, and the Metrication Board, among others. (cf. https://www.wikiwand.com/en/Decimal_separator#/Digit_grouping)

If a space is used for thousands separator, I would suggest to simply state "." as the default value for decimal.mark, rather than NULL as it avoids to have to consider an implicit rule.

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

I understand those reasons, but it would be unfortunately inconsistent with every other number formatting function in the tidyverse

This comment has been minimized.

@larmarange

larmarange Jul 5, 2018

Contributor

OK, I have changed the default behaviour,.

By default, big.mark is NULL. If NULL, a comma will be used except if decimal.mark is already a comma and in such case a space will be used for thousands separators.

#' my_format(v)
#'
#' # Per mille
#' per_mille <- number_format(scale = 1000,

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Can you please use tidyverse style guide here? http://style.tidyverse.org/syntax.html#long-lines

This comment has been minimized.

@larmarange

larmarange Jul 5, 2018

Contributor

Sorry, I forgot that styler do not check documentation examples.

It is corrected.

@@ -1,3 +1,72 @@
#' Number formatter: a generic formatter for numbers
#'
#' @return \code{number_format} returns a function with single parameter

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Can you please use markdown formatting, i.e. `number_format`

This comment has been minimized.

@larmarange

larmarange Jul 5, 2018

Contributor

corrected

#' decimal point
#' @param trim logical, if \code{FALSE}, values are right-justified to a common
#' width (see \code{\link[base]{format}})
#' @param ... other arguments passed on to \code{\link[base]{format}}

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Markdown link syntax is [base::format()]

This comment has been minimized.

@larmarange

larmarange Jul 5, 2018

Contributor

corrected

ret <- paste0(
prefix,
format(
scale * x,

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Shouldn't this be x / scale? I don't think I understand the name scale.

This comment has been minimized.

@larmarange

larmarange Jul 5, 2018

Contributor

Originally, in the first version of this generic formatter, I used the word multiplier, feeling it was more explicit.

However, when checking the other existing formatters available in scales, it appeared that unit_format already used such argument, called scale and returning x * scale.

Therefore, I changed the name of that argument of number_format to scale to be consistent with unit_format.

We can decide to use another wording. However, my feeling is that we should try to have consistent names and behaviours between all formatters, as much as possible.

However, I do not know if you want to maintain backward compatibility with previous version of scales or if you prefer to reorganize all formatters, even if it's introducing some breaking.

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Ah ok, leave as scale then

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Maybe emphasise that x is multiplied by scale in the docs?

This comment has been minimized.

@larmarange

larmarange Jul 5, 2018

Contributor

Corrected

prefix,
format(
scale * x,
big.mark = big.mark, decimal.mark = decimal.mark,

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Please put each named argument on a separate line

This comment has been minimized.

@larmarange

larmarange Jul 5, 2018

Contributor

corrected

@@ -8,6 +8,33 @@ test_that("time_format formats hms objects", {
expect_equal(time_format(format = "%H")(hms::as.hms(a_time, tz = "UTC")), "11")
})


test_that("number format works correctly", {

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Can you please break up in to smaller tests? Like one for accuracy, one for decimal vs big mark, one for scale, one for prefix/suffix?

This comment has been minimized.

@larmarange

larmarange Jul 5, 2018

Contributor

tests have been simplified and broken into smaller tests

suffix = "", big.mark = " ", decimal.mark = ".",
trim = TRUE, ...) {
function(x) number(
x, accuracy, scale, prefix,

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Can you please name arguments here (apart from x). That makes it safer if we later change the arguments to number().

This comment has been minimized.

@larmarange

larmarange Jul 5, 2018

Contributor

corrected

suffix = "", big.mark = " ", decimal.mark = ".",
trim = TRUE, ...) {
if (length(x) == 0) return(character())
if (is.null(accuracy)) {

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Would it make the code simpler to do

accuracy <- accuracy %||% precision(x)
x <- round_any(x, accuracy / scale)

nsmall <- -floor(log10(accuracy))
nsmall <- min(max(nsmall, 0), 20)

?

This comment has been minimized.

@larmarange

larmarange Jul 5, 2018

Contributor

yes, clearly

nsmall <- min(max(nsmall, 0), 20)
ret <- paste0(
prefix,
format(

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Might make the code a bit simpler to pull this out into a variable?

This comment has been minimized.

@larmarange

larmarange Jul 5, 2018

Contributor

sorry, I'm not sure to understand your comment.

Do you mean something like:

  ret <- format(
    scale * x,
    big.mark = big.mark,
    decimal.mark = decimal.mark,
    trim = trim,
    nsmall = nsmall,
    scientific = FALSE,
    ...
  )
  ret <- paste0(prefix, ret, suffix)

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Yes, exactly.

@hadley

Looking really good - thanks. Just a few more minor suggestions.

@@ -1,3 +1,84 @@
#' Number formatter: a generic formatter for numbers
#'
#' @return `number_format` returns a function with single parameter

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Can you please ident subsequent lines with two spaces? http://style.tidyverse.org/documentation.html#indents-and-line-breaks

This comment has been minimized.

@larmarange

larmarange Jul 6, 2018

Contributor

corrected

#'
#' @return `number_format` returns a function with single parameter
#' `x`, a numeric vector, that returns a character vector
#' @param x a numeric vector to format

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Can you please make parameter descriptions into sentences? i.e. start with a capital letter and end with a full stop?

This comment has been minimized.

@larmarange

larmarange Jul 6, 2018

Contributor

ok. I will update the PR tonight or tomorrow.

This comment has been minimized.

@larmarange

larmarange Jul 6, 2018

Contributor

done

nsmall <- -floor(log10(accuracy))
nsmall <- min(max(nsmall, 0), 20)

if (is.null(big.mark)) {

This comment has been minimized.

@hadley

hadley Jul 5, 2018

Member

Hmmmm, now this logic seems like it's the wrong way around. Maybe it's decimal.mark that should be NULL?

This is making me feel that you're correct that the default big.mark should be " "

This comment has been minimized.

@larmarange

larmarange Jul 6, 2018

Contributor

It's not so easy to find an appropriate behaviour if we set decimal.mark to NULL.

Your previous proposal was:

if (is.null(decimal.mark)) {
  decimal.mark = if (identical(big.mark, ",")) "." else ","
}

But this behaviour is probably not the best. If I need to present results in an international format, I have to specify big.mark = " " but I want to keep a dot as a decimal separator.

If I want to present results in French, the first thing I want to set is the decimal.mark as a comma, and the change of big.mark is a consequence of this previous change.

Therefore, there is no general rule to derive decimal.mark from big.mark.

It is true that using a space as the default thousands separator is more generic as it is working with different decimal marks.

In addition, for those in need to present results in an American style, there is still the comma_format that will be simply a short cut for number_format with a comma as the default decimal mark.

This comment has been minimized.

@hadley

hadley Jul 6, 2018

Member

Ok, lets go back to your original proposal with big.mark = " "

This comment has been minimized.

@larmarange

larmarange Jul 6, 2018

Contributor

done

@hadley

This comment has been minimized.

Member

hadley commented Jul 5, 2018

And can you please add a bullet to NEWS? It should briefly describe the change and end with (@yourname, #issuenumber).

larmarange added a commit to larmarange/scales that referenced this pull request Jul 6, 2018

@larmarange larmarange force-pushed the larmarange:number_format branch from 610e850 to 31c327f Jul 6, 2018

@larmarange

This comment has been minimized.

Contributor

larmarange commented Jul 6, 2018

I applied all requested corrections and I squashed all PR commits into one

@hadley hadley merged commit 8257643 into r-lib:master Jul 6, 2018

3 checks passed

codecov/patch 96.77% of diff hit (target 65.69%)
Details
codecov/project 66.66% (+0.96%) compared to bf6a630
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@hadley

This comment has been minimized.

Member

hadley commented Jul 6, 2018

Thanks for all your hard work on this!

@larmarange

This comment has been minimized.

Contributor

larmarange commented Jul 6, 2018

You are welcome.

Let me know how you want to deal now with the existing formatters and if you want to base them on the generic formatter.

It could be an opportunity to harmonise arguments between the different formatters.

comma_format could be an alias of number_format with just a different default value for big.mark, keeping the same list of arguments.

dollar_format could also be based on number_format with the management of its 2 additional arguments largest_with_cents and negative_parens.

percent_format could simply be an alias of number_format, just changing default values.

scientific_format: seems not appropriate to base that function on number_format. But we could consider to add to that formatter some arguments implemented in number_format like prefix, suffix...

ordinal_format: it's possible to use number_format in that formatter and also to allow to customise the function to be adapted to other languages.

Is format_format still relevant?

unit_format could be an alias of number_format. But we need to decide if the unit argument should be maintained for backward compatibility (otherwise, it could be just the suffix argument)

Let me know if you want to reorganise these formatters using number_format.

If yes, would you prefer separate PR or only one PR?

Would you prefer to keep a separate documentation page or to present the different alias on the same page?

@hadley

This comment has been minimized.

Member

hadley commented Jul 6, 2018

Maybe start with PR that uses number_format() for the most obvious cases? i.e. comma_format(), percent_format(), and unit_format(). Then we can discuss the other ones on a case-by-case basis (but generally, the more shared code, the better).

I'd prefer to have them all documented together.

@larmarange larmarange deleted the larmarange:number_format branch Jul 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment