Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a generic number formatter #142

Merged
merged 1 commit into from
Jul 6, 2018
Merged

Conversation

larmarange
Copy link
Contributor

cf. discussion in #77

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basic idea looks great. I gave you a bunch of style comments because I think this is a good idea, and I wanted to make the implementation as strong as possible. Thanks for all your work on this!

R/formatter.r Outdated
#' suffix = "\u2030",
#' accuracy = .1)
#' per_mille(v)
number_format <- function(accuracy = 1, scale = 1, prefix = "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make more sense to default big.mark to ",", and set decimal.mark = NULL

Then we can do:

if (is.null(decimal.mark)) {
  decimal.mark = if (identical(big.mark, ",")) "." else ","
}

And then document big.mark and decimal.mark together.

Copy link
Contributor Author

@larmarange larmarange Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably because I'm coming from a different culture (I'm French and comma is used as a decimal separator), I'm personally not in favour of using a comma as the default thousands separator, for the following reasons:

  • using comma as the thousands separator is the purpose of comma_format specific formatter;
  • in an international perspective, it would be more relevant to use a space as a thousands separator, as used for example by Lancet journal (a space for thousands separator and a dot for decimal separator being a good compromise for a good understanding for an international audience);
  • since 2003, the use of spaces as separators (for example: 20 000 and 1 000 000 for "twenty thousand" and "one million") has been officially endorsed by SI/ISO 31-0 standard, as well as by the International Bureau of Weights and Measures and the International Union of Pure and Applied Chemistry (IUPAC), the American Medical Association's widely followed AMA Manual of Style, and the Metrication Board, among others. (cf. https://www.wikiwand.com/en/Decimal_separator#/Digit_grouping)

If a space is used for thousands separator, I would suggest to simply state "." as the default value for decimal.mark, rather than NULL as it avoids to have to consider an implicit rule.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand those reasons, but it would be unfortunately inconsistent with every other number formatting function in the tidyverse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have changed the default behaviour,.

By default, big.mark is NULL. If NULL, a comma will be used except if decimal.mark is already a comma and in such case a space will be used for thousands separators.

R/formatter.r Outdated
#' my_format(v)
#'
#' # Per mille
#' per_mille <- number_format(scale = 1000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please use tidyverse style guide here? http://style.tidyverse.org/syntax.html#long-lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot that styler do not check documentation examples.

It is corrected.

R/formatter.r Outdated
@@ -1,3 +1,72 @@
#' Number formatter: a generic formatter for numbers
#'
#' @return \code{number_format} returns a function with single parameter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please use markdown formatting, i.e. `number_format`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

R/formatter.r Outdated
#' decimal point
#' @param trim logical, if \code{FALSE}, values are right-justified to a common
#' width (see \code{\link[base]{format}})
#' @param ... other arguments passed on to \code{\link[base]{format}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown link syntax is [base::format()]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

R/formatter.r Outdated
ret <- paste0(
prefix,
format(
scale * x,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be x / scale? I don't think I understand the name scale.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, in the first version of this generic formatter, I used the word multiplier, feeling it was more explicit.

However, when checking the other existing formatters available in scales, it appeared that unit_format already used such argument, called scale and returning x * scale.

Therefore, I changed the name of that argument of number_format to scale to be consistent with unit_format.

We can decide to use another wording. However, my feeling is that we should try to have consistent names and behaviours between all formatters, as much as possible.

However, I do not know if you want to maintain backward compatibility with previous version of scales or if you prefer to reorganize all formatters, even if it's introducing some breaking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, leave as scale then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe emphasise that x is multiplied by scale in the docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected

R/formatter.r Outdated
prefix,
format(
scale * x,
big.mark = big.mark, decimal.mark = decimal.mark,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put each named argument on a separate line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

@@ -8,6 +8,33 @@ test_that("time_format formats hms objects", {
expect_equal(time_format(format = "%H")(hms::as.hms(a_time, tz = "UTC")), "11")
})


test_that("number format works correctly", {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please break up in to smaller tests? Like one for accuracy, one for decimal vs big mark, one for scale, one for prefix/suffix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests have been simplified and broken into smaller tests

R/formatter.r Outdated
suffix = "", big.mark = " ", decimal.mark = ".",
trim = TRUE, ...) {
function(x) number(
x, accuracy, scale, prefix,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please name arguments here (apart from x). That makes it safer if we later change the arguments to number().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

R/formatter.r Outdated
suffix = "", big.mark = " ", decimal.mark = ".",
trim = TRUE, ...) {
if (length(x) == 0) return(character())
if (is.null(accuracy)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make the code simpler to do

accuracy <- accuracy %||% precision(x)
x <- round_any(x, accuracy / scale)

nsmall <- -floor(log10(accuracy))
nsmall <- min(max(nsmall, 0), 20)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, clearly

R/formatter.r Outdated
nsmall <- min(max(nsmall, 0), 20)
ret <- paste0(
prefix,
format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make the code a bit simpler to pull this out into a variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I'm not sure to understand your comment.

Do you mean something like:

  ret <- format(
    scale * x,
    big.mark = big.mark,
    decimal.mark = decimal.mark,
    trim = trim,
    nsmall = nsmall,
    scientific = FALSE,
    ...
  )
  ret <- paste0(prefix, ret, suffix)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good - thanks. Just a few more minor suggestions.

@@ -1,3 +1,84 @@
#' Number formatter: a generic formatter for numbers
#'
#' @return `number_format` returns a function with single parameter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please ident subsequent lines with two spaces? http://style.tidyverse.org/documentation.html#indents-and-line-breaks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

R/formatter.r Outdated
#'
#' @return `number_format` returns a function with single parameter
#' `x`, a numeric vector, that returns a character vector
#' @param x a numeric vector to format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please make parameter descriptions into sentences? i.e. start with a capital letter and end with a full stop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I will update the PR tonight or tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

R/formatter.r Outdated
nsmall <- -floor(log10(accuracy))
nsmall <- min(max(nsmall, 0), 20)

if (is.null(big.mark)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, now this logic seems like it's the wrong way around. Maybe it's decimal.mark that should be NULL?

This is making me feel that you're correct that the default big.mark should be " "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not so easy to find an appropriate behaviour if we set decimal.mark to NULL.

Your previous proposal was:

if (is.null(decimal.mark)) {
  decimal.mark = if (identical(big.mark, ",")) "." else ","
}

But this behaviour is probably not the best. If I need to present results in an international format, I have to specify big.mark = " " but I want to keep a dot as a decimal separator.

If I want to present results in French, the first thing I want to set is the decimal.mark as a comma, and the change of big.mark is a consequence of this previous change.

Therefore, there is no general rule to derive decimal.mark from big.mark.

It is true that using a space as the default thousands separator is more generic as it is working with different decimal marks.

In addition, for those in need to present results in an American style, there is still the comma_format that will be simply a short cut for number_format with a comma as the default decimal mark.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, lets go back to your original proposal with big.mark = " "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@hadley
Copy link
Member

hadley commented Jul 5, 2018

And can you please add a bullet to NEWS? It should briefly describe the change and end with (@yourname, #issuenumber).

larmarange added a commit to larmarange/scales that referenced this pull request Jul 6, 2018
@larmarange
Copy link
Contributor Author

I applied all requested corrections and I squashed all PR commits into one

@hadley hadley merged commit 8257643 into r-lib:master Jul 6, 2018
@hadley
Copy link
Member

hadley commented Jul 6, 2018

Thanks for all your hard work on this!

@larmarange
Copy link
Contributor Author

You are welcome.

Let me know how you want to deal now with the existing formatters and if you want to base them on the generic formatter.

It could be an opportunity to harmonise arguments between the different formatters.

comma_format could be an alias of number_format with just a different default value for big.mark, keeping the same list of arguments.

dollar_format could also be based on number_format with the management of its 2 additional arguments largest_with_cents and negative_parens.

percent_format could simply be an alias of number_format, just changing default values.

scientific_format: seems not appropriate to base that function on number_format. But we could consider to add to that formatter some arguments implemented in number_format like prefix, suffix...

ordinal_format: it's possible to use number_format in that formatter and also to allow to customise the function to be adapted to other languages.

Is format_format still relevant?

unit_format could be an alias of number_format. But we need to decide if the unit argument should be maintained for backward compatibility (otherwise, it could be just the suffix argument)

Let me know if you want to reorganise these formatters using number_format.

If yes, would you prefer separate PR or only one PR?

Would you prefer to keep a separate documentation page or to present the different alias on the same page?

@hadley
Copy link
Member

hadley commented Jul 6, 2018

Maybe start with PR that uses number_format() for the most obvious cases? i.e. comma_format(), percent_format(), and unit_format(). Then we can discuss the other ones on a case-by-case basis (but generally, the more shared code, the better).

I'd prefer to have them all documented together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants