Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abbreviations for timezones, #25 #53

Closed
wants to merge 91 commits into from
Closed

Conversation

ijlyttle
Copy link

@ijlyttle ijlyttle commented Sep 16, 2017

This is a first pass to address #25.

Things I still need to sort out:

  • document the function better
  • make sure this does something sensible on non-Mac platforms
  • work out some details on dictionaries
  • tests

Questions

  • abbreviate_olson() has an argument to specify the maximum width of what is returned. I am confident it will work for any value 14 or greater. It may not work at less than 13. For now, I throw a warning if width < 14; is this the "right" thing to do?
  • should abbreviate_olson() be exported? - I am doing so just to motivate documentation, but perhaps it does not need to be.
  • is there a better file to put these functions in?

I'm sure you will have more suggestions. I will signal when I think it will be worth looking at, but will welcome your feedback in the meantime.

@ijlyttle
Copy link
Author

Hi @krlmlr and @hadley, I think this is ready for your input.

I have made some tests that make sure that all Olson abbreviations are in fact 14 characters or fewer, and that all Olson abbreviations are unique. These tests pass on Mac, Windows, and Linux.

I also had a look at the abbreviations themselves - they seem OK to me, but you may have different opinions :)

The function takes only one timezone at a time, so to see them all at once I use (unsurprisingly)

purrr::map_chr(OlsonNames(), abbreviate_olson)

Earlier in this thread, I have a few questions. Thanks for having a look!

@krlmlr
Copy link
Member

krlmlr commented Oct 25, 2017

Thanks. I've uploaded a copy of the list of abbreviations from my system: https://gist.github.com/krlmlr/cc981acfd19931f9d56061848ff2447d

I wonder if we always should be using the abbreviation for the first component, even if the long form fits, so that Africa/Bissau becomes Afr/Bissau just like Afr/Blantyre. This would give a more consistent display.

We shouldn't give a warning, but maybe indicate non-uniqueness with a special symbol such as clisymbols::symbol$ellipsis. The pillar code will do this for you automatically, but maybe we should consider adding an ellipsis if and only if the abbreviation is not unique?

@krlmlr
Copy link
Member

krlmlr commented Oct 25, 2017

The filename seems ok, exporting seems fine as well. (This is a fairly low-level package, users are mostly other packages and may have legitimate uses for this functionality.)

@ijlyttle
Copy link
Author

Thanks! I will remove the warning and see what I can do with clisymbols::symbol$ellipsis.

I spent some time arguing with myself on the abbreviation-consistency question you raised, when I put the function together. Maybe what I can do is make that an option in the function call, e.g. be_consistent = TRUE (I'll find a better name for the argument).

Perhaps pillar could propose a default, and it could be overriden with an option?

@ijlyttle
Copy link
Author

ijlyttle commented Oct 30, 2017

I think this may be ready for another look.

I changed the abbreviation-function so that its default is to make a consistent abbreviation of the first and second elements, e.g. Africa/Bissau becomes Afr/Bissau

I have removed the warning.

I have made it possible to request a width of as small as 8. A few things to keep in mind as the width gets smaller:

  • at 14 characters, I can verify consistency and uniqueness
  • at 10 characters, I can verify uniqueness, but not consistency
  • below 10 characters, I can verify neither uniqueness nor consistency

To test uniqueness at runtime would require evaluating all the timezones whenever the function is called - I don't know if this would make this function too "heavy".

Could this function be called using width = 14, then have pillar use an ellipsis whenever it requires something shorter (as you suggest, if I understand correctly)?

Here are the current results (Mac) for using width = 14:

https://gist.github.com/ijlyttle/58152793e0d7854961fdfc86d5cf60b9

@krlmlr
Copy link
Member

krlmlr commented Oct 31, 2017

Thanks. I think it's important to have consistent display across all invocations of printing a tibble (for the same width), so I suggest to compute a dictionary for the abbreviations of all OlsonNames(), and then just do plain lookup for displaying. It seems unlikely that the Olson names change while an R session is run, we could use memoise to avoid expensive computation all over.

I think the abbreviations will be better if one-, two- and three-component time zones are processed separately. We need to do a bit of gymnastics with map_int(..., length) and perhaps by() to achieve this. With strict = FALSE the abbreviate() function guarantees uniqueness, but we still may end up with non-unique (or too long) strings, which we then shorten with an ellipsis (which may be three characters wide on some systems). The shortening is applied to the dictionary of time zones, and available automatically for all subsequent displays.

Would that work?

@ijlyttle
Copy link
Author

ijlyttle commented Nov 5, 2017

I think I get the idea - let me wrestle with the newly-vectorized function so that it will do what you describe. In the short term, I can get us to a non-memoised version. I need to do some reading on memoise, as I have not yet used it.

@ijlyttle
Copy link
Author

ijlyttle commented Nov 6, 2017

@krlmlr - Another question for you: your vectorization uses purrr functions, but pillar does not import purrr. Does this present a difficulty?

I have started to see what I can do using only base - I think there may be a way through, but I would sure like to have dplyr::summarise() :) (I think I can get aggregate() to work).

@ijlyttle
Copy link
Author

ijlyttle commented Jan 11, 2018

Hi @krlmlr,

I am embarrassed to say that I have not used memoise yet, so I am on shaky ground. My idea is to:

  • add memoise to Suggests
  • add abbreviate_olson <- memoise::memoise(abbreviate_olson) (if available) to utils-olson.R
  • modify type_sum.POSIXct() in type_sum.R to call abbreviate_olson() then return the element corresponding to attr(x, "tzone") (prepending "dttm-")

Is this a plausible way?

It is not evident to me how type_sum.POSIXct() will know the available width. Can you point me in the right direction here?

@krlmlr
Copy link
Member

krlmlr commented Jan 11, 2018

I totally forgot type_sum() doesn't have a width argument yet. Even if we add it now, not all implementers will support it right away, and we'll need to work around this. We could also show the time zone information in a separate row, but I'm not sure we want this (#73).

Maybe we should stick to a hard-coded width of 12 (width of date minus width of <dttm >), or maybe a width of 14 (width of date minus width of <dt >), and leave dynamic width for later?

Your suggested approach looks like the safest way to avoid a build-time dependency on memoise, I keep forgetting about this problem. Would it work to just overwrite abbreviate_olson with memoise(abbreviate_olson)? Also, maybe we just suggest memoise and do this transformation only if the package is installed?

@ijlyttle
Copy link
Author

ijlyttle commented Jan 11, 2018

I think I get the idea here. Let me see what I can do.

Crazy question: how should we approach the case where the tzone is not set? On my computer (MacOS), it uses the system timezone for the display.

The simplest thing to do, for now, would be to use the default <dttm> to indicate to timezone is not set, which I will do unless you want to do something else.

@krlmlr
Copy link
Member

krlmlr commented Jan 11, 2018

Thanks. Agree to use <dttm> if time zone is unset.

@ijlyttle
Copy link
Author

ijlyttle commented Jan 11, 2018

@krlmlr I have everything done - I imagine that the last thing that I did, updating type_sum.POSIXct(), broke a bunch of tests.

I can poke around to amend the tests, but any guidance you can provide will be very welcome.

@ijlyttle
Copy link
Author

ijlyttle commented Jan 11, 2018

Aside from tests, here's what happens on my computer:

library("lubridate")
library("pillar")
now_tz <- with_tz(now(), "America/New_York")
type_sum(now_tz)
[1] "dtm-Amer/New_York"
library("tibble")
as_tibble(now_tz)
# A tibble: 1 x 1
  value              
  <dttm-Amr/New_York>
1 2018-01-11 15:04:43

Now that things are "working", I thought I might raise some options because 12 characters seems a little confining.

  1. Leave as is.
  2. Use "dtm" rather than "dttm" as the prefix - buys us a character.
  3. Keep "dttm" as header when no timezone, use only abbreviated-timezone when there is a timezone, e.g. <Amer/New_York> - buys us five characters.
  4. Something else.

Thoughts?

@ijlyttle
Copy link
Author

ijlyttle commented Jan 11, 2018

One last question - can you have a look at the implementation of memoize:
https://github.com/ijlyttle/pillar/blob/d7e0ca5c3c0cb7f2d19b1f9cba742b8a6d4b53df/R/utils-olson.R#L70

I could not get this to work in .onLoad (maybe I need to use <<-).

It works for me this way, but I have no idea if this is a good practice or a horrible practice.

Thanks!

@krlmlr
Copy link
Member

krlmlr commented Jan 11, 2018

Some tests create files with output, and fail the first time they are run if the output changes. They should work the second time, but you may need to rerun all tests a few times until convergence. (testtthat aborts early if there are too many failures.) Can you show the output for the various options you're suggesting?

The current memoise implementation installs the function during build time, it's better to do in .onLoad(). Have you tried to assign() to the package environment?

@ijlyttle
Copy link
Author

ijlyttle commented Jan 11, 2018

I will fiddle with tests, as well as get things working with .onLoad().

Here are the assumed outputs resulting from the options.

library("lubridate")
library("pillar")
library("tibble")

now_tz <- with_tz(now(), "America/New_York")
as_tibble(now_tz)

As is:

# A tibble: 1 x 1
  value              
  <dttm-Amr/New_York>
1 2018-01-11 15:04:43

"dttm" -> "dtm":

# A tibble: 1 x 1
  value              
  <dtm-Amer/New_York>
1 2018-01-11 15:04:43

Discarding "dttm-" if timezone present:

# A tibble: 1 x 1
  value              
  <Amer/New_York>
1 2018-01-11 15:04:43

Same option, but run with timezone absent:

as_tibble(now())
# A tibble: 1 x 1
  value              
  <dttm>
1 2018-01-11 14:04:43

@ijlyttle
Copy link
Author

Hi @krlmlr,

Good news: I think I have .onLoad() worked out, and it behaves well for me. code

Bad news: I am having trouble to get the tests to do what you describe - I keep getting the same errors again and again, no convergence.

Error for this test:

> devtools::test(filter = "time")
Loading pillar
Testing pillar
format_time: 1................

Failed ------------------------------------------------------------------------------------------------
1. Error: output test (@test-format_time.R#4) ---------------------------------------------------------
attempt to select less than one element in get1index
1: expect_pillar_output(as.POSIXct("2017-07-28 18:04:35 +0200"), filename = "time.txt") at /Users/ijlyttle/Documents/git/github/public_forked/pillar/tests/testthat/test-format_time.R:4
2: expect_pillar_output_utf8(object_quo, filename, output_width) at /Users/ijlyttle/Documents/git/github/public_forked/pillar/tests/testthat/helper-output.R:21
3: expect_known_display(object = !(!object_quo), file = file.path("out", filename), crayon = TRUE, width = output_width) at /Users/ijlyttle/Documents/git/github/public_forked/pillar/tests/testthat/helper-output.R:27
4: testthat::expect_output_file(print(eval_tidy(object)), file, update = TRUE) at /Users/ijlyttle/Documents/git/github/public_forked/pillar/R/testthat.R:53
5: capture_output_as_vector(object)
6: with_sink(temp, withVisible(code))
7: withVisible(code)
8: print(eval_tidy(object))
9: eval_tidy(object)
10: overscope_eval_next(overscope, expr)
11: get_pillar_output_object(x, ..., xp = xp, xf = xf)
12: pillar(xp, ...) at /Users/ijlyttle/Documents/git/github/public_forked/pillar/tests/testthat/helper-output.R:52
13: pillar_type(x, ...) at /Users/ijlyttle/Documents/git/github/public_forked/pillar/R/pillar.R:38
14: as_character(type_sum(x) %||% "") at /Users/ijlyttle/Documents/git/github/public_forked/pillar/R/type.R:10
15: coerce_type_vec(x, friendly_type("character"), string = , character = set_chr_encoding(set_attrs(x, NULL), 
       encoding))
16: type_of(.x)
17: type_sum(x) %||% ""
18: is_null(x)
19: type_sum(x)
20: type_sum.POSIXct(x) at /Users/ijlyttle/Documents/git/github/public_forked/pillar/R/type-sum.R:11

DONE ==================================================================================================

@ijlyttle
Copy link
Author

ijlyttle commented Jan 14, 2018

Hi @krlmlr,

I found that the tests were not passing because I had an actual problem in my code; I have taken care of it.

The tests run OK now, but now I am having a problem running devtools::document().

Edit - having updated to latest github version of hadley/devtools, I get:

> devtools::document()
Updating pillar documentation
Loading pillar
Writing expect_known_display.Rd
Error in usethis::use_build_ignore(file_name, base_path = base_path) : 
  unused argument (base_path = base_path)

I see this is addressed in r-lib/pkgapi#21.

Once that is sorted out, it remains to see if you want to change the styling of the header (above).

@krlmlr krlmlr added this to To Do in krlmlr Jan 15, 2018
@krlmlr
Copy link
Member

krlmlr commented May 23, 2018

Thanks for your patience. I suspect this should live elsewhere (perhaps in lubridate?), tidyverse/tibble#411 (or a variant thereof) will provide the necessary infrastructure.

@ijlyttle
Copy link
Author

I agree, I will look elsewhere :)

@ijlyttle ijlyttle closed this May 23, 2018
@krlmlr
Copy link
Member

krlmlr commented May 23, 2018

Thanks again! Let me know when you find a home for this useful functionality.

@krlmlr krlmlr moved this from To Do to Done in krlmlr May 24, 2018
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
krlmlr
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants