New implementations of times.parse & times.format #8094

GULPF · 2018-06-23T14:37:39Z

This PR contains new improved implementations of times.parse and times.format.

More powerful mini-language
Better performance
Uses static[string] to validate layout at compile time when possible
More consistent error handling, should now always raise a ValueError when something goes wrong
Several minor bugs/limitations fixed

Changes in the layout mini-language

All z* patterns now output 'Z' for UTC
Added g pattern for era (AD or BC)
Added zzzz pattern for UTC offset including seconds
Added uuuu pattern for astronomical year padded to four digits
Added UUUU pattern for astronomical year without padding
Added YYYY pattern for year without padding
Deprecated y, yyy and yyyyy. These patterns are not useful and complicate the mini-language for no good reason.

The uuuu/yyyy patterns now prepend a '+' when the number of digits in the year is more than four (unless it's uuuu and the year is negative). This way, the iso format yyyyMMdd works as long as the year is in the range 1..9999. This behavior is consistent with Java (and maybe other languages as well).

Non-patterns that aren't separators must now always be surrounded by '. This was the document behavior before as well, but the old implementation allowed non-quoted text anyway.

Benchmark

The new implementations perform significantly better, especially parse. Naive benchmarch.

Result:

  Before:
    parse(x, "yyyy-MM-dd'T'HH:mm:sszzz", utc())     303 ms
    parse(x, "yyyy-MM-dd'T'HH:mm:sszzz", local())   394 ms
    format(x, "yyyy-MM-dd'T'HH:mm:sszzz")           147 ms

  After:
    parse(x, "yyyy-MM-dd'T'HH:mm:sszzz", utc())     130 ms
    parse(x, "yyyy-MM-dd'T'HH:mm:sszzz", local())   224 ms
    format(x, "yyyy-MM-dd'T'HH:mm:sszzz")           112 ms

Fixes #7017
Fixes #7189

GULPF · 2018-06-23T14:52:01Z

I forgot that the code must survive bootstraping, so still some work remaining

jyapayne · 2018-06-24T05:40:30Z

This is awesome! I love the new mini-language!

Varriount

Very impressive work! This is awesome! I feel like the times module is shaping up to be a real gem.

General Suggestions:

Prudent use of toOpenArray might improve performance even more. That being said, there might be term-rewriting macros in the future that will do this automatically.
Personally, I would break up some of the parsing/formatting routines into multiple sub-procedures (parsePattern in particular). Thoughts?
I wonder if some of the internal type names are too generic (Token, etc.).

Varriount · 2018-06-24T07:05:57Z

lib/pure/times.nim

@@ -7,35 +7,127 @@
 #    distribution, for details about the copyright.
 #

+##[
+  This module contains routines and types for dealing with time using a proleptic Gregorian calendar.
+  It's is available for the `JavaScript target <backends.html#the-javascript-target>`_.


Typo: Should be "it's also available"

Varriount · 2018-06-24T07:07:35Z

lib/pure/times.nim

+  This module contains routines and types for dealing with time using a proleptic Gregorian calendar.
+  It's is available for the `JavaScript target <backends.html#the-javascript-target>`_.
+
+  The types uses nanosecond time resolution, but the underlying resolution used by ``getTime()``


"Although the types use nanosecond"
", the underlying"

Varriount · 2018-06-24T07:08:23Z

lib/pure/times.nim

+    echo "An hour from now      : ", now() + 1.hours
+    echo "An hour from (UTC) now: ", getTime().utc + initDuration(hours = 1)
+
+  Parsing and formatting dates


Needs title-case

Varriount · 2018-06-24T07:09:18Z

lib/pure/times.nim

+  =============  =================================================================================  ================================================
+  Pattern        Description                                                                        Example
+  =============  =================================================================================  ================================================
+  ``d``          Numeric value of the day of the month, it will be one or two digits long.          | ``1/04/2012 -> 1``


"will be either one or"

"Numeric value representing the day"

Varriount · 2018-06-24T07:09:39Z

lib/pure/times.nim

+  =============  =================================================================================  ================================================
+  ``d``          Numeric value of the day of the month, it will be one or two digits long.          | ``1/04/2012 -> 1``
+                                                                                                    | ``21/04/2012 -> 21``
+  ``dd``         Same as above, but always two digits.                                              | ``1/04/2012 -> 01``


"but is always"

Varriount · 2018-06-24T07:30:05Z

tests/js/ttimes.nim

@@ -36,8 +36,8 @@ let utcPlus2 = Timezone(zoneInfoFromUtc: staticZoneInfoFromUtc, zoneInfoFromTz:
 block timezoneTests:
  let dt = initDateTime(01, mJan, 2017, 12, 00, 00, utcPlus2)
  doAssert $dt == "2017-01-01T12:00:00+02:00"
-  doAssert $dt.utc == "2017-01-01T10:00:00+00:00"


Why are there so fewer tests here?

Do you mean why there are so few tests in tests/js/ttimes.nim? The times tests should probably just be merged into a single file that runs for both C & JS. I can fix it in a separate PR.

Varriount · 2018-06-24T07:34:59Z

lib/pure/times.nim

+
+proc toDateTime(p: ParsedTime, zone: Timezone, f: TimeLayout,
+                input: string): DateTime =
+  var month = mJan


Wouldn't it be better to merge the declarations and assignments here?

I did that originally, but then the compiler can't prove that month is initialized for some reason

Varriount · 2018-06-24T07:35:37Z

lib/pure/times.nim

+    else:
+      result = false
+  of y, yyy, yyyyy:
+    raise newException(ValueError, "The pattern '" & $pattern & "' " &


strformat's fmt/& can be used here.

Varriount · 2018-06-24T07:37:24Z

lib/pure/times.nim

+    else:
+      result = false
+  of g:
+    if input[i..i+1].cmpIgnoreCase("BC") == 0:


(Suggestion) These could possibly be optimized through use of the new toOpenArray proc.

Varriount · 2018-06-24T07:38:25Z

lib/pure/times.nim

+    if result:
+      i.inc 3
+  of dddd:
+    if input.substr(i, i+5).cmpIgnoreCase("sunday") == 0:


These could possibly be optimized through use of toOpenArray.

The string comparisons could definitely be optimized further, but I'll leave it for another time.

GULPF · 2018-06-24T13:34:19Z

Thanks for the review, it should now be addressed :) Some code could probably be extracted from parsePattern into helpers, but I don't know if it would improve readability much since parsePatternwill still have a huge case-statement.

Araq · 2018-06-25T14:54:13Z

Fails with

./koch nimsuggest
bin/nim c --noNimblePath -d:release -p:compiler nimsuggest/nimsuggest.nim
�[32mHint: �[0mused config file '/home/travis/build/nim-lang/Nim/config/nim.cfg'�[36m [Conf]�[0m
�[32mHint: �[0mused config file '/home/travis/build/nim-lang/Nim/nimsuggest/nimsuggest.nim.cfg'�[36m [Conf]�[0m
�[32mHint: �[0msystem�[36m [Processing]�[0m
�[32mHint: �[0mnimsuggest�[36m [Processing]�[0m
�[32mHint: �[0mstrutils�[36m [Processing]�[0m
�[32mHint: �[0mparseutils�[36m [Processing]�[0m
�[32mHint: �[0mmath�[36m [Processing]�[0m
�[32mHint: �[0mbitops�[36m [Processing]�[0m
�[32mHint: �[0malgorithm�[36m [Processing]�[0m
�[32mHint: �[0municode�[36m [Processing]�[0m
�[32mHint: �[0mos�[36m [Processing]�[0m
�[32mHint: �[0mtimes�[36m [Processing]�[0m
�[32mHint: �[0moptions�[36m [Processing]�[0m
�[32mHint: �[0mtypetraits�[36m [Processing]�[0m
�[32mHint: �[0mstrformat�[36m [Processing]�[0m
�[32mHint: �[0mmacros�[36m [Processing]�[0m
�[32mHint: �[0mposix�[36m [Processing]�[0m
�[1mlib/pure/times.nim(2201, 53) �[0mtemplate/generic instantiation from here�[0m
�[1mlib/pure/times.nim(1657, 11) �[0m�[31mError: �[0mcan raise an unlisted exception: ref ValueError�[0m
FAILURE

skilchen · 2018-06-27T17:16:29Z

lib/pure/times.nim

+    result.add $dt.second
+  of ss:
+    result.add dt.second.intToStr(2)
+  of fff, ffffff, fffffffff:


Thats not a good solution. What happens, if you don't have as many nanosecond-digits as requested in the format-string?

No idea what I though when I implemented it like that... Thanks for catching it

dom96

Looks good but some things that I would like to see changed, mainly bikeshedding :)

dom96 · 2018-07-05T14:04:34Z

lib/pure/times.nim

+
+  TimeLayout* = object ## Represents a format for parsing and printing
+                       ## time types.
+    patterns: seq[byte] ## \


Why is this encoded as bytes? Wouldn't seq[LayoutPattern] make more sense?

Also, I think LayoutPattern should be called TimePattern. It doesn't have much to do with the layout of the pattern so I'm not sure why you named it this way.

I now noticed that you are referring to these format specifiers as "layout patterns" which is just confusing to me. Layout to me means "add 5 spaces before this string" or "indent and wrap these two lines so that they fit 80 characters", it's not about formatting time.

Please rename all of these types and use the word "Format" instead of "Layout"

The reason I wanted to avoid using "format" is because of the ambiguity (since it can be both a verb and a noun in this context). But English isn't my first language and you're probably right that using "format" anyway is better :)

Why is this encoded as bytes? Wouldn't seq[LayoutPattern] make more sense?

See the doc comment for this field. Basically TimeLayout.patterns not only contains LayoutPattern values, but also arbitrary bytes that are treated as text. This is a bit hackish, but it seems to performs well.

See the doc comment for this field. Basically TimeLayout.patterns not only contains LayoutPattern values, but also arbitrary bytes that are treated as text. This is a bit hackish, but it seems to performs well.

Isn't this ambiguous? dddd.byte == 3.byte? What if I want \3 in my string?

Isn't this ambiguous? dddd.byte == 3.byte? What if I want \3 in my string?

Each literal sequence is prefixed by LayoutPattern.Lit and the length of the literal sequence

dom96 · 2018-07-05T14:10:40Z

lib/pure/times.nim

+      ## be encoded as ``@[Lit.byte, 3.byte, 'f'.byte, 'o'.byte, 'o'.byte]``.
+    layout: string
+
+const LayoutPatternSeperators = { ' ', '-', '/', ':', '(', ')', '[', ']', ',' }


To me this means that these characters separate the time format pattern from a different layout pattern that can be used to lay out the time string (similar to how floats can be indented etc.)

Please change this naming scheme. These should be called PatternLiterals or something.

dom96 · 2018-07-05T14:12:12Z

lib/pure/times.nim


-      currentF = ""
+  template yieldcurrToken() =


Nitpick: yieldCurrToken

dom96 · 2018-07-05T14:20:36Z

lib/pure/times.nim

+
+  yieldcurrToken()
+
+proc stringToPattern(str: string): LayoutPattern =


This can be simplified into parseEnum[LayoutPattern](str).

parseEnum is case insensitive, which doesn't work for this enum

parseEnum[LayoutPattern](str.toLower())? :)

What I mean is that parseEnum doesn't care about case at all, see #7686. LayoutPattern has values that only differs in case.

dom96 · 2018-07-05T14:23:50Z

lib/pure/times.nim

+  var year: int
+  var monthday: int
+  (year, month, monthday) =
+    if p.year.isNone or p.month.isNone or p.monthday.isNone:


This if is unnecessary, you can just use the first branch or is this just an optimisation to prevent calling now unnecessarily?

If so, please add a comment.

or is this just an optimisation to prevent calling now unnecessarily?

Bingo, now is quite expensive. I'll add a comment.

dom96 · 2018-07-05T14:28:11Z

lib/pure/times.nim

-    result = format(dt, "yyyy-MM-dd'T'HH:mm:sszzz") # todo: optimize this
-  except ValueError: assert false # cannot happen because format string is valid
+    doAssert $dt == "2000-01-01T12:00:00Z"
+  result = format(dt, "yyyy-MM-dd'T'HH:mm:sszzz") # todo: optimize this


Does format no longer raise? Also, maybe we can just remove that "TODO" now.

The static[T] overloads means that errors in the format string are cough at compile time, so if the format string is known at compile time format wont raise any exception

timotheecour · 2018-07-07T10:09:18Z

A bit late to the party but just wanted to mention this to make sure this was considered:

This design is not as flexible as could be, eg, inserting a runtime-defined character inside the format string is a bit awkward
eg:

var str : string = getString()
# to use a runtime `str` instead of fixed string  `'T'` in format(x, "yyyy-MM-dd'T'HH:mm:sszzz") we'd need:
format(x, "yyyy-MM-dd'" & std.escapeSingleQuote & "'HH:mm:sszzz")

Also, it feels more magical (harder to distinguish special date variables in the string) and is inconsistent with strformat strings:fmt"hello my name is {str}"
common libraries also use a technique analog to strformat where the special variables ( eg MM) are denoted as special instead of the other way around, eg in python d.strftime("%d/%m/%y")

My suggestion was instead doing this:

var some_variable : string = getString()
format(x, " some_inline_string {yyyy}-{MM}-{dd}{some_variable}{HH}:{mm}:{sszzz}")

which could be implemented in terms of strformat.fmt

GULPF · 2018-07-07T10:46:08Z

This design is not as flexible as could be, eg, inserting a runtime-defined character inside the format string is a bit awkward

It's definitely awkward, but what's the use case? times.format should not be used for general string formatting, that's what strformat is for. I can't imagine a date time format that requires interpolation with a runtime string.

IMO times.format should be used from strformat.fmt, not the other way around. This is already possible:

import strformat, times
let dt = now()
echo fmt"Date: {dt:MMMM yyyy}"

Araq · 2018-07-07T21:01:38Z

lib/pure/times.nim

+    case f[i]
+    of '\'':
+      yieldcurrToken()
+      if f[i.succ] == '\'':


Missing index check.

Araq · 2018-07-07T21:01:56Z

lib/pure/times.nim

-          inc(i)
-      else: result.add(f[i])
-
+        while f[i] != '\'' and i < f.high:


Check in the wrong order.

dom96 · 2018-07-09T13:09:02Z

Isn't this ambiguous? dddd.byte == 3.byte? What if I want \3 in my string?

Each literal sequence is prefixed by LayoutPattern.Lit and the length of the literal sequence

Right, I would say an object variant would be better. Is there a reason you can't use that?

It's cool though, we can merge this and fix this later if necessary.

GULPF · 2018-07-09T14:14:44Z

Right, I would say an object variant would be better. Is there a reason you can't use that?

The nice thing about the current design is that it will never use more than a single ref. An object variant would require additional ref's. I would affect performance, but maybe not by much.

Araq · 2018-07-09T14:43:31Z

The nice thing about the current design is that it will never use more than a single ref. An object variant would require additional ref's. I would affect performance, but maybe not by much.

These packed representations based on seq[byte] are the future, please keep it this way.

Araq · 2018-07-09T18:04:13Z

Unrelated CI failures. Merging.

dom96 · 2018-07-10T17:17:28Z

These packed representations based on seq[byte] are the future, please keep it this way.

A data structure/type/DSL that maps to a seq[byte] perhaps, what I really dislike is the lack of type safety in the current approach.

GULPF force-pushed the new-parse-format branch 4 times, most recently from a5d62bb to 639814e Compare June 23, 2018 21:54

GULPF force-pushed the new-parse-format branch from 639814e to 3c36c20 Compare June 24, 2018 07:17

Varriount requested changes Jun 24, 2018

View reviewed changes

Varriount assigned Varriount and dom96 Jun 25, 2018

GULPF force-pushed the new-parse-format branch from 1006e4a to 30852b3 Compare June 25, 2018 17:10

skilchen reviewed Jun 27, 2018

View reviewed changes

New implementations of times.parse & times.format

40146be

GULPF force-pushed the new-parse-format branch from 30852b3 to 40146be Compare June 27, 2018 18:52

dom96 requested changes Jul 5, 2018

View reviewed changes

skilchen mentioned this pull request Jul 5, 2018

times.format: should allow writing arbitrary strings inside format string, like python strftime("D%Y%m%dT%H%M%S") #8214

Closed

Araq reviewed Jul 7, 2018

View reviewed changes

lib/pure/times.nim Outdated

case f[i]

of '\'':

yieldcurrToken()

if f[i.succ] == '\'':

Copy link

Member

Araq Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing index check.

Araq reviewed Jul 7, 2018

View reviewed changes

lib/pure/times.nim Outdated

inc(i)

else: result.add(f[i])

while f[i] != '\'' and i < f.high:

Copy link

Member

Araq Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check in the wrong order.

Review fixes

d9b971a

Araq merged commit 3b310e9 into nim-lang:devel Jul 9, 2018

timotheecour mentioned this pull request Jul 11, 2018

[regression] [times.format] Error: attempting to call undeclared routine: 'format' #8273

Closed

timotheecour mentioned this pull request Jul 14, 2018

fix #8273 times format regression, and fix inconsistent ordering in 1 format overload #8290

Merged

This was referenced Jul 14, 2018

Small/large dates #6467

Closed

times module: table repeated twice #8385

Closed


		yieldcurrToken()

		proc stringToPattern(str: string): LayoutPattern =

New implementations of times.parse & times.format #8094

New implementations of times.parse & times.format #8094

Conversation

GULPF commented Jun 23, 2018

Changes in the layout mini-language

Benchmark

GULPF commented Jun 23, 2018

jyapayne commented Jun 24, 2018

Varriount left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GULPF commented Jun 24, 2018 • edited

Araq commented Jun 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dom96 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timotheecour commented Jul 7, 2018 • edited

GULPF commented Jul 7, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dom96 commented Jul 9, 2018

GULPF commented Jul 9, 2018

Araq commented Jul 9, 2018

Araq commented Jul 9, 2018

dom96 commented Jul 10, 2018

GULPF commented Jun 24, 2018 •

edited

timotheecour commented Jul 7, 2018 •

edited

GULPF commented Jul 7, 2018 •

edited