-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New implementations of times.parse & times.format #8094
Conversation
I forgot that the code must survive bootstraping, so still some work remaining |
a5d62bb
to
639814e
Compare
This is awesome! I love the new mini-language! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very impressive work! This is awesome! I feel like the times module is shaping up to be a real gem.
General Suggestions:
- Prudent use of
toOpenArray
might improve performance even more. That being said, there might be term-rewriting macros in the future that will do this automatically. - Personally, I would break up some of the parsing/formatting routines into multiple sub-procedures (
parsePattern
in particular). Thoughts? - I wonder if some of the internal type names are too generic (
Token
, etc.).
lib/pure/times.nim
Outdated
@@ -7,35 +7,127 @@ | |||
# distribution, for details about the copyright. | |||
# | |||
|
|||
##[ | |||
This module contains routines and types for dealing with time using a proleptic Gregorian calendar. | |||
It's is available for the `JavaScript target <backends.html#the-javascript-target>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Should be "it's also available"
lib/pure/times.nim
Outdated
This module contains routines and types for dealing with time using a proleptic Gregorian calendar. | ||
It's is available for the `JavaScript target <backends.html#the-javascript-target>`_. | ||
|
||
The types uses nanosecond time resolution, but the underlying resolution used by ``getTime()`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Although the types use nanosecond"
", the underlying"
lib/pure/times.nim
Outdated
echo "An hour from now : ", now() + 1.hours | ||
echo "An hour from (UTC) now: ", getTime().utc + initDuration(hours = 1) | ||
|
||
Parsing and formatting dates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs title-case
lib/pure/times.nim
Outdated
============= ================================================================================= ================================================ | ||
Pattern Description Example | ||
============= ================================================================================= ================================================ | ||
``d`` Numeric value of the day of the month, it will be one or two digits long. | ``1/04/2012 -> 1`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"will be either one or"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Numeric value representing the day"
lib/pure/times.nim
Outdated
============= ================================================================================= ================================================ | ||
``d`` Numeric value of the day of the month, it will be one or two digits long. | ``1/04/2012 -> 1`` | ||
| ``21/04/2012 -> 21`` | ||
``dd`` Same as above, but always two digits. | ``1/04/2012 -> 01`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"but is always"
@@ -36,8 +36,8 @@ let utcPlus2 = Timezone(zoneInfoFromUtc: staticZoneInfoFromUtc, zoneInfoFromTz: | |||
block timezoneTests: | |||
let dt = initDateTime(01, mJan, 2017, 12, 00, 00, utcPlus2) | |||
doAssert $dt == "2017-01-01T12:00:00+02:00" | |||
doAssert $dt.utc == "2017-01-01T10:00:00+00:00" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there so fewer tests here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean why there are so few tests in tests/js/ttimes.nim
? The times tests should probably just be merged into a single file that runs for both C & JS. I can fix it in a separate PR.
lib/pure/times.nim
Outdated
|
||
proc toDateTime(p: ParsedTime, zone: Timezone, f: TimeLayout, | ||
input: string): DateTime = | ||
var month = mJan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to merge the declarations and assignments here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that originally, but then the compiler can't prove that month
is initialized for some reason
lib/pure/times.nim
Outdated
else: | ||
result = false | ||
of y, yyy, yyyyy: | ||
raise newException(ValueError, "The pattern '" & $pattern & "' " & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strformat's fmt
/&
can be used here.
lib/pure/times.nim
Outdated
else: | ||
result = false | ||
of g: | ||
if input[i..i+1].cmpIgnoreCase("BC") == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Suggestion) These could possibly be optimized through use of the new toOpenArray
proc.
if result: | ||
i.inc 3 | ||
of dddd: | ||
if input.substr(i, i+5).cmpIgnoreCase("sunday") == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could possibly be optimized through use of toOpenArray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string comparisons could definitely be optimized further, but I'll leave it for another time.
Thanks for the review, it should now be addressed :) Some code could probably be extracted from |
Fails with
|
lib/pure/times.nim
Outdated
result.add $dt.second | ||
of ss: | ||
result.add dt.second.intToStr(2) | ||
of fff, ffffff, fffffffff: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats not a good solution. What happens, if you don't have as many nanosecond-digits as requested in the format-string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea what I though when I implemented it like that... Thanks for catching it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but some things that I would like to see changed, mainly bikeshedding :)
lib/pure/times.nim
Outdated
|
||
TimeLayout* = object ## Represents a format for parsing and printing | ||
## time types. | ||
patterns: seq[byte] ## \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this encoded as bytes? Wouldn't seq[LayoutPattern]
make more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think LayoutPattern
should be called TimePattern
. It doesn't have much to do with the layout of the pattern so I'm not sure why you named it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now noticed that you are referring to these format specifiers as "layout patterns" which is just confusing to me. Layout to me means "add 5 spaces before this string" or "indent and wrap these two lines so that they fit 80 characters", it's not about formatting time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename all of these types and use the word "Format" instead of "Layout"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I wanted to avoid using "format" is because of the ambiguity (since it can be both a verb and a noun in this context). But English isn't my first language and you're probably right that using "format" anyway is better :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this encoded as bytes? Wouldn't seq[LayoutPattern] make more sense?
See the doc comment for this field. Basically TimeLayout.patterns
not only contains LayoutPattern
values, but also arbitrary bytes that are treated as text. This is a bit hackish, but it seems to performs well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the doc comment for this field. Basically TimeLayout.patterns not only contains LayoutPattern values, but also arbitrary bytes that are treated as text. This is a bit hackish, but it seems to performs well.
Isn't this ambiguous? dddd.byte == 3.byte
? What if I want \3
in my string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this ambiguous? dddd.byte == 3.byte? What if I want \3 in my string?
Each literal sequence is prefixed by LayoutPattern.Lit
and the length of the literal sequence
lib/pure/times.nim
Outdated
## be encoded as ``@[Lit.byte, 3.byte, 'f'.byte, 'o'.byte, 'o'.byte]``. | ||
layout: string | ||
|
||
const LayoutPatternSeperators = { ' ', '-', '/', ':', '(', ')', '[', ']', ',' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this means that these characters separate the time format pattern from a different layout pattern that can be used to lay out the time string (similar to how floats can be indented etc.)
Please change this naming scheme. These should be called PatternLiterals
or something.
lib/pure/times.nim
Outdated
|
||
currentF = "" | ||
template yieldcurrToken() = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: yieldCurrToken
lib/pure/times.nim
Outdated
|
||
yieldcurrToken() | ||
|
||
proc stringToPattern(str: string): LayoutPattern = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified into parseEnum[LayoutPattern](str)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseEnum
is case insensitive, which doesn't work for this enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseEnum[LayoutPattern](str.toLower())
? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that parseEnum
doesn't care about case at all, see #7686. LayoutPattern
has values that only differs in case.
lib/pure/times.nim
Outdated
var year: int | ||
var monthday: int | ||
(year, month, monthday) = | ||
if p.year.isNone or p.month.isNone or p.monthday.isNone: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if is unnecessary, you can just use the first branch or is this just an optimisation to prevent calling now
unnecessarily?
If so, please add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is this just an optimisation to prevent calling now unnecessarily?
Bingo, now
is quite expensive. I'll add a comment.
lib/pure/times.nim
Outdated
result = format(dt, "yyyy-MM-dd'T'HH:mm:sszzz") # todo: optimize this | ||
except ValueError: assert false # cannot happen because format string is valid | ||
doAssert $dt == "2000-01-01T12:00:00Z" | ||
result = format(dt, "yyyy-MM-dd'T'HH:mm:sszzz") # todo: optimize this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does format
no longer raise? Also, maybe we can just remove that "TODO" now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The static[T]
overloads means that errors in the format string are cough at compile time, so if the format string is known at compile time format
wont raise any exception
A bit late to the party but just wanted to mention this to make sure this was considered:
var str : string = getString()
# to use a runtime `str` instead of fixed string `'T'` in format(x, "yyyy-MM-dd'T'HH:mm:sszzz") we'd need:
format(x, "yyyy-MM-dd'" & std.escapeSingleQuote & "'HH:mm:sszzz")
My suggestion was instead doing this: var some_variable : string = getString()
format(x, " some_inline_string {yyyy}-{MM}-{dd}{some_variable}{HH}:{mm}:{sszzz}") which could be implemented in terms of |
It's definitely awkward, but what's the use case? IMO import strformat, times
let dt = now()
echo fmt"Date: {dt:MMMM yyyy}" |
lib/pure/times.nim
Outdated
case f[i] | ||
of '\'': | ||
yieldcurrToken() | ||
if f[i.succ] == '\'': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing index check.
lib/pure/times.nim
Outdated
inc(i) | ||
else: result.add(f[i]) | ||
|
||
while f[i] != '\'' and i < f.high: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check in the wrong order.
Right, I would say an object variant would be better. Is there a reason you can't use that? It's cool though, we can merge this and fix this later if necessary. |
The nice thing about the current design is that it will never use more than a single |
These packed representations based on |
Unrelated CI failures. Merging. |
A data structure/type/DSL that maps to a |
This PR contains new improved implementations of
times.parse
andtimes.format
.static[string]
to validate layout at compile time when possibleValueError
when something goes wrongChanges in the layout mini-language
g
pattern for era (AD or BC)zzzz
pattern for UTC offset including secondsuuuu
pattern for astronomical year padded to four digitsUUUU
pattern for astronomical year without paddingYYYY
pattern for year without paddingy
,yyy
andyyyyy
. These patterns are not useful and complicate the mini-language for no good reason.The
uuuu
/yyyy
patterns now prepend a '+' when the number of digits in the year is more than four (unless it'suuuu
and the year is negative). This way, the iso formatyyyyMMdd
works as long as the year is in the range 1..9999. This behavior is consistent with Java (and maybe other languages as well).Non-patterns that aren't separators must now always be surrounded by
'
. This was the document behavior before as well, but the old implementation allowed non-quoted text anyway.Benchmark
The new implementations perform significantly better, especially parse. Naive benchmarch.
Result:
Fixes #7017
Fixes #7189