Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for DateTime types #287

Merged
merged 22 commits into from
Feb 9, 2024
Merged

Conversation

eminence
Copy link
Contributor

@eminence eminence commented Dec 31, 2023

This is an initial draft at adding DateTime handling into numbat. This is incomplete, but I'm opening the PR now to start a discussion about the general approach.

Some review notes:

  • DateTimes didn't really feel like any of the existing types, so they are an entirely new thing (so the parser, tokenize, AST, etc all are updated accordingly)

  • At the moment, the only way to construct a DateTime object is with the new now() function, but I envision some new date parsing functions (like a strptime) at a minimum, and maybe dedicated handling in the parser for well-formatted dates

  • Two operations on DateTimes have been implemented:

    • Adding/Subtracting a duration (something of the Time dimension) to a DateTime to produce another DateTime
    • Subtracting two DateTimes to produce something of type Time.

    Because of this dependency on having a Time dimension, the typechecker has some special code to check for times, which feels a bit ugly (but might be unavoidable)

    What other operations can we perform on dates?

  • Controlflow in typechecker is a bit of a mess as written, and deserves to be improved somehow

  • The BinaryOperatorForDate variant needs to be redesigned, I think.

  • Tests and error handling is not added yet

  • I'm using chrono as the underlying date/time library. time is another good choice. I picked chrono just because I have more personal experience with it. Both crates support wasm as far as I know.

Small demo:

>>> let x = now()

  let x: DateTime = now()

>>> x

  x

    = 2023-12-31T17:09:48.505882820Z    [DateTime]

>>> now() - x -> milliseconds

  now() - x ➞ millisecond

    = 9088.02 ms    [Time]

>>> x + 2 days + 4 seconds

  (x + 2 day) + 4 second

    = 2024-01-02T17:09:52.505882820Z    [DateTime]

>>> now() + (180 miles / 55 mph)

  now() + (180 mile / 55 mph)

    = 2023-12-31T20:26:54.595755947Z    [DateTime]

>>>

CC #181

@sharkdp
Copy link
Owner

sharkdp commented Jan 2, 2024

This is just awesome — thank you very much for looking into this!

  • DateTimes didn't really feel like any of the existing types, so they are an entirely new thing (so the parser, tokenize, AST, etc all are updated accordingly)

Exactly what I envisioned 👍

some new date parsing functions (like a strptime) at a minimum

👍

and maybe dedicated handling in the parser for well-formatted dates

I think this could be a second step maybe (and potentially needs some thought), to keep this PR manageable?

Two operations on DateTimes have been implemented:

* Adding/Subtracting a duration (something of the `Time` dimension) to a `DateTime` to produce another `DateTime`

* Subtracting two `DateTime`s to produce something of type `Time`.

Because of this dependency on having a Time dimension, the typechecker has some special code to check for times, which feels a bit ugly (but might be unavoidable)

Ah, right. That is a bit unfortunate, yes. So far, we managed to decouple the unit- and dimension system (the prelude) fairly well from the language implementation. Maybe we can still achieve this if we 'register' the Time dimension somehow (like with a @-decorator when defining dimension Time)? Or maybe it helps to introduce a hidden __Time dimension (I currently don't see how 😄)?

I also have a somewhat related question: are the current quantities of dimension Time really suitable as durations? A year, for example, is defined as an average (Gregorian) year with a length of 365.2425 days. So running now() + 1 year today (Jan 2nd, 2024... not too late) will actually show me the Jan 1st, 2025, approximately 6 hours later than it is now. Because (a) 2024 is a leap year, i.e. has 366 days and (b) those 0.2425 days will shift the time by 5.82 hours.

A similar problem appears when we add/subtract months. But most users would probably expect the datetime to be shifted by a year from now (i.e. the exact same datetime, but with 2025 instead of 2024) or by a month from now. So we probably need to distinguish between years and calendar years. And similar for months.

I'm bringing this up here because we might need another special type anyway (maybe Duration vs Time).

A much more conservative approach would be to only have functions like add_months(dt: DateTime, num_months: Scalar) -> DateTime. But this would also render this much less useful.

What do you think?

What other operations can we perform on dates?

Despite what I said above, I think what you have already is really powerful! Calculations like now() + 300 minutes or now() + 7.5 hours come up all the time and already work as expected. And the other big thing (for me) are differences, which also work... and would be really useful if we had a special syntax in the parser like @17:15 - @8:40 ("how many hours have I worked today?") or @2023-12-14 - @2023-07-25 ("how many days are there between Dec 14th and July 25th?").

The other thing I do frequently is to convert from/to UNIX timestamps, but that should be fairly easy to support with FFI functions.

And then there are timezones. I would certainly like to see datetimes in the local timezone by default. But it would be great if we could also convert to other timezones somehow.

  • I'm using chrono as the underlying date/time library. time is another good choice. I picked chrono just because I have more personal experience with it. Both crates support wasm as far as I know.

👍

numbat/src/typechecker.rs Outdated Show resolved Hide resolved
numbat/src/typechecker.rs Outdated Show resolved Hide resolved
numbat/src/typed_ast.rs Outdated Show resolved Hide resolved
numbat/src/typed_ast.rs Outdated Show resolved Hide resolved
@eminence
Copy link
Contributor Author

eminence commented Jan 2, 2024

Thanks for the initial review!

and maybe dedicated handling in the parser for well-formatted dates

I think this could be a second step maybe (and potentially needs some thought), to keep this PR manageable?

Agreed.

Maybe we can still achieve this if we 'register' the Time dimension somehow (like with a @-decorator when defining dimension Time)?

I do think something like this is needed, eventually (but maybe for a follow-up PR). Currently, if someone is building their own prelude that has a time-like dimension that isn't called "Time" (or has a different base-unit than Second), DateTimes won't work properly. But I'd like this to be possible.

I also have a somewhat related question: are the current quantities of dimension Time really suitable as durations?

Great question. In your now() + 1 year example, I agree that adding 365.243 days is probably unexpected. As another example, consider what a user might intend with now() + 1 month ? Probably most people mean something like now.month += 1 with some unspecific behavior about what happens when the next month doesn't have the same number of days as the current month.

BTW, there is an argument here that the current prelude definitions for year and month are not obvious to most people, and maybe should be renamed (gregorian_year?)

I will say, however, that the current behavior of now() + 1 year being identical to now() + 365.2425 days is entirely expected and consistent given the definition of 1 year. So if we wanted to keep the current behavior, I think that is defensible. In fact, I think this could be combined with your add_months(dt: DateTime, num_months: Scalar) -> DateTime idea. The documentation for add_months could be a good place to describe the nuances here and why now() + 1 month behaves as it does and how if you want something else, then use add_months

would be really useful if we had a special syntax in the parser like @17:15 - @8:40 ("how many hours have I worked today?") or @2023-12-14 - @2023-07-25 ("how many days are there between Dec 14th and July 25th?").

I think these two examples suggest that we also need Date and Time objects (in addition to DateTime). Does that sound right?

And then there are timezones. I would certainly like to see datetimes in the local timezone by default. But it would be great if we could also convert to other timezones somehow.

For this, I'm wondering if we can re-purpose the conversion operator? For example, something like now() -> GMT-0500 ?

@ion1
Copy link

ion1 commented Jan 3, 2024

The same concern also applies to days: either it's always equal to 24 hours or you take DST and leap seconds into account.

@eminence
Copy link
Contributor Author

eminence commented Jan 7, 2024

I added a few more commits that add some new datetime functions:

  • parse_datetime
  • format_datetime
  • to_unixtime
  • from_unixtime

I also added experimental timezone support using the -> conversion operator. It uses strings as the right-hand-side because that allowed for easy implementation/experimentation, but I'm not necessarily sold on this syntax. Here's what it looks like:

>>> now()

  now()

    = Sat, 6 Jan 2024 23:29:25 -0500    [DateTime]

>>> now() -> "CET"

  now() ➞ "CET"

    = Sun, 7 Jan 2024 05:29:34 +0100    [DateTime]

>>> now() -> "Asia/Tokyo"

  now() ➞ "Asia/Tokyo"

    = Sun, 7 Jan 2024 13:29:45 +0900    [DateTime]

>>> parse_datetime("Wed, 20 Jul 2022 21:52:05 +0200") -> "local"

  parse_datetime("Wed, 20 Jul 2022 21:52:05 +0200") ➞ "local"

    = Wed, 20 Jul 2022 14:52:05 -0500    [DateTime]

@eminence
Copy link
Contributor Author

I've been continuing to test drive this over the past week, and continue to find it very useful.

  • I've simplified the error handling on typecheck errors. Doing something like now() + 4 meters results in a generic Operator + can not be applied to these types error. Unfortunately 4 meters + now() results in a different error message, but I'm hoping that can be fixed later and won't block this PR
  • So far, 100% of my uses of adding durations to dates have been with short durations (measured in days or hours), so I've not personally run into any annoyances with the strange "month" or "year" units.
    • Doing things like now() - 8 hours - 42 minutes feels fine. If syntax like now() - 8h42m existed I would probably use it, but I don't feel like it's needed for this PR
  • Using the -> conversion operation (with strings to represent timezones) still feels like a bit of an abuse of this operator, it's starting to feel pretty natural after a week of use (and it's exceedingly useful)
  • I tested in wasm and everything seemed to work.

At this point, I think this PR is ready for another review and wider testing.

@eminence eminence marked this pull request as ready for review January 14, 2024 04:32
@eminence eminence changed the title Draft: Initial support for DateTime types Initial support for DateTime types Jan 14, 2024
@sharkdp
Copy link
Owner

sharkdp commented Jan 17, 2024

Thank you very much for the updates. I currently don't have the energy to review this PR, but will hopefully do so soon. Concerning the question with calendar time units vs 'averaged' time units, I think it would be cool if someone could do some investigation how other tools handle this (e.g. Qalculate, Frink, GNU units, …)

@eminence
Copy link
Contributor Author

Thank you for the update. Please do not feel any rush or pressure to review this.

I've opened a new discussion thread in #292 to track the research task into how other tools handle time units

Comment on lines +32 to +33
chrono = "0.4.31"
chrono-tz = "0.8.5"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this has a quite drastic effect on the WASM size.

Before: 872.45 KB (872,448 bytes)
After: 1.89 MB (1,892,352 bytes)

I'm not saying this is a no-go, but maybe we could look into reducing the size of those dependencies (via feature flags?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I wasn't able to reduce the WASM size, even when disabled default features. chrono-tz has a way to limit the size of the TZ table which we might consider, but I'm not quite sure what ones we might omit (and I'm not currently sure how much space that would save anyway, compared to the size of chrono). It also recommends using LTO, but we're already using that in our release profile.

numbat/src/ffi.rs Outdated Show resolved Hide resolved
numbat/src/vm.rs Outdated Show resolved Hide resolved
numbat/src/typed_ast.rs Outdated Show resolved Hide resolved
numbat/src/typed_ast.rs Outdated Show resolved Hide resolved
numbat/src/typed_ast.rs Outdated Show resolved Hide resolved
numbat/src/typed_ast.rs Outdated Show resolved Hide resolved
numbat/src/typed_ast.rs Outdated Show resolved Hide resolved
numbat/src/typed_ast.rs Outdated Show resolved Hide resolved
numbat/src/typed_ast.rs Outdated Show resolved Hide resolved
numbat/src/typed_ast.rs Outdated Show resolved Hide resolved
numbat/src/vm.rs Outdated Show resolved Hide resolved
Suggestions from code review

Co-authored-by: David Peter <sharkdp@users.noreply.github.com>
numbat/src/typechecker.rs Outdated Show resolved Hide resolved
@sharkdp sharkdp mentioned this pull request Feb 8, 2024
17 tasks
Copy link
Contributor

@triallax triallax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks cool, just a few minor nitpicks.

numbat-cli/src/completer.rs Outdated Show resolved Hide resolved
numbat/src/ffi.rs Outdated Show resolved Hide resolved
numbat/src/ffi.rs Outdated Show resolved Hide resolved
numbat/src/vm.rs Outdated Show resolved Hide resolved
@sharkdp sharkdp added this to the 1.10 milestone Feb 8, 2024
numbat/src/vm.rs Outdated Show resolved Hide resolved
eminence and others added 2 commits February 9, 2024 08:21
This lets us convert a Duration into an exact number of seconds
without any unwraps or overflows
Co-authored-by: David Peter <sharkdp@users.noreply.github.com>
Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much!

@sharkdp sharkdp merged commit 21d5491 into sharkdp:master Feb 9, 2024
15 checks passed
@eminence eminence deleted the datetimes branch February 9, 2024 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants