Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PRELIMINARY REVIEW: 0004 Date-time values must comply with IETF RFC-3339 #13

Merged
merged 7 commits into from
Jan 17, 2024

Conversation

zburke
Copy link
Member

@zburke zburke commented Oct 6, 2023

  • Validation of date-time values must assert compliance with IETF RFC-3339.
  • Date-time values in storage must be compliant with IETF RFC-3339 Section 5.6.
  • APIs must respect date-time values when retrieving a range of values by date-time in a query.

@craigmcnally craigmcnally changed the title 0004 Date-time values must comply with IETF RFC-3339 PRELIMINARY REVEIW: 0004 Date-time values must comply with IETF RFC-3339 Oct 11, 2023
@marcjohnson-kint marcjohnson-kint changed the title PRELIMINARY REVEIW: 0004 Date-time values must comply with IETF RFC-3339 PRELIMINARY REVIEW: 0004 Date-time values must comply with IETF RFC-3339 Oct 16, 2023

## Summary

* Validation of date-time values must assert compliance with IETF RFC-3339.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project informally made the RFC-3339 decision back in 2017

## Summary

* Validation of date-time values must assert compliance with IETF RFC-3339.
* Date-time values in storage must be compliant with IETF RFC-3339 Section 5.6.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC-3339 describes how date time values can be represented over the internet, how does that apply to the internal storage a module uses?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Section 5.6 is specifically about the format of those date-time values. So I think it's valid, but I get the quibble. Maybe just edit to

Date-time values in storage must be compliant with the date/time format specified in IETF RFC-3339 Section 5.6.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm missing is, why it is relevant for us to establish a policy on the storage format, when what is important is the external behaviour of the module?

For example, lets say I choose to store date / time as a numeric offset from 1st Jan 1970 UTC. AFAIK, that isn't compatible with representations defined in RFC-3339 however does allow the module to provide the intended behaviour of handling RFC-3339 formatted date / times from outside the module

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's both really, but @maccabeelevine has really put his finger on it :)

WRT data at rest, IMNSHO, a date-time value in storage that does not include a timezone causes more harm than good because of its potential ambiguity. Our documentation says we should consider it to be UTC. ISO-8601 says it is considered to be localtime. My vote is that we consider it to be underspecified. It is easy to remove that ambiguity. We should seize the opportunity to do so.

WRT data in transit, this is essentially the problem identified in MODAUD-172 where we do provided an RFC-3339 formatted value and mod-audit ignores it. This is not acceptable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that this proposal is, in fact, faithful to the comments on the DMOD-256 ticket @marcjohnson-kint linked to above where @jakub-id notes, "APIs should store and return timestamps in UTC and assume UTC on input UNLESS the timezone is indicated in the datestring".

A significant benefit of implementing a standard like RFC-3339 is that these concerns about "storing UTC" or "returning UTC" become completely irrelevant. If a backend module wants spend cycles converting input values like 2023-10-16T12:52:01-04:00 to UTC that's fine but it doesn't matter. On the UI side, every value will be localized to the tenant's timezone no matter whether it comes in as UTC or UTC-4 or UTC+14. Maybe there's some benefit to converting to UTC on output so that humans reading through sorted query output have an easier time identifying simultaneous events.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "block my calendar" example you provide is an example of times unattached to dates and should be implemented as "between start date-time A and end date-time B, block my calendar from time X to time Y". i.e. it corresponds to times X and Y, not to date-times A and B.

MODAUD-172 is about mod-audit failing to correctly handle a value that is compliant with RFC-3339, i.e. a date-time value that has a timezone. It is a straight up bug: given unambiguous information, it returns the wrong result. Even if this RFC hand been written and accepted before MODAUD-1 were filed, it would still be a bug.

My whole point in writing this RFC is that we can avoid a whole class of bugs related to making assumptions about date-times values without time zones if we do not accept date-time values without timezones. We can just choose not to have those problems.

I understand the value of avoiding words like "must" that create rules that need exceptions be of the need to make context-sensitive decisions. There is real value in that. But an alternative problem is that without rules, people make context-ignorant decisions and that result in bugs and then push back against fixing the bugs because the guidelines (not "rules") say code should (not "must") act in a particular way, and besides the code is already in production and fixing it is expensive and other things probably rely on the implementation as-is so fixing the problem may actually cause more problems. All true, hence my very strong desire to do everything possible not to have these problems in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a straight up bug: given unambiguous information, it returns the wrong result. Even if this RFC hand been written and accepted before MODAUD-1 were filed, it would still be a bug

Agreed

I understand the frustration around that bug and how that has driven that RFC

My whole point in writing this RFC is that we can avoid a whole class of bugs related to making assumptions about date-times values without time zones if we do not accept date-time values without timezones

I understand the motivation. I'm all for improving our standards and documentation

My point about the PostgreSQL part is that I don't think it does provide that protection

an alternative problem is that without rules, people make context-ignorant decisions and that result in bugs and then push back against fixing the bugs because the guidelines (not "rules") say code should (not "must") act in a particular way

Fair enough

The similar situation I'm trying to avoid is where the rules get so diluted by exceptions that folks stop caring about them (we seem to be encountering this in the module reviews at the moment)

Ultimately, this comes down to what culture FOLIO wants to have (and requires a heck of a lot more than an RFC, because the councils aren't authorities). In my experience, if a team wants to push back on an external request, they will find a way to justify doing so

I'm getting the sense that continuing this conversation is only increasing your frustration. I generally support your motivation and what you are trying to achieve

Given that, and how I've once again been given feedback that much of the input I give at these stages is inappropriate, I suggest we park this conversation and let the TC decide on moving this forward later today

I'm currently intending on (trying) not participating in preliminary reviews of RFCs going forward

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good points, @marcjohnson-kint, especially that if "rules get so diluted by exceptions that folks stop caring ... Ultimately, this comes down to what culture FOLIO wants to have". This is bang on target and puts words to an unspoken part of this RFC that (IMO) needs to be said out loud. Thank you for articulating it so clearly!

Personally, I trust a proposal more when it has withstood widespread attack than when it was been met with no objection. IOW, I greatly value the criticisms you have raised and regret that my pushback has come across as frustration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgreSQL stores "timestamp with time zone" (= timestamptz) as 8 bytes with microsecond resolution: https://www.postgresql.org/docs/current/datatype-datetime.html

"timestamptz" is used 90 times in FOLIO: https://github.com/search?q=org%3Afolio-org+timestamptz&type=code

I don't see a reason why each timestamptz should be changed so that it is stored as a string in RFC-3339 format in database tables, what problem does this solve?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Timestamp with time zone values" are fine. This RFC is aimed at avoiding the transmission or storage of date-time values without an accompanying time zone.

This RFC solves the problem of a time zone-less value being misinterpreted. Based on applicable standards, a date-time without a time zone should be interpreted as localtime. This could be the time zone configured by end-user's computer, or stored for the tenant, or of the server where Okapi is running, or the server where PostgreSQL is running, all of which will be wrong in at minimum 23/24 occurrences when a such a value is instead interpreted as UTC.


## Unresolved Questions

None. Long live IETF RFC-3339.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a meta-"unresolved question" that would be appropriate for many RFCs, but starting with what flower release (or other deadline) does the RFC propose that this change happen? I assume it requires a review of all date/time storage across all modules, which is not trivial. To be clear I support the RFC, but I think it will be easier to discuss & agree to if the timing is clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For discussion separately: #16


Existing documentation states that, "The APIs assume UTC on input, and store and return timestamps in UTC". Thus, APIs may assume date-time values without a `time-offset` attributes are UTC, which may be incorrect given that ISO-8601 states that [when no UTC offset is provided, "the time is assumed to be in local time"](https://en.wikipedia.org/wiki/ISO_8601#Local_time_(unqualified)).

Some APIs may accept date-time values that lack a time-offset attribute. For such APIs, this will be a breaking change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All RMB based modules require date-times values in CQL expressions to be in UTC.
Currently the CQL to SQL converter doesn't have access to the schema and therefore doesn't known whether some JSON string property is a date-time.
All comparisons are implemented as string comparisons.
This RFC requires non-trivial breaking changes to RMB and all RMB based modules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the CQL to SQL converter doesn't have access to the schema and therefore doesn't known whether some JSON string property is a date-time. All comparisons are implemented as string comparisons.

I think I finally understand your perspective but want to check my understanding: JSONB stores date-time values as plain strings and does not have a “native” way to compare them other than as strings. I had been thinking only about PostgreSQL storage where comparing "timestamp with time zone" values across different time zones works just fine. But given we are dealing with plain strings, it is necessary for all values to be in the same time zone. Is that correct?

IOW, I can understand why requiring modules to be able to store non-UTC string values is needlessly disruptive.

Still, I think all APIs need to respect RFC 3339 values in requests (i.e. do some conversion to UTC on input) and present RFC 3339 values in responses (i.e. add Z to all date-time values on output). I'd be happy to constrain this RFC to make that clear. @marcjohnson-kint said this would work for him. Would it work for you, too, @julianladisch?

Aside, I think we should add the Z suffix (to indicate UTC) to all values in string-storage. Though it may seem wasteful since RMB modules assume values without time zones are in UTC, the benefits of standards compliance and reduced ambiguity are worth it, but this is not the most important thing we are discussing here.


Existing documentation states that, "The APIs assume UTC on input, and store and return timestamps in UTC". Thus, APIs may assume date-time values without a `time-offset` attributes are UTC, which may be incorrect given that ISO-8601 states that [when no UTC offset is provided, "the time is assumed to be in local time"](https://en.wikipedia.org/wiki/ISO_8601#Local_time_(unqualified)).

Some APIs may accept date-time values that lack a time-offset attribute. For such APIs, this will be a breaking change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All folio-vertx-lib based modules require date-time values in CQL expressions to be in UTC without time zone:
https://github.com/folio-org/folio-vertx-lib/blob/v3.1.0/core/src/main/java/org/folio/tlib/postgres/cqlfield/PgCqlFieldTimestamp.java
https://github.com/folio-org/folio-vertx-lib/blob/v3.1.0/core/src/test/java/org/folio/tlib/postgres/PgCqlQueryTest.java#L235-L255
This RFC requires non-trivial breaking changes to folio-vertx-lib and all folio-vertx-lib based modules.


## Risks and Drawbacks

Existing documentation states that, "The APIs assume UTC on input, and store and return timestamps in UTC". Thus, APIs may assume date-time values without a `time-offset` attributes are UTC, which may be incorrect given that ISO-8601 states that [when no UTC offset is provided, "the time is assumed to be in local time"](https://en.wikipedia.org/wiki/ISO_8601#Local_time_(unqualified)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When "APIs assume UTC on input" then local time is UTC for APIs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the strongest words possible, I disagree with this statement. Do not assume. Do not ever assume. Accept values that follow RFC-3339; reject all other values.


## Risks and Drawbacks

Existing documentation states that, "The APIs assume UTC on input, and store and return timestamps in UTC". Thus, APIs may assume date-time values without a `time-offset` attributes are UTC, which may be incorrect given that ISO-8601 states that [when no UTC offset is provided, "the time is assumed to be in local time"](https://en.wikipedia.org/wiki/ISO_8601#Local_time_(unqualified)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The APIs assume UTC on input", therefore APIs that allow a time-offset can require the time-offset to be zero. This is the case for CQL queries in RMB and all RMB based modules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APIs that allow a time-offset can require the time-offset to be zero

APIs that require a time-offset to be zero do not, in fact, allow a time-offset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pursue this implementation, then values with non-zero offsets must be rejected. Under this scenario, MODAUD-172 is still a legit bug but the appropriate fix is to return a 400 instead of a 200 with the correct content. I think that's a worse fix, but I also think it's better than the current implementation that sends an incorrect response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julianladisch Are you describing the current situation or proposing a solution?

APIs that allow a time-offset can require the time-offset to be zero

What do you mean by require the time offset to be zero? Does that mean that a client MUST provide an time-offset (which is the case for RFC-3339) AND that offset MUST be 0 (or Z)?

FWIW, my interpretation of The APIs assume UTC on input is likely different to many folks as I was involved in the conversations that led to that (rather unfortunate) turn of phrase.

I took that to mean that the module typically expected the input to be provided in UTC, rather than it required that it MUST be provided in UTC

That interpretation is based upon the context that the documentation also states that the date-time is provided in the RFC-3339 format, which includes a mandatory time offset, thus the assumption cannot be an inference, unless we are not complying with that specification

It feels like folks have emphasised the The APIs assume UTC on input part of the documentation and de-emphasised the FOLIO uses [RFC 3339](https://www.ietf.org/rfc/rfc3339.txt) as the standard to represent date and time format (i.e. a profile of ISO 8601) part (which is IMO by far the more important part)


Existing documentation states that, "The APIs assume UTC on input, and store and return timestamps in UTC". Thus, APIs may assume date-time values without a `time-offset` attributes are UTC, which may be incorrect given that ISO-8601 states that [when no UTC offset is provided, "the time is assumed to be in local time"](https://en.wikipedia.org/wiki/ISO_8601#Local_time_(unqualified)).

Some APIs may accept date-time values that lack a time-offset attribute. For such APIs, this will be a breaking change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All folio-spring-support/folio-spring-cql based modules require date-time values in CQL expressions to be in UTC without time zone:
https://github.com/folio-org/folio-spring-support/blob/v7.2.0/folio-spring-cql/src/main/java/org/folio/spring/cql/Cql2JpaCriteria.java#L383-L397
This RFC requires non-trivial breaking changes to folio-spring-cql and all folio-spring-cql based modules.


## Rationale and Alternatives

This RFC aims to increase the clarity and precision of our guidance while at the same time bringing it into compliance with IETF RFC-3339.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative: Keep the current requirement that all clients that pass a CQL query to an API need to convert date-time values into UTC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, the challenge that @zburke is trying to address with this RFC is that a module misinterpreted a date-time without a time zone

How would leaving that ambiguous help with that challenge?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative: Keep the current requirement that all clients that pass a CQL query to an API need to convert date-time values into UTC.

This provides the clarity that is missing from our current documentation, but is the least-functional, least-standards-compliant outcome.

a module misinterpreted a date-time without a time zone

No, a module misinterpreted a date-time with a time zone. At present, mod-audit accepts date-time values that include UTC offsets but ignores the offset. IOW, you can send queries like "show me the events that happened on October 24th in UTC-4" and mod-audit will respond, "I understand. Here are the events on October 24th in UTC". I would consider acceptable responses to be either "I understand. Here are the events on October 24th in UTC-4" or "Sorry. I do not understand UTC-4". I would strongly prefer the former.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, a module misinterpreted a date-time with a time zone

Thank you for correcting my misunderstanding

@zburke
Copy link
Member Author

zburke commented Oct 25, 2023

@julianladisch, I recognize there are significant implications to many of the suggestions here due to legacy implementations that are ignorant of time zones. If we could separate the recommendations in this RFC from the impact it would have for current RMB code, do you think we would be able to find agreement on how to transmit and store date-time values?

It's clear that we aren't seeing eye to eye here and I'm trying to figure out if that's because we have different ideas about the ideal implementation or because the suggestions here do not adequately consider the impact on existing services.

@zburke
Copy link
Member Author

zburke commented Jan 3, 2024

The merit of the ideas here notwithstanding, the sad sad truth is that backend tooling has no concept of "date-time values": they're all just strings 😮 😢 😭 . I opened with this PR with the expectation that the work would amount to "tweak existing date-time validation to make it better" but without the concept of a date-time in the first place, it turns out this work would be harder and its ripple effects both incredibly broad and disruptive. Rats.

@zburke zburke closed this Jan 3, 2024
@zburke zburke reopened this Jan 17, 2024
This RFC was voluntarily closed during preliminary review. As noted at the time:

> The merit of the ideas here notwithstanding, the sad sad truth is that
> backend tooling has no concept of "date-time values": they're all just
> strings 😮 😢 😭 . I opened with this PR with the expectation that the
> work would amount to "tweak existing date-time validation to make it
> better" but without the concept of a date-time in the first place, it
> turns out this work would be harder and its ripple effects both
> incredibly broad and disruptive. Rats.
@craigmcnally craigmcnally merged commit 75aba0c into folio-org:master Jan 17, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants