Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust Date/Time format patterns parsing #7826

Merged
merged 79 commits into from Sep 22, 2023

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Sep 18, 2023

Pull Request Description

  • Closes Specifying YYYY instead of yyyy in date format yields to parsing wrong year #7461 by introducing a Date_Time_Formatter type and making parsing date time formats more robust and safer.
  • The default ('simple') set of patterns is slightly simplified and made case insensitive (except for M/m and H/h) to avoid the YYYY vs yyyy issues and make it less error prone.
  • The YYYY now has the same meaning as yyyy in simple mode. The old meaning (week-based year) is moved to a separate mode, triggered by Date_Time_Formatter.from_iso_week_date_pattern.
  • Full Java syntax, as well as custom-built Java DateTimeFormatter can also be used by Date_Time_Formatter.from_java.
  • Text-based constants (e.g. ISO_ZONED_DATE_TIME) have now become methods on Date_Time_Formatter, e.g. Date_Time_Formatter.iso_zoned_date_time).

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@radeusgd
Copy link
Member Author

The new format selectors:
image
image
image

Time_Of_Day is mostly unchanged. Date has got 3 week-based selector examples and Date_Time got only one, since it seems likely less common on Date_Time and that one has alraedy lots of formats anyway. They are mostly to show how to use them.

If the user selects the week-based selector, this is what is generated (then the user can tweak the format string too):
image
image

The 'simple' patterns are plain Text that is converted using a Date_Time_Formatter.from (that:Text) conversion.


The locale option was decoupled from the parse/format methods - it is now part of the Date_Time_Formatter - as Locale is indeed an integral part of the formatter.

As shown above, it can be customized for the week-based year. For simple patterns, we need to add additional arguments to the from conversion or use the explicit method:

image
image

Later, I hope we will be detecting these conversions and will have some way to display in the IDE a way to add these additional arguments (essentially, we will need to promote the auto-converted expression like 'dd/MM/yyyy' into explicit one: Date_Time_Formatter.from 'dd/MM/yyyy' which will then be able to show any additional arguments; cc: @JaroslavTulach I guess this may be interesting for you?).


Most importantly - regarding the original bug - most patterns are case insensitive - i.e. I can now put either YYYY or yyyy in the Simple pattern and they will mean the same (regular year). Similarly, in the week-based pattern both YY and yy will also mean (week based year, in this special mode).

Mixed case - like YyYy will not work though - this will be interpreted as 4 separate y patterns (so currently for formatting this will print the year 4 times and for parsing this will most likely fail). TODO: I guess maybe we should print an error instead?

The only case sensitive ones left are H/h for 24/12 hour time formats and M/m for months vs minutes. I plan to add warnings for these - if minutes are used between date parts or months are used between time parts (that's almost always a mistake), as well as if 12h time format (h) is used without AM/PM (a) and vice-versa if a shows up together with 24h format (H).


There is also possibility to construct a Date_Time_Formatter from the original Java pattern or a raw Java DateTimeFormatter instance. I decided to not expose these in the widgets though - as these are advanced features.

image

Advanced users can discover it in the Component Browser.

@radeusgd
Copy link
Member Author

radeusgd commented Sep 18, 2023

One thing that I didn't like about this new approach is that the Locale setting becomes a bit more hidden in the GUI. I hope we can make it better with better visualization of automatic conversions, but I assume it is not the easiest thing and together with an IDE visual component for it - we will not get it very soon.

So, to improve the discoverability of 'how does one customize the locale', I decided to add an example with custom locale to Date and Date_Time dropdowns:

image
image

Does that seem OK?

@radeusgd
Copy link
Member Author

And of course all the same dropdowns show for parse as well.

image

@jdunkerley
Copy link
Member

On the mixed case, I think we should error if we detect mixed cases on Y and H. Months and Minutes a harder cases but if they are mixed in together makes sense to error.

We could keep the Locale attached to the format/parse method? That would keep it as visible as before.

@radeusgd radeusgd force-pushed the wip/radeusgd/7461-safer-date-format-parser branch from f2e31bd to e421654 Compare September 18, 2023 14:25
@radeusgd
Copy link
Member Author

On the mixed case, I think we should error if we detect mixed cases on Y and H. Months and Minutes a harder cases but if they are mixed in together makes sense to error.

Ok, can do - I guess we would error on any two same letters with mixed case then - I don't think Y and H are any special here.

We could keep the Locale attached to the format/parse method? That would keep it as visible as before.

I wanted to remove it, because now the locale has to be part of the formatter - since it is set at construction of the Java DateTimeFormatter.

However, if you think that makes sense, I can re-add the locale argument and simply mean it updates/overrides the locale that is set in the formatter before. I'd make it default to Nothing meaning that it doesn't override the current locale, but if the user picks an explicit choice, it could do the override.

Does that sound OK?

@jdunkerley
Copy link
Member

Ok, can do - I guess we would error on any two same letters with mixed case then - I don't think Y and H are any special here.

I was meaning that we can mix 24hr and 12hr so that should be an error (2 hours).

I'm happy for the locale to be embedded in the Date_Time_Formatter - most of the time the default is the correct one anyway.

@radeusgd radeusgd force-pushed the wip/radeusgd/7461-safer-date-format-parser branch 4 times, most recently from 9b6b7a6 to 39db2eb Compare September 20, 2023 15:19
Comment on lines +125 to +158
// Allow Year and Quarter to be parsed without a day (use first day of the quarter).
if (parsed.isSupported(ChronoField.YEAR) && parsed.isSupported(IsoFields.QUARTER_OF_YEAR)) {
int dayOfQuarter =
parsed.isSupported(IsoFields.DAY_OF_QUARTER) ? parsed.get(IsoFields.DAY_OF_QUARTER) : 1;
int year = parsed.get(ChronoField.YEAR);
int quarter = parsed.get(IsoFields.QUARTER_OF_YEAR);
int monthsToShift = 3 * (quarter - 1);
LocalDate firstDay = LocalDate.of(year, 1, 1);
return firstDay.plusMonths(monthsToShift).plusDays(dayOfQuarter - 1);
}

// Allow Month and Day to be parsed without a year (use current year).
if (parsed.isSupported(ChronoField.DAY_OF_MONTH)
&& parsed.isSupported(ChronoField.MONTH_OF_YEAR)) {
return LocalDate.of(
LocalDate.now().getYear(),
parsed.get(ChronoField.MONTH_OF_YEAR),
parsed.get(ChronoField.DAY_OF_MONTH));
}

if (parsed.isSupported(IsoFields.WEEK_BASED_YEAR) && parsed.isSupported(IsoFields.WEEK_OF_WEEK_BASED_YEAR)) {
// Get the day of week or default to first day if not present.
long dayOfWeek = parsed.isSupported(ChronoField.DAY_OF_WEEK) ? parsed.get(ChronoField.DAY_OF_WEEK) : 1;
HashMap<TemporalField, Long> fields = new HashMap<>();
fields.put(IsoFields.WEEK_BASED_YEAR, parsed.getLong(IsoFields.WEEK_BASED_YEAR));
fields.put(IsoFields.WEEK_OF_WEEK_BASED_YEAR, parsed.getLong(IsoFields.WEEK_OF_WEEK_BASED_YEAR));
fields.put(ChronoField.DAY_OF_WEEK, dayOfWeek);

TemporalAccessor resolved = IsoFields.WEEK_OF_WEEK_BASED_YEAR.resolve(fields, parsed, ResolverStyle.SMART);
if (resolved.isSupported(ChronoField.EPOCH_DAY)) {
return LocalDate.ofEpochDay(resolved.getLong(ChronoField.EPOCH_DAY));
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replicated the logic we had originally for parsing 'partial matches' and even extended it a bit to new cases.

However, for longer term - I'm not sure if this is the right approach. Doing the week-based-year proved to be challenging this way and it seems there is a much simpler way, preferred by the JDK APIs - that is setting defaults for missing fields using parseDefaulting. I think ideally we should switch towards this direction - of detecting missing fields at the point of constructing the formatter and adding their defaults there - instead of fighting with the parser here.

I did it for the quarter as there was no other way to do it apparently:
https://github.com/enso-org/enso/pull/7826/files#diff-9d5efa451e02ef4abbd6920679d8933499c9d53fe9a94cd5e404df8d0ffe8506R50-R54

But I think we could move all of this logic into it, and it would be cleaner.
@jdunkerley what do you think?

I'd like to do it but this PR is huge already so I don't want to introduce any further changes. I'd do it as a followup. It's also low priority since the current solution works OK, it just could be a bit cleaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - good to do this in the more standard way.

And yes a separate PR would be good!

@radeusgd radeusgd self-assigned this Sep 22, 2023
@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Sep 22, 2023
@mergify mergify bot merged commit 12c4f29 into develop Sep 22, 2023
25 checks passed
@mergify mergify bot deleted the wip/radeusgd/7461-safer-date-format-parser branch September 22, 2023 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specifying YYYY instead of yyyy in date format yields to parsing wrong year
4 participants