New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robust string to date convertor #114

Closed
frizbog opened this Issue Jul 15, 2016 · 6 comments

Comments

Projects
None yet
1 participant
@frizbog
Owner

frizbog commented Jul 15, 2016

It would be helpful to have a utility to take string representations of dates and turn them into java.util.Date values, for doing sorting, age calculations (with varying degrees of precision), etc.

Date strings can be imprecise...for example "Aug 2016" would be interpreted as 2016-08-01, 2016-08-31, or something in between. "Btw 1946 and 1948" would be interpreted as 1946-01-01, or 1948-01-01, or something in between.

This interpreter of date strings should have some hints for how to resolve imprecise dates - whether to favor earliest, latest, or midpoints.

Formats of dates that should be supported:

  • yyyymmdd
  • ddMMMyyyy (e.g., 28SEP2012)
  • yyyy-mm-dd
  • dd/mm/yyyy
  • mm/dd/yyyy
  • mm/dd/yy (assume last 100 yrs)
  • mm/yyyy
  • mm/yy (assume last 100yrs)
  • yyyy
  • yy (assume last 100 yrs)

Months should allow numbers, three-letter abbreviations, certain four-letter abbreviations, and full spellings.
Abbreviations should allow periods to be present or omitted.
Date separators should be allowed to be slashes, periods, hyphens, commas, or whitespace

Dates prefixed with "Abt.", "About", "Appx", or "Approximately" and missing either a day value or a month value should be interpreted as a range of dates, then returned based on the earliest/midpoint/latest hint.

If two dates are supplied,

  • with our without prefixes such as "Between" or "Btw"
  • separated by slashes, hyphens, or the word "or" or "to"

the dates should again be interpreted as a range, then returned based on the earliest/midpoint/latest hint.

There would be no way to convert an interpreted date back into string form and get the original form.

@frizbog

This comment has been minimized.

Owner

frizbog commented Jul 17, 2016

There is a lot of material in both the 5.5 and the 5.5.1 spec on the formats of dates....amazing what you discover when you read. 😄 It's remarkably and unexpectedly specific. In particular, dates in the form mm/dd/yyyy or yyyy-mm-dd or dd.mm.yyyy are not compliant with the spec. Dates should look like 17 JUL 2016.

It's also pretty specific about how to do ranges, approximate dates, etc.

Since the spec is so specific (no pun intended), this will simplify things significantly and I can limit support to spec-compliant date values, at least for a first pass. Later, I could possibly expand/relax things if the need is really there. A quick non-scientific scan of a large number of GEDCOM files seems to indicate that most dates are being written correctly.

frizbog added a commit that referenced this issue Jul 18, 2016

frizbog added a commit that referenced this issue Jul 18, 2016

frizbog added a commit that referenced this issue Jul 18, 2016

frizbog added a commit that referenced this issue Jul 18, 2016

frizbog added a commit that referenced this issue Jul 18, 2016

frizbog added a commit that referenced this issue Jul 18, 2016

Issue #114 - more work on DateParser - fixing broken tests
There was work in progress so I commmented out the incompleted tests,
temporarily.  Remaining work: French Revolutionary calendar, Hebrew
calendar, and support around the English calendar change of 1752 (where
the years of dates can have slashes).
@frizbog

This comment has been minimized.

Owner

frizbog commented Jul 18, 2016

Remaining work: French Republican calendar, Hebrew calendar, and support around the English calendar change of 1752 (where the years of dates can have slashes).

frizbog added a commit that referenced this issue Jul 18, 2016

frizbog added a commit that referenced this issue Jul 18, 2016

frizbog added a commit that referenced this issue Jul 20, 2016

Issue #114 - Started support for Hebrew calendar
It now can parse a single Hebrew date (day month year, month year, or
year only) and convert it to a Gregorian date, but cannot deal yet with
imprecise date prefixes (like BET, EST, etc) on Hebrew dates; does not
honor the imprecise date preferences (particularly FAVOR_LATEST and
FAVOR_MIDPOINT) for Hebrew dates yet; and does not deal with Hebrew date
ranges and periods yet.

frizbog added a commit that referenced this issue Jul 20, 2016

@frizbog

This comment has been minimized.

Owner

frizbog commented Jul 20, 2016

Hebrew calendar complete, English Gregorian Calendar change and double-date support complete.
Remaining: French Republican Calendar.

frizbog added a commit that referenced this issue Jul 20, 2016

frizbog added a commit that referenced this issue Jul 21, 2016

Issue #114 - Starting French Republican Calendar support
Also renamed HebrewCalendar class to HebrewCalendarParser (and renamed
the corresponding test)

frizbog added a commit that referenced this issue Jul 21, 2016

frizbog added a commit that referenced this issue Jul 21, 2016

Issue #114 - French Republican cal support - single dates about done
Still need to do ranges and periods
@frizbog

This comment has been minimized.

Owner

frizbog commented Jul 21, 2016

Remaining work: Ranges and Periods for French Republican calendar.

frizbog added a commit that referenced this issue Jul 21, 2016

@frizbog

This comment has been minimized.

Owner

frizbog commented Jul 21, 2016

3.0.1-SNAPSHOT as of 2016-07-21T19:33:44-04:00

@frizbog

This comment has been minimized.

Owner

frizbog commented Jul 22, 2016

Released in v3.0.1

@frizbog frizbog closed this Jul 22, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment