# Undate: computing with uncertain and partially-unknown dates

`Undate` is an ambitious, in-progress effort to develop a pragmatic Python library for computation and analysis of temporal information in humanistic and cultural data, with a particular emphasis on uncertain, incomplete, or imprecise dates and with support for multiple calendars.

Researchers in the humanities often work with historical or cultural data, and knowing when particular materials were created or events happened is important for understanding the context, interpreting correctly, and determining relationships and sequencing. However, these kind of materials rarely have full precision dates with known year, month, and day. In some contexts, scholars may be happy if they can determine even just a century based on handwriting or mentions of historic coins.

Humanistic and cultural data also often includes dates in different calendars, or even a mix of calendars within the same project or system. It's important to preserve the original date and calendar information, but it's also valuable to convert dates to a standard calendar so they can be compared and sorted together. `Undate` objects are calendar aware and calendar explicit, with a default of the Gregorian calendar. Currently, we support parsing and calendar conversion for dates in the Hebrew _Anno Mundi_ calendar and Islamic _Hijri_ calendar.

This notebook demonstrates current use and functionality of the core `Undate` and `UndateInterval` objects, along with some examples and use-cases from specific projects.

## Basic functionality

Like Python's builtin `datetime.date` object, an `Undate` can be initialized by specifying numeric values for year, month, and day.

We can print them using the default serialization (ISO8601, or YYYY-MM-DD), and we can compare them.

In [19]:
import datetime

from undate.undate import Undate

# these are equivalent
dt_november7 = datetime.date(2000, 11, 7)
november7 = Undate(2000, 11, 7)

We can print them out. By default, both of these dates will be displayed in ISO8601 format (YYYY-MM-DD).

In [20]:
print(dt_november7)

2000-11-07


In [21]:
print(november7)

2000-11-07


We can also compare them. Is this the same date?

In [22]:
bool(november7 == dt_november7) 

True

Unlike Python's `datetime.date`, an `Undate` can be initialized without providing all values for year, month, and day.

We can create Undate instances for the month of November in 2000, for the year 2000, or even for November 7th in some unknown year.

`Undate` also has an optional `label` field, since it's sometimes useful to attach a label to date.

In [23]:
# November 2000
november = Undate(2000, 11, label="November 2000")
# Year 2000
year2k = Undate(2000, label="Y2K")
# November 7 in an unknown year
november7_some_year = Undate(month=11, day=7, label="Some November 7")
# let's reinitialize our first date with a label too
november7 = Undate(2000, 11, 7, label="November 7, 2000")

# sometimes names are important
easter1916 = Undate(1916, 4, 23, label="Easter 1916")

Each of these `Undate` objects can be displayed in a standard format, and also has information about the precision of the date and duration information.

In [24]:
for example_date in [november, year2k, november7_some_year, november7, easter1916]:
    print(f"\n{example_date.label}: {example_date}")
    print(f"Date precision: {example_date.precision}")
    print(f"Duration in days: {example_date.duration().days}")


November 2000: 2000-11
Date precision: MONTH
Duration in days: 30

Y2K: 2000
Date precision: YEAR
Duration in days: 366

Some November 7: --11-07
Date precision: DAY
Duration in days: 1

November 7, 2000: 2000-11-07
Date precision: DAY
Duration in days: 1

Easter 1916: 1916-04-23
Date precision: DAY
Duration in days: 1


We can also do some simple calculations, like whether one date falls within another date.

In [25]:
november in year2k

True

In [26]:
november7 in year2k

True

In [27]:
november7 in november

True

In [28]:
easter1916 in year2k

False

In [29]:
november7_some_year in year2k

False

## Partially unknown values

We can also intialize an `Undate` object with string values, when a date is only partially known. We use the character `X` to indicate an unknown digit, following the notation used in the [Extended Date Time Format (EDTF)](https://www.loc.gov/standards/datetime/).

In [30]:
someyear_1900s = Undate("19XX", label="1900s")
late2022 = Undate(2022, "1X", label="late 2022")

# FIXME: duration isn't right for year! and assumes max for month
# can we get UnInt duration for both of these?

for example_date in [someyear_1900s, late2022]:
    print(f"\n{example_date.label}: {example_date}")
    print(f"Date precision: {example_date.precision}")
    # print(f"Duration in days: {example_date.duration().days}")   # inaccurate! fix or omit?


1900s: 19XX
Date precision: YEAR

late 2022: 2022-1X
Date precision: MONTH


When an `Undate` instance is initialized, internally the class calculates earliest and latest possible values for that date in the Gregorian calendar.

This means that some comparisons are possible even without precise information.

For instance, is a year sometime during the 1900s before a month in late 2022?

In [31]:
someyear_1900s < late2022

True

But uncertain dates with the same initial values aren't equal, since they are uncertain:

In [32]:
late2022 == Undate(2022, "1X")

False

## Date Intervals

Like many other date libraries, `undate` includes support for intervals.  An `UndateInterval` is a date range between two `Undate` objects. Intervals can be open-ended, allow for optional labels, and can calculate duration if enough information is known.

In [34]:
from undate import UndateInterval

nineteenth_c = UndateInterval(Undate(1900), Undate(2000), label="19th century")
nineteenth_c

<UndateInterval '19th century' (1900/2000)>

In [35]:
nineteenth_c.duration().days

36890

>>> UndateInterval(Undate(1900), Undate(2000), label="19th century").duration().days
36890
<UndateInterval '19th century' (1900/2000)>
>>> UndateInterval(Undate(1900), Undate(2000), label="20th century")
<UndateInterval '20th century' (1900/2000)>
>>> UndateInterval(latest=Undate(2000))  # before 2000
<UndateInterval ../2000>
>>> UndateInterval(Undate(1900))  # after 1900
<UndateInterval 1900/>
>>> UndateInterval(Undate(1900), Undate(2000), label="19th century").duration().days
36890
>>> UndateInterval(Undate(2000, 1, 1), Undate(2000, 1,31)).duration().days
31
```

## Parsing dates in supported formats

## Calendars