# Work with Dates and Times

In [50]:
# Generate notebook download link
from IPython.display import FileLink
print('To download this notebook, right click on the link and Save link as...')
FileLink('tutorial_datetimes.ipynb')

To download this notebook, right click on the link and Save link as...


In [51]:
import riptable as rt
import numpy as np

In [52]:
# Display all Dataset columns -- the default max is 9.
rt.Display.options.COL_ALL = True

# Render up to 100MM before showing in scientific notation.
rt.Display.options.E_MAX = 100_000_000

# Truncate small decimals, rather than showing infinitesimal scientific notation.
rt.Display.options.P_THRESHOLD = 0

# Put commas in numbers.
rt.Display.options.NUMBER_SEPARATOR = True

# Turn on Riptable autocomplete (start typing, then press Tab to see options).
rt.autocomplete()

In Riptable, there are three fundamental date and time classes:

- `rt.Date`, used for date information with no time attached to it
- `rt.DateTimeNano`, used for data with both date and time information (including time zone), to nanosecond precision
- `rt.TimeSpan`, used for "time since midnight data," with no date information attached

Here, we'll cover how to create date and time objects, how to extract data from these objects, how to use date and time arithmetic to build useful date and time representations, and how to reformat date and time information for display.

## `Date` Objects

A Date object stores an array of dates with no time data attached. You can create Date arrays from strings, integer date values, or Matlab ordinal dates. Creating Date arrays from strings is fairly common. 

If your string dates are in YYYYMMDD format, you can simply pass the list of strings to `rt.Date()`.

In [53]:
rt.Date(['20210101', '20210519', '20220308'])

Date(['2021-01-01', '2021-05-19', '2022-03-08'])

If your string dates are in another format, you can tell `rt.Date()` what to expect using Python `strptime` format code.

In [54]:
rt.Date(['12/31/19', '6/30/19', '02/21/19'], format='%m/%d/%y')

Date(['2019-12-31', '2019-06-30', '2019-02-21'])

For a list of format codes, see [Python's documentation](https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior).

Note: Under the hood, dates are stored as integers -- specifically, as the number of days since the Unix epoch, 01-01-1970.

In [55]:
date_arr = rt.Date(['19700102', '19700103', '19700212'])
date_arr._fa

FastArray([ 1,  2, 42])

Dates have various properties (a.k.a. attributes) that give you information about a Date.

Let's create a Dataset with a column of Dates, then use Date properties to extract information into new columns.

In [56]:
ds = rt.Dataset()

 # Generate a range of dates, spaced 15 days apart
ds.Dates = rt.Date.range('2019-01-01', '2019-02-30', step=15) 

# Some useful Date properties
ds.Year = ds.Dates.year
ds.Month = ds.Dates.month  # 1=Jan, 12=Dec
ds.Day_of_Month = ds.Dates.day_of_month
ds.Day_of_Week = ds.Dates.day_of_week  # 0=Mon, 6=Sun
ds.Day_of_Year = ds.Dates.day_of_year

ds

#,Dates,Year,Month,Day_of_Month,Day_of_Week,Day_of_Year
0,2019-01-01,2019,1,1,1,1
1,2019-01-16,2019,1,16,2,16
2,2019-01-31,2019,1,31,3,31
3,2019-02-15,2019,2,15,4,46


The following two properties are particularly useful when you want to group data by month or week. We'll see some examples when we talk about Categoricals and Accums.

In [57]:
ds.Start_of_Month = ds.Dates.start_of_month
ds.Start_of_Week = ds.Dates.start_of_week  # Returns the date of the previous Monday

ds

#,Dates,Year,Month,Day_of_Month,Day_of_Week,Day_of_Year,Start_of_Month,Start_of_Week
0,2019-01-01,2019,1,1,1,1,2019-01-01,2018-12-31
1,2019-01-16,2019,1,16,2,16,2019-01-01,2019-01-14
2,2019-01-31,2019,1,31,3,31,2019-01-01,2019-01-28
3,2019-02-15,2019,2,15,4,46,2019-02-01,2019-02-11


We used Python's `strptime` format code above to tell `rt.Date()` how to parse our data. Riptable date and time objects can also use the `strftime()` method to format data for display.

In [58]:
ds.MonthYear = ds.Dates.strftime('%b%y')

ds

#,Dates,Year,Month,Day_of_Month,Day_of_Week,Day_of_Year,Start_of_Month,Start_of_Week,MonthYear
0,2019-01-01,2019,1,1,1,1,2019-01-01,2018-12-31,Jan19
1,2019-01-16,2019,1,16,2,16,2019-01-01,2019-01-14,Jan19
2,2019-01-31,2019,1,31,3,31,2019-01-01,2019-01-28,Jan19
3,2019-02-15,2019,2,15,4,46,2019-02-01,2019-02-11,Feb19


You can do some arithmetic with date and time objects. For example, we can get the number of days between two dates by subtracting one date from another.

In [59]:
date_span = ds.Dates.max() - ds.Dates.min()

date_span

DateSpan(['45 days'])

This returns a DateSpan object, which is a way to represent the delta, or duration, between two dates. You can convert it to an integer if you prefer.

In [60]:
date_span.astype(int)

FastArray([45])

If you add a DateSpan to a Date, you get a Date.

In [61]:
ds.Dates.min() + date_span

Date(['2019-02-15'])

Subtracting an array of dates from an array of dates gives you an array of DateSpans. The two Date arrays must be the same length.

In [62]:
ds.DateDiff = ds.Dates - ds.Start_of_Month

ds

#,Dates,Year,Month,Day_of_Month,Day_of_Week,Day_of_Year,Start_of_Month,Start_of_Week,MonthYear,DateDiff
0,2019-01-01,2019,1,1,1,1,2019-01-01,2018-12-31,Jan19,0 days
1,2019-01-16,2019,1,16,2,16,2019-01-01,2019-01-14,Jan19,15 days
2,2019-01-31,2019,1,31,3,31,2019-01-01,2019-01-28,Jan19,30 days
3,2019-02-15,2019,2,15,4,46,2019-02-01,2019-02-11,Feb19,14 days


Or you can subtract one Date from every record in a Date array.

In [63]:
ds.Dates2 = ds.Dates - rt.Date('20190102')

ds

#,Dates,Year,Month,Day_of_Month,Day_of_Week,Day_of_Year,Start_of_Month,Start_of_Week,MonthYear,DateDiff,Dates2
0,2019-01-01,2019,1,1,1,1,2019-01-01,2018-12-31,Jan19,0 days,-1 days
1,2019-01-16,2019,1,16,2,16,2019-01-01,2019-01-14,Jan19,15 days,14 days
2,2019-01-31,2019,1,31,3,31,2019-01-01,2019-01-28,Jan19,30 days,29 days
3,2019-02-15,2019,2,15,4,46,2019-02-01,2019-02-11,Feb19,14 days,44 days


## `DateTimeNano` Objects

A `DateTimeNano` object stores data that has both date and time information, with the time specified to nanosecond precision. 

Like `Date` objects, `DateTimeNano` objects can be created from strings. Strings are common when the data is from, say, a CSV file. 

Unlike `Date` objects, `DateTimeNano`s are time-zone-aware. When you create a `DateTimeNano`, you need to specify the time zone of origin with the `from_tz` argument. Since Riptable is mainly used for financial market data, its time zone options are limited to NYC, DUBLIN, and (as of Riptable 1.3.6) Australia/Sydney, plus GMT and UTC (which is an alias for GMT).

(If you're wondering why 'Australia/Sydney' isn't abbreviated, it's because Riptable uses the standard time zone name from the [tz database](https://en.wikipedia.org/wiki/Tz_database). In the future, Riptable will support only the [standard names](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) in the tz database.)

In [64]:
rt.DateTimeNano(['20210101 09:31:15', '20210519 05:21:17'], from_tz='GMT')

DateTimeNano(['20210101 04:31:15.000000000', '20210519 01:21:17.000000000'], to_tz='NYC')

Notice that the `DateTimeNano` is returned with `to_tz='NYC'`. This is the time zone the data is displayed in; NYC is the default. You can change the display time zone when you create the `DateTimeNano` by using `to_tz`.

In [65]:
time_arr = rt.DateTimeNano(['20210101 09:31:15', '20210519 05:21:17'], 
                           from_tz='GMT', to_tz='GMT')

time_arr

DateTimeNano(['20210101 09:31:15.000000000', '20210519 05:21:17.000000000'], to_tz='GMT')

And as with Dates, you can specify the format of your string data.

In [66]:
rt.DateTimeNano(['12/31/19', '6/30/19'], format='%m/%d/%y', from_tz='NYC')

DateTimeNano(['20191231 00:00:00.000000000', '20190630 00:00:00.000000000'], to_tz='NYC')

When you're dealing with large amounts of data, it's more typical to get dates and times that are represented as nanoseconds since the Unix epoch (01-01-1970). In fact, that is how `DateTimeNano` objects are stored (it's much more efficient to store numbers than strings).

In [67]:
time_arr._fa

FastArray([1609493475000000000, 1621401677000000000], dtype=int64)

If your data comes in this way, `rt.DateTimeNano()` can convert it easily. Just supply the time zone.

In [68]:
rt.DateTimeNano([1609511475000000000, 1621416077000000000], from_tz='NYC')

DateTimeNano(['20210101 14:31:15.000000000', '20210519 09:21:17.000000000'], to_tz='NYC')

To split the date off a DateTimeNano, use `rt.Date()`.
<!-- TODO: Put this an other examples in a ds. -->

In [69]:
rt.Date(time_arr)

Date(['2021-01-01', '2021-05-19'])

To get the time, use `time_since_midnight()`.

In [70]:
time_arr.time_since_midnight()

TimeSpan(['09:31:15.000000000', '05:21:17.000000000'])

Note that the result is a TimeSpan. We'll look at these more in the next section.


You can also get the time in nanoseconds since midnight.

In [71]:
time_arr.nanos_since_midnight()

FastArray([34275000000000, 19277000000000], dtype=int64)

`DateTimeNano`s can be reformatted for display using `strftime()`.

In [72]:
time_arr.strftime('%m/%d/%y %H:%M:%S')  # Date and time

array(['01/01/21 09:31:15', '05/19/21 05:21:17'], dtype=object)

Just the time:

In [73]:
time_arr.strftime('%H:%M:%S')

array(['09:31:15', '05:21:17'], dtype=object)

Some arithmetic.

In [74]:
# Create two DateTimeNano arrays
time_arr1 = rt.DateTimeNano(['20220101 12:00:00', '20220301 13:00:00'], from_tz='NYC', to_tz='NYC')
time_arr2 = rt.DateTimeNano(['20190101 11:00:00', '20190301 11:30:00'], from_tz='NYC', to_tz='NYC')

`DateTimeNano` - `DateTimeNano` = `TimeSpan`

In [75]:
timespan1 = time_arr1 - time_arr2

timespan1

TimeSpan(['1096d 01:00:00.000000000', '1096d 01:30:00.000000000'])

`DateTimeNano` + `TimeSpan` = `DateTimeNano`

In [76]:
dtn1 = time_arr1 + timespan1

dtn1

DateTimeNano(['20250101 13:00:00.000000000', '20250301 14:30:00.000000000'], to_tz='NYC')

`DateTimeNano` - `TimeSpan` = `DateTimeNano`

In [77]:
dtn2 = dtn1 - timespan1   

dtn2

DateTimeNano(['20220101 12:00:00.000000000', '20220301 13:00:00.000000000'], to_tz='NYC')

## `TimeSpan` Objects

You saw above how a `TimeSpan` represents a duration of time between two `DateTimeNano`s. You can also think of it as a representation of a time of day. 

Recall that you can split a `TimeSpan` off a `DateTimeNano` using `time_since_midnight()`. Just keep in mind that a `TimeSpan` by itself has no absolute reference to Midnight of any day in particular.

As an example, let's say you want to find out which trades were made before a certain time of day (on any day). If your data has `DateTimeNano`s, you can split off the `TimeSpan`, then filter for the times you're interested in.

In [78]:
rng = np.random.default_rng(seed=42)

ds = rt.Dataset()
N = 100  # Length of the Dataset

ds.Symbol = rt.FA(rng.choice(['AAPL', 'AMZN', 'TSLA', 'SPY', 'GME'], N))
ds.Size = rng.random(N) * 100

# Create a column of randomly generated DateTimeNanos
ds.TradeDateTime = rt.DateTimeNano.random(N)

ds.TradeTime = ds.TradeDateTime.time_since_midnight()
ds

#,Symbol,Size,TradeDateTime,TradeTime
0,GME,77.40,20071022 00:04:41.326607217,00:04:41.326607217
1,AAPL,43.89,19791101 05:14:23.956732255,05:14:23.956732255
2,SPY,85.86,19840121 21:33:31.540623842,21:33:31.540623842
3,GME,69.74,19720315 15:05:04.360183527,15:05:04.360183527
4,GME,9.42,19830306 05:58:52.000155286,05:58:52.000155286
5,TSLA,97.56,20180518 12:11:17.238038067,12:11:17.238038067
6,AMZN,76.11,19941017 15:09:44.825344623,15:09:44.825344623
7,SPY,78.61,19831027 06:05:05.649265806,06:05:05.649265806
8,AMZN,12.81,19860618 16:49:16.405217903,16:49:16.405217903
9,SPY,45.04,19820208 12:54:04.032099791,12:54:04.032099791


If we want to find the trades that happened before 10:00 a.m., we need a TimeSpan that represents 10:00 a.m. Then we can can compare our TradeTimes against it.

To construct a TimeSpan from scratch, you can pass time strings in `%H:%M:%S` format:

In [79]:
rt.TimeSpan(['09:00', '10:45', '02:30', '15:00', '23:10'])

TimeSpan(['09:00:00.000000000', '10:45:00.000000000', '02:30:00.000000000', '15:00:00.000000000', '23:10:00.000000000'])

Or from an array of numerics, along with a unit, like hours:

In [80]:
rt.TimeSpan([9, 10, 12, 14, 18], unit='h')

TimeSpan(['09:00:00.000000000', '10:00:00.000000000', '12:00:00.000000000', '14:00:00.000000000', '18:00:00.000000000'])

For our purposes, this will do:

In [81]:
tenAM = rt.TimeSpan(10, unit='h')

tenAM

TimeSpan(['10:00:00.000000000'])

Now we can compare the TradeTime values against it. We'll put the results of the comparison into a column so we can spot check them.

In [82]:
ds.TradesBefore10am = (ds.TradeTime < tenAM)

ds

#,Symbol,Size,TradeDateTime,TradeTime,TradesBefore10am
0,GME,77.40,20071022 00:04:41.326607217,00:04:41.326607217,True
1,AAPL,43.89,19791101 05:14:23.956732255,05:14:23.956732255,True
2,SPY,85.86,19840121 21:33:31.540623842,21:33:31.540623842,False
3,GME,69.74,19720315 15:05:04.360183527,15:05:04.360183527,False
4,GME,9.42,19830306 05:58:52.000155286,05:58:52.000155286,True
5,TSLA,97.56,20180518 12:11:17.238038067,12:11:17.238038067,False
6,AMZN,76.11,19941017 15:09:44.825344623,15:09:44.825344623,False
7,SPY,78.61,19831027 06:05:05.649265806,06:05:05.649265806,True
8,AMZN,12.81,19860618 16:49:16.405217903,16:49:16.405217903,False
9,SPY,45.04,19820208 12:54:04.032099791,12:54:04.032099791,False


And of course, we can use the Boolean array to filter the Dataset.

In [83]:
ds.filter(ds.TradesBefore10am)

#,Symbol,Size,TradeDateTime,TradeTime,TradesBefore10am
0,GME,77.40,20071022 00:04:41.326607217,00:04:41.326607217,True
1,AAPL,43.89,19791101 05:14:23.956732255,05:14:23.956732255,True
2,GME,9.42,19830306 05:58:52.000155286,05:58:52.000155286,True
3,SPY,78.61,19831027 06:05:05.649265806,06:05:05.649265806,True
4,TSLA,92.68,20120319 01:33:32.346477845,01:33:32.346477845,True
5,SPY,64.39,19940605 00:44:16.452040573,00:44:16.452040573,True
6,GME,55.46,19820710 04:12:42.406084379,04:12:42.406084379,True
7,GME,82.76,20060509 08:33:44.801775530,08:33:44.801775530,True
8,AMZN,75.81,20080818 03:45:47.742051033,03:45:47.742051033,True
9,TSLA,35.45,19831003 06:33:02.345767650,06:33:02.345767650,True


If we only want to see certain columns of the Dataset, we can combine the filter with slicing:

In [84]:
ds[ds.TradesBefore10am, ['Symbol', 'Size']]

#,Symbol,Size
0,GME,77.40
1,AAPL,43.89
2,GME,9.42
3,SPY,78.61
4,TSLA,92.68
5,SPY,64.39
6,GME,55.46
7,GME,82.76
8,AMZN,75.81
9,TSLA,35.45


Or if we just want the total size of AAPL trades before 10am:

In [85]:
aapl10 = (ds.Symbol == 'AAPL') & (ds.TradesBefore10am)

ds.Size.nansum(filter = aapl10)

274.92741837733035

### Other Useful things to Do with TimeSpans

We can compare two `DateTimeNano` columns to find times that are close together -- for example, those less than 10ms apart.

To illustrate this, we'll create some randomly generated small `TimeSpan`s to add to our column of `DateTimeNano`s.

In [86]:
 # Create TimeSpans from 1 millisecond to 19 milliseconds
some_ms = rt.TimeSpan(rng.integers(low=1, high=20, size=N), 'ms') 

# Offset the TimeSpans in our original DateTimeNano 
ds.TradeDateTime2 = ds.TradeDateTime + some_ms

ds.head()

#,Symbol,Size,TradeDateTime,TradeTime,TradesBefore10am,TradeDateTime2
0,GME,77.4,20071022 00:04:41.326607217,00:04:41.326607217,True,20071022 00:04:41.333607217
1,AAPL,43.89,19791101 05:14:23.956732255,05:14:23.956732255,True,19791101 05:14:23.974732255
2,SPY,85.86,19840121 21:33:31.540623842,21:33:31.540623842,False,19840121 21:33:31.550623842
3,GME,69.74,19720315 15:05:04.360183527,15:05:04.360183527,False,19720315 15:05:04.374183527
4,GME,9.42,19830306 05:58:52.000155286,05:58:52.000155286,True,19830306 05:58:52.009155286
5,TSLA,97.56,20180518 12:11:17.238038067,12:11:17.238038067,False,20180518 12:11:17.244038067
6,AMZN,76.11,19941017 15:09:44.825344623,15:09:44.825344623,False,19941017 15:09:44.840344623
7,SPY,78.61,19831027 06:05:05.649265806,06:05:05.649265806,True,19831027 06:05:05.668265806
8,AMZN,12.81,19860618 16:49:16.405217903,16:49:16.405217903,False,19860618 16:49:16.411217903
9,SPY,45.04,19820208 12:54:04.032099791,12:54:04.032099791,False,19820208 12:54:04.047099791


Now we can find the trades that occurred within 10ms of each other, and again put the results into a new column for a spot check.

In [87]:
ds.Within10ms = (abs(ds.TradeDateTime.time_since_midnight() 
                     - ds.TradeDateTime2.time_since_midnight())) < rt.TimeSpan(10, 'ms')
ds.head()

#,Symbol,Size,TradeDateTime,TradeTime,TradesBefore10am,TradeDateTime2,Within10ms
0,GME,77.4,20071022 00:04:41.326607217,00:04:41.326607217,True,20071022 00:04:41.333607217,True
1,AAPL,43.89,19791101 05:14:23.956732255,05:14:23.956732255,True,19791101 05:14:23.974732255,False
2,SPY,85.86,19840121 21:33:31.540623842,21:33:31.540623842,False,19840121 21:33:31.550623842,False
3,GME,69.74,19720315 15:05:04.360183527,15:05:04.360183527,False,19720315 15:05:04.374183527,False
4,GME,9.42,19830306 05:58:52.000155286,05:58:52.000155286,True,19830306 05:58:52.009155286,True
5,TSLA,97.56,20180518 12:11:17.238038067,12:11:17.238038067,False,20180518 12:11:17.244038067,True
6,AMZN,76.11,19941017 15:09:44.825344623,15:09:44.825344623,False,19941017 15:09:44.840344623,False
7,SPY,78.61,19831027 06:05:05.649265806,06:05:05.649265806,True,19831027 06:05:05.668265806,False
8,AMZN,12.81,19860618 16:49:16.405217903,16:49:16.405217903,False,19860618 16:49:16.411217903,True
9,SPY,45.04,19820208 12:54:04.032099791,12:54:04.032099791,False,19820208 12:54:04.047099791,False


And again we can use the result as a mask array.

In [88]:
ds[ds.Within10ms, ['Symbol', 'Size']]

#,Symbol,Size
0,GME,77.40
1,GME,9.42
2,TSLA,97.56
3,AMZN,12.81
4,GME,37.08
5,AAPL,82.28
6,SPY,22.72
7,GME,55.46
8,AMZN,6.38
9,GME,82.76


A common situation is having dates as date strings and times in nanos since midnight. You can use some arithmetic to build a DateTimeNano: `Date` + `TimeSpan` = `DateTimeNano`.

In [89]:
ds = rt.Dataset({
    'Date': ['20111111', '20200202', '20220222'],
    'Time': [44_275_000_000_000, 39_287_000_000_000, 55_705_000_000_000]
})

# Convert the date strings to rt.Date objects
ds.Date = rt.Date(ds.Date)

# Convert the times to rt.TimeSpan objects
ds.Time = rt.TimeSpan(ds.Time)

ds

#,Date,Time
0,2011-11-11,12:17:55.000000000
1,2020-02-02,10:54:47.000000000
2,2022-02-22,15:28:25.000000000


At this point, you might want to simply add `ds.Date` and `ds.Time` to get a `DateTimeNano`.

In [90]:
ds.DateTime = ds.Date + ds.Time

ds

#,Date,Time,DateTime
0,2011-11-11,12:17:55.000000000,20111111 12:17:55.000000000
1,2020-02-02,10:54:47.000000000,20200202 10:54:47.000000000
2,2022-02-22,15:28:25.000000000,20220222 15:28:25.000000000


And that seems to work. However, remember that `DateTimeNano`s need to have a time zone. Here, GMT was assumed.

In [91]:
ds.DateTime

DateTimeNano(['20111111 12:17:55.000000000', '20200202 10:54:47.000000000', '20220222 15:28:25.000000000'], to_tz='GMT')

Specify your desired time zone so you don't end up with unexpected results down the line.

In [92]:
ds.DateTime2 = rt.DateTimeNano((ds.Date + ds.Time), from_tz='NYC')

ds.DateTime2

DateTimeNano(['20111111 12:17:55.000000000', '20200202 10:54:47.000000000', '20220222 15:28:25.000000000'], to_tz='NYC')

Warning: Given that `TimeSpan + Date = DateTimeNano`, and also that you can use `rt.Date(my_dtn)` to get a `Date` from a `DateTimeNano`, you might reasonably think you can get the `TimeSpan` from a `DateTimeNano` using `rt.TimeSpan(my_dtn)`. 

However, that result includes the number of days since January 1, 1970. To get the `TimeSpan` from a `DateTimeNano`, use `time_since_midnight()` instead.


| **Datetime Arithmetic**                |
|----------------------------------------|
| Date + Date = TypeError                |
| Date + DateTimeNano = TypeError        |
| Date + DateSpan = Date                 |
| Date + TimeSpan = DateTimeNano         |
|                                        |
| Date - Date = DateSpan                 |
| Date - DateSpan = Date                 |
| Date - DateTimeNano = TimeSpan         |
| Date - TimeSpan = DateTimeNano         |
|                                        |
| DateTimeNano - DateTimeNano = TimeSpan |
| DateTimeNano - TimeSpan = DateTimeNano |
| DateTimeNano + TimeSpan = DateTimeNano |
|                                        |
| TimeSpan - TimeSpan = TimeSpan         |
| TimeSpan + TimeSpan = TimeSpan         |                                             

Next, we'll look at Riptable's vehicle for group operations: [Perform Group Operations with Categoricals](tutorial_categoricals.ipynb).

<br>
<br>

---

Questions or comments about this guide? Email RiptableDocumentation@sig.com.