## Adjusting datetimes

By the end of this lecture you will be able to:

- add an offset to a datetime
- truncate a datetime to the start of an interval
- round a datetime to an interval

In [None]:
from datetime import date,datetime,timedelta

import polars as pl

We create a `DataFrame` with a monthly date range spanning 2020

In [None]:
start = datetime(2020,1,1)
stop = datetime(2021,1,1)
df = (
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,interval="1mo",eager=True)
        }
    )
)
df

See below for how to get this on a month-end basis

For adjustments of days or less we can adjust a datetime using `pl.duration` in an expression

In [None]:
(
    df
    .with_columns(
        (pl.col("date") + pl.duration(hours=1,minutes=10)).alias("add_hour_ten")
    )
    .head()
)

The largest unit supported by `pl.duration` is days and so it cannot, for example, move forward by a month.

A better way to adjust datetimes is to use `dt.offset_by` with a string interval offset

In [None]:
(
    df
    .with_columns(
        pl.col("date").dt.offset_by("1mo").alias("add_month")
    )
    .head()
)

The first row demonstrates that `dt.offset_by` can handle leap years.

We use the standard interval strings for `dt.offset_by`
- `"1ns"`:1 nanosecond
- `"1us"`:1 microsecond
- `"1ms"`:1 millisecond
- `"1s"` :1 second
- `"1m"` :1 minute
- `"1h"` :1 hour
- `"1d"` :1 day
- `"1w"` :1 week
- `"1mo"`:1 calendar month
- `"1y"` :1 calendar year

If we want to get a date series on a month-end basis we use `dt.month_end`

In [None]:
start = datetime(2020,1,1)
stop = datetime(2021,1,1)
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,interval="1mo",eager=True)
        }
    )
    .with_columns(
        pl.col('date').dt.month_end()
    )
)

## Truncating datetimes
In this example we create a datetime series over one hour at 20 minute intervals.

We truncate these to one hour bins with `dt.truncate`

In [None]:
start = datetime(2020,1,1)
stop = datetime(2020,1,1,2)
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,interval="20m",eager=True)
        }
    )
    .with_columns(
        pl.col("date").dt.truncate("1h").alias("truncate")
    )
    .head()
)

When we call `truncate` Polars bins the datetimes into windows with length of the truncate period.

All datetimes in a window are mapped to the datetime **at the start of the window**.

## How are the windows created?

To illustrate how the windows are created we:
- create a date range at 5 minute intervals over an hour
- use a truncation period of 11 minutes that does not divide into 60 minutes
- add the physical (microsecond) representation of the truncated datetime

In [None]:
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,interval="5m",eager=True)
        }
    )
    .with_columns(
        pl.col("date").dt.truncate("11m").alias("truncate")
    )
    .with_columns(
        pl.col("truncate").to_physical().alias("truncate_physical")
    )
    .head()
)

We see that the first datetime of `00:00:00` in `date` is mapped to `23:51:00` by `dt.truncate`.

If we divide the value in microseconds from the `truncate_physical` column by 11 minutes (in microseconds) we get an even division.

In [None]:
1577836260000000/(11*60*1e6)

So `2019-12-31 23:51:00` is the last multiple of 11 minutes before `2020-01-01 00:00:00` when we start the 11 minute intervals from 0 in the Unix epoch.

We can adjust the start of the windows with the `offset` argument. In this example we offset to start the first window at `00:00:00`

In [None]:
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,interval="5m",eager=True)
        }
    )
    .with_columns(
        pl.col("date").dt.truncate("11m").alias("truncate")
    )
    .with_columns(
        pl.col("date").dt.truncate("11m",offset="9m").alias("truncate_offset")
    )
    .head()
)

## Rounding datetimes
We use `dt.round` to do something similar except that datetimes are either rounded down to the start of the window or up to the end of the window.

In this example we have a 10-minute interval and round to hourly intervals

In [None]:
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,interval="10m",eager=True)
        }
    )
    .with_columns(
        pl.col("date").dt.round("1h").alias("round")
    )
    .head()
)

We see that a datetime of `00:30:00` halfway through the window is rounded up the end of the window at `01:00:00`

## Exercises
In the exercises you will develop your understanding of:
- adding an offset to a datetime
- truncating a datetime
- rounding a datetime

### Exercise 1
Use `truncate` to map the values in the `pickup` column to the start of weekly intervals.

Apply an `offset` to ensure that the first mapped datetime is `2021-12-31 00:00:00`

In [None]:
csv_file = "../data/nyc_trip_data_1k.csv"
(
    <blank>
    .head()
)

Map the values in the `pickup` column into weekly windows based on the closest window boundary using `round`

In [None]:
csv_file = "../data/nyc_trip_data_1k.csv"
(
    <blank>
    .head()
)

### Exercise 2
Add 12 hours to each date so the datetimes are midday **on the last day of the month** instead of midnight

In [None]:
start = datetime(2020,1,1)
stop = datetime(2021,1,1)
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,interval="1mo",eager=True)
        }
    )
    <blank>
)

## Solutions

### Solution to exercise 1

Map the values in the `pickup` column to weekly intervals where the values are mapped to the start of the interval.

Ensure that the first mapped datetime is 2021-12-31 00:00:00

In [None]:
csv_file = "../data/nyc_trip_data_1k.csv"
(
    pl.read_csv(csv_file,try_parse_dates=True)
    .with_columns(
        pl.col("pickup").dt.truncate("1w",offset="4d")
    )
    .head(2)
)

Map the values in the `pickup` column into weekly windows based on the closest window boundary

In [None]:
csv_file = "../data/nyc_trip_data_1k.csv"
(
    pl.read_csv(csv_file,try_parse_dates=True)
    .with_columns(
        pl.col("pickup").dt.round("1w")
    )
    .head(2)
)

### Solution to exercise 2
Add 12 hours to each date so the datetimes are midday **on the last day of the month** instead of midnight

In [None]:
start = datetime(2020,1,1)
stop = datetime(2021,1,1)
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,interval="1mo",eager=True)
        }
    )
    .select(
        pl.col("date").dt.month_end().dt.offset_by("12h")
    )
)