## Adjusting datetimes

By the end of this lecture you will be able to:

- add an offset to a datetime
- truncate a datetime to the start of an interval
- round a datetime to an interval

In [None]:
from datetime import date,datetime,timedelta

import polars as pl

We create a `DataFrame` with a monthly datetime range starting from the 1st of the month

In [None]:
start = datetime(2020,1,1)
stop = datetime(2020,4,1)
df = (
    pl.DataFrame(
        {
            "datetime":pl.datetime_range(start,stop,interval="1mo",eager=True)
        }
    )
)
df

We can adjust a datetime using `pl.duration`, a `datetime.timedelta` or the `dt.offset_by` expression

In [None]:
(
    df
    .with_columns(
        (pl.col("datetime") + pl.duration(hours=1,minutes=10)).alias("with_duration"),
        pl.col("datetime").dt.offset_by("1h10m").alias("with_offset_by"),    
        (pl.col("datetime") + timedelta(hours=1,minutes=10)).alias("with_timedelta"),
    )
)

There are some subtle differences between `pl.duration` and `dt.offset_by`:
- `pl.duration` is a fixed amount of time so `pl.duration(days=1)` is 24 hours
- `dt.offset_by` works with the calender and so can be 23 or 25 hours depending on e.g. daylight savings

We illustrate this below. We create a `DataFrame` with a single row that has a datetime just before the daylight savings change in London. We then offset this by either one day or 24 hours with `pl.duration` and `dt.offset_by` 

In [None]:
(
    pl.DataFrame(
        {
            "datetime_before_clocks_change":[datetime(2020,3,29,0)]
        }
    )
    .with_columns(
        pl.col("datetime_before_clocks_change").dt.convert_time_zone("Europe/London")
    )
    .with_columns(
        pl.col("datetime_before_clocks_change").dt.offset_by("1d").alias("offset_by_one_day"),       
        pl.col("datetime_before_clocks_change").dt.offset_by("24h").alias("offset_by_24_hours"),        
        (pl.col("datetime_before_clocks_change") + pl.duration(days=1)).alias("duration_one_day"),
        (pl.col("datetime_before_clocks_change") + pl.duration(hours=24)).alias("duration_24_hours"),
    )
)

We see that the result for `offset_by_one_day` is different to the other adjusted columns. For `offset_by_one_day` we offset by 23 hours to go from midnight to midnight while for the other columns we adjust by 24 hours to go from midnight to 1 AM.

The largest unit supported by `pl.duration` is weeks and the largest unit supported by `timedelta` is days and so they cannot, for example, move forward by a calendar month. We can do this with `dt.offset_by`

In [None]:
(
    df
    .with_columns(
        pl.col("datetime").dt.offset_by("1mo").alias("add_month")
    )
)

If we want to move a datetime series to a month-end basis we use the `dt.month_end` expression (and similarly we have `dt.month_start`). 

If we call `offset_by` on a month end date it moves forward but not necessarily to the following month end. For example, we here add `month_end` column and then offset it by 1 month

In [None]:
(
    df
    .with_columns(
        pl.col('datetime').dt.month_end().alias("month_end")
    )
    .with_columns(
        pl.col('month_end').dt.offset_by("1mo").alias("offset")
    )

)

We see:
- in the first row we move from 31st January to 29th February but
- in the second row we move from 29th February to 29th March rather than 31st March

We must instead call `dt.month_end` after the offset of 1 month

In [None]:
(
    df
    .with_columns(
        pl.col('datetime').dt.month_end().alias("month_end")
    )
    .with_columns(
        pl.col('month_end').dt.offset_by("1mo").dt.month_end().alias("offset")
    )

)

## Binning datetimes
In this example we create a datetime series over 90 minutes at 20 minute intervals.

We want to transform these datetimes into one hour bins. We do this with `dt.truncate`

In [None]:
start = datetime(2020,1,1)
stop = datetime(2020,1,1,1,30)
(
    pl.DataFrame(
        {
            "datetime":pl.datetime_range(start,stop,interval="20m",eager=True)
        }
    )
    .with_columns(
        pl.col("datetime").dt.truncate("30m").alias("truncate"),
    )
)

All datetimes in a window are mapped to the datetime **at the start of the bin** and not the earliest time that occurs in the bin.

## Rounding datetimes
We use `dt.round` to do something similar except that datetimes are either rounded down to the start of the window or up to the end of the window.

In this example we have a 10-minute interval and round to hourly intervals

In [None]:
(
    pl.DataFrame(
        {
            "datetime":pl.datetime_range(start,stop,interval="20m",eager=True)
        }
    )
    .with_columns(
        pl.col("datetime").dt.truncate("30m").alias("truncate"),
        pl.col("datetime").dt.round("30m").alias("round"),
    )
)

We see that a datetime of `00:20:00` on the second row is truncated to `00:00:00` but rounded to `00:30:00`

## Exercises
In the exercises you will develop your understanding of:
- adding an offset to a datetime
- truncating a datetime
- rounding a datetime

### Exercise 1
Use `truncate` to map the values in the `pickup` column to the start of weekly intervals.

Apply an `offset` to ensure that the first mapped datetime is `2021-12-31 00:00:00`

In [None]:
csv_file = "../data/nyc_trip_data_1k.csv"
(
    <blank>
    .head()
)

Map the values in the `pickup` column into weekly windows based on the closest window boundary using `round`

In [None]:
csv_file = "../data/nyc_trip_data_1k.csv"
(
    <blank>
    .head()
)

### Exercise 2
Add 12 hours to each date so the datetimes are midday **on the last day of the month** instead of midnight

In [None]:
start = datetime(2020,1,1)
stop = datetime(2021,1,1)
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,interval="1mo",eager=True)
        }
    )
    <blank>
)

## Solutions

### Solution to exercise 1

Map the values in the `pickup` column to weekly intervals where the values are mapped to the start of the interval.

Ensure that the first mapped datetime is 2021-12-31 00:00:00

In [None]:
csv_file = "../data/nyc_trip_data_1k.csv"
(
    pl.read_csv(csv_file,try_parse_dates=True)
    .with_columns(
        pl.col("pickup").dt.truncate("1w").dt.offset_by("4d")
    )
    .head(2)
)

Map the values in the `pickup` column into weekly windows based on the closest window boundary

In [None]:
csv_file = "../data/nyc_trip_data_1k.csv"
(
    pl.read_csv(csv_file,try_parse_dates=True)
    .with_columns(
        pl.col("pickup").dt.round("1w")
    )
    .head(2)
)

### Solution to exercise 2
Add 12 hours to each date so the datetimes are midday **on the last day of the month** instead of midnight

In [None]:
start = datetime(2020,1,1)
stop = datetime(2021,1,1)
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,interval="1mo",eager=True)
        }
    )
    .select(
        pl.col("date").dt.month_end().dt.offset_by("12h")
    )
)