## Time series preliminaries
By the end of this lecture you will be able to:
- create python `datetime` objects
- create a date range in Polars

Specifying dates and times as strings is failure-prone as a given string can map too different dates depending on the formatting used. As such the Polars developers have decided to not allow strings to be used in the library.

Instead in Polars dates and times are specified using python's built-in `datetime` module. We import these from the `datetime` module

In [1]:
from datetime import datetime,date,time,timedelta

import polars as pl

### Datetime
A datetime is a combination of a date and a time that can be specified down to microseconds.

We create a `datetime` object by specifying at least the year, month and day and optionally the hour, minute, second and microsecond.  Here we create a datetime for 2023/02/02 12:00:03.000001

In [2]:
dtime = datetime(2023,2,1,12,0,3,1)
dtime

datetime.datetime(2023, 2, 1, 12, 0, 3, 1)

We can extract components of the datetime from the object. For example to extract the date component we use the `date` method

In [3]:
dtime.date()

datetime.date(2023, 2, 1)

All datetime objects are stored internally as counts from the start of some period. We can get this underlying representation for a `datetime` object with the `timestamp` method.

In [4]:
dtime.timestamp()

1675252803.000001

This `datetime` is a count since the start of the UNIX/POSIX epoch on 1st January 1970

### Date
We create a `date` object by specifying the year, month and day

In [5]:
date(2023,2,1)

datetime.date(2023, 2, 1)

### Time
We create a `time` object by specifying the hour and optionally the minute, second and microsecond

In [6]:
time(14,2)

datetime.time(14, 2)

### Duration / time difference
We create a `timedelta` object by specifying the time difference in days, seconds, microseconds, milliseconds, minutes, hours or weeks

In [7]:
timedelta(days=1,hours=2)

datetime.timedelta(days=1, seconds=7200)

We can do arithmetic with `timedeltas`.

Here we define a half-hourly timedelta and then multiply it by 2

In [8]:
dt = timedelta(minutes = 30)

In [9]:
dt * 2

datetime.timedelta(seconds=3600)

Note that the largest interval in `timedelta` is days. This means `timedelta` does not have to deal with tricky things like months. For example, consider that if we added one month to 1st February we would expect to get 1st March. But if we add one month to 28th February do we expect to get 28th March or 31st March. Polars has ways to deal with this ambiguity that we see later.

Polars also has its own string intervals:
- "ns"
- "us"
- "ms"
- "s"
- "m"
- "h"
- "d"
- "w"
- "mo"
- "y"

So one week would be "1w".

These can also be concatenated so 1 day 3 hours is "1d3h"

We learn more about these intervals later in the time series section.

## Creating a datetime range
There are a number of ways to create a datetime range in Polars. We introduce the simplest way here.

We first specify our start, end and interval with `datetime` module objects

In [10]:
start_datetime = datetime(2023,1,1)
end_datetime = datetime(2023,1,1,4)
hourly_interval = timedelta(hours=1)

We create a datetime range `Series` using `pl.datetime_range`. Note that we have to specify `eager=True` for this to be evaluated - we explore why this is in a later lecture on date ranges

In [11]:
pl.datetime_range(
    start=start_datetime,
    end=end_datetime,
    interval=hourly_interval,
    eager=True
)

datetime
datetime[μs]
2023-01-01 00:00:00
2023-01-01 01:00:00
2023-01-01 02:00:00
2023-01-01 03:00:00
2023-01-01 04:00:00


The output is a Polars `Series`. The dtype in this case is `pl.Datetime`. We learn more about Polars datetime dtypes in the next lecture.

There are other options we can pass to `pl.datetime_range` including:
- how the date range is closed (on both sides by default) and
- whether to specify a time zone

In [12]:
pl.datetime_range(
    start=start_datetime,
    end=end_datetime,
    interval=hourly_interval,
    eager=True,
    closed="left",
)

datetime
datetime[μs]
2023-01-01 00:00:00
2023-01-01 01:00:00
2023-01-01 02:00:00
2023-01-01 03:00:00


We can also create a date range with dates rather than datetimes if the interval is even days

In [13]:
start_date = date(2023,1,1)
end_date = date(2023,1,23)
weekly_interval = timedelta(weeks=1)

In [14]:
pl.datetime_range(
    start=start_date,
    end=end_date,
    interval=weekly_interval,
    eager=True,

)

datetime
datetime[μs]
2023-01-01 00:00:00
2023-01-08 00:00:00
2023-01-15 00:00:00
2023-01-22 00:00:00


## Exercises
In the exercises you will learn to:
- use `datetime` objects
- create a date range in Polars

### Exercise 1
Create `date` objects for the 1st and 2nd January 2020 along with a 3 hour time interval using a `timedelta`

Create a `DataFrame` with a date range column called `date` using these parameters

In [15]:
df = pl.DataFrame(
    {
        <blank>
    }
)
df

SyntaxError: invalid syntax (718083251.py, line 3)

Create the `DataFrame` again using Polars string intervals at 2 hour 30 minute intervals

## Solutions

### Solution to exericise 1

Create `date` objects for the 1st and 2nd January 2020 along with a 3 hour time interval

In [None]:
start_date = date(2020,1,1)
end_date = date(2020,1,2)
interval = timedelta(hours=3)

Create a `DataFrame` with a date range column called `date` using these parameters

In [None]:
df = pl.DataFrame(
    {
        "date":pl.datetime_range(
            start=start_date,
            end=end_date,
            interval=interval,
            eager=True
        )
    }
)
df

Note the `eager=True` argument that is not the default for `pl.datetime_range`!

Create the `DataFrame` again using Polars string intervals at 2 hour 30 minute intervals

In [None]:
df = pl.DataFrame(
    {
        "date":pl.datetime_range(
            start=start_date,
            end=end_date,
            interval="2h30m",
            eager=True
        )
    }
)
df