## Working with time zones

By the end of this lecture you will be able to:
- add a time zone to a datetime
- change the time zone
- explain the use case of the different time zone functions

Working with time zones can be tricky. In this lecture we break it down to understand how the different time zone functions work.

In [None]:
from datetime import date,datetime

import polars as pl

## Creating a simple `DataFrame`

We create a `DataFrame` that has a single value in the `date` column - 1970/1/1 00:00:00

This date is the origin point for Unix timestamps. If we instead use a contemporary datetime it can be tricky to track changes in the integer representations as we are looking for small differences in large numbers.

To make things easier we will convert the integer representations from microseconds to hours with the following conversion factor

In [None]:
# Conversion factor to convert integer timestamps to hours
microseconds_per_hour = 3600 * 1e6

In the `DataFrame` we also add a column `hours` with the physical integer representation coverted from (integer) microseconds to (floating point) hours since the start of the epoch

In [None]:
df = (
    pl.DataFrame(
        {
            "datetime":[datetime(1970,1,1)]
        }
    )
    .with_columns(
        pl.col("datetime").to_physical().alias("hours")/microseconds_per_hour
    )
)
df

By default a `pl.Datetime` is **time zone-naive** - it has no time zone attached. Implicitly, however, a time zone-naive value is implicitly in the UTC time zone as 1970-01-01 00:00:00 as it corresponds to a timestamp of 0.

## Specify a time zone for a given datetime
If we know that the datetimes are not UTC but actually record a local datetime in a time zone we can specify the time zone with `dt.replace_time_zone`

The names of the time zone locations come from the Rust library chrono-tz. <a href="https://docs.rs/chrono-tz/latest/chrono_tz/enum.Tz.html" target="_blank"> See here for the full list of supported time zone names and locations</a>.


We tell Polars that this datetime is actually a local time in New York. We do this in a new column `tz_local` and also add the physical representation in `tz_local_hours`

In [None]:
(
    df
    .with_columns(
        [
            pl.col("datetime").dt.replace_time_zone("America/New_York").alias("tz_local"),
            pl.col("datetime").dt.replace_time_zone("America/New_York").to_physical().alias("tz_local_hours")/microseconds_per_hour
        ]
    )
)

By calling `dt.replace_time_zone`:
- the datetime hasn't changed from 1970-01-01 00:00:00 but now has the EST timezone
- the physical representation **has changed** from 0 to 5 hours

The physical representation must change by 5 hours because 1970-01-01 00:00:00 EST occured 5 hours into the Unix epoch

> Terminology: we refer to the difference in hours between timezones as the *offset*. For example the offset between 1970-01-01 00:00:00 UTC and 1969-12-31 19:00:00 EST is 5 hours.

## Change the time zone for a given Unix timestamp 
In this scenario we know that the original data was recorded in Unix timestamps and so is in the UTC timezone. We now want to know what local time that UTC timestamp corresponds to in New York. 

In this case we use `df.convert_time_zone` (after we have applied an explicit UTC time zone)

In [None]:
(
    df
    .with_columns(
        # Make UTC time zone explicit
        pl.col("datetime").dt.replace_time_zone("UTC")
    )
    .with_columns(
        [
            pl.col("datetime").dt.convert_time_zone("America/New_York").alias("with_tz"),
            pl.col("datetime").dt.convert_time_zone("America/New_York").to_physical().alias("with_tz_hours")/microseconds_per_hour
        ]
    )
)

By calling `dt.convert_time_zone`:
- the datetime **has been offset** by -5 hours to 1969-12-31 19:00:00 as EST is 5 hours behind UTC
- the physical representation has not changed from 0


We can remove a time zone by replacing the time zone with `None`

In [None]:
(
    df
    .with_columns(
        pl.col("datetime").dt.replace_time_zone("UTC")
    )
    # Remove the time zone
    .with_columns(
        [
            pl.col("datetime").dt.replace_time_zone(None).alias("no_tz"),
            pl.col("datetime").dt.replace_time_zone(None).to_physical().alias("no_tz_hours")/microseconds_per_hour
        ]
    )
)

### Summary of the methods
We summarise these methods here. The Datetime column reflects whether the datetime changes e.g. 1970-01-01 00:00:00 to 1969-12-31 19:00:00

| Method |Datetime | Timestamp|
|---|---|---|
| `dt.replace_time_zone` | No change | Changes timestamp |
| `dt.convert_time_zone` | Changes by offset| No change |

Example use cases:
- `dt.replace_time_zone` when your datetimes record when things happened in local time and you want to capture the right time zone and Unix timestamp
- `dt.convert_time_zone` when your data records when things happened in Unix timestamps and you want to know what this was in a local time zone


## Filtering time zone datetimes
To filter a datetime with a time zone we need to specify the time zone in the `filter`.

We use the `zoneinfo` library that is built into Python to specify the time zone.

In this example we create:
- a `DataFrame` with the first few hours of the Unix epoch
- add a `nyc` column in the New York time zone
- filter times before 06:00 in the New York time zone

In [None]:
from zoneinfo import ZoneInfo
start = datetime(1970,1,1)
stop = datetime(1970,1,1,7)
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,"1h",eager=True)
        }
    )
    .with_columns(pl.col("date").dt.replace_time_zone("America/New_York").alias("nyc"))
    .filter(
        pl.col("nyc") < datetime(1970,1,1,6,tzinfo=ZoneInfo("America/New_York"))
    )
)

### Extracting the date/time for a datetime with a timezone
In the lecture on Timeseries Features later in this section we see how to extract the date/time from a datetime with a timezone with `dt.date` and `dt.time`

## Exercises
In the exercises you will develop your understanding of:
- setting the time zone
- changing the time zone
- getting the time difference between time zones


## Exercise 1
Create a `DataFrame` with a `date` column at monthly intervals from 1st September 2020 to 1st December 2020

In [None]:
start = datetime(2020,9,1)
stop = datetime(2020,12,1)
(
    pl.DataFrame(
        {
            "date":<blank>
        }
    )
)

The dates in the `date` column record events that happened in an factory in Johannesburg in South Africa.

Transform the `date` column so that the datetimes are local to Johannesburg.

Continue with your query from above in each step of this exercise

Add a column with the integer representation called `date_p`

You want to know what time it was in the Dublin office when the events happened in Johannesburg. 

Add a column called `date_dublin` with the local time in Dublin for these events

Add a column called `offset` that shows the offset between Johannesburg and Dublin. Do this by subtracting columns with the same datetimes but different timezones

Why does the offset change over the months?

### Exercise 2
You have a weather station that records temperature at hourly intervals. The device records data in UTC.

In [None]:
pl.Config.set_tbl_rows(25)
import numpy as np
start = datetime(2020,9,1)
stop = datetime(2020,9,2)
(
    pl.DataFrame(
        {
            "date": pl.datetime_range(start, stop, "1h",eager=True)
        }
    )
    .with_columns(
        # We use a cosine function with a period of 24 hours to generate a fake temperature cycle
        (25 + 4*((2*np.pi*pl.col("date").to_physical()/(24*60*60*1e6))).cos()).alias("temperature")    
    )
)

From the output we can see that the device is not located in the UTC time zone as the highest temperature is at night and the lowest is in the afternoon.

Change the time zone to correspond with location that has higher temperatures in the late afternoon and lower temperatures in the early night (<a href="https://docs.rs/chrono-tz/latest/chrono_tz/enum.Tz.html" target="_blank">there are obviously many such locations, you mainly need to figure out whether to go east or west!</a>
).

In [None]:

(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,"1h",eager=True)
        }
    )
    .with_columns(
        # Use a cosine function with a period of 24 hours to generate a fake temperature cycle
        25 + 4*((2*np.pi*pl.col("date").to_physical()/(24*60*60*1e6))).cos().alias("temperature")
    )
    <blank>
)

## Solutions

### Solution to Exercise 1

Create a `DataFrame` with a `date` column at monthly intervals from 1st September 2020 to 1st December 2020

In [None]:
start = datetime(2020,9,1)
stop = datetime(2020,12,1)
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,"1mo",eager=True)
        }
    )
)

The dates in the `date` column actually record events that happened in an factory in Johannesburg in South Africa.

Transform the `date` column so that the datetimes are local to Johannesburg.

In [None]:
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,"1mo",eager=True)
        }
    )
    .with_columns(
        pl.col("date").dt.replace_time_zone("Africa/Johannesburg")
    )
)

Add a column with the integer representation called `date_p`

In [None]:
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,"1mo",eager=True)
        }
    )
    .with_columns(
        pl.col("date").dt.replace_time_zone("Africa/Johannesburg")
    )
    .with_columns(
        pl.col("date").to_physical().alias("date_p")
    )
)

You want to know what time it was in the Dublin office when the events happened in Johannesburg. 

Add a column called `date_dublin` with the local time in Dublin for these events

In [None]:
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,"1mo",eager=True)
        }
    )
    .with_columns(
        pl.col("date").dt.replace_time_zone("Africa/Johannesburg")
    )
    .with_columns(
        pl.col("date").to_physical().alias("date_p")
    )
    .with_columns(
        pl.col("date").dt.convert_time_zone("Europe/Dublin").alias("date_dublin")
    )
)

Add a column called `offset` that shows the offset between Johannesburg and Dublin.

In [None]:
(
    pl.DataFrame(
        {
            "date":pl.datetime_range(start,stop,"1mo",eager=True)
        }
    )
    .with_columns(
        pl.col("date").dt.replace_time_zone("Africa/Johannesburg")
    )
    .with_columns(
        pl.col("date").to_physical().alias("date_p")
    )
    .with_columns(
        pl.col("date").dt.convert_time_zone("Europe/Dublin").alias("date_dublin")
    )
    .with_columns(
        (pl.col("date") - pl.col("date_dublin").dt.replace_time_zone("Africa/Johannesburg"))
        .alias("offset")
    )
)

Why does the offset change over the months?

Because there is daylight savings time (Irish Summer Time IST) applied in Dublin in August and September.

### Solution to exercise 2
You have a weather station that records temperature at hourly intervals. The device records data in UTC.

In [None]:
pl.Config.set_tbl_rows(25)
import numpy as np
start = datetime(2020,9,1)
stop = datetime(2020,9,2)
(
    pl.DataFrame(
        {
            "date": pl.datetime_range(start, stop, "1h",eager=True)
        }
    )
    .with_columns(
        (25 + 4*((2*np.pi*pl.col("date").to_physical()/(24*60*60*1e6))).cos()).alias("temperature")    
    )
)

From the output we can see that the device is not located in the UTC timezone as the highest temperature is at night and the lowest is in the afternoon.

Change the timezone to a location that has higher temperatures in the late afternoon and lower temperatures in the early night (<a href="https://docs.rs/chrono-tz/latest/chrono_tz/enum.Tz.html" target="_blank">there are obviously many such locations, you mainly need to figure out whether to go east or west!</a>
).

In [None]:
(
    pl.DataFrame(
        {
            "date": pl.datetime_range(start, stop, "1h",eager=True)
        }
    )
    .with_columns(
        (25 + 4*((2*np.pi*pl.col("date").to_physical()/(24*60*60*1e6))).cos()).alias("temperature")    
    )
    .with_columns(pl.col("date").dt.replace_time_zone("UTC").dt.convert_time_zone("Brazil/West"))
)