# Generate list of DST transitions dates

We need to obtain a list of when daylight savings transitions happened.

Python is able to convert from timezones with DST to ones without. (e.g. `Australia/Sydney` to `Australia/Brisbane`.) Therefore the python datetime library must contain the raw data about when timezone changes happen.

This script grabs that data, and writes it into a convenient form.

We do this instead of downloading from an official site such as [this](https://www.nsw.gov.au/about-nsw/daylight-saving), because the official site has one page per year. It would be a pain to go through each one. We'd probably make a mistake doing so. 

Note that over the last decade or two, all of the regions with DST move their clock ("transition")  on the same day. 
They all shift forward/back by the same amount (1 hour).
Although SA is permenantly half an hour off from VIC, NSW, TAS.
(So they actually move their clocks back/forward half an hour before the others. For now we're just looking at dates and not caring about that. Since in practice people physically change their clocks just after dinnertime. If anything, people may go to be earlier/later a few hours before a DST transition. So the exact 2am transition time is not really relevant for us.)

Note that the dates are saved to CSV as `yyyy-mm-dd`. I have verified manually that R `read_csv` interprets this correctly.

In [1]:
import datetime as dt
import os

import pytz
import pandas as pd

In [2]:
base_data_dir = '/home/matthew/data/'
output_path = os.path.join(base_data_dir, '02-dst-dates.csv')

In [3]:
# AEMO data is in "market time", which is Brisbane time (no DST), UTC+10.
MARKET_TIME_OFFSET = 10 

In [4]:
data = []
tz = pytz.timezone('Australia/Sydney')
assert len(tz._utc_transition_times) == len(tz._transition_info)
for (transition_utc, (delta, t2, tz_acronym)) in zip(tz._utc_transition_times, tz._transition_info):
    if (dt.datetime(year=2000, month=1, day=1) < transition_utc) and (transition_utc < dt.datetime.now() + dt.timedelta(days=365)):
        data.append({
            'date': transition_utc.astimezone(tz).date(), 
            'direction': 'start' if (t2 > dt.timedelta(0)) else 'stop',
        })

df = pd.DataFrame(data)

In [5]:
# do a unit test
# to make sure we're not off by 1
# https://www.nsw.gov.au/about-nsw/daylight-saving
# "Daylight saving begins at 2am, Eastern Standard Time on Sunday 1 October 2023."
expected_date = dt.date(2023, 10, 1)
expected_direction = 'start'

assert df[(df['date'] == expected_date) & (df['direction'] == expected_direction)].shape[0] == 1

AssertionError: 

In [None]:
df.to_csv(output_path, index=False)