## 2.1 Trading dates and hours
The goal is to create a csv with dates and the start times of pre-market and regular hours and end times of post-market.

In [1]:
###
from polygon.rest import RESTClient
from datetime import datetime, date, time, timedelta
from pytz import timezone
from functools import lru_cache
import os
import pytz
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import mplfinance as mpf

POLYGON_DATA_PATH = "../data/polygon/"

START_DATE = date(2019, 1, 1)
END_DATE = date(2023, 9, 7)

with open(POLYGON_DATA_PATH + "secret.txt") as f:
    KEY = next(f).strip()

client = RESTClient(api_key=KEY)

By the way: the earliest date where Polygon has data is 2003-09-10.

In [2]:
ticker_list_iterator = client.list_tickers(date="2003-09-10", active=True, market='stocks', limit=1000)
df = pd.DataFrame(ticker_list_iterator)
len(df)

8204

To get the market hours, we can infer it from daily SPY ETF data. But this does not include information about early closes. Luckily the early closes are listed on [QuantConnect](https://www.quantconnect.com/docs/v2/writing-algorithms/securities/asset-classes/us-equity/market-hours). I have copied them below.

So we will first get a list of all trading dates using SPY data. The regular hours always start at 9:30 ET. The start of premarket is 4:00 ET. To test if it is always 4:00 ET we will compare it to the 1-minute SPY data. According to QuantConnect there are no early or late opens. To get the close we will use 16:00 ET or 13:00 ET if it is an early close. The post-market ends 4 hours after close. We will also test this on the SPY minute data.

In [2]:
###
early_closes = ['1999-11-26','2000-07-03','2000-11-24','2001-07-03','2001-11-23',
'2001-12-24','2002-07-05','2002-11-29','2002-12-24','2003-07-03','2003-11-28',
'2003-12-24','2003-12-26','2004-11-26','2005-11-25','2006-07-03','2006-11-24',
'2007-07-03','2007-11-23','2007-12-24','2008-07-03','2008-11-28','2008-12-24',
'2009-11-27','2009-12-24','2010-11-26','2011-11-25','2012-07-03','2012-11-23',
'2012-12-24','2013-07-03','2013-11-29','2013-12-24','2014-07-03','2014-11-28',
'2014-12-24','2015-11-27','2015-12-24','2016-11-25','2017-07-03','2017-11-24',
'2017-12-24','2018-07-03','2018-11-23','2018-12-24','2019-07-03','2019-11-29',
'2019-12-24','2020-11-27','2020-12-24','2021-11-26','2022-11-25','2023-07-03',
'2023-11-24','2024-07-03',]
early_closes = [datetime.strptime(date, "%Y-%m-%d").date() for date in early_closes]

In [3]:
###
path = POLYGON_DATA_PATH + "../market/market_calendar.csv"

# Get list of trading dates
if not os.path.isfile(path):
    SPY_d1 = pd.DataFrame(client.get_aggs(ticker = "SPY", multiplier = 1, timespan = "day", from_ = START_DATE, to = END_DATE, limit=50000))
    SPY_d1["timestamp"] = pd.to_datetime(SPY_d1["timestamp"], unit="ms").dt.date
    SPY_d1.rename(columns = {"timestamp": "date"}, inplace=True)
    trading_days = SPY_d1[["date"]]
else:
    # If we already have it we just need to update
    old_market_hours = pd.read_csv(
            path,
            parse_dates=True,
        )
    old_market_hours["date"] = pd.to_datetime(old_market_hours["date"]).dt.date
    old_trading_days = old_market_hours[["date"]]
    last_day_old = old_trading_days['date'].values[-1]

    SPY_d1 = pd.DataFrame(client.get_aggs(ticker = "SPY", multiplier = 1, timespan = "day", from_ = (last_day_old + timedelta(days=1)).isoformat(), to = END_DATE, limit=50000))
    SPY_d1["timestamp"] = pd.to_datetime(SPY_d1["timestamp"], unit="ms").dt.date
    SPY_d1.rename(columns = {"timestamp": "date"}, inplace=True)
    new_trading_days = SPY_d1[["date"]]

    trading_days = pd.concat([old_trading_days, new_trading_days])
    trading_days.drop_duplicates(inplace=True)
    trading_days.sort_values(by='date', inplace=True)

# Fill with pre-market/regular session
trading_hours = trading_days.copy()
trading_hours["premarket_open"] = time(4)
trading_hours["regular_open"] = time(9, 30)

trading_hours["regular_close"] = trading_hours["date"].isin(early_closes)
trading_hours["regular_close"] = trading_hours["regular_close"].apply(lambda boolean: time(12, 59) if boolean else time(15, 59))

trading_hours["postmarket_close"] = pd.to_datetime(trading_hours['date'].astype(str) + " " + trading_hours["regular_close"].astype(str)) + timedelta(hours=4)
trading_hours["postmarket_close"] = trading_hours["postmarket_close"].dt.time

trading_hours.set_index("date", inplace=True)
trading_hours.to_csv(path)

In [34]:
# 2023-07-03 was an early close day
trading_hours[trading_hours.index == date(2023, 7, 3)]

Unnamed: 0_level_0,premarket_open,regular_open,regular_close,postmarket_close
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2023-07-03,04:00:00,09:30:00,12:59:00,16:59:00


We will also create some handy functions for handling and getting trading dates. These are also in the <code>utils.py</code> script so other notebooks can access them.

In [4]:
# Some handy functions
def datetime_to_unix(dt):
    """Converts a ET-naive datetime object to msec timestamp

    Args:
        dt (datetime): datetime to convert

    Returns:
        int: Unix millisecond timestamp
    """

    if isinstance(dt, datetime):
        time_ET = timezone("US/Eastern").localize(dt)
        return int(time_ET.timestamp() * 1000)
    else:
        raise Exception("No datetime object specified.")

def download_m1_raw_data(ticker, from_, to, client, columns=['open', 'high', 'low', 'close', 'volume']):
    """Downloads raw 1-minute data from Polygon and converts to ET-time

    Args:
        ticker (str): _description_
        from_ (date/datetime): the starting date(time)
        to (date/datetime): end ending date(time)
        client (RESTClient): the client object
        columns (list): list of column names to keep

    Returns:
        DataFrame: the result
    """
    
    # If no time specified, fill in the start of premarket/end of postmarket
    if all(isinstance(value, date) for value in (from_, to)):
        start_unix = datetime_to_unix(dt=datetime.combine(from_, time(4)))
        end_unix = datetime_to_unix(dt=datetime.combine(to, time(20)))
    elif all(isinstance(value, datetime) for value in (from_, to)):
        start_unix = datetime_to_unix(from_)
        end_unix = datetime_to_unix(to)
    else:
        raise Exception("No datetime or date object specified.")
    
    try:
        m1 = pd.DataFrame(
            client.list_aggs(
                ticker=ticker,
                multiplier=1,
                timespan="minute",
                from_=start_unix,
                to=end_unix,
                limit=50000,
                adjusted=False,
            )
        )
    except Exception as e:
        print(ticker)
        print(e)
        return

    if not m1.empty:
        m1["timestamp"] = pd.to_datetime(
            m1["timestamp"], unit="ms"
        )  # Convert timestamp to UTC
        m1.rename(columns={"timestamp": "datetime"}, inplace=True)
        m1["datetime"] = m1["datetime"].dt.tz_localize(
            pytz.UTC
        )  # Make UTC aware (in order to convert)
        m1["datetime"] = m1["datetime"].dt.tz_convert("US/Eastern")  # Convert UTC to ET
        m1["datetime"] = m1["datetime"].dt.tz_localize(None)  # Make timezone naive
        m1.set_index("datetime", inplace=True)
        m1 = m1[columns]

        return m1

    else:
        print(
            f"There is no data for {ticker} from {from_.isoformat()} to {to.isoformat()}"
        )

@lru_cache
def get_market_calendar():
    """Retrieves the market hours

    Returns:
        DataFrame: the index contains Date objects and the columns Time objects.
    """
    market_hours = pd.read_csv(
        POLYGON_DATA_PATH + "../market/market_calendar.csv", index_col=0
    )
    market_hours.index = pd.to_datetime(market_hours.index).date
    market_hours.premarket_open = pd.to_datetime(
        market_hours.premarket_open, format="%H:%M:%S"
    ).dt.time
    market_hours.regular_open = pd.to_datetime(
        market_hours.regular_open, format="%H:%M:%S"
    ).dt.time
    market_hours.regular_close = pd.to_datetime(
        market_hours.regular_close, format="%H:%M:%S"
    ).dt.time
    market_hours.postmarket_close = pd.to_datetime(
        market_hours.postmarket_close, format="%H:%M:%S"
    ).dt.time
    return market_hours

def get_market_dates():
    """Get a list of market days from the market calendar

    Returns:
        list: list of Date objects
    """
    market_hours = get_market_calendar()
    return list(market_hours.index)

def first_trading_date_after_equal(dt):
    """Gets first trading day after or equal to input date. Return the input if out of range.

    Args:
        dt (Date): Date object to compare. Can be a non-trading date.

    Returns:
        Date: the trading date
    """
    trading_days = get_market_dates()
    if dt < trading_days[0] or dt >= trading_days[-1]:
        print("Out of range! Returning input.")
        return dt
    while dt not in trading_days:
        dt = dt + timedelta(days=1)
    return dt

def last_trading_date_before_equal(dt):
    """Gets last trading day before or equal to input date. Return the input if out of range.

    Args:
        dt (Date): Date object to compare. Can be a non-trading date.

    Returns:
        Date: the trading date
    """
    trading_days = get_market_dates()
    if dt <= trading_days[0] or dt > trading_days[-1]:
        print("Out of range! Returning input.")
        return dt
    while dt not in trading_days:
        dt = dt - timedelta(days=1)
    return dt

def first_trading_date_after(day):
    """Gets first trading date after the specified trading date.

    Args:
        day (date): MUST be a trading date
    
    Returns:
        date: the next trading date
    """
    trading_days = get_market_dates()
    return trading_days[trading_days.index(day) + 1]


def last_trading_date_before(day):
    """Gets last trading date before the specified trading date.

    Args:
        day (date): MUST be a trading date
    
    Returns:
        date: the previous trading date
    """
    trading_days = get_market_dates()
    return trading_days[trading_days.index(day) - 1]
    

In [33]:
get_market_calendar().tail(5)

Unnamed: 0,premarket_open,regular_open,regular_close,postmarket_close
2023-08-28,04:00:00,09:30:00,15:59:00,19:59:00
2023-08-29,04:00:00,09:30:00,15:59:00,19:59:00
2023-08-30,04:00:00,09:30:00,15:59:00,19:59:00
2023-08-31,04:00:00,09:30:00,15:59:00,19:59:00
2023-09-01,04:00:00,09:30:00,15:59:00,19:59:00


# 2.2 Testing
According to [Schwab](https://www.schwab.com/public/schwab/nn/qq/about_extended_hours_trading.html) there are no extended hours on trading days with early closes. I could not find confirmation for this so I will test this claim.

In [13]:
data = download_m1_raw_data("SPY", from_ = date(2019, 1, 2), to = date(2023, 8, 18), client=client)
data.to_parquet(POLYGON_DATA_PATH + "../market/SPY.csv", engine="fastparquet", compression="snappy", row_group_offsets=25000)

In [21]:
SPY_m1 = pd.read_csv(
        POLYGON_DATA_PATH + "../market/SPY.csv",
        parse_dates=True,
        index_col="datetime",
        usecols=["datetime"]
    )

market_calendar = get_market_calendar()
market_dates = get_market_dates()

# Test for duplicated days in daily data, should be False
print(all(market_calendar.duplicated()))

SPY_open_dates = set(SPY_m1.index.date)

# Test if SPY dates are contained in the market days
print(SPY_open_dates.issubset(set(market_dates)))

False
True


Test if first minute of the day is always at 4:00 ET. As there are not always trades in the first exact minute, we will only look at the hour.

In [22]:
SPY_m1["date"] = SPY_m1.index.date
SPY_m1["hour"] = SPY_m1.index.hour

In [23]:
SPY_open = SPY_m1[~SPY_m1["date"].duplicated()] # By removing duplicates we get the first 1-minute bar of each day.
SPY_open[SPY_open["hour"] != 4]

Unnamed: 0_level_0,date,hour
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1


Test if the post-market closed 4 hours after an early close day (2023-07-03).

In [26]:
SPY_m1[SPY_m1['date'] == date(2023, 7, 3)].tail(1)

Unnamed: 0_level_0,date,hour
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-07-03 16:59:00,2023-07-03,16


Test if this is always the case for early closes.

In [30]:
SPY_close = SPY_m1[~SPY_m1["date"].duplicated(keep="last")]
SPY_early_close = SPY_close[SPY_close["hour"] < 19] # Remove regular closes
SPY_early_close

Unnamed: 0_level_0,date,hour
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-07-03 16:59:00,2019-07-03,16
2019-11-29 16:59:00,2019-11-29,16
2019-12-24 16:59:00,2019-12-24,16
2020-11-27 16:59:00,2020-11-27,16
2020-12-24 16:59:00,2020-12-24,16
2021-11-26 16:59:00,2021-11-26,16
2022-11-25 16:59:00,2022-11-25,16
2023-07-03 16:59:00,2023-07-03,16


This matches the QuantConnect list.

So to summarize: pre-market always starts at 4:00. The market always opens at 9:30. The market closes at 16:00, except on early closes where the close is 13:00. The post-market always closes 4 hours after the regular close.

So far, we have learned some quirks of the Polygon API and established the market hours. We now will loop through all trading dates and get a list of all tickers.

# 2.3 Updates
1. Update END_DATE.
2. Update the list of <code>early_closes</code>.
3. Run the 3 ### cells in 2.1.