# 3.1 Tickers
The goal is to solve the problems of the Polygon ticker lists in the introduction. Before we do that we will download the ticker list for all days from Polygon and store them into the map <code>tickers</code>.

In [1]:
#####
from polygon.rest import RESTClient
from datetime import datetime, date, time, timedelta
from pytz import timezone
from functools import lru_cache
from utils import get_market_dates
import os
import pytz
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import mplfinance as mpf

POLYGON_DATA_PATH = "../data/polygon/"

START_DATE = date(2019, 1, 1)
END_DATE = date(2023, 9, 1)

with open(POLYGON_DATA_PATH + "secret.txt") as f:
    KEY = next(f).strip()

client = RESTClient(api_key=KEY)

First, I will create a function to download the ticker for a specific date.

In [2]:
###
def download_tickers(date_):
    """Retrieve the ticker list for a specific date

    Args:
        date_ (Date): the Date for which to download the ticker list

    Returns:
        DataFrame: the ticker list
    """
    
    date_iso = date_.isoformat()

    ticker_list_iterator_active = client.list_tickers(type="CS", date=date_iso, active=True, market='stocks', limit=1000)
    ticker_list_iterator_delisted = client.list_tickers(type="CS", date=date_iso, active=False, market='stocks', limit=1000)
    ticker_list_iterator_active_adr = client.list_tickers(type="ADRC", date=date_iso, active=True, market='stocks', limit=1000)
    ticker_list_iterator_delisted_adr = client.list_tickers(type="ADRC", date=date_iso, active=False, market='stocks', limit=1000)
    tickers_active = pd.DataFrame(ticker_list_iterator_active)
    tickers_delisted = pd.DataFrame(ticker_list_iterator_delisted)
    tickers_active_adr = pd.DataFrame(ticker_list_iterator_active_adr)
    tickers_delisted_adr = pd.DataFrame(ticker_list_iterator_delisted_adr)

    tickers_all = pd.concat([tickers_active, tickers_delisted, tickers_active_adr, tickers_delisted_adr])
    tickers_all.sort_values(by = "ticker", inplace=True)
    tickers_all.reset_index(inplace=True, drop=True)
    return tickers_all[['ticker', 'name', 'active', 'delisted_utc', 'last_updated_utc', 'cik', 'composite_figi', 'type']]

Then all ticker lists are downloaded and stored in the <code>raw/tickers/</code> map. But only the one that we need if we already have some.

In [3]:
###
# Get a list of what we already have
files = os.listdir(POLYGON_DATA_PATH + f'raw/tickers')
available_dates = [date.fromisoformat(file.replace(".csv", "")) for file in files]

trading_dates = get_market_dates()
for day in trading_dates:
    # Only download what we do not have
    if day >= START_DATE and day <= END_DATE and day not in available_dates:
        tickers = download_tickers(day)
        tickers.to_csv(POLYGON_DATA_PATH + f"raw/tickers/{day.isoformat()}.csv")
        print(f"Downloaded tickers for {day.isoformat()}")

Downloaded tickers for 2023-08-29
Downloaded tickers for 2023-08-30
Downloaded tickers for 2023-08-31
Downloaded tickers for 2023-09-01


A random ticker list:

In [12]:
pd.read_csv(POLYGON_DATA_PATH + f"raw/tickers/2022-06-09.csv", index_col=0).head(3)

Unnamed: 0,ticker,name,active,delisted_utc,last_updated_utc,cik,composite_figi,type
0,A,Agilent Technologies Inc.,True,,2022-06-14T00:00:00Z,1090872.0,BBG000C2V3D6,CS
1,AA,"Alcoa, Inc.",False,2016-11-01T00:00:00Z,2016-11-01T00:00:00Z,4281.0,,CS
2,AA,ALCOA INC,False,2016-10-07T00:00:00Z,2016-10-07T00:00:00Z,4281.0,,CS


We observe that the <code>last_updated_utc</code> does not match the date of the ticker list. For example for "A", this date is *after* 2022-06-09. So this value is not point-in-time. So this value is useless for us. Neither do we need <code>delisted_utc</code>, because we will determine the <code>end_date</code> by the ticker lists themselves. We will also determine the <code>start_date</code>, which Polygon does not give at all.

Later when we do have data, we will create a new column <code>start_data</code> and <code>end_data</code> which gives the start and end dates from the available data.

# 3.2 Building the tickers loop
Now we can finally create our ticker list, which includes all tickers. The process involves looping over all Polygon ticker lists and updating our own one. First some notation: T is our ticker list that we iteratively update using Polygons ticker list. P(i) is the Polygon ticker list from day *i*. 

1. On day 1, our ticker list is the same as the one from Polygon, but with some extra columns. We create a column <code>start_date</code> which is day 1 and <code>end_date</code> with is empty. We are only interested in stocks that were active on that day.
2. For all *i = 2 ... n* days, for the active stocks:
    * **Delistings**: The stocks that are in T but not in P(i) are the stocks that are removed by Polygon (e.g. FB). For these tickers we set the <code>end_date</code> in T to day *i*. 
    * **New listings**: The stocks that are in P(i) but not in T are the new listings. We will append the new stock to T and set the start_date to day *i*.
    * **Everything else**: The stocks that are both in P(i) and T are the stocks that 'continue their listings'. We do nothing.

Two tickers are the 'same' if all fields except <code>last_updated_utc</code> or <code>delisted_utc</code> are the same.

For testing, we will start with 2022-06-08 and update to 2022-06-09. Both FB and META should then be included with the correct start and end dates. The start and end date of FB should be 2022-06-08 and the start date of META should be 2022-06-09. The end date of META should be empty.

In [13]:
day_1 = date(2022, 6, 8)
day_2 = date(2022, 6, 9)

our_tickers = pd.read_csv(
    POLYGON_DATA_PATH + f"raw/tickers/{day_1.isoformat()}.csv",
    index_col=0,
)
our_tickers = our_tickers[["ticker", "name", "active", "cik", "composite_figi", "type"]]
our_tickers = our_tickers[our_tickers["active"] == True]
our_tickers.reset_index(inplace=True, drop=True)

our_tickers["start_date"] = day_1
our_tickers["end_date"] = pd.NaT

tickers_day_2 = pd.read_csv(
    POLYGON_DATA_PATH + f"raw/tickers/{day_2.isoformat()}.csv",
    index_col=0,
)
tickers_day_2 = tickers_day_2[["ticker", "name", "active", "cik", "composite_figi", "type"]]
tickers_day_2 = tickers_day_2[tickers_day_2["active"] == True]
tickers_day_2.reset_index(inplace=True, drop=True)

In [15]:
our_tickers.head(2)

Unnamed: 0,ticker,name,active,cik,composite_figi,type,start_date,end_date
0,A,Agilent Technologies Inc.,True,1090872.0,BBG000C2V3D6,CS,2022-06-08,NaT
1,AA,Alcoa Corporation,True,1675149.0,BBG00B3T3HD3,CS,2022-06-08,NaT


In [16]:
tickers_day_2.head(2)

Unnamed: 0,ticker,name,active,cik,composite_figi,type
0,A,Agilent Technologies Inc.,True,1090872.0,BBG000C2V3D6,CS
1,AA,Alcoa Corporation,True,1675149.0,BBG00B3T3HD3,CS


Preliminary check for duplicates

In [17]:
# Preliminary check: no duplicates
if our_tickers[["ticker", "name", "active", "cik", "composite_figi", "type"]].duplicated().all():
    raise Exception("There are duplicates!")

if tickers_day_2[["ticker", "name", "active", "cik", "composite_figi", "type"]].duplicated().all():
    raise Exception("There are duplicates!")

CONTINUE HERE