# Adjustment factors
Adjustment factors are stored in the Polygon folder under <code>raw/adjustments/AAPL.csv</code>. If a ticker is recycled, then this file has the adjustment factors for all the companies associated with the ticker. There is no need to use the ID here. To get the adjustments, we simply use the <code>Stock Splits</code> and <code>Dividends</code> endpoint through the SDK. These are <code>list_splits</code> and <code>list_dividends</code>. 

There is a 100% chance that some info misses. Just look around at Reddit for Polygon data issues. However, if that is the case we will manually fix them. But to find them out, we either have to cross-check (which I am not gonna do), or just find it out when backtesting. 

Ex-dividend is the date when the investor does *not* get the dividend. If an investor held the stock before, he does. So on ex-dividend date the stock on average drops with the dividend. We need to add the dividends back to the stock price. Or subtract them from before ex-dividend. Most platforms do the backwards adjustments so I will also do that. The advantage is that the current price in the data is then unadjusted and thus equal to the actual market price.

The execution date of a split if when the stock has just been split, before market open. So all prices before the split should be adjusted. If the split is 10-to-1, this means that 10 stocks have become 1. So all prices before the execution date must be x10. If the split is 1-to-5, this means 1 stock is split into 5 pieces. Then all prices before the split date have to be divided by 5.

The file with adjustments in the <code>adjustments</code> folder have the following columns: <code>['ticker', 'date', 'type', 'subtype', 'amount']</code> The date is the ex-dividend or execution date. Type is 'DIV' or 'SPLIT'. Subtype is 'CD', 'SD', 'LT', 'ST' for dividends and 'R' (reverse), 'N' (regular) for splits. Amount is the USD amount for dividends and a fraction for splits. A 10-to-1 reverse split is 10. A 1-to-5 split is 0.2.

In [1]:
from polygon.rest import RESTClient
from datetime import datetime, date, time, timedelta
from pytz import timezone
from utils import first_trading_day_after, last_trading_day_before, get_tickers_v3
import os
import pytz
import pandas as pd
import numpy as np

In [14]:
DATA_PATH = "../../../data/polygon/"

START_DATE = first_trading_day_after(datetime(2020, 1, 1).date())
END_DATE = last_trading_day_before(datetime(2023, 8, 18).date())
# print(START_DATE)
# print(END_DATE)

CLEAN_DOWNLOAD = True # If False, only update existing data to the END_DATE. If True, make sure to remove all files in the adjustment map before a clean download.

with open(DATA_PATH + "secret.txt") as f:
    KEY = next(f).strip()

client = RESTClient(api_key=KEY)

In [19]:
# Loop through all tickers
tickers_v3 = get_tickers_v3()

"""
To keep track of what the date range is from our adjustments folder, we save the start and end date in start_and_end_date.txt.
The first line is the start date and the second is the end date.
"""

# Use a different start date if we only want to update
if not CLEAN_DOWNLOAD:
    if os.path.isfile(DATA_PATH + "raw/adjustments/start_and_end_date.txt"):
        with open(DATA_PATH + "secret.txt") as f:
            ORIGINAL_START_DATE = next(f).strip()
            
            end_date_of_data = next(f).strip()
            START_DATE = first_trading_day_after(end_date_of_data)
    else:
        raise Exception('There is no start_and_end_date.txt file!')
        
for index, row in tickers_v3.iterrows():
    id = row["ID"]
    ticker = row["ticker"]
    if ticker != "TSLA":
        continue

    start_date = max(row["start_date"], START_DATE) # Trim to START_DATE if ticker existed for longer
    end_date = min(row["end_date"], END_DATE) # Trim to END_DATE. But use end date of the ticker if delisted.

    # Tickers that do not need downloading/updating. This happens when the ticker is delisted before START_DATE.
    if end_date < START_DATE:
        continue

    # Get data
    try:
        splits = pd.DataFrame(client.list_splits(ticker=ticker, execution_date_gte=start_date, execution_date_lte=end_date))
        dividends = pd.DataFrame(client.list_dividends(ticker=ticker, ex_dividend_date_gte=start_date, ex_dividend_date_lte =end_date))
    except Exception as e:
        print(repr(e))
        continue

    # Get correct format
    if not dividends.empty:
        dividends = dividends.rename(columns={'ex_dividend_date': 'date', 'dividend_type':'subtype', 'cash_amount': 'amount'})
        dividends['type'] = 'DIV'
        dividends = dividends[['ticker', 'date', 'type', 'subtype', 'amount']]

    if not splits.empty:
        splits = splits.rename(columns={'execution_date': 'date'})
        splits['type'] = 'SPLIT'
        splits['subtype'] = np.where(splits['split_from'] > splits['split_to'], 'R', 'N')
        splits['amount'] = splits['split_from'] / splits['split_to']
        splits = splits[['ticker', 'date', 'type', 'subtype', 'amount']]
    
    # Skip loop if no data
    if splits.empty and dividends.empty:
        continue

    # Merge dividends and splits
    adjustments = pd.concat([dividends, splits])
    adjustments = adjustments.sort_values(by='date').reset_index(drop=True)
    adjustments['date'] = pd.to_datetime(adjustments['date']).dt.date

    if adjustments.isnull().values.any():
        #null_data = tickers_all[tickers_all[["ticker", "name", "active", "type", "start_date", "last_updated_utc"]].isnull().any(axis=1)]
        raise Exception(f"There are missing values for {ticker} at index {index}.")

    # Save or update
    path = DATA_PATH + f'raw/adjustments/{id}.csv'

    # This is in case of ticker recycling
    if not CLEAN_DOWNLOAD and os.path.isfile(path):
        old_adjustments = pd.read_csv(
            path,
            parse_dates=True,
        )
        all_adjustments = pd.concat([old_adjustments, adjustments])
        all_adjustments.to_csv(path, index=False)
    else:
        adjustments.to_csv(path, index=False)

# Create start_and_end_date.txt file that has the start/end dates.
with open(DATA_PATH + "raw/adjustments/start_and_end_date.txt", 'w') as f:
    if CLEAN_DOWNLOAD:
        f.write(f'{START_DATE}\n')
        f.write(f'{END_DATE}\n')
    else:
        f.write(f'{ORIGINAL_START_DATE}\n')
        f.write(f'{END_DATE}\n')

Example: MFA had a reverse split and some dividends.

In [118]:
pd.read_csv(DATA_PATH + "raw/adjustments/MFA.csv")

Unnamed: 0,ticker,date,type,subtype,amount
0,MFA,2021-03-30,DIV,CD,0.3
1,MFA,2021-06-29,DIV,CD,0.4
2,MFA,2021-09-29,DIV,CD,0.4
3,MFA,2021-12-30,DIV,CD,0.44
4,MFA,2022-03-21,DIV,CD,0.44
5,MFA,2022-04-05,SPLIT,R,4.0
6,MFA,2022-06-29,DIV,CD,0.44
7,MFA,2022-09-29,DIV,CD,0.44
8,MFA,2022-12-29,DIV,CD,0.35
9,MFA,2023-03-30,DIV,CD,0.35


In [119]:
pd.read_csv(DATA_PATH + "raw/adjustments/TSLA.csv")

Unnamed: 0,ticker,date,type,subtype,amount
0,TSLA,2022-08-25,SPLIT,N,0.333333


In [None]:
# # Code to remove any indices
# import glob
# path = DATA_PATH + 'raw/adjustments/*.csv'
# for fname in glob.glob(path):
#     print(fname)
#     data = pd.read_csv(fname)
#     data[['ticker', 'date', 'type', 'subtype', 'amount']].to_csv(fname, index=False)

### Updates
Simply rerun the file with the correct END_DATE and set CLEAN_DOWNLOAD to False.