# Merging ticker changes
In section 1.7 we have acquired the list of renamings. Now we will use them to merge our data. We have to be aware that it is possible for a ticker to used multiple times, so the <code>ticker_changes.csv</code> may contain multiple of the same tickers in the 'from' and 'to' column. 

After processing the ticker changes we will create a <code>tickers_v5.csv</code> which will be our definitive ticker list. This contains a column 'tickers_old', which will containa list of (date_of_change, ticker) pairs. So if A changes to B on day 2, and B changes to C on day 5, tickers_old for D will contain [[2, A], [5, B]].

The process will be as follows:
* As long as we have ticker changes to process
    * Loop through <code>tickers_v4.csv</code>.
        * Get the next trading date after 'end_date_data'.
        * Search in <code>tickers_changes.csv</code> if there is a ticker change on this date.
        * If it does:
            * The stock data will be merged.
            * In <code>tickers_v4.csv</code> we will change "ticker" to the new ticker and add a list [date, ticker] to "tickers_old".
            * All other rows will be merged such as "start_date". For identifiers we will take the last name available. We do NOT change the ID, because we chose the ID to be never-changing.
            * The row of the old ticker will be deleted
            * **We need to restart the loop!** If we don't the following can happen: Let's assume that a ticker was renamed from A -> B -> C -> D but that the order in which it appears in our ticker list is C, D, A, B. Using our loop, C gets merged with D. Then the loop checks D, which has no renamings. Then A gets merged with B. Then B gets merged with C, however that is incorrect! B should be merged with the new D, which contains C. Any double+ renamings have the risk of being in the 'wrong order'.

Note: if a ticker A goes OTC and then comes back and changes to B, then we will have two files: one of the A before OTC and the A+B after OTC named B.

In [1]:
from utils import get_tickers, get_market_dates, get_ticker_changes
from datetime import datetime, date, time
import mplfinance as mpf
import pandas as pd
import numpy as np
import os
DATA_PATH = "../../../data/polygon/"
END_DATE = date(2023, 9, 1)

In [127]:
tickers_v4 = get_tickers(v=4)
market_dates = get_market_dates()
ticker_changes = get_ticker_changes()

tickers_v4.insert(loc = 2, column = 'tickers_old', value = [[] for _ in range(len(tickers_v4))])

while True:
    # tickers_v4 gets smaller by 1 element every time we run this loop.
    for index_from, row_from in tickers_v4.copy().iterrows():
        # Get values
        type_from = row_from['type']
        if type_from == "INDEX":
            continue
        id_from = row_from['ID']
        ticker_from = row_from['ticker']
        start_date_from = row_from['start_date']
        end_date_from = row_from['end_date']
        start_data_from = row_from['start_data']
        end_data_from = row_from['end_data']
        if end_data_from == END_DATE:
            continue

        start_data_to = market_dates[market_dates.index(end_data_from) + 1]

        # Get ticker changes 
        change = ticker_changes[(ticker_changes.index == start_data_to) & (ticker_changes['from'] == ticker_from)]
        if change.empty:
            continue
        elif len(change) > 1:
            raise Exception("Duplicate!")
        ticker_to = change['to'].values[0]

        # Set values of new ticker
        row_to = tickers_v4[(tickers_v4['start_data'] == start_data_to) & (tickers_v4['ticker'] == ticker_to)]
        if row_to.empty:
            continue
        index_to = row_to.index[0]
        id_to = row_to['ID'].values[0]
        tickers_v4.loc[index_to, "tickers_old"].append([start_data_to, ticker_from])
        tickers_v4.loc[index_to, "start_date"] = start_date_from
        tickers_v4.loc[index_to, "start_data"] = start_data_from

        # Do the actual merging
        from_ = pd.read_parquet(DATA_PATH + f"processed/m1/{id_from}.parquet")
        to = pd.read_parquet(DATA_PATH + f"processed/m1/{id_to}.parquet")
        pd.concat([from_, to]).to_parquet(DATA_PATH + f"processed/m1/{id_to}.parquet", engine="pyarrow", compression = 'brotli')
        # os.remove(path = DATA_PATH + f"processed/m1/{id_from}.parquet") # Removal of old renamed ticker

        # Append index to delete later (we avoid deleting something we are iterating over)
        tickers_v4.drop(index_from, inplace=True)
        tickers_v4.reset_index(inplace=True, drop=True)
        print(f"Ticker change {ticker_from} -> {ticker_to} on {start_data_to} has been processed")
        print(f"{index_from/len(tickers_v4)*100:.1f}% | Length of tickers_v4 is {len(tickers_v4)}")
        break
    
    # If we have reached the end of the loop, it means we have processed everything. Then we can stop.
    if index_from == len(tickers_v4):
        break

Ticker change AACQ -> ORGN on 2021-06-25 has been processed
0.1% | Length of tickers_v4 is 7569
Ticker change AAXN -> AXON on 2021-01-26 has been processed
0.3% | Length of tickers_v4 is 7568
Ticker change ACAC -> MYPS on 2021-06-22 has been processed
0.6% | Length of tickers_v4 is 7567
Ticker change ACAM -> LOTZ on 2021-01-22 has been processed
0.7% | Length of tickers_v4 is 7566
Ticker change ACEV -> TMPO on 2022-11-23 has been processed
0.9% | Length of tickers_v4 is 7565
Ticker change ACIC -> ACHR on 2021-09-17 has been processed
1.0% | Length of tickers_v4 is 7564
Ticker change ACND -> MKTW on 2021-07-22 has been processed
1.1% | Length of tickers_v4 is 7563
Ticker change ACTC -> PTRA on 2021-06-15 has been processed
1.3% | Length of tickers_v4 is 7562
Ticker change ACTD -> OPAL on 2022-07-22 has been processed
1.3% | Length of tickers_v4 is 7561
Ticker change ADF -> HGTY on 2021-12-03 has been processed
1.5% | Length of tickers_v4 is 7560
Ticker change ADGI -> IVVD on 2022-09-13 

In [None]:
tickers_v4.to_csv("../../../data/tickers_v5.csv")
tickers_v5 = get_tickers(v=5)
tickers_v5[tickers_v5["tickers_old"].str.len() == 0].head(2)

In [115]:
tickers_v5.iloc[2000:2005]

Unnamed: 0,ID,ticker,tickers_old,name,active,start_date,end_date,start_data,end_data,type,cik,composite_figi
0,A-2019-01-01,A,[],Agilent Technologies Inc.,True,2019-01-01,2023-09-01,2019-01-02,2023-09-01,CS,1090872.0,BBG000C2V3D6
1,AA-2019-01-01,AA,[],Alcoa Corporation,True,2019-01-01,2023-09-01,2019-01-02,2023-09-01,CS,1675149.0,BBG00B3T3HD3
2,AAC-2021-03-25,AAC,[],Ares Acquisition Corporation,True,2021-03-25,2023-09-01,2021-03-25,2023-09-01,CS,1829432.0,
3,AACG-2019-01-01,AACG,[],ATA Creativity Global American Depositary Shares,True,2019-01-01,2023-09-01,2019-10-17,2023-09-01,ADRC,1420529.0,BBG000V2S3P6
4,AACI-2021-11-10,AACI,[],Armada Acquisition Corp. I Common Stock,True,2021-11-10,2023-09-01,2021-11-10,2023-09-01,CS,1844817.0,BBG011XR7306
...,...,...,...,...,...,...,...,...,...,...,...,...
7587,XLP-2003-09-10,,[],,,,,2019-01-02,2023-09-01,,,
7588,XLRE-2015-10-08,,[],,,,,2019-01-02,2023-09-01,,,
7589,XLU-2003-09-10,,[],,,,,2019-01-02,2023-09-01,,,
7590,XLV-2003-09-10,,[],,,,,2019-01-02,2023-09-01,,,
