# Fundamentals
Recycled tickers: data not available.

I don't really care what the exact point-in-time fundamental data is. Because that would imply updating my data at every earnings release. And my main use for fundamentals data is filtering/segmentation analysis. E.g. filter non profitable/non-profitable companies for a specific mean-reversion strategy. Or get the 3000 largest stocks by market cap. I am not interested in pure fundamental strategies, as the holding period will be months to years. And then 2003-2024 is simply not enough data.

I will simply sample every quarter (after earnings releases) and then use that. Instead of having 1 file per ticker, we only need one fundamentals.csv file that contains everything. We can do this because the data is only updated quarterly so this file is small.

The fundamentals.csv file:
* Column 1: date (at Feb 1, May 1, Aug 1, Nov 1)
* Column 2: current ticker (however we might need to query the old ticker, which we luckily have the information for in tickers v5)
* Other columns:
    * Market-cap

Using market-cap, I can create historical constituents for S&P100, S&P500, S&P1500, Russell 3000 and Russell 3000E index.

In [1]:
from datetime import datetime, date, time, timedelta
from times import get_market_dates, get_market_calendar, last_trading_date_before
from data import get_data
from tickers import get_tickers
from polygon.rest import RESTClient
import json
import numpy as np
import ast
import pandas as pd

DATA_PATH = "../data/polygon/"

START_DATE = date(2003, 11, 1) # MUST BE 1st of FEB, MAY, AUG or NOV
END_DATE = date(2024, 3, 1)

with open(DATA_PATH + "secret.txt") as f:
    KEY = next(f).strip()

client = RESTClient(api_key=KEY)

In [None]:
rows = [] # List of dictionaries, to eventually create a DataFrame
tickers = get_tickers(types=['CS', 'ADRC'])
for index, row in tickers.iterrows():
    id = row['ID']
    ticker = id[:-11]

    country = row['country']
    sic = row['sic']
    start_data = row['start_data']
    end_data = row['end_data']

    # Get fundamental data for each quarter
    for day in pd.date_range(START_DATE, END_DATE, freq='3MS').date:
        # Stock must be active for the day we try to query fundamentals
        if day < start_data or day > end_data:
            continue

        ticker_on_day = ticker
        ticker_changes_str = tickers[tickers['ID'] == id]['tickers_old'].values[0]
        ticker_changes = ast.literal_eval(ticker_changes_str)

        # To get the point-in-time ticker, we need to search for the first ticker change after that date.
        # E.g. {'2022-10-17': 'BTX', '2021-03-26': 'NTN'} on 2021-03-26 is BTX, as on 2021-03-26 the ticker changed from NTN
        if ticker_changes:
            changes_after_day = dict(filter(lambda k: date.fromisoformat(k[0]) > day, ticker_changes.items()))
            if changes_after_day:
                ticker_on_day = ticker_changes[next(reversed(changes_after_day.keys()))]

        try:
            ticker_details = client.get_ticker_details(ticker=ticker_on_day, date=day)
        except Exception:
            continue

        try:
            market_cap = ticker_details.market_cap
        except AttributeError:
            continue
        
        if not market_cap:
            continue
        
        market_cap_M =  int(market_cap/1_000_000)
            
        # Add data to our list
        rows.append({'date': day,
                     'ID': id,
                     'market_cap_M': market_cap_M,
                     'country': country,
                     'sic': sic})
    
    # market_cap_df = pd.DataFrame(rows)
    # market_cap_df = market_cap_df.groupby('ID').agg({'market_cap_M': 'last'})
    # print(f'{index+1} | {len(market_cap_df)}')
    print(f'{index}')

market_cap_df = pd.DataFrame(rows)
market_cap_df.to_csv(DATA_PATH + 'processed/fundamentals.csv')

In [None]:
market_cap_df = pd.DataFrame(rows)
market_cap_df.to_csv(DATA_PATH + 'processed/fundamentals.csv')

### Analysis
96.5% of stocks have market cap data and 83% has sic codes. Let's look at the stocks that don't. (We will filter on stocks that have a history larger than 60 days, because for short listings there won't be fundamental data.)

In [None]:
tickers = get_tickers(types=['CS', 'ADRC'])
# tickers = tickers[tickers['end_data'] - tickers['start_data'] > timedelta(days=60)] 

In [None]:
fundamentals = pd.read_csv(DATA_PATH + 'processed/fundamentals.csv', index_col=0)
grouped_by_marketcap = fundamentals.groupby('ID').agg({'market_cap_M': 'last'}).dropna()
grouped_by_sic = fundamentals.groupby('ID').agg({'sic': 'last'}).dropna()

print(f'Amount of tickers: {len(tickers)}')
print(f"Amount of stocks with marketcap: {len(grouped_by_marketcap)}")
print(f"Amount of stocks with SIC code: {len(grouped_by_sic)}")

In [None]:
no_market_cap_tickers = []
no_SIC_tickers = []
for index, row in get_tickers().iterrows():
    id = row['ID']
    if id not in grouped_by_marketcap.index:
        no_market_cap_tickers.append(id)
    if id not in grouped_by_sic.index:
        no_SIC_tickers.append(id)

In [None]:
no_market_cap = tickers[tickers['ID'].isin(no_market_cap_tickers)]\
    [['ID', 'name', 'start_data', 'end_data', 'type', 'cik', 'composite_figi']]
no_market_cap.to_csv('../output/no_marketcap.csv')

no_SIC = tickers[tickers['ID'].isin(no_SIC_tickers)]\
    [['ID', 'name', 'start_data', 'end_data', 'type', 'cik', 'composite_figi']]
no_SIC.to_csv('../output/no_SIC.csv')

**When manually looking through the tickers with no SIC codes, almost all of them are foreign companies (but not ADRs).** It is not clear why they don't have SIC codes. Also the location is always 'us', even for foreign-headquartered corporations.

### Updates
<!-- 1. There is no need to process ticker changes because fundamentals is point-in-time. If we get fundamental data for a ticker that had a ticker change, the code automatically searchs for the old ticker.
2. Same procedure to update -->

In [None]:
from polygon.rest import RESTClient

with open(DATA_PATH + "secret.txt") as f:
    KEY = next(f).strip()

client = RESTClient(api_key=KEY)

data = pd.DataFrame(client.vx.list_stock_financials(cik = 'AAPL', filing_date_gte=date(2014, 1, 1), filing_date_lte=date(2016, 1, 1)) )