# Acquisition of CIK List

This Jupyter notebook helps acquire the CIK (Central Index Key) list of companies that are currently in the S&P 500 Index. These companies' CIK  numbers are required for further data extraction from the SEC's EDGAR database. 

## Import packages

In [None]:
import pandas as pd
import os 

## Get link from Wikipedia and download the list of CIK accordingly

The list of CIKs is obtained from Wikipedia, which keeps an updated table of S&P 500 companies including their CIKs.

In [None]:
link = (
    "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies#S&P_500_component_stocks"
)

try:
    df = pd.read_html(link, header=0)[0]
except Exception as e:
    print(f"An error occurred while trying to read the HTML: {e}")



cik = df['CIK'] 

# Ensure the directory exists before trying to write the file
output_dir = '../../data/00_raw/'
os.makedirs(output_dir, exist_ok=True)
cik.to_csv(os.path.join(output_dir, 'cik.csv'), index=False)

## Next step

The file ```cik.csv``` is the input for the extraction of 10-K filings with the use of the [EDGAR Crawler](https://github.com/nlpaueb/edgar-crawler). The 10-K filings are saved in the data folder. These data sets are then used as input for the subsequent notebook ```01_extract_sentences.ipynb```