Load the SEC API key from a key file (Excluded from the git repo, so it stays isolated on my machine)

In [3]:
# read the api key
f = open("sec_api.key", "r")
api_key = f.readline()

Using the sec-api library's QueryApi tool to pull all SEC data from January 1, 2023 to December 31, 2023, pull 20 10-K's, and exclude NT 10-K's and 10-K/A's

In [2]:
from sec_api import QueryApi

query_api = QueryApi(api_key=api_key)

query = {
"query": """filedAt:[2023-01-01 TO 2023-12-31] \
    AND formType:(\"10-K\") \
    AND NOT formType:(\"NT\", \"10-K/A\")
  """,

  "from": "0",
  "size": "20",
  "sort": [{ "filedAt": { "order": "desc" } }]
}

response = query_api.get_filings(query)

In [4]:
import pandas as pd
metadata = pd.DataFrame.from_records(response['filings'])
print('number of filings:', len(metadata))
print('counts of each filing type:', metadata['formType'].value_counts())
print()

df_10k = metadata.loc[(metadata['formType'] == '10-K')]
print('number of 10-K, 10-K/A, and NT 10-K filings:', len(df_10k))

# df_10k.to_csv('10-K.csv', index=False)

display(df_10k)

res = df_10k.to_json()
print(res)

number of filings: 20
counts of each filing type: formType
10-K    20
Name: count, dtype: int64

number of 10-K, 10-K/A, and NT 10-K filings: 20


Unnamed: 0,id,accessionNo,cik,ticker,companyName,companyNameLong,formType,description,filedAt,linkToTxt,linkToHtml,linkToXbrl,linkToFilingDetails,entities,documentFormatFiles,dataFiles,seriesAndClassesContractsInformation,periodOfReport
0,ac1c82ba56935d6024a57453939051ce,0000090168-23-000083,90168,SIF,SIFCO INDUSTRIES INC,SIFCO INDUSTRIES INC (Filer),10-K,Form 10-K - Annual report [Section 13 and 15(d...,2023-12-29T17:53:23-05:00,https://www.sec.gov/Archives/edgar/data/90168/...,https://www.sec.gov/Archives/edgar/data/90168/...,,https://www.sec.gov/Archives/edgar/data/90168/...,[{'companyName': 'SIFCO INDUSTRIES INC (Filer)...,"[{'sequence': '1', 'description': '10-K 09.30....","[{'sequence': '11', 'description': 'XBRL TAXON...",[],2023-09-30
1,f9fd5115ba2dd2655322b3089229ca68,0001558370-23-020031,1967306,MSBB,"Mercer Bancorp, Inc.","Mercer Bancorp, Inc. (Filer)",10-K,Form 10-K - Annual report [Section 13 and 15(d...,2023-12-29T17:14:34-05:00,https://www.sec.gov/Archives/edgar/data/196730...,https://www.sec.gov/Archives/edgar/data/196730...,,https://www.sec.gov/Archives/edgar/data/196730...,"[{'companyName': 'Mercer Bancorp, Inc. (Filer)...","[{'sequence': '1', 'description': '10-K', 'doc...","[{'sequence': '11', 'description': 'EX-101.SCH...",[],2023-09-30
2,f15f12e631ec88c7aae95205fd1e48b5,0001599916-23-000295,1787412,WBBA,"WB Burgers Asia, Inc.","WB Burgers Asia, Inc. (Filer)",10-K,Form 10-K - Annual report [Section 13 and 15(d...,2023-12-29T17:09:26-05:00,https://www.sec.gov/Archives/edgar/data/178741...,https://www.sec.gov/Archives/edgar/data/178741...,,https://www.sec.gov/Archives/edgar/data/178741...,"[{'fiscalYearEnd': '0731', 'stateOfIncorporati...","[{'sequence': '1', 'size': '818616', 'document...","[{'sequence': '7', 'size': '27886', 'documentU...",[],2023-07-31
3,0b50523519b16d23b406384674b1d357,0001174947-23-001489,12040,BDL,FLANIGANS ENTERPRISES INC,FLANIGANS ENTERPRISES INC (Filer),10-K,Form 10-K - Annual report [Section 13 and 15(d...,2023-12-29T16:51:14-05:00,https://www.sec.gov/Archives/edgar/data/12040/...,https://www.sec.gov/Archives/edgar/data/12040/...,,https://www.sec.gov/Archives/edgar/data/12040/...,[{'companyName': 'FLANIGANS ENTERPRISES INC (F...,"[{'sequence': '1', 'description': '10-K', 'doc...","[{'sequence': '6', 'description': 'XBRL SCHEMA...",[],2023-09-30
4,6433bcdfc18c7a6c96ba82f4c99a2d74,0001929589-23-000010,1929589,MRDB,MariaDB plc,MariaDB plc (Filer),10-K,Form 10-K - Annual report [Section 13 and 15(d...,2023-12-29T16:46:24-05:00,https://www.sec.gov/Archives/edgar/data/192958...,https://www.sec.gov/Archives/edgar/data/192958...,,https://www.sec.gov/Archives/edgar/data/192958...,"[{'companyName': 'MariaDB plc (Filer)', 'cik':...","[{'sequence': '1', 'description': '10-K', 'doc...","[{'sequence': '10', 'description': 'XBRL TAXON...",[],2023-09-30
5,7673af245cef97a9da0bf04d5c21e189,0001359687-23-000028,1359687,REGX,"RED TRAIL ENERGY, LLC","RED TRAIL ENERGY, LLC (Filer)",10-K,Form 10-K - Annual report [Section 13 and 15(d...,2023-12-29T16:38:07-05:00,https://www.sec.gov/Archives/edgar/data/135968...,https://www.sec.gov/Archives/edgar/data/135968...,,https://www.sec.gov/Archives/edgar/data/135968...,"[{'fiscalYearEnd': '0930', 'stateOfIncorporati...","[{'sequence': '1', 'size': '1275617', 'documen...","[{'sequence': '6', 'size': '34294', 'documentU...",[],2023-09-30
6,ce0958b7418a37399cd03c1a0b93a9f9,0001213900-23-099889,1506251,CTXR,"Citius Pharmaceuticals, Inc.","Citius Pharmaceuticals, Inc. (Filer)",10-K,Form 10-K - Annual report [Section 13 and 15(d...,2023-12-29T16:05:44-05:00,https://www.sec.gov/Archives/edgar/data/150625...,https://www.sec.gov/Archives/edgar/data/150625...,,https://www.sec.gov/Archives/edgar/data/150625...,"[{'companyName': 'Citius Pharmaceuticals, Inc....","[{'sequence': '1', 'description': 'ANNUAL REPO...","[{'sequence': '9', 'description': 'XBRL SCHEMA...",[],2023-09-30
7,de95dcb9693157dcc42b815335f5f394,0000072633-23-000016,72633,NRT,NORTH EUROPEAN OIL ROYALTY TRUST,NORTH EUROPEAN OIL ROYALTY TRUST (Filer),10-K,Form 10-K - Annual report [Section 13 and 15(d...,2023-12-29T10:47:42-05:00,https://www.sec.gov/Archives/edgar/data/72633/...,https://www.sec.gov/Archives/edgar/data/72633/...,,https://www.sec.gov/Archives/edgar/data/72633/...,[{'companyName': 'NORTH EUROPEAN OIL ROYALTY T...,"[{'sequence': '1', 'documentUrl': 'https://www...",[],[],2023-10-31
8,aed08d80a232f8798a065dd7b2926bb1,0001967097-23-000004,1967097,,"Atmos Energy Kansas Securitization I, LLC","Atmos Energy Kansas Securitization I, LLC (Filer)",10-K,Form 10-K - Annual report [Section 13 and 15(d...,2023-12-29T09:07:12-05:00,https://www.sec.gov/Archives/edgar/data/196709...,https://www.sec.gov/Archives/edgar/data/196709...,,https://www.sec.gov/Archives/edgar/data/196709...,[{'companyName': 'Atmos Energy Kansas Securiti...,"[{'sequence': '1', 'description': '10-K', 'doc...","[{'sequence': '8', 'description': 'XBRL TAXONO...",[],2023-09-30
9,72f26f1f67ca7584e42365e9b152ab55,0001493152-23-046428,932021,GTLL,GLOBAL TECHNOLOGIES LTD,GLOBAL TECHNOLOGIES LTD (Filer),10-K,Form 10-K - Annual report [Section 13 and 15(d...,2023-12-28T17:55:31-05:00,https://www.sec.gov/Archives/edgar/data/932021...,https://www.sec.gov/Archives/edgar/data/932021...,,https://www.sec.gov/Archives/edgar/data/932021...,[{'companyName': 'GLOBAL TECHNOLOGIES LTD (Fil...,"[{'sequence': '1', 'documentUrl': 'https://www...","[{'sequence': '16', 'description': 'XBRL SCHEM...",[],2023-06-30


{"id":{"0":"ac1c82ba56935d6024a57453939051ce","1":"f9fd5115ba2dd2655322b3089229ca68","2":"f15f12e631ec88c7aae95205fd1e48b5","3":"0b50523519b16d23b406384674b1d357","4":"6433bcdfc18c7a6c96ba82f4c99a2d74","5":"7673af245cef97a9da0bf04d5c21e189","6":"ce0958b7418a37399cd03c1a0b93a9f9","7":"de95dcb9693157dcc42b815335f5f394","8":"aed08d80a232f8798a065dd7b2926bb1","9":"72f26f1f67ca7584e42365e9b152ab55","10":"25c62cc8276f9a0ed30f459366884a73","11":"4a8d010ac372f4c4e5d410d2e6a769e1","12":"be9180d914da0f7ff31d95a15a4e865d","13":"dde3660bffc6993b5428762b9c2a83b2","14":"a76e6c2d1847d358d2689e327c46569f","15":"00a7aa5ff5c04af6c1395e6df1f62efb","16":"29e0e10a4e36eafaa60316c4a05674aa","17":"9a6c98109ca876c4a11a2892379cdb05","18":"85076eea2a50ee560efb8a6d9d2fc711","19":"952692190c4c124bc2a479492be8f361"},"accessionNo":{"0":"0000090168-23-000083","1":"0001558370-23-020031","2":"0001599916-23-000295","3":"0001174947-23-001489","4":"0001929589-23-000010","5":"0001359687-23-000028","6":"0001213900-23-099889

In [24]:
print(df_10k['linkToHtml'][0])
print(df_10k['linkToTxt'][0])
print(df_10k['linkToXbrl'][0])
print(df_10k['linkToFilingDetails'][0])

https://www.sec.gov/Archives/edgar/data/90168/000009016823000083/0000090168-23-000083-index.htm
https://www.sec.gov/Archives/edgar/data/90168/000009016823000083/0000090168-23-000083.txt

https://www.sec.gov/Archives/edgar/data/90168/000009016823000083/sif-20230930.htm


Create a new list where each item in the list is the company name and the date filed combined with the pulled items 1A, 7, and 7A.

In [4]:
from sec_api import ExtractorApi
extractor_api = ExtractorApi(api_key=api_key)


texts = []
for idx in df_10k.index:
    url = df_10k['linkToFilingDetails'][idx]

    item_1a = extractor_api.get_section(url, "1A", "text")
    item_7 = extractor_api.get_section(url, "7", "text")
    item_7a = extractor_api.get_section(url, "7A", "text")
    texts.append({
        'companyName': df_10k['companyName'][idx],
        'filedAt': df_10k['filedAt'][idx],
        'item_1a':item_1a,
        'item_7': item_7,
        'item_7a': item_7a
    })

In [9]:
# save the extracted data to disk in .json to reduce API usage
import json
for chunk in texts:
    date = chunk['filedAt'].split('T')[0]
    name = './raw_data/{0}----{1}.json'.format(chunk['companyName'], date)
    with open(name, 'w') as outfile:
        json_string = json.dumps(chunk)
        outfile.write(json_string)

In [44]:
from bs4 import BeautifulSoup

# translate special HTML characters
txt1a = texts[0]['item_1a']
decoded1 = BeautifulSoup(txt1a)
txt7 = texts[0]['item_7']
decoded2 = BeautifulSoup(txt7).contents[0]
# txt7a = texts[0]['item_7a']
# decoded3 = BeautifulSoup(txt7a).contents[0]
print(decoded2)


 Item 7. Management’s Discussion and Analysis of Financial Condition and Results of Operations 

SIFCO is engaged in the production of forgings and machined and sub-assembled components primarily for the Aerospace and Defense, Energy and Commercial Space markets. The processes and services include forging, heat-treating, chemical processing and machining. The Company operates under one business segment. 

When planning and evaluating its business operations, the Company takes into consideration certain factors, including the following: (i) the projected build rate for commercial, business and military aircraft, as well as the engines that power such aircraft; (ii) the projected build rate for industrial steam and gas turbine engines; and (iii) the projected maintenance, repair and overhaul schedules for commercial, business and military aircraft, as well as the engines that power such aircraft. 

The Company operates within a cost structure that includes a significant fixed component. 

In [31]:
print(len(decoded2.contents))

1


In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
splits = text_splitter.split_text(decoded2.contents[0])
for x in splits:
    print(x)
    print()

In [50]:
def extract_table(in_string: str):
    start_idx = in_string.find('##TABLE_START')
    end_idx = in_string.find('##TABLE_END')

    if start_idx == -1 or end_idx == -1:
        return None
    
    end_idx += len('##TABLE_END')

    table_string = in_string[start_idx:end_idx]
    cut_string = in_string.replace(table_string, 'NOTE: ')
    table_string = in_string[start_idx + len('##TABLE_START'): end_idx - len('##TABLE_END')]

    if start_idx >= end_idx:
        return None
    
    return [table_string, cut_string]

out = extract_table(decoded2)
print(out[0])
print('==========================================')
print(out[1])

 (Dollars in millions) Years Ended September 30, Year Over Year Increase 

(Decrease) 

Net Sales 2023 2022 Aerospace components for: Fixed wing aircraft $ 40.1 $ 39.5 $ 0.6 Rotorcraft 16.4 15.6 0.8 Energy components for power generation units 23.0 17.4 5.6 Commercial product and other revenue 7.5 11.4 (3.9) Total $ 87.0 $ 83.9 $ 3.1 
 Item 7. Management’s Discussion and Analysis of Financial Condition and Results of Operations 

SIFCO is engaged in the production of forgings and machined and sub-assembled components primarily for the Aerospace and Defense, Energy and Commercial Space markets. The processes and services include forging, heat-treating, chemical processing and machining. The Company operates under one business segment. 

When planning and evaluating its business operations, the Company takes into consideration certain factors, including the following: (i) the projected build rate for commercial, business and military aircraft, as well as the engines that power such aircr