# Datasets:

## Financial Statements

The financial dataset was obtained from SEC Edgar financial statement data set, which includes the company balance sheet, income statement and statement of cash flows. The data is provided quarterly since January 2009 to June 2024, which is the most recent dataset as of the writing of this proposal. SEC (January 2009 - June 2024). The SEC provides this data set using eXtensible Business Reporting Language (XBRL) which divides the dataset amongst many disjoint tables SEC (2024). In order to provide the Large Language model with a single set of tables we will use the following helper tool to process the dataset into a single data frame HansjoergW (2024).From this statement we will then use the following formulas to calculate a comprehensive set of financial ratios that will be provided. From this we will be able to create a dataset similar to that used in Kim et al. (2024).

Github Repo: https://github.com/HansjoergW/sec-fincancial-statement-data-set/tree/main

### Initial Setup

In [13]:
# to ensure that the logging statements are shown in juypter output, run this cell
import logging
import pandas as pd
from secfsdstools.update import update

logger = logging.getLogger()
logger.setLevel(logging.INFO)

# ensure that all columns are shown and that colum content is not cut
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.width',1000)

#Ensure database is up to date with SEC releases
# If have not run for the first time, will take a few minutes to download dataset.
update()


2024-11-28 16:18:22,692 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg


For our first milestone we have focused our efforts in obtaining information from the following Companies: AAPLE, JPMORGAN, EXXON, RATHEON and JOHNSSON & JOHNSSON. In order to query the database, we need to search using the central index key. We will use the following object to store all relevant aspects of the company.

In [14]:
from secfsdstools.c_index.companyindexreading import CompanyIndexReader

class Company:
    def __init__(self, cik):
        self.cik = cik
        self.report_reader = CompanyIndexReader.get_company_index_reader(cik=self.cik)
    
    def get_cik(self):
        return self.cik

    def get_report_reader(self):
        return self.report_reader
    
    def getAvailableReports(self):
        return list(self.report_reader.get_all_company_reports_df()['form'].unique()) 

    def getFilingList(self, reportType, startDate, endDate):
        if reportType == 'All':
            unfilteredDF = self.report_reader.get_all_company_reports_df()
        else:
            unfilteredDF = self.report_reader.get_all_company_reports_df(forms=reportType)
            
        filteredDF = unfilteredDF[(unfilteredDF.period >= startDate) & (unfilteredDF.period <= endDate)]
        return filteredDF
    

In [15]:
from secfsdstools.c_index.searching import IndexSearch

companyNames = [
    "Apple Inc",
    "Johnson & Johnson",
    "JPMorgan Chase",
    "Exxon",
    "Lockheed Martin",
    "NVIDIA CORP"
]

companyObjDict = dict()
index_search = IndexSearch.get_index_search()
for c in companyNames:
    results = index_search.find_company_by_name(c)
    print(results)
    companyObjDict[c] = Company(cik=results.iloc[0]['cik'])


2024-11-28 16:18:26,843 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg
2024-11-28 16:18:27,312 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg
2024-11-28 16:18:27,404 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg
2024-11-28 16:18:27,490 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg


        name     cik
0  APPLE INC  320193
                name     cik
0  JOHNSON & JOHNSON  200406
                  name    cik
0  JPMORGAN CHASE & CO  19617


2024-11-28 16:18:27,606 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg
2024-11-28 16:18:27,710 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg


               name    cik
0  EXXON MOBIL CORP  34088
                   name     cik
0  LOCKHEED MARTIN CORP  936468


2024-11-28 16:18:27,832 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg


          name      cik
0  NVIDIA CORP  1045810


Types of reports

1. **Form 8-K**:
   - This is an interim report filed quarterly or more frequently as needed.
   - It's used for the disclosure of material events that occur between 
regular reporting periods (like Form 10-Q or 10-K).
   - Examples include significant transactions, changes in control, bankruptcy 
filings, etc.

2. **Form 10-Q**:
   - This is a quarterly report filed by public companies to disclose their 
financial performance and position.
   - It's due 45 days after the end of each fiscal quarter (except for smaller 
companies, which have up to 60 days).
   - It includes unaudited financial statements and management discussion and 
analysis.

3. **Form DEF 14A**:
   - This is used by companies that are having a shareholder meeting to 
solicit proxies from shareholders.
   - The form must be filed at least 21 calendar days before the date of the 
meeting or the adjourned date.
   - It includes information about the meeting, management's recommendations 
for voting on proposals, and other relevant details.

4. **Form 10-K**:
   - This is an annual report that provides a comprehensive overview of a 
company's business and financial condition.
   - It must be filed within 60-90 days after the end of the fiscal year 
(depending on the size of the company).
   - Form 10-K includes audited financial statements, management discussion 
and analysis, executive compensation, governance information, etc.

5. **Form 8-K/A**:
   - This is an amended version of Form 8-K.
   - Companies use it to correct or update previously filed material 
information that has changed or become inaccurate.
   - The 'A' stands for "amended." For example, if a company filed a Form 8-K 
stating it had acquired another company, and later realized there was an error 
in the purchase price disclosed, it would file an amended Form 8-K/A to 
correct the mistake.


In [16]:
# companyObjDict["NVIDIA CORP"].getAvailableReports()
companyObjDict["NVIDIA CORP"].getFilingList(reportType=['8-K', '10-Q', 'DEF 14A', '10-K', '8-K/A'], startDate=0, endDate=20241231)

Unnamed: 0,adsh,cik,name,form,filed,period,fullPath,originFile,originFileType,url
0,0001045810-24-000262,1045810,NVIDIA CORP,8-K,20240828,20240831,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2024q3.zip,2024q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581024000262/0001045810-24-000262-index.htm
1,0001045810-24-000264,1045810,NVIDIA CORP,10-Q,20240828,20240731,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2024q3.zip,2024q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581024000264/0001045810-24-000264-index.htm
2,0001045810-24-000104,1045810,NVIDIA CORP,DEF 14A,20240514,20240630,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2024q2.zip,2024q2.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581024000104/0001045810-24-000104-index.htm
3,0001045810-24-000206,1045810,NVIDIA CORP,8-K,20240702,20240630,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2024q3.zip,2024q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581024000206/0001045810-24-000206-index.htm
4,0001045810-24-000144,1045810,NVIDIA CORP,8-K,20240607,20240531,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2024q2.zip,2024q2.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581024000144/0001045810-24-000144-index.htm
...,...,...,...,...,...,...,...,...,...,...
107,0001045810-10-000029,1045810,NVIDIA CORP,10-Q,20100830,20100731,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2010q3.zip,2010q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581010000029/0001045810-10-000029-index.htm
108,0001045810-10-000018,1045810,NVIDIA CORP,10-Q,20100521,20100430,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2010q2.zip,2010q2.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581010000018/0001045810-10-000018-index.htm
109,0001045810-10-000006,1045810,NVIDIA CORP,10-K,20100318,20100131,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2010q1.zip,2010q1.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581010000006/0001045810-10-000006-index.htm
110,0001045810-09-000036,1045810,NVIDIA CORP,10-Q,20091119,20091031,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2009q4.zip,2009q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581009000036/0001045810-09-000036-index.htm


In [17]:
filings_10k_10Q = companyObjDict["NVIDIA CORP"].getFilingList(reportType=["10-K","10-Q"], startDate=0, endDate=20241231)

filings_10k_10Q

Unnamed: 0,adsh,cik,name,form,filed,period,fullPath,originFile,originFileType,url
0,0001045810-24-000264,1045810,NVIDIA CORP,10-Q,20240828,20240731,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2024q3.zip,2024q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581024000264/0001045810-24-000264-index.htm
1,0001045810-24-000124,1045810,NVIDIA CORP,10-Q,20240529,20240430,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2024q2.zip,2024q2.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581024000124/0001045810-24-000124-index.htm
2,0001045810-24-000029,1045810,NVIDIA CORP,10-K,20240221,20240131,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2024q1.zip,2024q1.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581024000029/0001045810-24-000029-index.htm
3,0001045810-23-000227,1045810,NVIDIA CORP,10-Q,20231121,20231031,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2023q4.zip,2023q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581023000227/0001045810-23-000227-index.htm
4,0001045810-23-000175,1045810,NVIDIA CORP,10-Q,20230828,20230731,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2023q3.zip,2023q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581023000175/0001045810-23-000175-index.htm
...,...,...,...,...,...,...,...,...,...,...
56,0001045810-10-000029,1045810,NVIDIA CORP,10-Q,20100830,20100731,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2010q3.zip,2010q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581010000029/0001045810-10-000029-index.htm
57,0001045810-10-000018,1045810,NVIDIA CORP,10-Q,20100521,20100430,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2010q2.zip,2010q2.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581010000018/0001045810-10-000018-index.htm
58,0001045810-10-000006,1045810,NVIDIA CORP,10-K,20100318,20100131,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2010q1.zip,2010q1.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581010000006/0001045810-10-000006-index.htm
59,0001045810-09-000036,1045810,NVIDIA CORP,10-Q,20091119,20091031,/Users/joseluistejada/secfsdstools/data/parquet/quarter/2009q4.zip,2009q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/1045810/000104581009000036/0001045810-09-000036-index.htm


# Filtering strategy 

Stategy, use ddate == filing date, remove items with qrts != 1, used first indexed value.


In [21]:
from secfsdstools.e_collector.multireportcollecting import MultiReportCollector
nvdia_Q3_2023_adsh = "0001045810-23-000227"
periodFiled = filings_10k_10Q[filings_10k_10Q.adsh == nvdia_Q3_2023_adsh]



# load only the assets tags that are present in the 10-K report of apple in the years
# 2022 and 2012
#tag_filter = IS, BS, CF

collector: MultiReportCollector = MultiReportCollector.get_reports_by_adshs(
                                              adshs=[nvdia_Q3_2023_adsh])
rawdatabag = collector.collect()

# as expected, there are just two entries in the submission dataframe
# print(rawdatabag.sub_df, '\n')
dataFrame = rawdatabag.pre_df

dataFrame


2024-11-28 16:19:32,978 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg
2024-11-28 16:19:32,981 [INFO] parallelexecution      items to process: 1
2024-11-28 16:19:33,100 [INFO] parallelexecution      commited chunk: 0


Unnamed: 0,adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating
0,0001045810-23-000227,1,29,CP,0,H,AmendmentFlag,dei/2023,Amendment Flag,0
1,0001045810-23-000227,1,13,CP,0,H,CityAreaCode,dei/2023,City Area Code,0
2,0001045810-23-000227,1,3,CP,0,H,DocumentPeriodEndDate,dei/2023,Document Period End Date,0
3,0001045810-23-000227,1,1,CP,0,H,DocumentType,dei/2023,Document Type,0
4,0001045810-23-000227,1,9,CP,0,H,EntityAddressAddressLine1,dei/2023,"Entity Address, Address Line One",0
...,...,...,...,...,...,...,...,...,...,...
155,0001045810-23-000227,2,20,IS,0,H,WeightedAverageNumberOfSharesOutstandingDilutedDisclosureItemsAbstract,us-gaap/2023,Weighted average shares used in per share computation:,0
156,0001045810-23-000227,7,8,CF,0,H,BusinessCombinationAdvancedConsiderationWrittenOff,0001045810-23-000227,Acquisition termination cost,0
157,0001045810-23-000227,2,7,IS,0,H,BusinessCombinationAdvancedConsiderationWrittenOff,0001045810-23-000227,Acquisition termination cost,0
158,0001045810-23-000227,7,27,CF,0,H,NetProceedsPaymentsRelatedToEmployeeStockPlans,0001045810-23-000227,Proceeds related to employee stock plans,0


In [16]:
print(list(dataFrame.tag))
dataFrame.ddate.value_counts()
dataFrame.ddate.dtype

['AccountsPayableCurrent', 'AccountsPayableCurrent', 'AccruedLiabilitiesCurrent', 'AccruedLiabilitiesCurrent', 'AccumulatedOtherComprehensiveIncomeLossNetOfTax', 'AccumulatedOtherComprehensiveIncomeLossNetOfTax', 'AdditionalPaidInCapital', 'AdditionalPaidInCapital', 'AdjustmentsToAdditionalPaidInCapitalSharebasedCompensationRequisiteServicePeriodRecognitionValue', 'AdjustmentsToAdditionalPaidInCapitalSharebasedCompensationRequisiteServicePeriodRecognitionValue', 'AdjustmentsToAdditionalPaidInCapitalSharebasedCompensationRequisiteServicePeriodRecognitionValue', 'AdjustmentsToAdditionalPaidInCapitalSharebasedCompensationRequisiteServicePeriodRecognitionValue', 'AllocatedShareBasedCompensationExpense', 'AllocatedShareBasedCompensationExpense', 'AllocatedShareBasedCompensationExpense', 'AllocatedShareBasedCompensationExpense', 'AntidilutiveSecuritiesExcludedFromComputationOfEarningsPerShareAmount', 'AntidilutiveSecuritiesExcludedFromComputationOfEarningsPerShareAmount', 'AntidilutiveSecuri

dtype('int64')

In [17]:
import numpy as np
filingPeriod = np.asarray((str(periodFiled["period"].values)[1:-1]), dtype='int64')
filingPeriod

array(20231031)

In [18]:
dataFrame[(dataFrame.ddate == filingPeriod) & (dataFrame.qtrs.isin([0, 1]))]
#Saved

Unnamed: 0,adsh,tag,version,coreg,ddate,qtrs,uom,value,footnote
1,0001045810-23-000227,AccountsPayableCurrent,us-gaap/2023,,20231031,0,USD,2.380000e+09,
3,0001045810-23-000227,AccruedLiabilitiesCurrent,us-gaap/2023,,20231031,0,USD,5.472000e+09,
5,0001045810-23-000227,AccumulatedOtherComprehensiveIncomeLossNetOfTax,us-gaap/2023,,20231031,0,USD,-8.800000e+07,
7,0001045810-23-000227,AdditionalPaidInCapital,us-gaap/2023,,20231031,0,USD,1.299100e+10,
10,0001045810-23-000227,AdjustmentsToAdditionalPaidInCapitalSharebasedCompensationRequisiteServicePeriodRecognitionValue,us-gaap/2023,,20231031,1,USD,9.830000e+08,
...,...,...,...,...,...,...,...,...,...
438,0001045810-23-000227,ProductWarrantyAccrualsAndReturnProvisionsCurrent,0001045810-23-000227,,20231031,0,USD,2.990000e+08,
439,0001045810-23-000227,PurchaseObligationAndOtherCommitments,0001045810-23-000227,,20231031,0,USD,4.430000e+09,
440,0001045810-23-000227,PurchaseObligationInventoryPurchaseAndSupplyAndCapacityCommitmentRemainingMinimumAmountsCommitted,0001045810-23-000227,,20231031,0,USD,1.711000e+10,
441,0001045810-23-000227,PurchaseObligationToBePaidAfterYearFour,0001045810-23-000227,,20231031,0,USD,3.540000e+08,


# Advanced Statement Standarization


In [9]:
from secfsdstools.e_collector.reportcollecting import SingleReportCollector
from secfsdstools.e_filter.rawfiltering import ReportPeriodRawFilter, StmtRawFilter
from secfsdstools.e_presenter.presenting import StandardStatementPresenter
from secfsdstools.u_usecases.bulk_loading import default_postloadfilter
from secfsdstools.e_filter.joinedfiltering import StmtJoinedFilter
from secfsdstools.f_standardize.bs_standardize import BalanceSheetStandardizer
from secfsdstools.f_standardize.is_standardize import IncomeStatementStandardizer
from secfsdstools.f_standardize.cf_standardize import CashFlowStandardizer

bs_standardizer = BalanceSheetStandardizer()
is_standardizer = IncomeStatementStandardizer()
cf_standardizer = CashFlowStandardizer()

# initialize the search class
search = IndexSearch.get_index_search()

# create a list with all known forms
forms_list = ['10-12B', '10-12G', '10-12G/A', '10-D', '10-K', '10-K/A', '10-KT', '10-KT/A', '10-Q', '10-Q/A', '10-QT', '10-QT/A', '18-K', '20-F', '20-F/A', '20FR12B', '20FR12G', '40-F', '40-F/A', '424B1', '424B2', '424B3', '424B4', '424B5', '424B7', '425', '6-K', '6-K/A', '8-K', '8-K/A', '8-K12B', '8-K12B/A', '8-K12G3', 'ARS', 'DEF 14A', 'DEF 14C', 'DEFA14A', 'DEFC14A', 'DEFM14A', 'DEFM14C', 'DEFR14A', 'F-1', 'F-1/A', 'F-3', 'F-3/A', 'F-3ASR', 'F-4', 'F-4/A', 'N-2', 'N-2/A', 'N-2ASR', 'N-2MEF', 'N-4', 'N-4/A', 'N-6/A', 'N-CSR', 'N-CSR/A', 'N-CSRS', 'N-CSRS/A', 'NT 10-Q', 'POS 8C', 'POS AM', 'POS AMI', 'POS EX', 'POSASR', 'PRE 14A', 'PREC14A', 'PREM14A', 'PRER14A', 'PRER14C', 'S-1', 'S-1/A', 'S-11', 'S-11/A', 'S-1MEF', 'S-3', 'S-3/A', 'S-3ASR', 'S-4', 'S-4/A', 'SP 15D2']
stmt_list = ['BS', 'CF', 'CI', 'CP', 'EQ', 'IS', 'SI', 'UN']


2024-10-26 15:05:08,641 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg


In [10]:
adsh = "0001045810-23-000227"
reader = SingleReportCollector.get_report_by_adsh(adsh=adsh, stmt_filter=['BS', 'IS', 'CF'])
raw_data = reader.collect()
raw_data

2024-10-26 15:05:08,714 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg


<secfsdstools.d_container.databagmodel.RawDataBag at 0x11eff9a90>

In [11]:
filterd_data = raw_data.filter(ReportPeriodRawFilter())
raw_stmts_data = filterd_data.filter(StmtRawFilter(stmts=stmt_list))
joined_df = filterd_data.join()
report_data = joined_df.present(StandardStatementPresenter(invert_negating=True))
 # loading stardized view of BS, IS
std_joined_df = default_postloadfilter(raw_stmts_data).join()

#Standarized Balance Sheet
bs_joined_df = std_joined_df[StmtJoinedFilter(stmts=['BS'])]
bs_standardized = bs_joined_df.present(bs_standardizer)
cols = [x for x in bs_standardized.columns.tolist() if not x.endswith('error')]
bs_standardized[cols]


2024-10-26 15:05:08,863 [INFO] standardizing  start PRE processing ...
2024-10-26 15:05:08,872 [INFO] standardizing  start MAIN processing ...
2024-10-26 15:05:08,938 [INFO] standardizing  start POST processing ...
2024-10-26 15:05:08,943 [INFO] standardizing  start FINALIZE ...


Unnamed: 0,adsh,cik,name,form,fye,fy,fp,date,filed,coreg,report,ddate,qtrs,Assets,AssetsCurrent,Cash,AssetsNoncurrent,Liabilities,LiabilitiesCurrent,LiabilitiesNoncurrent,Equity,HolderEquity,RetainedEarnings,AdditionalPaidInCapital,TreasuryStockValue,TemporaryEquity,RedeemableEquity,LiabilitiesAndEquity,AssetsCheck_cat,LiabilitiesCheck_cat,EquityCheck_cat,AssetsLiaEquCheck_cat
0,0001045810-23-000227,1045810,NVIDIA CORP,10-Q,131,2024.0,Q3,2023-10-31,20231121,,4,20231031,0,54148000000.0,32658000000.0,5519000000.0,21490000000.0,20883000000.0,9101000000.0,11782000000.0,33265000000.0,33265000000.0,20360000000.0,12991000000.0,0.0,0.0,0.0,54148000000.0,0.0,0.0,0.0,0.0


In [12]:
#Standarized Income Statement
is_joined_df = std_joined_df[StmtJoinedFilter(stmts=['IS'])]
is_standardized = is_joined_df.present(is_standardizer)
cols = [x for x in is_standardized.columns.tolist() if not x.endswith('error')]
is_standardized[cols]

2024-10-26 15:05:09,002 [INFO] standardizing  start PRE processing ...
2024-10-26 15:05:09,012 [INFO] standardizing  start MAIN processing ...
2024-10-26 15:05:09,189 [INFO] standardizing  start POST processing ...
2024-10-26 15:05:09,197 [INFO] standardizing  start FINALIZE ...


Unnamed: 0,adsh,cik,name,form,fye,fy,fp,date,filed,coreg,report,ddate,qtrs,Revenues,CostOfRevenue,GrossProfit,OperatingExpenses,OperatingIncomeLoss,IncomeLossFromContinuingOperationsBeforeIncomeTaxExpenseBenefit,AllIncomeTaxExpenseBenefit,IncomeLossFromContinuingOperations,IncomeLossFromDiscontinuedOperationsNetOfTax,ProfitLoss,NetIncomeLossAttributableToNoncontrollingInterest,NetIncomeLoss,OutstandingShares,EarningsPerShare,RevCogGrossCheck_cat,GrossOpexpOpil_cat,ContIncTax_cat,ProfitLoss_cat,NetIncomeLoss_cat,EPS_cat
0,0001045810-23-000227,1045810,NVIDIA CORP,10-Q,131,2024.0,Q3,2023-10-31,20231121,,2,20231031,1,18120000000.0,4720000000.0,13400000000.0,2983000000.0,10417000000.0,10522000000.0,1279000000.0,9243000000.0,0.0,9243000000.0,0.0,9243000000.0,2468000000.0,3.75,0.0,0.0,0.0,0.0,0.0,1.0
1,0001045810-23-000227,1045810,NVIDIA CORP,10-Q,131,2024.0,Q3,2023-10-31,20231121,,2,20231031,3,38819000000.0,11309000000.0,27510000000.0,8152000000.0,19358000000.0,19712000000.0,2237000000.0,17475000000.0,0.0,17475000000.0,0.0,17475000000.0,2470000000.0,7.07,0.0,0.0,0.0,0.0,0.0,1.0


In [13]:
cols

['adsh',
 'cik',
 'name',
 'form',
 'fye',
 'fy',
 'fp',
 'date',
 'filed',
 'coreg',
 'report',
 'ddate',
 'qtrs',
 'Revenues',
 'CostOfRevenue',
 'GrossProfit',
 'OperatingExpenses',
 'OperatingIncomeLoss',
 'IncomeLossFromContinuingOperationsBeforeIncomeTaxExpenseBenefit',
 'AllIncomeTaxExpenseBenefit',
 'IncomeLossFromContinuingOperations',
 'IncomeLossFromDiscontinuedOperationsNetOfTax',
 'ProfitLoss',
 'NetIncomeLossAttributableToNoncontrollingInterest',
 'NetIncomeLoss',
 'OutstandingShares',
 'EarningsPerShare',
 'RevCogGrossCheck_cat',
 'GrossOpexpOpil_cat',
 'ContIncTax_cat',
 'ProfitLoss_cat',
 'NetIncomeLoss_cat',
 'EPS_cat']

In [14]:
#Standarized Cash Flows
cf_joined_df = std_joined_df[StmtJoinedFilter(stmts=['CF'])]
cf_standardized = cf_joined_df.present(cf_standardizer)
cols = [x for x in cf_standardized.columns.tolist() if not x.endswith('error')]
cf_standardized[cols]

2024-10-26 15:05:09,362 [INFO] standardizing  start PRE processing ...
2024-10-26 15:05:09,377 [INFO] standardizing  start MAIN processing ...
2024-10-26 15:05:09,424 [INFO] standardizing  start POST processing ...
2024-10-26 15:05:09,438 [INFO] standardizing  start FINALIZE ...


Unnamed: 0,adsh,cik,name,form,fye,fy,fp,date,filed,coreg,report,ddate,qtrs,NetCashProvidedByUsedInOperatingActivitiesContinuingOperations,NetCashProvidedByUsedInFinancingActivitiesContinuingOperations,NetCashProvidedByUsedInInvestingActivitiesContinuingOperations,NetCashProvidedByUsedInOperatingActivities,NetCashProvidedByUsedInFinancingActivities,NetCashProvidedByUsedInInvestingActivities,CashProvidedByUsedInOperatingActivitiesDiscontinuedOperations,CashProvidedByUsedInInvestingActivitiesDiscontinuedOperations,CashProvidedByUsedInFinancingActivitiesDiscontinuedOperations,EffectOfExchangeRateFinal,CashPeriodIncreaseDecreaseIncludingExRateEffectFinal,CashAndCashEquivalentsEndOfPeriod,DepreciationDepletionAndAmortization,DeferredIncomeTaxExpenseBenefit,ShareBasedCompensation,IncreaseDecreaseInAccountsPayable,IncreaseDecreaseInAccruedLiabilities,InterestPaidNet,IncomeTaxesPaidNet,PaymentsToAcquirePropertyPlantAndEquipment,ProceedsFromSaleOfPropertyPlantAndEquipment,PaymentsToAcquireInvestments,ProceedsFromSaleOfInvestments,PaymentsToAcquireBusinessesNetOfCashAcquired,ProceedsFromDivestitureOfBusinessesNetOfCashDivested,PaymentsToAcquireIntangibleAssets,ProceedsFromSaleOfIntangibleAssets,ProceedsFromIssuanceOfCommonStock,ProceedsFromStockOptionsExercised,PaymentsForRepurchaseOfCommonStock,ProceedsFromIssuanceOfDebt,RepaymentsOfDebt,PaymentsOfDividends,BaseOpAct_cat,BaseFinAct_cat,BaseInvAct_cat,NetCashContOp_cat,CashEoP_cat
0,0001045810-23-000227,1045810,NVIDIA CORP,10-Q,131,2024.0,Q3,2023-10-31,20231121,,7,20231031,3,16591000000.0,-10004000000.0,-4457000000.0,16591000000.0,-10004000000.0,-4457000000.0,0.0,0.0,0.0,0.0,2130000000.0,5519000000.0,1121000000.0,-2411000000.0,2555000000.0,1250000000.0,,,4676000000.0,,,,8001000000.0,-83000000.0,,,,,,-6874000000.0,,-1250000000.0,-296000000.0,0.0,0.0,0.0,0.0,0.0


In [1]:
import datetime
from secfsdstools.update import update
from secfsdstools.c_index.companyindexreading import CompanyIndexReader
from secfsdstools.c_index.searching import IndexSearch
from secfsdstools.e_collector.reportcollecting import SingleReportCollector

#List of All Forms
FORMS_LIST = ['10-12B', '10-12G', '10-12G/A', '10-D', '10-K', '10-K/A', '10-KT', '10-KT/A', '10-Q', '10-Q/A', '10-QT', '10-QT/A', '18-K', '20-F', '20-F/A', '20FR12B', '20FR12G', '40-F', '40-F/A', '424B1', '424B2', '424B3', '424B4', '424B5', '424B7', '425', '6-K', '6-K/A', '8-K', '8-K/A', '8-K12B', '8-K12B/A', '8-K12G3', 'ARS', 'DEF 14A', 'DEF 14C', 'DEFA14A', 'DEFC14A', 'DEFM14A', 'DEFM14C', 'DEFR14A', 'F-1', 'F-1/A', 'F-3', 'F-3/A', 'F-3ASR', 'F-4', 'F-4/A', 'N-2', 'N-2/A', 'N-2ASR', 'N-2MEF', 'N-4', 'N-4/A', 'N-6/A', 'N-CSR', 'N-CSR/A', 'N-CSRS', 'N-CSRS/A', 'NT 10-Q', 'POS 8C', 'POS AM', 'POS AMI', 'POS EX', 'POSASR', 'PRE 14A', 'PREC14A', 'PREM14A', 'PRER14A', 'PRER14C', 'S-1', 'S-1/A', 'S-11', 'S-11/A', 'S-1MEF', 'S-3', 'S-3/A', 'S-3ASR', 'S-4', 'S-4/A', 'SP 15D2']
STATEMENT_LIST = ['BS', 'CF', 'CI', 'CP', 'EQ', 'IS', 'SI', 'UN']


## Company Class: Stores information from a given CIK
class Company:
    def __init__(self, cik):
        self.cik = cik
        self.report_reader = CompanyIndexReader.get_company_index_reader(cik=self.cik)

    def get_cik(self):
        return self.cik

    def get_report_reader(self):
        return self.report_reader

    def getAvailableReports(self):
        return list(self.report_reader.get_all_company_reports_df()['form'].unique())

    def getFilingList(self, reportType, startDate, endDate):
        if reportType == 'All':
            unfilteredDF = self.report_reader.get_all_company_reports_df()
        else:
            unfilteredDF = self.report_reader.get_all_company_reports_df(forms=reportType)

        filteredDF = unfilteredDF[(unfilteredDF.period >= startDate) & (unfilteredDF.period <= endDate)]
        return filteredDF

#Downloads complete set of 10K/Q forms
if __name__ == '__main__':
    #Update DB
    print("Updating SEC DB...")
    # update()
    print("---Done.")

    #Get CIK for Each of Companies
    companyNames = [
        "Apple Inc",
        "Johnson & Johnson",
        "JPMorgan Chase",
        "Exxon",
        "Lockheed Martin",
        "NVIDIA CORP"
    ]

    #Determine Company CIK from Name
    companyObjDict = dict()
    index_search = IndexSearch.get_index_search()
    for c in companyNames:
        results = index_search.find_company_by_name(c)
        if len(results) == 1:
            print("CIK for {} : {}".format(c, results.iloc[0]['cik']))
            companyObjDict[c] = Company(cik=results.iloc[0]['cik'])
        else:
            print("-------------------------------------------------")
            print("Multiple CIK for company name {} found:".format(c))
            for index, row in results.iterrows():
                print(index, row['cik'], row['name'])
            selectedIndex = int(input("Select company index from list: "))
            companyObjDict[results.iloc[selectedIndex]['name']] = Company(cik=results.iloc[selectedIndex]['cik'])

2024-11-28 16:09:52,345 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg


Updating SEC DB...
---Done.


2024-11-28 16:09:52,824 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg
2024-11-28 16:09:52,911 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg
2024-11-28 16:09:52,997 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg


CIK for Apple Inc : 320193
CIK for Johnson & Johnson : 200406
CIK for JPMorgan Chase : 19617


2024-11-28 16:09:53,114 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg
2024-11-28 16:09:53,218 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg


CIK for Exxon : 34088
CIK for Lockheed Martin : 936468


2024-11-28 16:09:53,340 [INFO] configmgt  reading configuration from /Users/joseluistejada/.secfsdstools.cfg


CIK for NVIDIA CORP : 1045810


In [12]:
#Process numerical financial information using 10K/Q
name, obj = companyObjDict['NVIDIA CORP']
#Get latest filings last, in order to append to np array.
filingList = obj.getFilingList(reportType=['10-Q','10-K'],startDate=0, endDate=int(datetime.date.today().strftime('%Y%m%d'))).sort_values('period', ascending=True)
print("Company {} has {} available 10K/Q reports, processing...".format(name, filingList.shape[0]))
row = filingList["0001045810-23-000227"]
#IS
collector: SingleReportCollector = SingleReportCollector.get_report_by_adsh(adsh=row.adsh)
rawdatabag = collector.collect()
rawdatabag


KeyError: 'NVIDIA CORP'