# SEC Finance Data Analysis

Our group chose to research financial records from the SEC of at least ten companies for a period of five years. Of those ten companies, we plan to extract 15 to 17 key elements. We will design our database in a way that allows for expansion, if necessary.

SEC Form **10-Q** is a comprehensive report of financial performance that must be submitted quarterly by all public companies to the Securities and Exchange Commission (SEC)

SEC Form **10-K** is a comprehensive report filed annually by a publicly-traded company about its financial performance and is required by the U.S. Securities and Exchange Commission (SEC). 



In [426]:
import requests
import pandas as pd

In [427]:

# Base URL to retreive company facts from SEC Data API
base_url = "https://data.sec.gov/api/xbrl/companyfacts/"

# Headers to be set to receive appropriate respnse from SEC Data API
headers = {
    'User-Agent' : 'ramkumarpj@gmail.com',
    'Host' : 'data.sec.gov'
}


In [428]:
# CIK - Central Index Key - Unique key that identifies a company in SEC Database

# List of CIKs under analysis
cik_list = ['808362', '1652044', '1637459' ]

# Data elements to be explored for each CIK

data_elements = [ 'Revenues',
                      'SalesRevenueGoodsNet',
                      'SalesRevenueServicesNet',
                      'RevenueFromContractWithCustomerIncludingAssessedTax',
                      'GrossProfit',
                      'OperatingIncomeLoss',
                      'NetIncomeLoss',
                      'ResearchAndDevelopmentExpense',
                      'SellingAndMarketingExpense',
                      'ShareBasedCompensation',
                      'Depreciation',
                      'AllocatedShareBasedCompensationExpense',
                      'CostsAndExpenses',
                      'GeneralAndAdministrativeExpense',
                      'InterestExpense',
                      'LeaseAndRentalExpense',
                      'MarketingAndAdvertisingExpense',
                      'OtherAccruedLiabilitiesCurrent',
                      'EntityCommonStockSharesOutstanding',
                      'EntityPublicFloat']




In [429]:

# Function to extract financial data from a dictionary of items (us-gaap or dei)
# Parameter - tenQ_tenK_list keeps finance data for each CIK
# Parameter - items dictonary (us-gaap or dei)

#{
#'key': ''
#'units': ''
#'10Q' : nbr of 10Qs
#'10K' : nbr of 10Ks
#}

def extractData(tenQ_tenK_list, items):
    
    for key in items.keys():
    
        if key in data_elements:
            
            i = 1
            for key2 in items[key]['units'].keys():
                
                fin_list = items[key]['units'][key2]
                tenQCount = len([i for i in fin_list if i['form'] == '10-Q'])
                tenKCount = len([i for i in fin_list if i['form'] == '10-K'])
                
                #print(f"{i}. {key} {key2} 10Qs- {tenQCount}, 10Ks - {tenKCount}")
                
                i+=1
                
                tenQ_tenK_list.append({
                    'key' : key,
                    'units' : key2,
                    '10Qs' : tenQCount,
                    '10Ks' : tenKCount,
                })

In [430]:
    
# List that holds finance data for each CIK

finance_data_analysis = []

for cik in cik_list:
    
    # Create the URL to retrieve data for specific CIK
    url = base_url + f'CIK{cik.zfill(10)}.json'

    print(url)
    
    # Fetch the data from SEC Data API
    response = requests.get(url, headers=headers).json()

    print(f"received data for company- {response['entityName']}, cik = {response['cik']}")
    
    # Get DEI Items from response
    dei = response['facts']['dei']

    # Get US-GAAP Items from response
    us_gaap = response['facts']['us-gaap']
    
    tenQ_tenK_list = []

    extractData(tenQ_tenK_list, us_gaap)
    extractData(tenQ_tenK_list, dei)
    
    
    finance_data_analysis.append(
    {
        'cik' : response['cik'],
        'company' : response['entityName'],
        'data_elements_analysis' : tenQ_tenK_list
    })


https://data.sec.gov/api/xbrl/companyfacts/CIK0000808362.json
received data for company- Baker Hughes Holdings LLC, cik = 808362
https://data.sec.gov/api/xbrl/companyfacts/CIK0001652044.json
received data for company- Alphabet Inc., cik = 1652044
https://data.sec.gov/api/xbrl/companyfacts/CIK0001637459.json
received data for company- Kraft Heinz Co, cik = 1637459


In [431]:
# Create Company DataFrame
company_df = pd.DataFrame(finance_data_analysis)
company_df = company_df.drop('data_elements_analysis', axis=1)
company_df

Unnamed: 0,cik,company
0,808362,Baker Hughes Holdings LLC
1,1652044,Alphabet Inc.
2,1637459,Kraft Heinz Co


In [432]:
# Extract the count of SEC 10-Q & SEC-10K filings 
# available for each data element for all the CIKs

data_elements_analysis = []

for element in data_elements:
    element_analysis = {
                'data_element' : element
        }
    for fin_data in finance_data_analysis:
        cik = str(fin_data['cik'])
        for data in fin_data['data_elements_analysis']:
            if data['key'] == element:
                element_analysis[cik + '_10Q'] = data['10Qs']
                element_analysis[cik + '_10K'] = data['10Ks']
        if not cik + '_10Q' in element_analysis:
            element_analysis[cik+'_10Q'] = 0
            element_analysis[cik+'_10K'] = 0
    data_elements_analysis.append(element_analysis)
                


In [433]:
df = pd.DataFrame(data_elements_analysis)
df = df.set_index('data_element')
df = df.fillna(0)

In [434]:
print(company_df)
df

       cik                    company
0   808362  Baker Hughes Holdings LLC
1  1652044              Alphabet Inc.
2  1637459             Kraft Heinz Co


Unnamed: 0_level_0,808362_10Q,808362_10K,1652044_10Q,1652044_10K,1637459_10Q,1637459_10K
data_element,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Revenues,0,0,42,15,0,0
SalesRevenueGoodsNet,90,27,0,0,34,30
SalesRevenueServicesNet,90,27,0,0,0,0
RevenueFromContractWithCustomerIncludingAssessedTax,8,0,0,0,54,46
GrossProfit,0,100,0,0,88,76
OperatingIncomeLoss,148,42,84,27,88,36
NetIncomeLoss,130,119,85,27,88,52
ResearchAndDevelopmentExpense,88,42,84,27,0,24
SellingAndMarketingExpense,0,0,84,27,0,0
ShareBasedCompensation,32,24,58,27,54,30
