# EDGAR API

## Description 

The below code is used to query the EDGAR API and gather 20 years of balance sheet, income statement, and cash flow data for companies listed in the Wilshire 5000 Index. <br>

## Data Gathering and Prep

In [38]:
# Import packages
import pandas as pd
import requests
from ipynb.fs.full.Preprocessing_Functions import *

In [67]:
# Import S&P 500 Index data frame
wil_df = pd.read_csv("Data/Wilshire_5000_All_Holdings.csv")
# Reorder columns
wil_df = wil_df[["Ticker","Name"]]

# List of company ticker symbols
ticker = wil_df['Ticker']
# list of company names
name = wil_df['Name']

# Check data Summary
print('\n')
wil_df.info()
wil_df.head()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3221 entries, 0 to 3220
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Ticker  3221 non-null   object
 1   Name    3221 non-null   object
dtypes: object(2)
memory usage: 50.5+ KB


Unnamed: 0,Ticker,Name
0,A,Agilent Technologies Inc.
1,AA,Alcoa Corp
2,AAL,American Airlines Group Inc
3,AAME,Atlantic American Corp.
4,AAN,Aarons Company Inc (The)


A data frame consisting of the stock tickers and their company names listed in the Wilshire 5000 Index is imported.<br>

### API Check Functions

In [35]:
# EDGAR API key and guide
# key : fb1243ed5400fc7c2a39dba8a1df73d6 , User Guide : https://developer.edgar-online.com/live
appkey = 'fb1243ed5400fc7c2a39dba8a1df73d6'

# Function to check for company compatibility with the EDGAR API BalanceSheetConsolidated
def check_company(stock):
    
    # API url
    url = "https://datafied.api.edgar-online.com/v2/corefinancials/ann"
    # Query perameteres
    query = {"appkey": appkey,
             "fields": "BalanceSheetConsolidated",
             "primarysymbols": stock,
             "numperiods": "20",}

    response = requests.get(url, query)
    data = response.json()
    data = data["result"]
    data = data["rows"]
    data = [x["values"] for x in data]
    
    return data

In [41]:
# EDGAR API key and guide
# key : fb1243ed5400fc7c2a39dba8a1df73d6 , User Guide : https://developer.edgar-online.com/live
appkey = 'fb1243ed5400fc7c2a39dba8a1df73d6'

# Function to check for company compatibility with the EDGAR API IncomeStatementConsolidated
def check_company_in(stock):
    
    # API url
    url = "https://datafied.api.edgar-online.com/v2/corefinancials/ann"
    # Query perameteres
    query = {"appkey": appkey,
             "fields": "IncomeStatementConsolidated",
             "primarysymbols": stock,
             "numperiods": "20",}

    response = requests.get(url, query)
    data = response.json()
    data = data["result"]
    data = data["rows"]
    data = [x["values"] for x in data]
    
    return data

In [6]:
# Function to check for company compatibility with the EDGAR API CashFlowStatementConsolidated
def check_company_cf(stock):
    
    # API url
    url = "https://datafied.api.edgar-online.com/v2/corefinancials/ann"
    # Query perameteres
    query = {"appkey": appkey,
             "fields": "CashFlowStatementConsolidated",
             "primarysymbols": stock,
             "numperiods": "20",}

    response = requests.get(url, query)
    data = response.json()
    data = data["result"]
    data = data["rows"]
    data = [x["values"] for x in data]
    
    return data

Functions used to query the EDGAR API and check whether an inputted stock ticker is in the EDGAR database. <br>

### API Check Loops

In [7]:
# Set loop to loop 3,221 times
tickers = wil_df['Ticker'].tolist()

# List of tickers that are compatable to query for using the EDGAR API
searchable_tickers = []
# List of tickers that are not compatable to query for using the EDGAR API
non_searchable_tickers = []

# loop to create above lists
for ticker in log_progress(tickers):
    result = check_company(ticker)
    if len(result) == 0:
        non_searchable_tickers.append(ticker)
        
    else:
        searchable_tickers.append(ticker)

VBox(children=(HTML(value=''), IntProgress(value=0, max=3221)))

In [8]:
# Save search results as data frames
searchable_tickers_data = pd.DataFrame(searchable_tickers, columns = ['tickers'])
searchable_tickers_data.to_csv('Data/Data/searchable_tickers_data.csv')
non_searchable_tickers_data = pd.DataFrame(non_searchable_tickers, columns = ['tickers'])
non_searchable_tickers_data.to_csv('Data/Data/non_searchable_tickers_data.csv')

In [42]:
# Set loop to loop 3,221 times
tickers = wil_df['Ticker'].tolist()

# List of tickers that are compatable to query for using the EDGAR API
searchable_tickers_in = []
# List of tickers that are not compatable to query for using the EDGAR API
non_searchable_tickers_in = []

# loop to create above lists
for ticker in log_progress(tickers):
    result = check_company_in(ticker)
    if len(result) == 0:
        non_searchable_tickers_in.append(ticker)
        
    else:
        searchable_tickers_in.append(ticker)

VBox(children=(HTML(value=''), IntProgress(value=0, max=3221)))

In [43]:
# Save search results as data frames
searchable_tickers_in_data = pd.DataFrame(searchable_tickers_in, columns = ['tickers'])
searchable_tickers_in_data.to_csv('Data/Data/searchable_tickers_in_data.csv')
non_searchable_tickers_in_data = pd.DataFrame(non_searchable_tickers_in, columns = ['tickers'])
non_searchable_tickers_in_data.to_csv('Data/Data/non_searchable_tickers_in_data.csv')

In [9]:
# Set loop to loop 3,221 times
tickers = wil_df['Ticker'].tolist()

# List of tickers that are compatable to query for using the EDGAR API
searchable_tickers_cf = []
# List of tickers that are not compatable to query for using the EDGAR API
non_searchable_tickers_cf = []

# loop to create above lists
for ticker in log_progress(tickers):
    result = check_company_cf(ticker)
    if len(result) == 0:
        non_searchable_tickers_cf.append(ticker)
        
    else:
        searchable_tickers_cf.append(ticker)

VBox(children=(HTML(value=''), IntProgress(value=0, max=3221)))

In [10]:
# Save search results as data frames
searchable_tickers_cf_data = pd.DataFrame(searchable_tickers_cf, columns = ['tickers'])
searchable_tickers_cf_data.to_csv('Data/Data/searchable_tickers_cf_data.csv')
non_searchable_tickers_cf_data = pd.DataFrame(non_searchable_tickers_cf, columns = ['tickers'])
non_searchable_tickers_cf_data.to_csv('Data/Data/non_searchable_tickers_cf_data.csv')

For loops are used to generate lists of searchable stock tickers. <br>
Having a list of searchable stock tickers will help to query company data uninterrupted. <br>

In [45]:
# Total tickers to query the EDGAR API 
print('\n' + 'Total searchable tickers balance sheet: ' + str(len(searchable_tickers)) + '\n')
print('Total searchable tickers income statement: ' + str(len(searchable_tickers_in)) + '\n')
print('Total searchable tickers cash flow statement: ' + str(len(searchable_tickers_cf)) + '\n')


Total searchable tickers balance sheet: 3130

Total searchable tickers income statement: 3129

Total searchable tickers cash flow statement: 3129



There is a difference of one between the two lists of searchable stock tickers, the lesser will be used to retain consistency of the data.<br>

In [46]:
# Merge searchable ticker results to one data frame
data = searchable_tickers_in_data.merge(searchable_tickers_data, on = 'tickers')
data = data.merge(searchable_tickers_cf_data, on = 'tickers')
wil_df.columns = ['tickers', 'name']
data = data.merge(wil_df, on = 'tickers')

# Print data summary
print('\n')
data.info()
data.head()



<class 'pandas.core.frame.DataFrame'>
Int64Index: 3129 entries, 0 to 3128
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   tickers  3129 non-null   object
 1   name     3129 non-null   object
dtypes: object(2)
memory usage: 73.3+ KB


Unnamed: 0,tickers,name
0,A,Agilent Technologies Inc.
1,AA,Alcoa Corp
2,AAL,American Airlines Group Inc
3,AAME,Atlantic American Corp.
4,AAOI,Applied Optoelectronics Inc


### API Query Functions

In [55]:
# Function to query EDGAR API for company annual balance sheet data over 20 years (2000-2019)
def fin_1(stock, company):

    url = "https://datafied.api.edgar-online.com/v2/corefinancials/ann"
    query = {"appkey": appkey,
             "fields": "BalanceSheetConsolidated",
             "primarysymbols": stock,"numperiods": "20",}

    response = requests.get(url, query)
    data = response.json()
    data = data["result"]
    data = data["rows"]
    data = [x["values"] for x in data]

    year = 2020
    flat_list = []
    
    for sublist in data:
        year -= 1
        for item in sublist:
            item.update({"year":year})
            flat_list.append(item)

    data = pd.DataFrame(flat_list)
    data["ticker"] = stock
    data['name'] = company
    data['statement'] = 'BalanceSheet'
    data.columns = ["field", "value", "year", "company_ticker", "company_name", 'statement']
    data = data[['year','company_ticker','company_name','statement','field','value']]

    return data

In [56]:
# Function to query EDGAR API for company annual income statement data over 20 years (2000-2019)
def fin_2(stock, company):

    url = "https://datafied.api.edgar-online.com/v2/corefinancials/ann"
    query = {"appkey": appkey,
             "fields": "IncomeStatementConsolidated",
             "primarysymbols": stock,"numperiods": "20",}

    response = requests.get(url, query)
    data = response.json()
    data = data["result"]
    data = data["rows"]
    data = [x["values"] for x in data]

    year = 2020
    flat_list = []
    
    for sublist in data:
        year -= 1
        for item in sublist:
            item.update({"year":year})
            flat_list.append(item)

    data = pd.DataFrame(flat_list)
    data["ticker"] = stock
    data['name'] = company
    data['statement'] = 'IncomeStatement'
    data.columns = ["field", "value", "year", "company_ticker", "company_name", 'statement']
    data = data[['year','company_ticker','company_name','statement','field','value']]

    return data

In [57]:
# Function to query EDGAR API for company annual cash flow statement data over 20 years (2000-2019)
def fin_3(stock, company):

    url = "https://datafied.api.edgar-online.com/v2/corefinancials/ann"
    query = {"appkey": appkey,
             "fields": "CashFlowStatementConsolidated",
             "primarysymbols": stock,"numperiods": "20",}

    response = requests.get(url, query)
    data = response.json()
    data = data["result"]
    data = data["rows"]
    data = [x["values"] for x in data]

    year = 2020
    flat_list = []
    
    for sublist in data:
        year -= 1
        for item in sublist:
            item.update({"year":year})
            flat_list.append(item)

    data = pd.DataFrame(flat_list)
    data["ticker"] = stock
    data['name'] = company
    data['statement'] = 'CashFlowStatement'
    data.columns = ["field", "value", "year", "company_ticker", "company_name", 'statement']
    data = data[['year','company_ticker','company_name','statement','field','value']]

    return data

Two functions to query and return 20 years of balance sheet, income statement, and cash flow data. <br>

### API Query Loops

In [63]:
# List of tickers
tickers = data['tickers'].tolist()
# List of company names
companies = data['name'].tolist()

# List of queried company data
company_data = []

# While loop to loop through the searchable_tickers list and query the EDGAR API and return 
# annual balance sheet, income, and cash flow statement data over 20 years (2000-2019)
for ticker, company in zip(log_progress(tickers), companies): 
    try:
        result1 = fin_1(ticker, company)
        result2 = fin_2(ticker, company)
        result3 = fin_3(ticker, company)
        company_data.append(result1)
        company_data.append(result2)
        company_data.append(result3)
    except:
        break

# Convert company_data list to data frame   
df = pd.concat(company_data)

# Print data summary
print('\n')
df.head()

VBox(children=(HTML(value=''), IntProgress(value=0, max=3129)))





Unnamed: 0,year,company_ticker,company_name,statement,field,value
0,2019,A,Agilent Technologies Inc.,BalanceSheet,accountspayableandaccruedexpenses,1006000000.0
1,2019,A,Agilent Technologies Inc.,BalanceSheet,additionalpaidincapital,5311000000.0
2,2019,A,Agilent Technologies Inc.,BalanceSheet,cashandcashequivalents,1441000000.0
3,2019,A,Agilent Technologies Inc.,BalanceSheet,cashcashequivalentsandshortterminvestments,1441000000.0
4,2019,A,Agilent Technologies Inc.,BalanceSheet,commonstock,3000000.0


For loops are used to loop through the list of 3,129 stock tickers and query the EDGAR API to return 20 years of balance sheet, income statement, and cash flow data for every company. The data is converted into a data frame.<br><br> 

In [64]:
# Export dsp_df data frame to project data directory
df.to_csv('Data/2000_2019_Wilshire_5000_Financials.csv', sep = ',', encoding = 'utf-8')

The final data frame is exported to the project directory for further exploratory data analysis.<br>