# Data Collection: Financial Statements Data

Financial statements are essential elements for analyzing any business. Companies declare their financial statements every quarter as well as annually to relevant authorities and their own stakeholders. Before analyzing any data, it is important to collect it. There are various sources of data, though the most reliable sources are the companies official website and exchange boards. But for me, the data present on yahoo finance was sufficient for any analysis.
There are various packages available in python to pull data from yahoo finance website. Two of the most widely used are:
1. [yahoofinancials](https://pypi.org/project/yahoofinancials/)
2. [yfinance](https://pypi.org/project/yfinance/)

Eventhough, **yahoofinancials** provides more functionality, **yfinance** is wins in simplicity. The output provided of financial statements provided by both the libraries are in different formats. **yahoofinancials** provides output in JSON format and **yfinance** provides output in pandas dataframe. I need output in pandas dataframe, hence will be using **yfinance**.

In [1]:
# Import necessary packages
import os
import yfinance as yf
import pandas as pd

Since I ought to use data for learning purpose only, I chose Nifty50 stocks of Indian Equities. The *"nifty_50.csv"* in folder *annual_reports"* contains the information about those equities.
Also, yahoo finance contains data for the latest 4 years only.

In [2]:
df = pd.read_csv('./data/annual_reports/nifty_50.csv')

In [3]:
df.head()

Unnamed: 0,Company Name,Industry,Symbol,Series,ISIN Code
0,ACC Ltd.,CEMENT & CEMENT PRODUCTS,ACC,EQ,INE012A01025
1,Adani Enterprises Ltd.,METALS,ADANIENT,EQ,INE423A01024
2,Adani Green Energy Ltd.,POWER,ADANIGREEN,EQ,INE364U01010
3,Adani Transmission Ltd.,POWER,ADANITRANS,BE,INE931S01010
4,Ambuja Cements Ltd.,CEMENT & CEMENT PRODUCTS,AMBUJACEM,EQ,INE079A01024


## Balance Sheets 

In [4]:
## Supporting functions

def get_balance_sheet(ticker):
    """
    Function to get balance sheet for a ticker.
    """
    ticker = yf.Ticker(f"{ticker}.NS")
    return ticker.balance_sheet

def save_balance_sheet(ticker, balance_sheet):
    """
    Function to save balance sheet for the given ticker.
    """
    # Check if the given ticker folder exists
    folder_check = os.path.isdir(f"./data/annual_reports/{ticker}")
    if not folder_check: # If ticker folder is not present, then create one.
        os.mkdir(f"./data/annual_reports/{ticker}")
    # Save the balance sheet to the ticker folder
    balance_sheet.to_csv(f"./data/annual_reports/{ticker}/balance_sheet.csv")

In yahoo finance, Indian equity tickers are altered a bit. To access data from NSE, append "NS" at the end separated by ".". Similarly, to get data from BSE, append "BO" at the end separated by ".". For example, to get data for CDSL from NSE, ticker name should be "CDSL.NS". To get data from BSE, use "CDSL.BO".

In [6]:
# Get balance sheet for one ticker
get_balance_sheet('ACC')

Unnamed: 0,2020-12-31,2019-12-31,2018-12-31,2017-12-31
Intangible Assets,459800000.0,342700000.0,374200000.0,400300000.0
Capital Surplus,8450300000.0,8450300000.0,8450300000.0,8450300000.0
Total Liab,54978600000.0,55890500000.0,55210200000.0,54870100000.0
Total Stockholder Equity,126991300000.0,115437700000.0,105319000000.0,93558500000.0
Minority Interest,32400000.0,31600000.0,30300000.0,28800000.0
Other Current Liab,21085200000.0,20110700000.0,16136600000.0,23137600000.0
Total Assets,182002300000.0,171359800000.0,160559500000.0,148457400000.0
Common Stock,1877900000.0,1877900000.0,1877900000.0,1877900000.0
Other Current Assets,4184600000.0,6042500000.0,8275200000.0,9089300000.0
Retained Earnings,116628200000.0,105101200000.0,94988800000.0,83228300000.0


In [7]:
# Get balance sheet data for nifty 50
for _, row in df.iterrows():
    ticker = row["Symbol"]
    try:
        ticker_balance_sheet = get_balance_sheet(ticker)
        save_balance_sheet(ticker, ticker_balance_sheet)
    except:
        print(f"Could not fetch data for: {ticker}")

## Income Statements

In [8]:
## Supporting functions

def get_income_stmt(ticker):
    """
    Function to get balance sheet for a ticker.
    """
    ticker = yf.Ticker(f"{ticker}.NS")
    return ticker.financials

def save_income_stmt(ticker, income_stmt):
    """
    Function to save balance sheet for the given ticker.
    """
    # Check if the given ticker folder exists
    folder_check = os.path.isdir(f"./data/annual_reports/{ticker}")
    if not folder_check: # If ticker folder is not present, then create one.
        os.mkdir(f"./data/annual_reports/{ticker}")
    # Save the balance sheet to the ticker folder
    income_stmt.to_csv(f"./data/annual_reports/{ticker}/income_stmt.csv")

In [9]:
get_income_stmt('ACC')

Unnamed: 0,2020-12-31,2019-12-31,2018-12-31,2017-12-31
Research Development,,,,
Effect Of Accounting Charges,,,,
Income Before Tax,17088500000.0,20525200000.0,15101100000.0,13100600000.0
Minority Interest,32400000.0,31600000.0,30300000.0,28800000.0
Net Income,14301800000.0,13774100000.0,15204700000.0,9244100000.0
Selling General Administrative,43827100000.0,51483200000.0,50246200000.0,44905400000.0
Gross Profit,85683400000.0,96518000000.0,92960700000.0,84331000000.0
Ebit,17363200000.0,18019500000.0,15112400000.0,12335500000.0
Operating Income,17363200000.0,18019500000.0,15112400000.0,12335500000.0
Other Operating Expenses,18104700000.0,20950900000.0,21569900000.0,20653900000.0


In [10]:
# Get income statements data for nifty 50
for _, row in df.iterrows():
    ticker = row["Symbol"]
    try:
        ticker_income_stmt = get_income_stmt(ticker)
        save_income_stmt(ticker, ticker_income_stmt)
    except:
        print(f"Could not fetch data for: {ticker}")

## Using yahoofinancials package

In [11]:
from yahoofinancials import YahooFinancials

ticker = 'CDSL.NS'
yahoo_financials = YahooFinancials(ticker)

balance_sheet_data_qt = yahoo_financials.get_financial_stmts('annual', 'balance')
income_statement_data_qt = yahoo_financials.get_financial_stmts('annual', 'income')

In [12]:
balance_sheet_data_qt

{'balanceSheetHistory': {'CDSL.NS': [{'2021-03-31': {'intangibleAssets': 178100000,
     'totalLiab': 1639945000,
     'totalStockholderEquity': 8772211000,
     'minorityInterest': 427834000,
     'otherCurrentLiab': 1257318000,
     'totalAssets': 10839990000,
     'commonStock': 1045000000,
     'otherCurrentAssets': 67478000,
     'retainedEarnings': 7727211000,
     'otherLiab': 47522000,
     'otherAssets': 158379000,
     'cash': 418878000,
     'totalCurrentLiabilities': 1591358000,
     'deferredLongTermAssetCharges': 229000,
     'propertyPlantEquipment': 784054000,
     'totalCurrentAssets': 8054239000,
     'longTermInvestments': 1665218000,
     'netTangibleAssets': 8594111000,
     'shortTermInvestments': 1858093000,
     'netReceivables': 567963000,
     'accountsPayable': 126989000}},
   {'2020-03-31': {'intangibleAssets': 28333000,
     'totalLiab': 974756000,
     'totalStockholderEquity': 7239946000,
     'minorityInterest': 418523000,
     'otherCurrentLiab': 543520

In [13]:
income_statement_data_qt

{'incomeStatementHistory': {'CDSL.NS': [{'2021-03-31': {'researchDevelopment': None,
     'effectOfAccountingCharges': None,
     'incomeBeforeTax': 2595234000,
     'minorityInterest': 427834000,
     'netIncome': 2003405000,
     'sellingGeneralAdministrative': 445024000,
     'grossProfit': 3988116000,
     'ebit': 2674039000,
     'operatingIncome': 2674039000,
     'otherOperatingExpenses': 777048000,
     'interestExpense': -230000,
     'extraordinaryItems': None,
     'nonRecurring': None,
     'otherItems': None,
     'incomeTaxExpense': 582561000,
     'totalRevenue': 3988116000,
     'totalOperatingExpenses': 1314077000,
     'costOfRevenue': 0,
     'totalOtherIncomeExpenseNet': -78805000,
     'discontinuedOperations': None,
     'netIncomeFromContinuingOps': 2012673000,
     'netIncomeApplicableToCommonShares': 2003405000}},
   {'2020-03-31': {'researchDevelopment': None,
     'effectOfAccountingCharges': None,
     'incomeBeforeTax': 1364478000,
     'minorityInterest': 