# Value Investing with Python - Part 02

In part two of the **Value Investing with Python**-series the `BeautifulSoup` module is used to scrape stock/company data from https://morningstar.com. The web scraped data consists of useful metrics (like revenue, return on equity, current ratio and more) for several years (up to 10 years).
<br><br>
Example for scraped webpage: https://financials.morningstar.com/ratios/r.html?t=0P000000GY&culture=en&platform=sal
<br><br>
The function has two inputs:
* *stock_ticker*: stock ticker symbol as indicated on Morningstar
* *exchange_ticker*: exchange ticker symbol as indicated on Morningstar

The function outputs the web scraped data.

In [1]:
def get_morningstar_finkeys(stock_ticker, exchange_ticker):
    
    # Function to scrape key ratio data available on morningstar.com.
    # The function uses the BeautifulSoup package.
    
    # Input: stock and exchange ticker (as indicated on Morningstar.com) as strings.
    # Output: Pandas DataFrame with financial as well as key ratio data.

    # Import required modules:
    from bs4 import BeautifulSoup
    import pandas as pd
    import numpy as np
    import requests
    import json
    import time
    import re
    
    # Define first parts of the URLs for both financials as well as key ratios:
    financials_url_first_part = 'http://financials.morningstar.com/finan/financials/getFinancePart.html?&callback=xxx&t='
    key_ratio_url_first_part = 'http://financials.morningstar.com/finan/financials/getKeyStatPart.html?&callback=xxx&t='
    
    # Define complete URLs:
    financials_url = financials_url_first_part + exchange_ticker + ':' + stock_ticker
    key_ratio_url = key_ratio_url_first_part + exchange_ticker + ':' + stock_ticker

    # Create empty list to collect relevant data:
    list_data = []

    for part in [financials_url, key_ratio_url]:
        soup = BeautifulSoup(json.loads(re.findall(r'xxx\((.*)\)', requests.get(part).text)[0])['componentData'], 'lxml')

        for tr in soup.select('tr'):
            row_data = [td.text for td in tr.select('td, th') if td.text]

            # If empty row, skip to next item in loop:
            if row_data == []:
                continue

            # If row has less than 12 entries, add "Year" to the row:
            if len(row_data) < 12:
                row_data = ["Year"] + row_data

            list_data.append(row_data)

        # Stop time for next webscraping operation:
        time.sleep(10.0)

    # Create dataframe with both parts and handle missing values by imputing NaNs:
    df = pd.DataFrame(list_data[1:], columns = list_data[0])
    df = df.replace("—", np.nan)
            
    return df

Example:

In [2]:
apple_data = get_morningstar_finkeys("AAPL", "XNAS")

In [3]:
apple_data

Unnamed: 0,Year,2012-09,2013-09,2014-09,2015-09,2016-09,2017-09,2018-09,2019-09,2020-09,2021-09,TTM
0,Revenue USD Mil,156508,170910,182795,233715,215639,229234,265595,260174,274515,365817,386017
1,Gross Margin %,43.9,37.6,38.6,40.1,39.1,38.5,38.3,37.8,38.2,41.8,43.3
2,Operating Income USD Mil,55241,48999,52503,71230,60024,61344,70898,63930,66288,108949,119379
3,Operating Margin %,35.3,28.7,28.7,30.5,27.8,26.8,26.7,24.6,24.1,29.8,30.9
4,Net Income USD Mil,41733,37037,39510,53394,45687,48351,59531,55256,57411,94680,101935
...,...,...,...,...,...,...,...,...,...,...,...,...
91,Cash Conversion Cycle,-52.13,-44.50,-48.64,-52.97,-67.29,-75.91,-78.92,-73.76,-60.54,-52.90,-49.92
92,Receivables Turnover,19.20,14.22,11.96,13.62,13.23,13.63,12.94,11.28,14.06,17.26,19.64
93,Inventory Turnover,112.12,83.45,57.94,62.82,58.64,40.37,37.17,40.13,41.52,40.03,40.98
94,Fixed Assets Turnover,13.48,10.67,9.82,10.85,8.71,7.54,7.07,6.61,7.40,9.60,10.01


Example for downloading data for several stocks simultaneously:

In [4]:
company_information_list = [["amazon_data", "AMZN", "XNAS"], ["google_data", "GOOG", "XNAS"]]

In [5]:
data_dict = {}

In [6]:
for company_info in company_information_list:
    temporary_df = get_morningstar_finkeys(company_info[1], company_info[2])
    data_dict[company_info[0]] = temporary_df

In [7]:
data_dict["amazon_data"]

Unnamed: 0,Year,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,2019-12,2020-12,2021-12,TTM
0,Revenue USD Mil,61093,74452,88988,107006,135987,177866,232887,280522,386064,469822,477748
1,Gross Margin %,6.8,6.9,7.0,8.8,10.3,10.2,13.3,13.8,13.3,14.1,13.4
2,Operating Income USD Mil,676,745,178,2233,4186,4106,12421,14541,22899,24879,19683
3,Operating Margin %,1.1,1.0,0.2,2.1,3.1,2.3,5.3,5.2,5.9,5.3,4.1
4,Net Income USD Mil,-39,274,-241,596,2371,3033,10073,11588,21331,33364,21413
...,...,...,...,...,...,...,...,...,...,...,...,...
91,Cash Conversion Cycle,-25.33,-19.58,-13.72,-13.72,-16.10,-14.96,-12.38,-11.63,-19.70,-20.55,-10.79
92,Receivables Turnover,20.59,18.31,17.15,17.78,18.42,16.54,15.61,14.96,17.02,16.36,16.82
93,Inventory Turnover,10.33,10.32,10.54,10.53,11.24,11.62,12.16,12.83,15.11,14.30,14.06
94,Fixed Assets Turnover,10.65,8.27,6.38,5.52,5.34,4.56,4.21,3.51,3.11,2.56,2.48


In [8]:
data_dict["google_data"]

Unnamed: 0,Year,2012-12,2013-12,2014-12,2015-12,2016-12,2017-12,2018-12,2019-12,2020-12,2021-12,TTM
0,Revenue USD Mil,50175,59825,66001,74989,90272,110855,136819,161857,182527,257637,270334
1,Gross Margin %,58.9,56.8,61.1,62.4,61.1,58.9,56.5,55.6,53.6,56.9,56.9
2,Operating Income USD Mil,12760,13966,16496,19360,23716,28882,31392,35928,41224,78714,82371
3,Operating Margin %,25.4,23.3,25.0,25.8,26.3,26.1,22.9,22.2,22.6,30.6,30.5
4,Net Income USD Mil,10737,12920,14444,16348,19478,12662,30736,34343,40269,76033,74539
...,...,...,...,...,...,...,...,...,...,...,...,...
91,Cash Conversion Cycle,34.36,26.21,,,34.10,36.80,34.91,32.17,35.95,33.74,32.96
92,Receivables Turnover,7.54,7.14,7.23,7.16,7.03,6.83,6.99,7.01,6.49,7.34,8.62
93,Inventory Turnover,40.86,55.55,,,131.11,89.64,64.17,68.28,98.13,116.90,103.18
94,Fixed Assets Turnover,4.68,4.22,3.27,2.84,2.85,2.89,2.68,2.24,2.01,2.48,2.49
