In [2]:
from investment_dataset_builder import DataScraper, DataParser, DatasetBuilder
from pathlib import Path

key_path = Path().home()/'desktop'/'FinancialModellingPrep_API.txt'
with open(key_path) as file:
    api_key = file.read()

## DataScraper

You can create a DataScraper object by calling the constructor with a valid ticker symbol as follows.

In [5]:
a = DataScraper('MSFT', api_key)

# All scraped data is stored in the .data_dictionary
a.data_dictionary

# dictionary keys are as follows:
a.data_dictionary.keys()

# 'ratios' and 'metrics' are financial metrics which have been pulled from the FMP API
# 'is' is an abbreviation for income statement
# 'price' is the stock price, pulled as daily values from yahoo finance

[*********************100%***********************]  1 of 1 completed


dict_keys(['info', 'ratios', 'metrics', 'is', 'price'])

## DataParser
The DataParser class takes the DataScraper.data_dictionary as the only argument. It will then parse the data in the data_dictionary. It combines the relevant data from the financial ratios, metrics, income statements, and price into a single dataframe that is indexed via a unique identifier that details the company ticker and the financial quarter from which the data came.

In [7]:
b = DataParser(a.data_dictionary)

The info, ratios, metrics, income statements and price data can be accessed individually by calling any of the below attributes on the class. The combined dataset is called via the .final_data attribute

In [10]:
# b.info
# b.ratios
# b.metrics
# b.is_
# b.price
b.final_data

Unnamed: 0,date,period,assetTurnover,capitalExpenditureCoverageRatio,cashConversionCycle,cashFlowCoverageRatios,cashFlowToDebtRatio,cashPerShare,cashRatio,companyEquityMultiplier,...,S&P500PriceHigh,S&P500PriceLow,snpPriceRatio_1Q,snpPriceRatio_2Q,snpPriceRatio_3Q,snpPriceRatio_4Q,priceRatioRelativeToS&P_1Q,priceRatioRelativeToS&P_2Q,priceRatioRelativeToS&P_3Q,priceRatioRelativeToS&P_4Q
MSFT-Q2-2022,2022-12-31,Q2,0.144690,-1.780842,-2.541032,0.185864,0.185864,13.353241,0.191463,1.990608,...,4100.959961,3491.580078,,,,,,,,
MSFT-Q1-2022,2022-09-30,Q1,0.139311,-3.692185,-15.714859,0.384825,0.384825,14.381655,0.261864,2.072894,...,4325.279785,3610.399902,0.966226,,,,0.938877,,,
MSFT-Q4-2022,2022-06-30,Q4,0.142158,-3.584486,-6.780149,0.401975,0.401975,14.015119,0.146516,2.190679,...,4603.069824,3636.870117,0.968170,0.935471,,,1.003701,0.942352,,
MSFT-Q3-2022,2022-03-31,Q3,0.143236,-4.753933,-14.247269,0.414242,0.414242,13.967703,0.161392,2.115140,...,4818.620117,4114.649902,0.920647,0.891343,0.861239,,0.979427,0.983052,0.922965,
MSFT-Q2-2021,2021-12-31,Q2,0.151967,-2.468883,-6.924248,0.226137,0.226137,16.704730,0.265824,2.127298,...,4808.930176,4278.939941,0.972408,0.895245,0.866749,0.837476,0.958205,0.938492,0.941966,0.884390
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
MSFT-Q3-1987,1987-03-31,Q3,0.000000,0.000000,,0.000000,0.000000,,0.000000,0.000000,...,302.720001,241.279999,1.054537,1.147700,0.919926,0.928744,1.374315,1.276296,1.571690,1.702142
MSFT-Q2-1986,1986-12-31,Q2,0.000000,0.000000,,0.000000,0.000000,,0.000000,0.000000,...,254.869995,231.320007,1.141926,1.204204,1.310588,1.050487,1.513584,2.080141,1.931780,2.378885
MSFT-Q1-1986,1986-09-30,Q1,0.000000,0.000000,,0.000000,0.000000,,0.000000,0.000000,...,254.240005,228.080002,1.009327,1.152578,1.215436,1.322813,1.426013,2.158390,2.966308,2.754743
MSFT-Q4-1986,1986-06-30,Q4,0.362624,0.000000,,,,0.014443,3.481356,1.225413,...,250.130005,226.300003,1.003927,1.013291,1.157103,1.220209,0.938053,1.337675,2.024683,2.782553


## DatasetBuilder
The above two classes are useful if you are seeking to only obtain data on a single company. While this may be useful, but if you are hoping to parse company data on a company-by company basis then I recommend instead using the `Company` class from [investment-tools](https://github.com/oldhiltonian/investment-tools), as that was spcifically designed for analysis of single companies and provides good visual tools.

The DatasetBuilder class combines the DataScraper and DataParser classes and leverages further FMP API functionality to cycle through all companies on user-specified exchanges, collecting and parsing all their data and combining all the data into a single dataframe for downstream use. 

The DatasetBuilder class can be called with an optional argument `exchanges`, which must be a list of stock exchanges that must be scraped for company data. If exchanges is not passed, then it defaults to `exchanges = ['New York Stock Exchange']`. It is likely that the user may not know which stock exchanges are valid and thus cannot pass valid strings for `exchanges`. In that case, simply instatiate the class and call the `.possible_exchange_names()` method to see which exchanges can be scraped.

In [25]:
c = DatasetBuilder()

# remove the [:5] to see the full list
c.possible_exchange_names[:5]

['AMEX', 'ASE', 'ASX', 'American Stock Exchange', 'Amsterdam']

The exchanges can then be set by passing a list of valid names to the `.set_exchanges()` method.

In [21]:
c.set_exchanges(['NASDAQ'])

The dataset can then be built by simply calling `.build()` on the DatasetBuilder object. Give the method time to run, as there are often thousands of companies that must be analysed. 

In [22]:
c.build()

HUBC
item: 4900/68848
BNIXU
item: 59632/68848
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- BNIXU: No timezone found, symbol may be delisted


AttributeError: 'NoneType' object has no attribute 'drop'

The dataset can then be inspected by calling the `.dataset` attribute.

In [24]:
c.dataset

And the dataset can be saved by calling the `.save_dataset()` method and passing the desired path, including the filename. Note that for efficiency, only parquet files are accepted, and so your path must have a `.parquet` extension.

In [None]:
'''
Development notes:

Check the following tickers in DataParser as they seem to misbehave:
    WRB 
    HLM
    RXO
    MER-PK
'''