Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardised Financial Data #103

Open
firmai opened this issue Jan 19, 2023 · 5 comments
Open

Standardised Financial Data #103

firmai opened this issue Jan 19, 2023 · 5 comments
Labels
discussion It is unclear whether this is a bug that really should be fixed

Comments

@firmai
Copy link

firmai commented Jan 19, 2023

How can this be used to develop standardised financial data, the tool looks promising but I am struggling to find good example, thanks so much for your work :)

@manusimidt
Copy link
Owner

Hey there,

the goal of this tool is certainly not to standardize financial data. This is basically the goal of the XBRL Standard itself. How well the data is standardized solely depends on the financial regulators and the creator of the XBRL document.

I guess your question is probably: "How can I use this tool to collect and compare data from different companies".

With py-xbrl you can basically extract any information that is tagged in an XBRL or iXBRL document. If you are not familiar with XBRL, maybe have a look at this iXBRL viewer. All values that are "clickable" are tagged with XBRL and can be read in with py-xbrl
https://www.sec.gov/ix?doc=/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm

i.e.: The following code extracts "Earning per share" from apple and Microsoft.

import logging
from xbrl.cache import HttpCache
from xbrl.instance import XbrlParser, XbrlInstance

cache: HttpCache = HttpCache('./cache')
xbrlParser = XbrlParser(cache)

subs = {
    "AAPL": "https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm",
    "MSFT": "https://www.sec.gov/Archives/edgar/data/789019/000156459022035087/msft-10q_20220930.htm"
}

for ticker in subs.keys():
    inst: XbrlInstance = xbrlParser.parse_instance(subs[ticker])

    for fact in inst.facts:
        if fact.concept.name == 'EarningsPerShareBasic':
            print(f"On {fact.context.end_date} {ticker} had an EPS of {fact.value}")

output:

On 2022-09-24 AAPL had an EPS of 6.15
On 2021-09-25 AAPL had an EPS of 5.67
On 2020-09-26 AAPL had an EPS of 3.31

On 2022-09-30 MSFT had an EPS of 2.35
On 2021-09-30 MSFT had an EPS of 2.73
On 2020-09-26 AAPL had an EPS of 3.31

With py-xbrl you can extract thousands of different facts from thousand of companies directly from the source (the actual financial report from the company) instead of going through an API.

@manusimidt manusimidt added the discussion It is unclear whether this is a bug that really should be fixed label Jan 19, 2023
@firmai
Copy link
Author

firmai commented Feb 7, 2023

Pretty damn cool, what would be the difference between what you are doing and what Ties de kok did with https://github.com/TiesdeKok/fast_xbrl_parser

It seems that you are parsing the htm file https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm

And that he is parsing the xml file: https://www.sec.gov/Archives/edgar/data/1652044/000165204423000016/goog-20221231_def.xml

Do you know if these datasets are meant to contain the same information (facts/concepts). I wonder what would be the advantage, disadvantage of using one over the other.

@rayniervanegmond
Copy link

rayniervanegmond commented Feb 7, 2023 via email

@manusimidt
Copy link
Owner

@rayniervanegmond Thank you for the great explanation! I can only agree entirely with what @rayniervanegmond said!

It is true that the SEC also provides XBRL files for iXBRL submissions. However these are converted from the original iXBRL filings, this is a service the SEC provides for compatibility reasons.

image

But I would always prefer to parse iXBRL since it has several benefits.

Regarding your second question (@firmai ):
TBH, I did not try the "fast_xbrl_parser" from "TiesdeKok". Seems like it is coded in RUST while 'py-xbrl' is purely python based.
Another great open-source library for parsing XBRL is Arelle. It offers many functionalities, way more than 'py-xbrl'. However, this vast range of functionalities also increases complexity. The goal of 'py-xbrl' was always to parse filings and get all of the data as easily as possible, never XBRL validation which is also a huge part of a proper XBRL processor.

@rayniervanegmond
Copy link

rayniervanegmond commented Feb 7, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion It is unclear whether this is a bug that really should be fixed
Projects
None yet
Development

No branches or pull requests

3 participants