Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be nicer to submissions that do not follow the XBRL standard 100% #84

Open
manusimidt opened this issue Jun 24, 2022 · 6 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@manusimidt
Copy link
Owner

Implement some functionality that allows also for parsing XBRL reports that are violating the XBRL standart. Maybe just issue a warning and continue with parsing instead of crashing completely.

(from discussion:)
Hey,

the concepts are defined in the different taxonomy schemas imported by the instance document.

For example:
The first submission you provided failed at the concept:
"in-ca:WhetherApprovalTakenFromBoardForMaterialContractsorArrangementsorTransactionsWithRelatedParty"
which is prefixed by xmlns "in-ca". This xml namespace refers to the taxonomy with namespace "http://www.icai.org/xbrl/taxonomy/2016-03-31/in-ca".
This is linked to the schema file located at https://www.mca.gov.in/XBRL/2016/07/26/Taxonomy/CnI/IN-CA/in-ca-2016-03-31.xsd.
There you can check that the above mentioned concept is really not defined.

=> Thus the creator of this filing incorrectly used this non-existing concept which is why py-xbrl crashes.

The problematic line is the following:

concept: Concept = tax.concepts[tax.name_id_map[concept_name]]

Here I just expect the tax.name_id_map to have the given concept (which it also should according to the XBRL standard).

There where several discussions bevore about "How to treat incorrect XBRL". Because many users of py-xbrl just wan't to get data out of the reports and do not care if the report could be parsed 100%.

I plan to implement a functionality which would allow you to parse submissions that are incorrect (and maybe just issue a warning).
But I am not able to work on py-xbrl until Mid July (due to university stuff).

So in the mean time i would suggest to just but a "try-catch" block around the line where it's failing.
Like the following (untested):

# get the concept object from the taxonomy
tax = taxonomy.get_taxonomy(taxonomy_ns)
if tax is None: tax = _load_common_taxonomy(cache, taxonomy_ns, taxonomy)

try:
    concept: Concept = tax.concepts[tax.name_id_map[concept_name]]
    context: AbstractContext = context_dir[fact_elem.attrib['contextRef'].strip()]
except ValueError:
    print(f"All facts with concept {concept_name} will be ignored, due to invalid concept definition")
    continue

Originally posted by @manusimidt in #83 (reply in thread)

@manusimidt manusimidt self-assigned this Jun 24, 2022
@manusimidt manusimidt added the enhancement New feature or request label Jun 24, 2022
@manusimidt
Copy link
Owner Author

This would affect mainly the instance module, but there both XBRL and iXBRL parsing is affected since these are separate functions.

@PotatoProgrammer20
Copy link

Hi,

Thanks for the idea, I did some code changes, added a try catch block as you suggested and used beautiful soup to fetch the wrongly filed data from the XML file directly.

Here is my code:

    try:
        concept: Concept = tax.concepts[tax.name_id_map[concept_name]]
        context: AbstractContext = context_dir[fact_elem.attrib['contextRef'].strip()]
    except KeyError:
        print(f"\nAll facts with concept \t" + concept_name + "\t will be ignored, due to invalid concept definition\n")
        #print (f"this is the path \n", instance_path)

        from bs4 import BeautifulSoup

        file = open(instance_path,"r", encoding="utf-8")
        contents = file.read()
        soup = BeautifulSoup(contents, 'xml')
        tag_list = soup.find_all()
        for tag in tag_list:
            if tag.name == concept_name:
                print("This is the wrongly filed concept :\n" + concept_name + "\nThis is it's data:\n" + tag.text)
        continue

Now, i am getting the values on terminal, sure. But the final result is in dataframe,
How do i append this result to the final dataframe? can you please help?

image

(this is my terminal result hope you are able to see this)

Thanks and regards.

@PotatoProgrammer20
Copy link

Is there a way where i can integrate this result for wrongly filed concept names and add them to the "facts"

There seem to be many changes in parameters of the functions so I would rather wait for you to give an update regarding this.

Thanks

manusimidt added a commit that referenced this issue Jul 11, 2022
@manusimidt
Copy link
Owner Author

This change is now live in version 2.2.0

@manusimidt
Copy link
Owner Author

This could also apply to context id's (see #86)

@manusimidt manusimidt reopened this Aug 21, 2022
@manusimidt
Copy link
Owner Author

This could also apply to missing or not locatable taxonomies. #112 #76

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants