unresolved schemas #76

mrx23dot · 2021-10-28T17:56:52Z

Got some more unresolved schemas.

As I understand these are not real URIs, so what't the official way to resolve them?
There must be a way to look these up instead of hard coding them. Let me ask SEC.

 parsing cache/www.sec.gov/Archives/edgar/data/0000779544/000093041317004111/arkr-20170930.xml  error The taxonomy with namespace http://xbrl.sec.gov/stpr/2011-01-31 could not be found. Please check if it is imported in the schema file
 parsing cache/www.sec.gov/Archives/edgar/data/0001213660/000121390021043138/f10q0621_bimiinter_htm.xml  error The taxonomy with namespace http://xbrl.sec.gov/currency/2021 could not be found. Please check if it is imported in the schema file
 parsing cache/www.sec.gov/Archives/edgar/data/0000721693/000121390021043080/f10q0621_chinarecycling_htm.xml  error The taxonomy with namespace http://xbrl.sec.gov/currency/2021 could not be found. Please check if it is imported in the schema file
 parsing cache/www.sec.gov/Archives/edgar/data/0001066923/000121390021021762/ftft-20201231.xml  error The taxonomy with namespace http://xbrl.sec.gov/currency/2020-01-31 could not be found. Please check if it is imported in the schema file
 parsing cache/www.sec.gov/Archives/edgar/data/0000754811/000118518516005657/grow-20160930.xml  error The taxonomy with namespace http://xbrl.sec.gov/country/2016-01-31 could not be found. Please check if it is imported in the schema file
 parsing cache/www.sec.gov/Archives/edgar/data/0001316517/000121390021040913/f10q0621_kanditech_htm.xml  error The taxonomy with namespace http://xbrl.sec.gov/naics/2021 could not be found. Please check if it is imported in the schema file
 parsing cache/www.sec.gov/Archives/edgar/data/0001464790/000121390021039630/f10q0621_brileyfin_htm.xml  error The taxonomy with namespace http://xbrl.sec.gov/currency/2021 could not be found. Please check if it is imported in the schema file
 parsing cache/www.sec.gov/Archives/edgar/data/0001422892/000121390021050437/f10k2021_sinoglobalship_htm.xml  error The taxonomy with namespace http://xbrl.sec.gov/currency/2021 could not be found. Please check if it is imported in the schema file

The text was updated successfully, but these errors were encountered:

mrx23dot · 2021-11-01T20:32:35Z

SEC replied:

All XBRL taxonomies currently accepted in EDGAR filings are posted on https://www.sec.gov/info/edgar/edgartaxonomies.shtml. An XML version can be found at https://www.sec.gov/info/edgar/edgartaxonomies.xml. We do not maintain anything similar for what has historically been accepted, but you can find the information in the latest Release Notes, for example Figure 1 in https://xbrl.sec.gov/doc/releasenotes-2022-draft.pdf
Please note that STPR-2011 is no longer in use. STPR-2018 and 2021 versions are available in the first link mentioned above.

So does this mean we can poll a website once after startup to get a list, instead of hardcoding? (maybe on demand for non SEC users)

manusimidt · 2021-11-03T09:11:58Z

It would be possible to query the XML Version of the Edgar Taxonomies list instead of hardcoding it into the libary.

Keep in mind, however, that this only applies to SEC Edgar submissions. I deliberately did not include this functionality because I did not want to optimize py-xbrl for a specific XBRL source, but wanted to keep it as general as possible for all XBRL documents. If I were to include such functionality, I would decouple it modularly from the xbrl parser core modules (the core modules being instance.py, taxonomy.py and linkbase.py).

mrx23dot · 2021-11-03T11:19:27Z

Unfortunately SEC drops old items from xml, but this would still give us 10years to add anything in xml to github, which is better than doing it every year.

My proposal would be to keep existing items in taxonomy.ns_schema_map but provide a utility function that grabs the latest ones from SEC and user can extend the dict (also with custom ones).

taxonomy.ns_schema_map  # contains historical/common items

# extend base schemas, download can fail, new ones override old ones
ns_schema_map.update(get_SEC_schemas())

This eliminates user having to write the download/parsing, and for us having the update the lib every year.
We could put get_SEC_schemas into taxonomy.py, or util_sec.py

I will chase SEC up to get a list with older items, so there might be more urls in get_SEC_schemas.

My implementation: grab.txt

mrx23dot · 2021-11-08T21:07:47Z

SEC also suggested this file, contains all the old schemas, we should add these to the hard coded ones, and poll the new ones less frequently.

https://github.com/Arelle/Arelle/blob/cef828d1f14e23a24fa1971477679c471079b48f/arelle/plugin/validate/EFM/resources/edgartaxonomies/edgartaxonomies-all-years.xml

manusimidt · 2021-11-10T23:19:26Z

Yes that's a good resource. The xml file also contains the namespace-schema mapping. It is probably the best to create an abstraction layer above the core parsing modules and implement both ways simutainiously.
So basically:

First look into the taxonomy schema for the schema location
If schema url could not be located look into the a hardcoded mapping in the parser
If the namespace could still not be resolved query the mapping provided by the SEC

mrx23dot · 2021-11-11T10:33:06Z

Sounds good, I would still use my API suggestion:

hardcoded is default, unless user clears the dictionary
user can add custom (not SEC) ones
latest SEC ones can be loaded on demand (not automatically), since it can be used for non SEC too, maybe a mode='sec' for automatic fetching

I parsed arelle hardcoded ones into python dict format:
hardcoded_arelle.txt

mrx23dot · 2021-11-12T15:25:05Z

I extended with these hard coded ones in #77 for first step.

manusimidt · 2021-11-18T08:21:11Z

Thank you! I will check and merge it in the following days.

mrx23dot · 2021-11-18T09:30:16Z

I only left https protocol to ones which would auto redirect from http->https anyway, otherwise it wastes time with redirection or certificate checking.

ajmedeio · 2022-10-24T16:54:41Z

Now that we're in 2022, filings are using the 2022 taxonomies. This means the library may fail to parse filings that assume presence of those common taxonomies. @mrx23dot would you suggest using the function you wrote above (get_SEC_schemas) to fill the gaps?

manusimidt · 2022-10-24T20:31:48Z

@Ajmed I think the suggestion of @mrx23dot is quite good and i am planning to implement a similar function that queries the common taxonomy file for SEC submissions.

Normally, if every submission would strictly follow the XBRL Standard, we wouldn't need this function. But i understand that it is really helpful if you want to parse SEC submissions from all companies (also those that fail to completely comply to the standards). Also it's probably better than keeping a static list that does the namespace-to-schemaUrl mapping.

The reason why i haven't implemented it yet is that I want to implement an advanced caching for this file. The libary should not automatically download this file from the SEC Servers whenever the taxonomy module is imported. On the other hand the libary should also not download it once and never update the local copy.
It should only be downloaded when really needed and maybe cached for an hour or something like that before it gets updated by a new version from the SEC servers.

ajmedeio · 2022-10-25T00:06:09Z

@manusimidt makes sense. No rush, wouldn't want extra things entering the library that become cruft. In the meantime I've used the above code and it's successfully parsing the ill-formed filings.

manusimidt mentioned this issue May 20, 2023

Solution to frequently missing taxonomy specifications in UK submissions #112

Open

This was referenced Nov 20, 2023

xbrl.TaxonomyNotFound #120

Open

Be nicer to submissions that do not follow the XBRL standard 100% #84

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unresolved schemas #76

unresolved schemas #76

mrx23dot commented Oct 28, 2021

mrx23dot commented Nov 1, 2021

manusimidt commented Nov 3, 2021

mrx23dot commented Nov 3, 2021

mrx23dot commented Nov 8, 2021

manusimidt commented Nov 10, 2021 •

edited

Loading

mrx23dot commented Nov 11, 2021

mrx23dot commented Nov 12, 2021 •

edited

Loading

manusimidt commented Nov 18, 2021

mrx23dot commented Nov 18, 2021

ajmedeio commented Oct 24, 2022

manusimidt commented Oct 24, 2022

ajmedeio commented Oct 25, 2022

unresolved schemas #76

unresolved schemas #76

Comments

mrx23dot commented Oct 28, 2021

mrx23dot commented Nov 1, 2021

manusimidt commented Nov 3, 2021

mrx23dot commented Nov 3, 2021

mrx23dot commented Nov 8, 2021

manusimidt commented Nov 10, 2021 • edited Loading

mrx23dot commented Nov 11, 2021

mrx23dot commented Nov 12, 2021 • edited Loading

manusimidt commented Nov 18, 2021

mrx23dot commented Nov 18, 2021

ajmedeio commented Oct 24, 2022

manusimidt commented Oct 24, 2022

ajmedeio commented Oct 25, 2022

manusimidt commented Nov 10, 2021 •

edited

Loading

mrx23dot commented Nov 12, 2021 •

edited

Loading