add parse_ixbrl_diskcache_version to accelarate access parsed ixbrls. #90

fhopecc · 2022-10-20T14:21:41Z

The logical content of public xbrl can be thought unchangeable lately when it is submited or publiced by the organizations, so its parsed result can be cached in disk to accelerate accessing lately. For HttpCache caches xml files in disk, not python object dumps, parse_ixbrl function is very slow for it will parsed xml to make python object(xbrl instance) again . I add parse_ixbrl_diskcache_version function, it uses diskcache to store the parse result in python object dump format. For load the python dump file is faster than get object from parsing xml. I recommend to use diskcache replacing HttpCache to cache parse results. In addition, diskcache hide the cache detail to let the user merely focus on parse xbrl file or url.

The logical content of public xbrl can be thought unchangeable, so its parsed result can be cached in disk to accelerate accessing lately. For it do not parsed xml to python object again lately.

The logical content of public xbrl can be thought unchangeable, so its parsed result can be cached in disk to accelerate accessing lately. For it do not parsed xml to python object again lately. so parse_ixbrl_diskcache_version is faster than parse_ixbrl

manusimidt · 2022-10-21T16:52:14Z

Hey, Thanks for your pull request.

I wonder what the use case for this is. py-xbrl currently uses a memory-based LRU-cache for caching taxonomies.
You can see this in the taxonomy module:

py-xbrl/xbrl/taxonomy.py

Lines 509 to 510 in 61a518e

    
           @lru_cache(maxsize=60) 
        
           def parse_taxonomy_url(schema_url: str, cache: HttpCache) -> TaxonomySchema:

This speeds up the parsing dramatically because submissions from the same year and country usually use the same taxonomies. If you for example parse 1000 xbrl documents from the SEC, the US-gaap taxonomy will only be loaded once from the file system and then cached into memory.

However, if I see correctly you want to store the python object of the XBRL instance to the file system, correct?
Most people using this library are just using it to extract the facts from the incredibly big XML files and then store it either in a json file, a csv file, into a database or into pandas dataframes. So normally you would not need the Xbrlinstance again.

manusimidt · 2022-11-18T18:16:45Z

@fhopecc Thanks for your interest in py-xbrl and your proposed contributions.
I have a hard time understanding what this pull request is about. The title add parse_ixbrl_diskcache_version to accelerate access parsed ixbrls. suggests that you want to cache something with diskcache.
However, you added many more commits to this pull request that (and correct me if I am wrong) have nothing to do with the description of the pull request.

If you want me to merge the changes from you I would ask you to use different pull requests for thematically different changes.

Thanks,
Manu

fhopecc added 2 commits October 20, 2022 14:46

cache parse_ixbrl result in disk

2f88216

The logical content of public xbrl can be thought unchangeable, so its parsed result can be cached in disk to accelerate accessing lately. For it do not parsed xml to python object again lately.

fhopecc changed the title ~~add~~ add parse_ixbrl_diskcache_version to get accelarate access parsed ixbrl Oct 21, 2022

Update requirements.txt

4dc322b

fhopecc changed the title ~~add parse_ixbrl_diskcache_version to get accelarate access parsed ixbrl~~ add parse_ixbrl_diskcache_version to accelarate access parsed ixbrls. Oct 21, 2022

fhopecc added 4 commits October 22, 2022 22:00

Update instance.py

fe30093

Add treeview for linkbase.

4252dcd

Add infer the path of tifrs schema function

fd20cfb

Update instance.py

ee82b31

去除磁碟快取

9230aa0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add parse_ixbrl_diskcache_version to accelarate access parsed ixbrls. #90

add parse_ixbrl_diskcache_version to accelarate access parsed ixbrls. #90

fhopecc commented Oct 20, 2022 •

edited

Loading

manusimidt commented Oct 21, 2022

manusimidt commented Nov 18, 2022

add parse_ixbrl_diskcache_version to accelarate access parsed ixbrls. #90

Are you sure you want to change the base?

add parse_ixbrl_diskcache_version to accelarate access parsed ixbrls. #90

Conversation

fhopecc commented Oct 20, 2022 • edited Loading

manusimidt commented Oct 21, 2022

manusimidt commented Nov 18, 2022

fhopecc commented Oct 20, 2022 •

edited

Loading