You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure why silx.io.dictdump.dicttoh5 takes so long to export a (simple, non-nested) dict.
From what I understand (and as suggested by the following profiling), a new dataset is recursively created for each dict key, to handle nested dicts. If so, we could circumvent the problem by avoiding creating dataset for "tree leaves", i.e in the last recursion level.
In this case I need to store an "associative array" to keep the mapping between keys and values. Maybe I should switch to something different like array of tuples before dumping ?
The text was updated successfully, but these errors were encountered:
I am pretty sure there is no choices. Leaves are still datasets. Or you think about using something else like attr?
There is maybe better data structure for your data? For example do you really need a key-value struct? Or instead few datasets with attrs, or associative arrays with columns?
For example if your %05d is well known all this data could be inside a single dataset with this key as an index instead.
Yes I think the simplest would be that I change my data structure before dumping to h5. Although there is no predictable pattern in the values (the %05d was just for example), another structure can be easy to implement without dict.
The approach of dicttoh5 is quite conservative and it should be kept as is.
I have to export some metadata in a HDF5 file. Within this metadata is a quite large
dict
ofstr
:It takes 9 seconds to export this 7500 keys dict.
Now if I export arrays instead:
it takes 38 ms.
Not sure why
silx.io.dictdump.dicttoh5
takes so long to export a (simple, non-nested) dict.From what I understand (and as suggested by the following profiling), a new dataset is recursively created for each
dict
key, to handle nested dicts. If so, we could circumvent the problem by avoiding creating dataset for "tree leaves", i.e in the last recursion level.In this case I need to store an "associative array" to keep the mapping between keys and values. Maybe I should switch to something different like array of tuples before dumping ?
The text was updated successfully, but these errors were encountered: