Lazy DictionaryTreeBrowser #2623

ericpre · 2021-01-19T13:58:51Z

Make the DictionaryTreeBrowser lazy by default to reduce overhead when creating signal. The lazy attributes are processed when the first attribute is accessed.

Closes #368.

Progress of the PR

Minimal example of the bug fix or the new feature

from hyperspy.misc.elements import elements
from hyperspy.misc.utils import DictionaryTreeBrowser

%timeit DictionaryTreeBrowser(elements) # 4.2 us
%timeit DictionaryTreeBrowser(elements, lazy=False) # 26 ms

Note that this example can be useful to update the user guide.

codecov · 2021-01-27T22:16:23Z

Codecov Report

Merging #2623 (456dddb) into RELEASE_next_patch (a7343c6) will increase coverage by 0.02%.
The diff coverage is 98.18%.

@@                  Coverage Diff                   @@
##           RELEASE_next_patch    #2623      +/-   ##
======================================================
+ Coverage               76.90%   76.93%   +0.02%     
======================================================
  Files                     201      201              
  Lines                   29668    29706      +38     
  Branches                 6503     6514      +11     
======================================================
+ Hits                    22817    22854      +37     
- Misses                   5104     5105       +1     
  Partials                 1747     1747

Impacted Files	Coverage Δ
hyperspy/misc/utils.py	`85.44% <98.18%> (+0.90%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a7343c6...456dddb. Read the comment docs.

jlaehne

Found just a minor typo.

hyperspy/misc/utils.py

francisco-dlp · 2021-03-09T05:30:48Z

I am starting to wonder if we are not taking the DTB too far. It is really a shame that something that we added in the early days just for convenience has ended up being the major performance bottleneck in HyperSpy. At the end of the day, the main issue is that we have to process the dictionary in order to transform it into a DTB, and that takes time. A different approach would be to create a subclass of Dict with the required features. That would eliminate the need of going back and forth from DTB to dict. This is the approach taken by addict. We cannot simply replace DTB with addict, at least not until v2.0, because we would miss many features (invalid attr name browsing, hidden attributes, tree printing...). However, would subclassing addict to add the missing features be feasible / desirable?

By the way, there is a related thread in StackOverflow. The top answer suggest using a new 3.7 feature to implement something like this, dataclasses.

In the meantime, I agree that this PR is a good improvement of the situation, so I think that it would be worth to merge it in time for v1.6.2. Could you merge Rnp in? I'll try to find time to review it later this week.

ericpre · 2021-03-09T09:47:01Z

Yes, I agree, it would be good to explore more standard alternative for 2.0.

Overall, the DictionaryTreeBrowser has very useful features and I don't think it was touch much in many years, so it is doing good job! This performance issue is noticeable when stacking many signals having very large original_metadata and I guess it didn't get fixed because it was not a big issue!

jlaehne · 2021-03-16T08:45:22Z

Does this solve #2045? Would this PR be a chance to also add the requested option for making reading of original_metadata optional (with default True).

ericpre · 2021-03-16T12:47:37Z

No it doesn't fix the memory issue mentioned of #2045, because the original_metadata are still copied, the "only" speed it up.
I agree that having the option not to read original_metadata would be good and doing the same for hs.stack, shall we do this in a separate PR?

jlaehne · 2021-03-16T13:41:49Z

Yes, maybe it makes more sense to go for a separate PR.

@francisco-dlp still wanted to have a look at this one though.

… when running pytest in tests folder.

ericpre added type: enhancement status: needs review release: next patch labels Jan 19, 2021

jlaehne reviewed Mar 4, 2021

View reviewed changes

hyperspy/misc/utils.py Outdated Show resolved Hide resolved

francisco-dlp added this to the v1.6.2 milestone Mar 9, 2021

ericpre mentioned this pull request Mar 13, 2021

Add functions to search DictionaryTreeBrowser (e.g. metadata) #2633

Merged

10 tasks

jlaehne linked an issue Mar 15, 2021 that may be closed by this pull request

Large (original_) metadata significantly slowing hyperspy methods #2536

Closed

ericpre mentioned this pull request Mar 24, 2021

Luminescence specific metadata LumiSpy/lumispy#53

Closed

ericpre added 10 commits March 24, 2021 21:48

Rename hyperspy.tests.signal to hyperspy.tests.signals to avoid error…

6e6af65

… when running pytest in tests folder.

Tidy up python2 syntax.

e78a005

Add lazy support to DictionaryBrowserTree.

290c510

Fix __getitem__ method for lazy DTB.

a29518b

Fix length and iteration DTB

95ef74a

Avoid calling slugify when not necessary.

013793c

Fix setup.py.

13c4dd1

Add comments.

96752fb

Add entry to changelog.

3cb78ab

Fix typo.

456dddb

ericpre force-pushed the fix_slow_DTB branch from c068476 to 456dddb Compare March 24, 2021 21:50

francisco-dlp merged commit 228e707 into hyperspy:RELEASE_next_patch Mar 25, 2021

ericpre deleted the fix_slow_DTB branch March 27, 2021 10:58

jlaehne removed the status: needs review label Apr 11, 2021

ericpre removed the release: next patch label Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy DictionaryTreeBrowser #2623

Lazy DictionaryTreeBrowser #2623

ericpre commented Jan 19, 2021 •

edited

codecov bot commented Jan 27, 2021 •

edited

jlaehne left a comment

francisco-dlp commented Mar 9, 2021

ericpre commented Mar 9, 2021

jlaehne commented Mar 16, 2021

ericpre commented Mar 16, 2021

jlaehne commented Mar 16, 2021

Lazy DictionaryTreeBrowser #2623

Lazy DictionaryTreeBrowser #2623

Conversation

ericpre commented Jan 19, 2021 • edited

Progress of the PR

Minimal example of the bug fix or the new feature

codecov bot commented Jan 27, 2021 • edited

Codecov Report

jlaehne left a comment

Choose a reason for hiding this comment

francisco-dlp commented Mar 9, 2021

ericpre commented Mar 9, 2021

jlaehne commented Mar 16, 2021

ericpre commented Mar 16, 2021

jlaehne commented Mar 16, 2021

ericpre commented Jan 19, 2021 •

edited

codecov bot commented Jan 27, 2021 •

edited