Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy DictionaryTreeBrowser #2623

Merged
merged 10 commits into from Mar 25, 2021

Conversation

ericpre
Copy link
Member

@ericpre ericpre commented Jan 19, 2021

Make the DictionaryTreeBrowser lazy by default to reduce overhead when creating signal. The lazy attributes are processed when the first attribute is accessed.

Closes #368.

Progress of the PR

  • Make DictionaryTreeBrowser lazy by default,
  • update docstring,
  • add entry to CHANGES.rst,
  • add tests,
  • ready for review.

Minimal example of the bug fix or the new feature

from hyperspy.misc.elements import elements
from hyperspy.misc.utils import DictionaryTreeBrowser

%timeit DictionaryTreeBrowser(elements) # 4.2 us
%timeit DictionaryTreeBrowser(elements, lazy=False) # 26 ms

Note that this example can be useful to update the user guide.

@codecov
Copy link

codecov bot commented Jan 27, 2021

Codecov Report

Merging #2623 (456dddb) into RELEASE_next_patch (a7343c6) will increase coverage by 0.02%.
The diff coverage is 98.18%.

Impacted file tree graph

@@                  Coverage Diff                   @@
##           RELEASE_next_patch    #2623      +/-   ##
======================================================
+ Coverage               76.90%   76.93%   +0.02%     
======================================================
  Files                     201      201              
  Lines                   29668    29706      +38     
  Branches                 6503     6514      +11     
======================================================
+ Hits                    22817    22854      +37     
- Misses                   5104     5105       +1     
  Partials                 1747     1747              
Impacted Files Coverage Δ
hyperspy/misc/utils.py 85.44% <98.18%> (+0.90%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a7343c6...456dddb. Read the comment docs.

Copy link
Contributor

@jlaehne jlaehne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found just a minor typo.

hyperspy/misc/utils.py Outdated Show resolved Hide resolved
@francisco-dlp
Copy link
Member

I am starting to wonder if we are not taking the DTB too far. It is really a shame that something that we added in the early days just for convenience has ended up being the major performance bottleneck in HyperSpy. At the end of the day, the main issue is that we have to process the dictionary in order to transform it into a DTB, and that takes time. A different approach would be to create a subclass of Dict with the required features. That would eliminate the need of going back and forth from DTB to dict. This is the approach taken by addict. We cannot simply replace DTB with addict, at least not until v2.0, because we would miss many features (invalid attr name browsing, hidden attributes, tree printing...). However, would subclassing addict to add the missing features be feasible / desirable?

By the way, there is a related thread in StackOverflow. The top answer suggest using a new 3.7 feature to implement something like this, dataclasses.

In the meantime, I agree that this PR is a good improvement of the situation, so I think that it would be worth to merge it in time for v1.6.2. Could you merge Rnp in? I'll try to find time to review it later this week.

@francisco-dlp francisco-dlp added this to the v1.6.2 milestone Mar 9, 2021
@ericpre
Copy link
Member Author

ericpre commented Mar 9, 2021

Yes, I agree, it would be good to explore more standard alternative for 2.0.

Overall, the DictionaryTreeBrowser has very useful features and I don't think it was touch much in many years, so it is doing good job! This performance issue is noticeable when stacking many signals having very large original_metadata and I guess it didn't get fixed because it was not a big issue!

@jlaehne
Copy link
Contributor

jlaehne commented Mar 16, 2021

Does this solve #2045? Would this PR be a chance to also add the requested option for making reading of original_metadata optional (with default True).

@ericpre
Copy link
Member Author

ericpre commented Mar 16, 2021

No it doesn't fix the memory issue mentioned of #2045, because the original_metadata are still copied, the "only" speed it up.
I agree that having the option not to read original_metadata would be good and doing the same for hs.stack, shall we do this in a separate PR?

@jlaehne
Copy link
Contributor

jlaehne commented Mar 16, 2021

Yes, maybe it makes more sense to go for a separate PR.

@francisco-dlp still wanted to have a look at this one though.

@francisco-dlp francisco-dlp merged commit 228e707 into hyperspy:RELEASE_next_patch Mar 25, 2021
@ericpre ericpre deleted the fix_slow_DTB branch March 27, 2021 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large (original_) metadata significantly slowing hyperspy methods
3 participants