Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large (original_) metadata significantly slowing hyperspy methods #2536

Closed
thomasaarholt opened this issue Sep 5, 2020 · 4 comments · Fixed by #2623 or #2691
Closed

Large (original_) metadata significantly slowing hyperspy methods #2536

thomasaarholt opened this issue Sep 5, 2020 · 4 comments · Fixed by #2623 or #2691

Comments

@thomasaarholt
Copy link
Contributor

I experience rather significant slowdowns when I use hyperspy to either index or plot signals with large original_metadata. On my datasets times go from 280 ms to 30 sec when plotting.

I've tried creating a minimum working example, but the slowdowns are nowhere the same, and appear to be equal to the time (40 ms) needed to copy the array I'm adding to the metadata:

s = hs.signals.Signal1D([[1,2,3], [2,3,4]])
%time s.inav[0] # Wall time: 1.97 ms
s.original_metadata.test = np.random.random((80, 500, 500))
%time s.inav[0] # Wall time: 45 ms
s.original_metadata.test2 = np.random.random((80, 500, 500))
%time s.inav[0] # Wall time: 87 ms
s.metadata.test3 = np.random.random((80, 500, 500))
%time s.inav[0] # Wall time: 143 ms

While I'm not particularly concerned with these times - should the above be copying the original_metadata withinav, rather than just referring to the original original_metadata?

Here are some examples on a <40 | 500, 500> dataset nonrigidly corrected for drift with Smart Align. SA appears to save the previously processed datasets (raw data and rigidly corrected) in the original metadata.

Link to datasets here, with and without the processing done by Smart Align. I've included a large (~200 MB) and a small (~16MB) test case.

The problem is "taken care of" by calling s.original_metadata = s.original_metadata.__class__() to overwrite it with an empty metadata object, but it would be good to handle this better in the future. In particular, s.plot() is very slow.

Large example (~200 MB file)

s = hs.load("HAADF.dm4")
%time s.inav[0] # Wall time: 8.67 s
s.original_metadata = s.original_metadata.__class__()
%time s.inav[0] # Wall time: 4 ms
s = hs.load("Non-rigid Aligned HAADF.dm4")
%time s.plot() # Wall time: 32.4 s
s.original_metadata = s.original_metadata.__class__()
%time s.plot() # Wall time: 287 ms

In the attached small file, some metadata tags take much longer to display than others:
r.original_metadata.ImageList.TagGroup0.ImageTags.Parent is instant, and r.original_metadata.ImageList.TagGroup0.ImageTags.SmartAlign.Tx_store takes forever.

I'm not familiar with the DictionaryTreeBrowser, so I'm not sure what to suggest.

@ericpre
Copy link
Member

ericpre commented Sep 7, 2020

As mentioned in previous issues on original_metadata (##1398, ##2045), there may be too different issues here:

  1. copying large original_metadata, most likely what is happening when plotting or indexing. For plotting, copying original_metadata is not necessary, so it should be fairly easy to fix, indexing/slicing copying metadata is required.
  2. DictionaryTreeBrowser is slow see Speed up DictionaryBrowser #368

@tjof2 tjof2 added this to the Discussion milestone Sep 9, 2020
@ericpre ericpre modified the milestones: Discussion, v1.6.2 Mar 13, 2021
@jlaehne jlaehne linked a pull request Mar 15, 2021 that will close this issue
5 tasks
@jlaehne
Copy link
Contributor

jlaehne commented Mar 28, 2021

@ericpre can we close this issue now?

@ericpre
Copy link
Member

ericpre commented Mar 28, 2021

Not yet, we still need to make copying metadata optional and disable it when it is necessary, for example plotting, etc. I am looking at this at the moment.

@jlaehne jlaehne linked a pull request Mar 28, 2021 that will close this issue
6 tasks
@ericpre ericpre removed this from the v1.6.2 milestone Apr 9, 2021
@jlaehne jlaehne added this to the v1.6.2 milestone Apr 11, 2021
@francisco-dlp
Copy link
Member

Fixed in #2691.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants