Memory usage of `dss_line()` #56

eort · 2021-11-30T17:04:11Z

Hi,

I came across this repository when I was looking for an implementation of ZAPline. Really nice that you took the effort to translate the original matlab package. I have played around with it on my data, and for most of it, it works really nicely.

However, one thing that worries me a bit, is the memory usage of the dss_line(). So far, I tried to run it on MEG data (300 channels x 1.8x10^6 samples), but there was no chance that my laptop (24GB) could offer enough memory for dss_line to finish. Only if I cropped the data considerably it worked.

Of course, eventually the scripts will run on an HPC, so memory shouldn't be a big problem there, still it is quite annoying that I can't run and check those scripts locally before exporting them to the hpc. So I was wondering whether there are some magic tricks that I can do to lower the memory footprint? Settings that I have missed or something? I tried to trace the memory consumption of the function (where I was sufficiently confident to not affect functionality), and could indeed reduce the usage here and there, but the peak is still higher than what my pc can handle.

If helpful, I can provide more information of any kind.

The text was updated successfully, but these errors were encountered:

nbara · 2021-11-30T17:26:54Z

Hi @eort

You are 100% right and this is something that I've identified already (see #50 ). One obvious thing to try is to compute the covariance by blocks. I'll try to get around to it soon (but if you want to open a PR sooner you're welcome 😉 )

eort · 2021-11-30T18:30:57Z

Ah great! Well, I doubt I can produce a reasonable PR, but I shall try to further look into what parts of the code are particularly memory-hungry. Thanks!

nbara · 2021-12-01T12:43:07Z

Hi @eort can you try the code in #57 ? I added a blocksize parameter to compute the covariances by blocks. This should speed up the computation significantly.

Let me know.

eort · 2021-12-01T14:03:06Z

Speed might be up somewhat. It's hard to say, because I don't have a good reference for a complete data file of mine. So far I always ran out of memory before it could finish. However, this time it was quite quick for my memory to be full, so I guess it is somewhat quicker :)

nbara · 2021-12-01T14:07:23Z

In that case I'll leave the PR open until you have had enough time to test it, ok?

eort · 2021-12-01T14:25:36Z

Sure! While testing it is already quite clear that the slowest part of the process (so far) are the ffts in gaussfilter:

python-meegkit/meegkit/utils/sig.py

Lines 348 to 354 in 7f8192a

    
           tmp = np.fft.fft(data, axis=0) 
        
           if data.ndim == 2: 
        
               tmp *= fx[:, None] 
        
           elif data.ndim == 3: 
        
               tmp *= fx[:, None, None] 
        
           filtdat = 2 * np.real(np.fft.ifft(tmp, axis=0))

nbara · 2021-12-01T14:31:25Z

Yes I came to the same conclusion. I have to do some FFTing here to compute the bias covariance, but at least now it's done over hopefully smaller matrices so it should not hog your memory as much 🤞

eort · 2021-12-01T16:58:24Z

Not sure it really did save memory though. I just managed to edit the code here and there to save memory, and it did finish finally! (Beautiful Psds afterwards!). Tomorrow I shall have a somewhat more systematic look which of my edits actually did save memory, and compose a PR of them. Then you can decide what of it you want to keep and what not 😄

nbara · 2021-12-01T16:59:42Z

Sounds good

eort · 2021-12-02T12:01:59Z

quick question. Does it make more sense to start the PR based on #57, or based on master?

nbara · 2021-12-02T12:09:09Z

you can base it on #57

nbara · 2021-12-02T17:39:42Z

Closed via #57 (for now)

nbara · 2021-12-02T17:39:55Z

Thanks @eort !

nbara added the enhancement New feature or request label Nov 30, 2021

nbara mentioned this issue Nov 30, 2021

ZapLine improvements #50

Open

4 tasks

nbara self-assigned this Dec 1, 2021

nbara closed this as completed Dec 2, 2021

nbara linked a pull request Dec 2, 2021 that will close this issue

Reduce memory footprint of dss_line #58

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage of `dss_line()` #56

Memory usage of `dss_line()` #56

eort commented Nov 30, 2021

nbara commented Nov 30, 2021

eort commented Nov 30, 2021

nbara commented Dec 1, 2021

eort commented Dec 1, 2021

nbara commented Dec 1, 2021

eort commented Dec 1, 2021

nbara commented Dec 1, 2021

eort commented Dec 1, 2021

nbara commented Dec 1, 2021

eort commented Dec 2, 2021

nbara commented Dec 2, 2021

nbara commented Dec 2, 2021

nbara commented Dec 2, 2021

Memory usage of dss_line() #56

Memory usage of dss_line() #56

Comments

eort commented Nov 30, 2021

nbara commented Nov 30, 2021

eort commented Nov 30, 2021

nbara commented Dec 1, 2021

eort commented Dec 1, 2021

nbara commented Dec 1, 2021

eort commented Dec 1, 2021

nbara commented Dec 1, 2021

eort commented Dec 1, 2021

nbara commented Dec 1, 2021

eort commented Dec 2, 2021

nbara commented Dec 2, 2021

nbara commented Dec 2, 2021

nbara commented Dec 2, 2021

Memory usage of `dss_line()` #56

Memory usage of `dss_line()` #56