Reading large EDF files with preload = False raises memory error #10634

arnaumanasanch · 2022-05-16T11:01:30Z

Issue/Bug

When reading in an IPython notebook a large .edf file which is 10Gb size (4 hours/ 150 channels/ 2048Hz sampling frequency) with:

raw_edf = mne.io.read_raw_edf('file.edf', preload=False)

the kernel crashes as the RAM memory, which is 12 Gb, gets full.

I thought the preload argument was loading only metadata (which should not be more than a few hundred Mbs).

Would there be a way in which I could read the raw_edf (only the metadata) and then with the get_data() method, be able to just load into memory a small piece of data (n channels and x time range) without the system failing because of the RAM being full?

If I try the same with a higher RAM (32Gb), there is no problem and once it is loaded, the RAM goes back to normal (e.g: before loading it is 1Gb, when loading it goes up to 15Gb, once loaded back to 1 Gb approx.). We need to make it functional, if possible, with the 12 Gb RAM.

For privacy reasons, I can’t share the file that I am using.

Thanks in advance.

Additional information

MNE version: e.g. 1.0.3
operating system: / Windows 10

The text was updated successfully, but these errors were encountered:

welcome · 2022-05-16T11:01:32Z

Hello! 👋 Thanks for opening your first issue here! ❤️ We will try to get back to you soon. 🚴🏽‍♂️

cbrnr · 2022-05-16T11:10:00Z

Would you be able to run mprof on your machine with 32GB RAM while loading the EDF file? Once we have the result, we can think about where to place the @profile decorator to see where this is happening.

Oh, and before that, does the problem also occur outside of a notebook (e.g. plain Python script or Python interactive interpreter, or even IPython)?

arnaumanasanch · 2022-05-16T12:36:27Z

Thanks for the rapid response.

Yes, it also occurs outside a notebook, both in plain Python script and in a Python interactive interpreter.

I have run the memory_profiler in the 32Gb RAM machine and the result is the following:

The final increment is only 19Mib (the size of the metadata I assume), but the process to get there does not only require these 19Mib, but more than 10Gbs approx. Should we go deeper in the source code?

Let me know if I can help with anything else.

agramfort · 2022-05-16T13:13:24Z

I fear you need to dig into the code and put breakpoint or using something like memory_profiler to identify the pb

…

Message ID: ***@***.***>

cbrnr · 2022-05-16T14:21:43Z

Can you do a time-based memory profile (https://github.com/pythonprofilers/memory_profiler#time-based-memory-usage)? This should at least show the 10GB spike at some point.

arnaumanasanch · 2022-05-16T14:35:36Z

Here it is:

cbrnr · 2022-05-16T14:40:43Z

Thanks! I guess the next thing to do would be to sprinkle some @profile decorators on functions that might be the culprit. This should then be reflected by time stamps in the diagram, plus the function name.

arnaumanasanch · 2022-05-16T16:47:20Z

So, I executed the memory_profile over the source code and the RAM gets filled in the _read_segment_file function from the edf.py file.
I do not copy the entire log because it is too large, just the meaningful piece:

It is in the ai loop that the RAM gets full, more specifically when defining the variable many_chunk:

many_chunk = _read_ch(fid, subtype, ch_offsets[-1] * n_read, dtype_byte, dtype).reshape(n_read, -1)

If I keep track of the RAM before and after this line execution (using psutil) the RAM gets an increase of around 10Mb at every iteration.

Let me know if you need any other information. Thanks for the help.

cbrnr · 2022-05-16T18:05:37Z

Thanks @arnaumanasanch, I'll take a look to see why this is necessary without preload (likely it isn't).

cbrnr · 2022-05-17T05:56:38Z

If anyone wants to reproduce the problem, here's a reprex that generates a large EDF file (944MB on disk, 3.6GB in RAM, but values can be adapted) and reads it with read_raw_edf():

import numpy as np
from mne.io import read_raw_edf
from pyedflib.highlevel import write_edf_quick


def write_large_edf():
    n_chans = 64
    length = 2 * 60 * 60
    fs = 1024
    write_edf_quick("large.edf", np.random.randn(n_chans, length * fs), fs)


raw = read_raw_edf("large.edf", preload=False)

Running that script with mprof run -T 0.05 test.py (assuming that the script is stored in test.py and that write_large_edf() has been called separately before) followed by mprof plot -t "" produces the following graph:

cbrnr · 2022-05-17T06:35:53Z

I think I found one place where we accidentally create a view on an array, which prevents garbage collection and therefore fills up memory. This line creates a reference to many_chunks, which should be overwritten in the next loop iteration – but it won't because of that reference. If I make a copy, memory consumption goes down a lot:

I don't think I can bring it down further (but I will check), because we need to go through the file when annotations are present. Even in the toy data file from my example, there's a channel called "EDF Annotations", and this triggers the reading.

I'll submit a PR so that you can test with your file @arnaumanasanch.

arnaumanasanch added the BUG label May 16, 2022

cbrnr mentioned this issue May 17, 2022

Use less memory when loading EDF file #10638

Merged

larsoner closed this as completed in #10638 May 17, 2022

cbrnr mentioned this issue May 17, 2022

[BUG] Memory Leak for read_raw_edf with TAL data (~20 GB) #10644

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading large EDF files with preload = False raises memory error #10634

Reading large EDF files with preload = False raises memory error #10634

arnaumanasanch commented May 16, 2022

welcome bot commented May 16, 2022

cbrnr commented May 16, 2022

arnaumanasanch commented May 16, 2022

agramfort commented May 16, 2022 via email

cbrnr commented May 16, 2022

arnaumanasanch commented May 16, 2022

cbrnr commented May 16, 2022

arnaumanasanch commented May 16, 2022 •

edited

cbrnr commented May 16, 2022

cbrnr commented May 17, 2022

cbrnr commented May 17, 2022

Reading large EDF files with preload = False raises memory error #10634

Reading large EDF files with preload = False raises memory error #10634

Comments

arnaumanasanch commented May 16, 2022

Issue/Bug

Additional information

welcome bot commented May 16, 2022

cbrnr commented May 16, 2022

arnaumanasanch commented May 16, 2022

agramfort commented May 16, 2022 via email

cbrnr commented May 16, 2022

arnaumanasanch commented May 16, 2022

cbrnr commented May 16, 2022

arnaumanasanch commented May 16, 2022 • edited

cbrnr commented May 16, 2022

cbrnr commented May 17, 2022

cbrnr commented May 17, 2022

arnaumanasanch commented May 16, 2022 •

edited