Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix memory leak in SIOReader #99

Merged
merged 1 commit into from
Aug 11, 2020
Merged

fix memory leak in SIOReader #99

merged 1 commit into from
Aug 11, 2020

Conversation

gaede
Copy link
Contributor

@gaede gaede commented Aug 11, 2020

BEGINRELEASENOTES

ENDRELEASENOTES

 - delete previous Event and RunHeader before reading a new one
 - this restores the old expected behavior of LCIO
 - for parallel event reading one needs to use the MT::LCReader directly and
   handle the memory accordingly (e.g. via use of std::unique_ptr)
@gaede gaede merged commit d4baf6c into iLCSoft:master Aug 11, 2020
@gaede gaede deleted the fix_memory branch August 11, 2020 10:00
Copy link

@jstrube jstrube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That fixes the memory leak, but now it's not thread safe. Not sure if it was before, I haven't had a chance to test, yet.

@gaede
Copy link
Contributor Author

gaede commented Aug 12, 2020

The SIOReader implementing the IO:LCReader interface cannot be thread-safe and handle the memory for the user at the same time.
For parallel processing one needs to use the MT::LCReader directly and handle the event memory in the user code ( see the release notes of #99)...

@Romendakil
Copy link

Apparently exactly this commit breaks the LCIO Interface of WHIZARD. Trying to delete the _currentEvent leads to a crash. Not sure whether the commit is wrong, or this is an error in our interface since the beginning.

@jstrube
Copy link

jstrube commented Sep 15, 2020

Hmm, I don't observe a crash, but the commit surely doesn't guard against double deletes. The if statements are redundant (a delete on a nullptr is a no-op, see also https://stackoverflow.com/questions/6731331/is-it-still-safe-to-delete-nullptr-in-c0x). To guard against double deletes, one could set the pointer to nullptr after the delete. If I understand correctly, the segfault is only after reading the last event?
Maybe the destructor calls a delete on an already deleted event? Is that consistent with your error message?

@Romendakil
Copy link

Actually, it crashes whenever we try to read the first event, not the last one. You are right, the if-clause around the delete is unnecessary. I'm note sure it really is a double delete. Maybe the crash is because this pointer comes out of an external code. The error message is:

*** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007fa74103e6e7 in __GI___waitpid (pid=4737, stat_loc=stat_loc
entry=0x7fffbdeb7fe8, options=options
entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x00007fa740fa9107 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149
#2  0x00007fa73f026643 in TUnixSystem::StackTrace() () from /home/reuter/local/lib/libCore.so
#3  0x00007fa73f029034 in TUnixSystem::DispatchSignals(ESignals) () from /home/reuter/local/lib/libCore.so
#4  <signal handler called>
#5  0x00007fa741b23115 in SIO::SIOReader::readNextEvent (this=0x563b56417ed0, accessMode=0) at /home/reuter/local/packages/LCIO-02-15-03/src/cpp/src/SIO/SIOReader.cc:108
#6  0x00007fa744307f56 in read_lcio_event (lcRdr=0x563b56417ed0) at ../../../src/lcio/LCIOWrap.cpp:123
#7  0x00007fa743f7a669 in lcio_interface_MP_lcio_read_event (lcrdr_=0x7fa73c576f50, evt_=0x7fa73c576f60, ok_=0x7fffbdebab58) at lcio_interface.f90:937
#8  0x00007fa743fa2e85 in eio_lcio_MP_eio_lcio_input_i_prc (eio_=0x7fa73c576d50, eio_Sig=0x7fa7446d0288 <eio_lcio_DT_eio_lcio_tHEADER+8>, eio_Dtp=0x0, i_prc_=0x7fffbdebabc0, iostat_=0x7fffbdebabbc) at eio_lcio.f90:305
#9  0x0000563b555728dc in eio_lcio_uti_MP_eio_lcio_2 (u_=0x7fffbdebae80) at eio_lcio_uti.f90:265
#10 0x00007fa7442e932a in unit_tests_MP_test (test_proc_=0x563b555705e4 <eio_lcio_uti_MP_eio_lcio_2>, name_=0x563b557d55e4 "eio_lcio_2", description_=0x563b557d55d0 "read event contents", u_log_=0x7fffbdebaee4, results_=0x563b55a07fd0 <test_results_>, name_Len=10, description_Len=19) at unit_tests.f90:196
#11 0x0000563b55570553 in eio_lcio_ut_MP_eio_lcio_test (u_=0x7fffbdebaee4, results_=0x563b55a07fd0 <test_results_>) at eio_lcio_ut.f90:46
#12 0x0000563b54d9f35e in main_ut_IP_whizard_check (check_=0x563b55a08000 <check_>, results_=0x563b55a07fd0 <test_results_>) at main_ut.f90:676
#13 0x0000563b54d9c221 in main_ut_ (__NAGf90_main_args=0x0) at main_ut.f90:306
#14 0x0000563b54d9a488 in main (argc=4, argv=0x7fffbdebbce8) at main_ut.f90:29
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5  0x00007fa741b23115 in SIO::SIOReader::readNextEvent (this=0x563b56417ed0, accessMode=0) at /home/reuter/local/packages/LCIO-02-15-03/src/cpp/src/SIO/SIOReader.cc:108
#6  0x00007fa744307f56 in read_lcio_event (lcRdr=0x563b56417ed0) at ../../../src/lcio/LCIOWrap.cpp:123
#7  0x00007fa743f7a669 in lcio_interface_MP_lcio_read_event (lcrdr_=0x7fa73c576f50, evt_=0x7fa73c576f60, ok_=0x7fffbdebab58) at lcio_interface.f90:937
#8  0x00007fa743fa2e85 in eio_lcio_MP_eio_lcio_input_i_prc (eio_=0x7fa73c576d50, eio_Sig=0x7fa7446d0288 <eio_lcio_DT_eio_lcio_tHEADER+8>, eio_Dtp=0x0, i_prc_=0x7fffbdebabc0, iostat_=0x7fffbdebabbc) at eio_lcio.f90:305
#9  0x0000563b555728dc in eio_lcio_uti_MP_eio_lcio_2 (u_=0x7fffbdebae80) at eio_lcio_uti.f90:265
#10 0x00007fa7442e932a in unit_tests_MP_test (test_proc_=0x563b555705e4 <eio_lcio_uti_MP_eio_lcio_2>, name_=0x563b557d55e4 "eio_lcio_2", description_=0x563b557d55d0 "read event contents", u_log_=0x7fffbdebaee4, results_=0x563b55a07fd0 <test_results_>, name_Len=10, description_Len=19) at unit_tests.f90:196
#11 0x0000563b55570553 in eio_lcio_ut_MP_eio_lcio_test (u_=0x7fffbdebaee4, results_=0x563b55a07fd0 <test_results_>) at eio_lcio_ut.f90:46
#12 0x0000563b54d9f35e in main_ut_IP_whizard_check (check_=0x563b55a08000 <check_>, results_=0x563b55a07fd0 <test_results_>) at main_ut.f90:676
#13 0x0000563b54d9c221 in main_ut_ (__NAGf90_main_args=0x0) at main_ut.f90:306
#14 0x0000563b54d9a488 in main (argc=4, argv=0x7fffbdebbce8) at main_ut.f90:29
===========================================================

@jstrube
Copy link

jstrube commented Sep 15, 2020

That message is a bit weird. Line 108, the one with the error is the return. The delete happens in line 104. Is this code identical to tag 02-15-03?
Not sure how a return can segfault.

@Romendakil
Copy link

I just inserted some debug lines, in the original LCIO 02-15-03 it is line 104. BTW, I opened a new issue.

@Romendakil
Copy link

If I keep the delete statement, our code also works if I replace:
_currentEvent = _reader.readNextEvent( accessMode ).release() ;
by
EVENT::LCEvent* _currentEvent = _reader.readNextEvent( accessMode ).release() ;

@jstrube
Copy link

jstrube commented Sep 15, 2020

Not sure why that would work. You are just shadowing the member variable with a local variable with the same name.

I can't say I understand exactly what's going on, but reading this: https://en.cppreference.com/w/cpp/memory/unique_ptr, I get

The object is disposed of, using the associated deleter when either of the following happens:
the managing unique_ptr object is destroyed
the managing unique_ptr object is assigned another pointer via operator= or reset().

If that's the case, then the delete destroys the object, and then the assignment calls the destructor again?
edit/ never mind. _currentEvent is not a unique_ptr, so I misunderstood how that worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

memory issue using pyLCIO in latest version?
3 participants