-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEGFAULT on OVIS-4 head #408
Comments
To elaborate, this is completely reproducible. It happens every time for me the first time that I run ldms_ls against the ldmsd. Here is a little more info from another gdb session. I will include all of the threads' backtraces this time. Note that one of the other threads is always in write() under ibv_cmd_dereg_mr(). Maybe that is relevant?
|
@narategithub could you please take a look at this. This logic is unchanged, but I don't think we have tested recently on a PPC machine. Absent memory corruption of some kind, it's not obvious how this is happening. The initialization code is here:
Also, if the cs_init library function is not being called at fork-time, this could happen. |
I'm leaning towards memory corruption. The reason is that from the back trace, the pointer to the mutex supplied to Do we have any PPC machine to test it out? |
I can play testing monkey if you want to feed me patches with extra debugging or something. |
Hi @morrone, I have compiled, built and tested OVIS-4 top of tree on:
I cannot reproduce this problem, but will keep testing. Can you please send me your sampler config? I just loaded meminfo and ran ldms_ls against it. BTW, one thing you might try is look at your LD_LIBRARY_PATH, LDMSD_PLUGIN_PATH, and ZAP_LIBPATH and make certain you are picking up the right code. |
@tom95858 Same kernel version. I don't think paths are related, because the install was at the system level from a packaged rpm. I believe that I was using auth_munge. I think I only had meminfo enabled, but I could be wrong. I was switching between alot of different configurations. I'll build and retest tomorrow with a new config I've finally eliminated the genders stuff today. I'll try with that when I restest. |
@tom95858 well, now the problem is gone. There are a whole lot of parts that changed, so it is hard to say which part was the cause. This ticket might not have anywhere to go now, though. |
@morrone, can we close this? |
Yes, that is fine. |
I am trying to test OVIS-4 at commit 8f77cd1, and I'm getting a SEGFUALT in ldmsd as soon as I hit it the first time with ldms_ls.
This is on a ppc64le machine, using rdma transport. Running the ldmsd under gdb I see:
Backtrace of thread 118466:
The text was updated successfully, but these errors were encountered: