Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfaults in ffi::newctype #27

Closed
niess opened this issue Dec 29, 2020 · 7 comments
Closed

segfaults in ffi::newctype #27

niess opened this issue Dec 29, 2020 · 7 comments

Comments

@niess
Copy link

niess commented Dec 29, 2020

Hello,

thank you for the previous patches.

I am currently stuck with the following issue. When running a "complex" program at some point I get segfaults in ffi::newctype. Below is a typical example of a trace obtained with gdb:

#0  0x0000555555564385 in sweeplist ()
#1  0x0000555555564475 in sweepstep ()
#2  0x00005555555658b4 in singlestep ()
#3  0x0000555555566020 in luaC_step ()
#4  0x000055555555e137 in lua_newuserdatauv ()
#5  0x00007ffff73fb4db in operator new (n=40, L=0x5555557942a8) at ../src/lua.hh:158
#6  0x00007ffff73fa83e in ffi::newctype<ast::c_type>(lua_State *, <unknown type in lib/lua/5.4/cffi.so, CU 0xca9, DIE 0x51d2>) (
    L=0x5555557942a8, args#0=<unknown type in lib/lua/5.4/cffi.so, CU 0xca9, DIE 0x51d2>) at ../src/ffi.hh:282
...

Unfortunately I could not reproduce this issue with a minimal example. The segfaults happen in several of my use cases and are always triggered by the previous sequence: ffi::newctype/sweeplist. It does not happen on the first call to ffi::newctype but rather after O(100) calls or so.

Sorry, this is not very helpful but I don't know what to check at this point? Please, let me know if there are extra values that would be meaningful to be printed out, e.g. using gdb?

When using LuaJIT/ffi I have no segfaults.

@q66
Copy link
Owner

q66 commented Dec 29, 2020

this doesn't even look like our bug, but possibly a bug in Lua itself (in its garbage collector)

you should try different versions as well, and make sure you have the latest version of 5.4 (currently 5.4.2)

@niess
Copy link
Author

niess commented Dec 30, 2020

Thanks for the hints.
I tried with Lua 5.3.5 and now the segfaults seem to be captured resulting in aborts. E.g. I get the following error messages:

free(): invalid next size (normal)
Aborted

or

malloc(): invalid size (unsorted)
Aborted

Could these messages be generated by cfii? Or by Lua itself? It would be helpful to see where in the Lua code this happens, e.g. with an error trace. But maybe that's not possible from an abort?

I could not yet pinpoint the problem to a minimal example. Maybe it is related to my application mixing Lua and direct C allocations / free? In principle I only free memory that was allocated with malloc (if no bug in my app). But, for example I have cases where I do ptr = ffi.new('void *[1]') and then I give over ptr to a C library that allocates (frees) memory in ptr[0] using malloc (free). I use ffi.gc to ensure that memory is released when ptr is garbage collected.

@q66
Copy link
Owner

q66 commented Dec 30, 2020

well, might be our bug but without a testcase there isn't really anything i can do

these messages are generated by glibc's memory allocator

@niess
Copy link
Author

niess commented Dec 31, 2020

@q66 I finally found out the reason of the memory errors in my application. It looks like structures with arrays of dimension larger than one have wrong size. E.g. the following currently fails with cffi-lua but works with LuaJIT/ffi:

local ffi = jit and require('ffi') or require('cffi')

ffi.cdef([[
struct transform {
    double matrix[3][3];
};
]])

assert(ffi.sizeof('struct transform') == 3 * 3 * ffi.sizeof('double'))

The structure has a size of 3*8=24 instead of 3*3*8 =72. I think that it really has a wrong size, i.e. ffi.sizeof likely reports the actually allocated size but the allocated size is wrong (too small). I think so because I have corrupted memory due to overwriting the heap when using such constructs.

Note that ffi.sizeof('double [3][3]') however seems to be correct, i.e. outside of a structure.

@q66
Copy link
Owner

q66 commented Dec 31, 2020

I see, that would explain it...

@q66 q66 closed this as completed in 7b0b5cc Dec 31, 2020
@q66
Copy link
Owner

q66 commented Dec 31, 2020

okay, that should be fixed now... thanks for reporting

@niess
Copy link
Author

niess commented Jan 1, 2021

Thanks for the patch. It works fine now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants