Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGFAULT calling lua #84

Open
mickeprag opened this issue Apr 20, 2017 · 21 comments · May be fixed by #192
Open

SEGFAULT calling lua #84

mickeprag opened this issue Apr 20, 2017 · 21 comments · May be fixed by #192

Comments

@mickeprag
Copy link

mickeprag commented Apr 20, 2017

I am getting segfaults from time to time. It seems to happen when python objects are wrapped and sent to the lua runtime.
Lupa version 1.4
Lua: tested version 5.1.5, 5.2.3, and luajit 2.0.4

Backtraces of the crash:
Lua:

#0  0x00007f639189c261 in lua_type () from /usr/lib64/liblua5.2.so.0
#1  0x00007f6391ae1b21 in __pyx_f_4lupa_5_lupa_lua_object_repr ()
   from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#2  0x00007f6391adf4ac in __pyx_pf_4lupa_5_lupa_10_LuaObject_14__str__ ()
   from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#3  0x00007f6391ade33d in __pyx_pw_4lupa_5_lupa_10_LuaObject_15__str__ ()
   from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#4  0x00007f63994c974a in _PyObject_Str () from /usr/lib64/libpython2.7.so.1.0
#5  0x00007f63994dad98 in PyString_Format () from /usr/lib64/libpython2.7.so.1.0
#6  0x00007f639952735f in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#7  0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#8  0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#9  0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#10 0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#11 0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#12 0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#13 0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#14 0x00007f6399529ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#15 0x00007f63994b30cd in ?? () from /usr/lib64/libpython2.7.so.1.0
#16 0x00007f639948d333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#17 0x00007f639952365e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#18 0x00007f6399529ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#19 0x00007f63994b30cd in ?? () from /usr/lib64/libpython2.7.so.1.0
#20 0x00007f639948d333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#21 0x00007f639952365e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#22 0x00007f6399529ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#23 0x00007f63995264ae in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#24 0x00007f6399529ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#25 0x00007f63994b30cd in ?? () from /usr/lib64/libpython2.7.so.1.0
#26 0x00007f639948d333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#27 0x00007f639952365e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#28 0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#29 0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#30 0x00007f6399529ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#31 0x00007f63994b2fec in ?? () from /usr/lib64/libpython2.7.so.1.0
#32 0x00007f639948d333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#33 0x00007f639949c275 in ?? () from /usr/lib64/libpython2.7.so.1.0
#34 0x00007f639948d333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#35 0x00007f639951fd77 in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.7.so.1.0
#36 0x00007f6399558ab2 in ?? () from /usr/lib64/libpython2.7.so.1.0
#37 0x00007f639922c444 in start_thread () from /lib64/libpthread.so.0
#38 0x00007f6398f735ed in clone () from /lib64/libc.so.6

Luajit:

#0  0x00007f555d8a3c78 in lua_rawgeti () from /usr/lib64/libluajit-5.1.so.2
#1  0x00007f555db13b3e in __pyx_f_4lupa_5_lupa_10_LuaObject_push_lua_object ()
   from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#2  0x00007f555db15736 in __pyx_pf_4lupa_5_lupa_10_LuaObject_14__str__ ()
   from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#3  0x00007f555db152f7 in __pyx_pw_4lupa_5_lupa_10_LuaObject_15__str__ ()
   from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#4  0x00007f556550074a in _PyObject_Str () from /usr/lib64/libpython2.7.so.1.0
#5  0x00007f5565511d98 in PyString_Format () from /usr/lib64/libpython2.7.so.1.0
#6  0x00007f556555e35f in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#7  0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#8  0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#9  0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#10 0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#11 0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#12 0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#13 0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#14 0x00007f5565560ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#15 0x00007f55654ea0cd in ?? () from /usr/lib64/libpython2.7.so.1.0
#16 0x00007f55654c4333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#17 0x00007f556555a65e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#18 0x00007f5565560ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#19 0x00007f55654ea0cd in ?? () from /usr/lib64/libpython2.7.so.1.0
#20 0x00007f55654c4333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#21 0x00007f556555a65e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#22 0x00007f5565560ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#23 0x00007f556555d4ae in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#24 0x00007f5565560ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#25 0x00007f55654ea0cd in ?? () from /usr/lib64/libpython2.7.so.1.0
#26 0x00007f55654c4333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#27 0x00007f556555a65e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#28 0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#29 0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#30 0x00007f5565560ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#31 0x00007f55654e9fec in ?? () from /usr/lib64/libpython2.7.so.1.0
#32 0x00007f55654c4333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#33 0x00007f55654d3275 in ?? () from /usr/lib64/libpython2.7.so.1.0
#34 0x00007f55654c4333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#35 0x00007f5565556d77 in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.7.so.1.0
#36 0x00007f556558fab2 in ?? () from /usr/lib64/libpython2.7.so.1.0
#37 0x00007f5565263444 in start_thread () from /lib64/libpthread.so.0
#38 0x00007f5564faa5ed in clone () from /lib64/libc.so.6
@kmike
Copy link
Collaborator

kmike commented May 10, 2017

I know it can be hard, but could you try creating a small-ish reproducible example?

@mickeprag
Copy link
Author

I have really tried boiling this down to a simple example to reproduce this issue. I am not sure I have succeded but in the mean time I have been able to reproduce a similar segfault. Not sure if this is the same issue.
This code segfaults roughly every ~500 iterations on my computer.

from lupa import LuaRuntime, unpacks_lua_table
import threading, time

script = """
function callDone(response)
	local json = response:json()
end

function run(arg)
	local http = r()
	local request = http:get('https://httpbin.org/ip', callDone)
end

"""

class DummyResponse():
	def json(self):
		return {}

def dummyCall(**kwargs):
	time.sleep(0.1)

class Request(object):
	@unpacks_lua_table
	def get(self, url, success=None, **kwargs):
		r = PendingRequest(dummyCall, success, {'url': url})
		r.start()
		return r

class PendingRequest(threading.Thread):
	def __init__(self, fn, callback, kwargs):
		super(PendingRequest,self).__init__(name='HTTP request')
		#self.daemon = True
		self.fn = fn
		self.callback = callback
		self.kwargs = kwargs

	def run(self):
		try:
			r = self.fn(**self.kwargs)
		except Exception as e:
			print("Could not execute http request %s", e)
			return
		if self.callback is not None:
			thread = self.callback.coroutine(DummyResponse())
			try:
				thread.send(None)
			except StopIteration:
				pass
			self.callback = None

request = Request()
def r():
	return request

lua = LuaRuntime(
	unpack_returned_tuples=True,
	register_eval=False,
)
lua.globals().r = r
lua.execute(script)

for i in range(10000):
	fn = getattr(lua.globals(), 'run')
	print("Start call", i)
	thread = fn.coroutine()
	try:
		thread.send(None)
	except StopIteration:
		pass

print("Wait for shutdown")
#time.sleep(2)

@mickeprag
Copy link
Author

I think I have some more information. I think the cause is two different threads try to access the lua runtime. The above example does this intentionally but in my software this is a side effect. Let me try to explain.
I have a lua runtime running in its own isolated thread. The lua thread tries to call some python code and I send this over to the python main thread to avoid concurrency issues. The python code gets a reference to the lua runtime (but never access it directly) in a wrapped object.
When python garbage-collects this object this is done in the main thread. The reference to the lua runtime is cleared (by the python interpreter, not my code) and it's here I get a segfault.

I have not managed to create an isolated example of this but it is fairly reproducible in my project. Here is an example of the wrapper object:

class LuaFunctionWrapper(object):
	def __init__(self, cb):
		self.cb = cb  # A pointer to a lua function

	def __del__(self):
		self.cb = None  # This is where the segfault occurs

The segfault happens even without the destructor. I added the destructor with an implicit "freeing" of cb to verify the backtrace that the segfault happens there.

@mickeprag
Copy link
Author

Backtrace of the above observations:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff72ffd700 (LWP 32133)]
0x00007fffef133d82 in lua_rawgeti (L=L@entry=0x7fff60004ef0, idx=idx@entry=-1001000, n=n@entry=0) at lapi.c:654
654       setobj2s(L, L->top, luaH_getint(hvalue(t), n));
(gdb) py-bt
Traceback (most recent call first):
  File "/home/micke/Documents/dev/telldus/tellstick-server/lua/src/lua/LuaScript.py", line 99, in __del__
    self.cb = None
  File "/home/micke/Documents/dev/telldus/tellstick-server-plugins/http/http/http.py", line 61, in run
    self.failure = None
  File "/usr/lib64/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 774, in __bootstrap
    self.__bootstrap_inner()

(gdb) bt
#0  0x00007fffef133d82 in lua_rawgeti (L=L@entry=0x7fff60004ef0, idx=idx@entry=-1001000, n=n@entry=0) at lapi.c:654
#1  0x00007fffef146291 in luaL_unref (L=0x7fff60004ef0, t=-1001000, ref=7) at lauxlib.c:546
#2  0x00007fffef3739f0 in __pyx_pf_4lupa_5_lupa_10_LuaObject_2__dealloc__ () from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#3  0x00007fffef37361d in __pyx_pw_4lupa_5_lupa_10_LuaObject_3__dealloc__ () from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#4  0x00007fffef390ea7 in __pyx_tp_dealloc_4lupa_5_lupa__LuaObject () from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#5  0x00007ffff7a7374f in insertdict_by_entry (mp=0x7fffd40b77f8, key='cb', hash=<optimized out>, ep=<optimized out>, value=<optimized out>)
at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Objects/dictobject.c:519
#6  0x00007ffff7a751b0 in dict_set_item_by_hash_or_entry (
op=op@entry={'destructionHandlers': [(<instancemethod at remote 0x7fffd407f050>, (), {})], 'cb': None, 'script': <LuaScript(_LuaScript__queue=[(<lupa._lupa._LuaFunction at remote 0x7fffd4074fa0>, (<Response(cookies=<RequestsCookieJar(_now=1495024665, _policy=<DefaultCookiePolicy(strict_rfc2965_unverifiable=True, strict_ns_domain=0, _allowed_domains=None, rfc2109_as_netscape=None, rfc2965=False, strict_domain=False, _now=1495024665, strict_ns_set_path=False, strict_ns_unverifiable=False, strict_ns_set_initial_dollar=False, hide_cookie2=False, _blocked_domains=(...), netscape=True) at remote 0x7fffd41093b0>, _cookies={}, _cookies_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7fffd409b7d0>, _RLock__count=0) at remote 0x7fffd4180750>) at remote 0x7fffd4180950>, _content='{"success":true,"message":"success"}', headers=<CaseInsensitiveDict(_store=<OrderedDict(_OrderedDict__root=[[[[[[[[[[[[[...], [...], 'access-control-allow-origin'], [...], 'access-control-allow-methods'], [....(truncated), key=<optimized out>, hash=<optimized out>, ep=ep@entry=0x0, 
value=value@entry=None) at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Objects/dictobject.c:795
#7  0x00007ffff7a76164 in PyDict_SetItem (
op=op@entry={'destructionHandlers': [(<instancemethod at remote 0x7fffd407f050>, (), {})], 'cb': None, 'script': <LuaScript(_LuaScript__queue=[(<lupa._lupa._LuaFunction at remote 0x7fffd4074fa0>, (<Response(cookies=<RequestsCookieJar(_now=1495024665, _policy=<DefaultCookiePolicy(strict_rfc2965_unverifiable=True, strict_ns_domain=0, _allowed_domains=None, rfc2109_as_netscape=None, rfc2965=False, strict_domain=False, _now=1495024665, strict_ns_set_path=False, strict_ns_unverifiable=False, strict_ns_set_initial_dollar=False, hide_cookie2=False, _blocked_domains=(...), netscape=True) at remote 0x7fffd41093b0>, _cookies={}, _cookies_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7fffd409b7d0>, _RLock__count=0) at remote 0x7fffd4180750>) at remote 0x7fffd4180950>, _content='{"success":true,"message":"success"}', headers=<CaseInsensitiveDict(_store=<OrderedDict(_OrderedDict__root=[[[[[[[[[[[[[...], [...], 'access-control-allow-origin'], [...], 'access-control-allow-methods'], [....(truncated), key=key@entry='cb', value=value@entry=None)
at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Objects/dictobject.c:848
#8  0x00007ffff7a7b638 in _PyObject_GenericSetAttrWithDict (obj=<optimized out>, name='cb', value=None,
dict={'destructionHandlers': [(<instancemethod at remote 0x7fffd407f050>, (), {})], 'cb': None, 'script': <LuaScript(_LuaScript__queue=[(<lupa._lupa._LuaFunction at remote 0x7fffd4074fa0>, (<Response(cookies=<RequestsCookieJar(_now=1495024665, _policy=<DefaultCookiePolicy(strict_rfc2965_unverifiable=True, strict_ns_domain=0, _allowed_domains=None, rfc2109_as_netscape=None, rfc2965=False, strict_domain=False, _now=1495024665, strict_ns_set_path=False, strict_ns_unverifiable=False, strict_ns_set_initial_dollar=False, hide_cookie2=False, _blocked_domains=(...), netscape=True) at remote 0x7fffd41093b0>, _cookies={}, _cookies_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7fffd409b7d0>, _RLock__count=0) at remote 0x7fffd4180750>) at remote 0x7fffd4180950>, _content='{"success":true,"message":"success"}', headers=<CaseInsensitiveDict(_store=<OrderedDict(_OrderedDict__root=[[[[[[[[[[[[[...], [...], 'access-control-allow-origin'], [...], 'access-control-allow-methods'], [....(truncated))
at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Objects/object.c:1529
#9  0x00007ffff7a7b03f in PyObject_SetAttr (
v=v@entry=<LuaFunctionWrapper(destructionHandlers=[(<instancemethod at remote 0x7fffd407f050>, (), {})], cb=None, script=<LuaScript(_LuaScript__queue=[(<lupa._lupa._LuaFunction at remote 0x7fffd4074fa0>, (<Response(cookies=<RequestsCookieJar(_now=1495024665, _policy=<DefaultCookiePolicy(strict_rfc2965_unverifiable=True, strict_ns_domain=0, _allowed_domains=None, rfc2109_as_netscape=None, rfc2965=False, strict_domain=False, _now=1495024665, strict_ns_set_path=False, strict_ns_unverifiable=False, strict_ns_set_initial_dollar=False, hide_cookie2=False, _blocked_domains=(...), netscape=True) at remote 0x7fffd41093b0>, _cookies={}, _cookies_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7fffd409b7d0>, _RLock__count=0) at remote 0x7fffd4180750>) at remote 0x7fffd4180950>, _content='{"success":true,"message":"success"}', headers=<CaseInsensitiveDict(_store=<OrderedDict(_OrderedDict__root=[[[[[[[[[[[[[...], [...], 'access-control-allow-origin'], [...], 'access-control-allow-met...(truncated), name=<optimized out>, name@entry='cb', value=value@entry=None)
at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Objects/object.c:1252
#10 0x00007ffff7ad3bec in PyEval_EvalFrameEx (
f=f@entry=Frame 0x7fffd407a578, for file /home/micke/Documents/dev/telldus/tellstick-server/lua/src/lua/LuaScript.py, line 99, in __del__ (self=<LuaFunctionWrapper(destructionHandlers=[(<instancemethod at remote 0x7fffd407f050>, (), {})], cb=None, script=<LuaScript(_LuaScript__queue=[(<lupa._lupa._LuaFunction at remote 0x7fffd4074fa0>, (<Response(cookies=<RequestsCookieJar(_now=1495024665, _policy=<DefaultCookiePolicy(strict_rfc2965_unverifiable=True, strict_ns_domain=0, _allowed_domains=None, rfc2109_as_netscape=None, rfc2965=False, strict_domain=False, _now=1495024665, strict_ns_set_path=False, strict_ns_unverifiable=False, strict_ns_set_initial_dollar=False, hide_cookie2=False, _blocked_domains=(...), netscape=True) at remote 0x7fffd41093b0>, _cookies={}, _cookies_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7fffd409b7d0>, _RLock__count=0) at remote 0x7fffd4180750>) at remote 0x7fffd4180950>, _content='{"success":true,"message":"success"}', headers=<CaseInsensitive...(truncated), throwflag=throwflag@entry=0)
at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Python/ceval.c:2253
#11 0x00007ffff7ada7d0 in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x7fffd4168e28, argcount=1, 
kws=kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Python/ceval.c:3582
#12 0x00007ffff7a63ccc in function_call (func=<function at remote 0x7fffef5b8938>,

@mickeprag
Copy link
Author

mickeprag commented May 22, 2017

Finally, a small-ish reproducible example! No threads or anything special... ;)
https://gist.github.com/mickeprag/75a0fbf04cfd06c3fe48b759da22f5ef

@xeor
Copy link

xeor commented Aug 20, 2018

Did you do any more work on this? I'm new to lua, but have done my share of python.. can i help?

@mickeprag
Copy link
Author

I can only answer from my side. I have tried to see where the crash happen but I could not understand fully why.
If @kmike done anyting more, I do not know.
If you want to help. Start by see if you can reproduce the crash on your computer using my test-script.

@scoder
Copy link
Owner

scoder commented Aug 20, 2018

I can reproduce the crash, but the stack trace changes on each run. That suggests that there might be some kind of Lua stack corruption that only shows at a later point. Meaning, the crash is almost certainly not where the problem is.

@xeor
Copy link

xeor commented Aug 20, 2018

Able to reproduce using Python 3.7.0 and lupa 1.7..

Going to test a couple of versions now

@scoder
Copy link
Owner

scoder commented Aug 20, 2018 via email

@xeor
Copy link

xeor commented Aug 20, 2018

Tried a couple of random versions..

3.7 - 1.6, 1.7
3.4.9 - 1.0 (without unpacks_lua_table)
2.7.15 - 1.7, 1.5,

all crashes randomly between 15 and 400..

@xeor
Copy link

xeor commented Aug 20, 2018

Running the test and just watching it fail, dumps a lot of different errors on the console. Mostly segfaults, but also python malloc (python: malloc.c:3760: _int_malloc: Assertion (unsigned long) (size) >= (unsigned long) (nb)' failed.`), errors and other python errors..

This might be a stupid question, but if I move thread = fn.coroutine() (https://gist.github.com/mickeprag/75a0fbf04cfd06c3fe48b759da22f5ef#file-crashtest-py-L39) outside the loop. It never fails.. Will it still work?

@scoder
Copy link
Owner

scoder commented Aug 20, 2018

Maybe you could try to strip down the test case? It's very complex and uses lots of features: Lua couroutines, the @unpacks_lua_table decorator, runtime options … Any feature that can be avoided will make it easier to find the place where things go wrong.

@xeor
Copy link

xeor commented Aug 20, 2018

I'm very new to lua, so I'm not sure where to begin... I'll continue playing with it a little tho..

@xeor
Copy link

xeor commented Aug 20, 2018

Another strange find is that if I set anything under self to the callback object passed into the PendingRequest, it crashes..

Example..

class PendingRequest(object):
	def __init__(self, callback):
		super(PendingRequest,self).__init__()
		self.callback = callback
		thread = self.callback.coroutine()
		try:
			thread.send(None)
		except StopIteration:
			pass

is the original..

class PendingRequest(object):
	def __init__(self, callback):
		super(PendingRequest,self).__init__()
		# self.callback = callback
		thread = callback.coroutine()
		try:
			thread.send(None)
		except StopIteration:
			pass

does not crash...

But

class PendingRequest(object):
	def __init__(self, callback):
		super(PendingRequest,self).__init__()
		self.xx = callback
		thread = callback.coroutine()
		try:
			thread.send(None)
		except StopIteration:
			pass

do crash..

@mickeprag
Copy link
Author

Maybe you could try to strip down the test case? It's very complex and uses lots of features: Lua couroutines, the @unpacks_lua_table decorator, runtime options

I have simplified the test case. Actually, removing the runtime options makes the script crash sooner on my machine.
I cannot reproduce the crash without using coroutines. Som my guess is that there is somewhere there the issue is.

Two observations:

  1. If I do not return the object PendingRequest in Request.get() it does not crash.
  2. If the callback variable is not stored in self (in PendingRequest.__init__) is does not crash. Same observation as @xeor.

Maybe this has something to do when the PendingRequest object is cleaned up by the Python garbage collector and it tries to release the reference to the lua-function? Just my speculations...

@xeor
Copy link

xeor commented Aug 21, 2018

I tried to turn off gc. import gc; gc.disable(), made no difference..

@noahcgreen
Copy link

Has there been any work done on this? I'm running into the same issue with coroutines unpredictably segfaulting.

@mickeprag
Copy link
Author

From my side, no, unfortunately not.

@scoder
Copy link
Owner

scoder commented Sep 10, 2020

There is a reproducing script in https://gist.github.com/mickeprag/75a0fbf04cfd06c3fe48b759da22f5ef
It's probably still not minimal and requires more investigation to find the point where things go wrong in the code.
Help with that is welcome.

@noahcgreen
Copy link

Here's a slightly more minimal reproducing script:

from lupa import LuaRuntime


class PendingRequest:

    def __init__(self, callback):
        self.callback = callback


def make_request(callback):
    return PendingRequest(callback)


lua = LuaRuntime()
lua.globals().make_request = make_request
run = lua.eval("""
function()
    make_request(function() end)
end
""")

for i in range(10000):
    print("Start call", i)
    thread = run.coroutine()
    try:
        thread.send(None)
    except StopIteration:
        pass

print("Finished successfully")

Almost every time I run this I get an error similar to this:

Python(83285,0x10f966dc0) malloc: Incorrect checksum for freed object 0x7f8de7f2bd78: probably modified after being freed.
Corrupt value: 0x0
Python(83285,0x10f966dc0) malloc: *** set a breakpoint in malloc_error_break to debug
zsh: abort      python3 crashtest.py

So I do think it's likely there is some error with garbage collection/deallocation. I'm still not so comfortable debugging Cython but I'll try to look at this more over the weekend.

@grungy-ado grungy-ado linked a pull request Aug 31, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants