Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

closure_callback: Use separate thread for marshalling #181

Closed
wants to merge 1 commit into from

Conversation

psychon
Copy link
Collaborator

@psychon psychon commented Nov 17, 2017

When a thread is yielded (status LUA_YIELD), Lua can get really upset
when one tries to execute Lua code on that thread. When LUA_USE_APICHECK
is enabled and e.g. things run under code coverage analysis (meaning
that debug.sethook() was used to set a hook that runs on various
occasions), this can actually result in a failed assertion which aborts
the process.

Fix this by making closure_callback create a new Lua thread and use that
one for all marshaling. Only the results are then moved back to the
yielded Lua thread's stack and the coroutine is resumed.

Sadly, I cannot provide a nice test case that shows the error before
this commit and the fix afterwards. I only got a complicated one.

Signed-off-by: Uli Schlachter psychon@znc.in


I have the following mess laying around. Before this commit, this caused Lua to fail an assertion and/or segfault. With this commit, everything seems to work fine.

diff --git a/tests/gio.lua b/tests/gio.lua
index 5b4f775..5edb550 100644
--- a/tests/gio.lua
+++ b/tests/gio.lua
@@ -85,3 +85,127 @@ function gio.async_access()
    if jit then jit.on() end
 end
 
+function get_foo()
+	local gio = require("lgi").Gio
+
+	local cache = {}
+
+	local function async_get_mtime(file)
+	    local info = file:async_query_info("time::modified", gio.FileQueryInfoFlags.NONE)
+	    return info and info:get_attribute_uint64("time::modified")
+	end
+
+	local function get_readable_path(gfile)
+	    return gfile:get_path() or gfile:get_uri()
+	end
+
+	local function scan(self, directory)
+	    local query = "standard::name,standard::type"
+	    local enum, err = directory:async_enumerate_children(query, gio.FileQueryInfoFlags.NONE)
+	    if not enum then
+		print(get_readable_path(directory) .. ": " .. tostring(err))
+		return
+	    end
+	    local per_call = 1000 -- Random value
+	    local children = {}
+	    while true do
+		local list, enum_err = enum:async_next_files(per_call)
+		if enum_err then
+		    print(get_readable_path(directory) .. ": " .. tostring(enum_err))
+		    return
+		end
+		for _, info in ipairs(list) do
+		    local file_type = info:get_file_type()
+		    local child = enum:get_child(info)
+		    if file_type == 'REGULAR' then
+			local path = child:get_path()
+			if path then
+			    self.known_paths[path] = true
+			end
+		    elseif file_type == 'DIRECTORY' then
+			table.insert(children, child)
+		    end
+		end
+		if #list == 0 then
+		    break
+		end
+	    end
+	    enum:async_close()
+	    for _, child in ipairs(children) do
+		scan(self, child)
+	    end
+	end
+
+	--- Asynchronously ensure that this cache object is up to date. You do not have
+	-- to call this yourself unless you want to inspect the `.known_paths` table
+	-- directly.
+	function cache:async_update()
+	    local now = os.time()
+
+	    -- Can we still just assume that we are up to date?
+	    if self.last_check + self.timeout >= now then
+		return
+	    end
+	    self.last_check = now
+
+	    -- Did the directory change?
+	    local mtime = async_get_mtime(self.directory)
+	    if self.directory_mtime == mtime then
+		return
+	    end
+	    self.directory_mtime = mtime
+
+	    -- We need to rebuild the cache
+	    self.known_paths = {}
+	    scan(self, self.directory)
+	end
+
+	--- Asynchronously check if a file exists.
+	-- @tparam string path relative path of the file.
+	-- @treturn boolean True if the file exists
+	function cache:async_check_exists(path)
+	    local sane_path = self.directory:get_child(path):get_path()
+	    if not sane_path then
+		return false
+	    end
+	    self:async_update()
+	    return self.known_paths[sane_path] or false
+	end
+
+	--- Asynchronously creates a new cache object.
+	-- @tparam string directory The directory to check.
+	-- @tparam[opt=5] int timeout Number of seconds that cache might be out of date.
+	-- @return A new cache object.
+	function cache.async_new(directory, timeout)
+	    local result = setmetatable({
+		directory = gio.File.new_for_path(directory),
+		timeout = timeout or 5,
+		last_check = 0,
+		directory_mtime = 0,
+		known_paths = {},
+	    }, { __index = cache })
+	    result:async_update()
+	    return result
+	end
+
+	--- Contains all known subfiles
+	-- @table cache.known_paths
+
+	return cache
+end
+
+require("luacov.runner")()
+
+function gio.foo()
+	print("starting foo")
+
+
+	local foo = get_foo()
+	require("lgi").Gio.Async.call(function()
+		foo.async_new(".")
+		foo.async_new("this/does/not/exist")
+		print(require("lgi").Gio.File.new_for_path("."):get_path())
+	end)()
+
+	print("done with foo")
+end

CC @blueyed @Elv13 once again I hope that this fixes all our problems. If one of you has an idea how to run test-menubar.lua against an LGI with this patch, speak up.

When a thread is yielded (status LUA_YIELD), Lua can get really upset
when one tries to execute Lua code on that thread. When LUA_USE_APICHECK
is enabled and e.g. things run under code coverage analysis (meaning
that debug.sethook() was used to set a hook that runs on various
occasions), this can actually result in a failed assertion which aborts
the process.

Fix this by making closure_callback create a new Lua thread and use that
one for all marshaling. Only the results are then moved back to the
yielded Lua thread's stack and the coroutine is resumed.

Sadly, I cannot provide a nice test case that shows the error before
this commit and the fix afterwards. I only got a complicated one.

Signed-off-by: Uli Schlachter <psychon@znc.in>
@psychon
Copy link
Collaborator Author

psychon commented Nov 17, 2017

Okay, I think this really makes things work for awesome.

Just re-enable test-menubar.lua (and have travis build the branch):
psychon/awesome@a8b2ab0
https://travis-ci.org/psychon/awesome/builds/303450398
(No idea why LuaJIT did not fail this time, it definitely ran the test)

And then switch to this PR for LGI:
psychon/awesome@88cc614
https://travis-ci.org/psychon/awesome/builds/303460318
(I think the first failure is unrelated, if I should guess then that's because it failed to upload coverage data)

Also, some more information on the original bug:

lua: ldebug.c:326: lua_getinfo: Assertion `ttisfunction(ci->func)' failed.
==13289== 
==13289== Process terminating with default action of signal 6 (SIGABRT): dumping core
==13289==    at 0x53CC69B: raise (in /usr/lib64/libc-2.26.so)
==13289==    by 0x53CE3B0: abort (in /usr/lib64/libc-2.26.so)
==13289==    by 0x53C4929: __assert_fail_base (in /usr/lib64/libc-2.26.so)
==13289==    by 0x53C49A1: __assert_fail (in /usr/lib64/libc-2.26.so)
==13289==    by 0x40A035: lua_getinfo (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x424333: hookf (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40AF35: luaD_hook (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40A604: luaG_traceexec (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x41AF96: luaV_execute (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40BBDE: luaD_call (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40BC30: luaD_callnoyield (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x416604: luaT_callTM (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x404F12: auxgetstr (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x5B85800: lgi_type_get_repotype (core.c:194)
==13289==    by 0x5B8BB59: object_type (object.c:80)
==13289==    by 0x5B8C6A0: lgi_object_2lua (object.c:357)
==13289==    by 0x5B8A496: lgi_marshal_2lua (marshal.c:1327)
==13289==    by 0x5B82819: callable_param_2lua (callable.c:766)
==13289==    by 0x5B8302B: closure_callback (callable.c:1193)
==13289==    by 0x6739ABE: ffi_closure_unix64_inner (in /usr/lib64/libffi.so.6.0.2)
==13289==    by 0x6739E85: ffi_closure_unix64 (in /usr/lib64/libffi.so.6.0.2)
==13289==    by 0x6BE62A3: ??? (in /usr/lib64/libgio-2.0.so.0.5400.2)
==13289==    by 0x6BE62D8: ??? (in /usr/lib64/libgio-2.0.so.0.5400.2)
==13289==    by 0x6467596: ??? (in /usr/lib64/libglib-2.0.so.0.5400.2)
==13289==    by 0x646ABB6: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5400.2)
==13289==    by 0x646AF5F: ??? (in /usr/lib64/libglib-2.0.so.0.5400.2)
==13289==    by 0x646B271: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5400.2)
==13289==    by 0x6739D1D: ffi_call_unix64 (in /usr/lib64/libffi.so.6.0.2)
==13289==    by 0x673968E: ffi_call (in /usr/lib64/libffi.so.6.0.2)
==13289==    by 0x5B84705: callable_call (callable.c:922)
==13289==    by 0x40B75A: luaD_precall (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x419794: luaV_execute (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40BBDE: luaD_call (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40BC30: luaD_callnoyield (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40AB7B: luaD_rawrunprotected (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40C02A: luaD_pcall (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x4082DE: lua_pcallk (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x4229BE: luaB_xpcall (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40B75A: luaD_precall (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x419794: luaV_execute (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40BBDE: luaD_call (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40BC30: luaD_callnoyield (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40AB7B: luaD_rawrunprotected (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40C02A: luaD_pcall (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x4082DE: lua_pcallk (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x403D16: docall (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x404A06: pmain (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40B75A: luaD_precall (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40BBD2: luaD_call (in /tmp/lua-5.3.4/src/lua)
==13289==    by 0x40BC30: luaD_callnoyield (in /tmp/lua-5.3.4/src/lua)

You can see that closure_callback tries to marshal an argument (callable_param_2lua, lgi_marshal_2lua and lgi_object_2lua). This results in a call to object_type which in turn tries to find LGI's internal representation for a type (lgi_type_get_repotype). This function does the equivalent of indexing lgi.core.repo, however lgi/init.lua set a metatable on this table with an __index meta-method. Thus, some Lua code attempts to run. Since the thread that this is happening on is yielded, things go downhill. However, it takes a while before this is noticed.

@blueyed
Copy link

blueyed commented Nov 17, 2017

Yes, the first build failure is due to uploading coverage:

Upload error: Couldn't find a repository matching this job.

Otherwise I cannot really comment, but your comments/description make sense.

@psychon
Copy link
Collaborator Author

psychon commented Nov 17, 2017

Sigh. In the mean time I found http://lua-users.org/lists/lua-l/2016-08/msg00150.html which suggests that doing memory allocation on a suspended thread is also not allowed since it might cause GC activity which in turn can cause __gc metamethods to run. Sounds like LGI has to keep a "useless" Lua thread around just for argument marshaling...

Oh and this PR only fixes half the problem: It only handles parameters, but return values need a similar treatment, too.

@psychon
Copy link
Collaborator Author

psychon commented Nov 19, 2017

Superseded by #183

@psychon psychon closed this Nov 19, 2017
@psychon psychon mentioned this pull request Nov 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants