Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This program creates garbage faster than the garbage collector can collect it #157

Open
psychon opened this issue Feb 2, 2017 · 9 comments

Comments

@psychon
Copy link
Collaborator

psychon commented Feb 2, 2017

I'm not exactly sure what is happening here, but the following program seems to have ever-increasing memory usage:

local lgi = require("lgi")
local glib = lgi.GLib
local gio = lgi.Gio

local count = 1000

local main = glib.MainLoop()

local function launch(i)
	print(i, collectgarbage("count"))

	local launcher = gio.SubprocessLauncher.new(gio.SubprocessFlags.STDOUT_PIPE)
	local proc = launcher:spawnv({"/bin/cat", "/proc/meminfo"})
	local pipe = proc:get_stdout_pipe()
	pipe = gio.DataInputStream.new(pipe)

	local start_read, finish_read
	start_read = function()
		pipe:read_line_async(glib.PRIORITY_DEFAULT, nil, finish_read)
	end
	finish_read = function(obj, res)
		local line, length = obj:read_line_finish(res)
		if line then
			start_read()
		else
			-- End of file
			assert(obj:close())
			assert(proc:wait())
			assert(proc:get_successful())
			if i < count then
				launch(i+1)
			else
				main:quit()
			end
		end
	end
	start_read()
end

launch(1)

main:run()

When setting more agressive GC settings, this no longer happens. For example:

collectgarbage("setstepmul", 1000)
collectgarbage("setpause", 110)

I looked a bit into "things" and I managed to figure out that the finish_read function is still reachable via, for example. debug.getregistry()[34709]. This shows that lots of things are still referenced via the registry, which means via luaL_ref. This function is only used for callables in lgi: lgi_closure_allocated() and lgi_closure_create().
Callables are "left dangling" after use (lgi_guard_create) and the GC can collect them. Apparently the above program causes some behaviour so that garbage accumulates quicker than it can be collected. Perhaps somehow the callables references each other...? (Although I do not really see how this would happen)

Original issue: awesomeWM/awesome#1490

@psychon
Copy link
Collaborator Author

psychon commented Feb 4, 2017

I cannot reproduce this issue with Lua 5.1 nor 5.2. It seems to be specific to Lua 5.3.
I came up with the following "pure-Lua" program reproducing this behaviour. I will try to get some insight from Lua-people into this.

local registry = {}

local refnil, noref = -1, -2

-- Lua-based reimplementations of luaL_ref and luaL_unref. Yes, this makes no
-- sense to have in Lua, but helps with the following experiments.
local function luaL_ref(t, obj)
	if not obj then
		return refnil
	end
	local next_free = t[0]
	local ref
	if next_free then
		ref = next_free
		t[0] = t[next_free]
	else
		ref = #t + 1
	end
	t[ref] = obj
	return ref
end
local function luaL_unref(t, ref)
	if ref >= 0 then
		-- Push the free reference into the freelist
		t[0], t[ref] = ref, t[0]
	end
end

-- Quick test
do
	local test_reg = {}
	assert(luaL_ref(test_reg, nil) == refnil)
	assert(luaL_ref(test_reg, {}) == 1)
	assert(luaL_ref(test_reg, {}) == 2)
	assert(luaL_ref(test_reg, {}) == 3)
	assert(test_reg[0] == nil)

	luaL_unref(test_reg, 2)
	assert(test_reg[0] == 2)
	assert(test_reg[2] == nil)

	luaL_unref(test_reg, 1)
	assert(test_reg[0] == 1)
	assert(test_reg[1] == 2)
	assert(test_reg[2] == nil)

	assert(luaL_ref(test_reg, {}) == 1)
	assert(test_reg[0] == 2)
	assert(test_reg[2] == nil)

	luaL_unref(test_reg, 3)
	assert(test_reg[0] == 3)
	assert(test_reg[3] == 2)
	assert(test_reg[2] == nil)

	luaL_unref(test_reg, 1)
	assert(test_reg[0] == 1)
	assert(test_reg[1] == 3)
	assert(test_reg[3] == 2)
	assert(test_reg[2] == nil)

	assert(luaL_ref(test_reg, nil) == refnil)
end

-- Some helper functions
local function make_big_object(size)
	if not size then return {} end
	size = size or 3
	if size <= 0 then
		return {}
	end
	return { make_big_object(size-1), make_big_object(size-1) }
end

local function guard_gc_callback(self)
	self.callback(self.arg1, self.arg2)
end
local guard_mt = { __gc = guard_gc_callback }
local function make_guard(callback, arg1, arg2)
	setmetatable({callback = callback, arg1 = arg1, arg2 = arg2}, guard_mt)
end

local free_later = { make_big_object(20) }

-- Now simulate what LGI is doing when running the "leaky program"
local iter = 0
while true do
	print(iter, collectgarbage("count"), #registry, registry[0])
	for i = 1, 1000 do
		local ref = luaL_ref(registry, make_big_object())
		make_guard(luaL_unref, registry, ref)
		iter = iter + 1
	end
	free_later = nil
end

@psychon
Copy link
Collaborator Author

psychon commented Feb 8, 2017

I asked on lua-l about this and got an answer from Roberto. It does not really say so, but I read it more-or-less as "do not do that".

@pavouk Would it be ok with you if I tried to come up with some magic to help Lua's garbage collector here? I am thinking of adding a weak table that tracks the guards which can be cleared and uses this for "finalizing" these objects faster (when a new guard is created, clear all pending guards).

@pavouk
Copy link
Collaborator

pavouk commented Feb 12, 2017

I don't completely grok Roberto's answer too. From what I understood it seems that any use of luaL_ref can wreak havoc on GC if performed sufficiently often. This looks to me more like a bug of GC + luaL_ref. but most probably I'm missing something.

@psychon Of course, if you come with any improvement I'll gladly merge it. And given recent amount of your and mine activity on lgi, I'd be really glad if you could at least co-maintain this project. If you are interested, please let me know.

As for replacing luaL_ref: it used to be used in much more areas of lgi, but at some point I switched to use lua_setuserdata() instead. However, it is not so simple for closures, where more objects need to be assigned to single callback object (thread_ref, target_ref and callable_ref). That's why I left luaL_ref here. Maybe it would be possible to attach simple table as userdata and store all these three objects into it instead of using luaL_ref?

@psychon
Copy link
Collaborator Author

psychon commented Feb 12, 2017

And given recent amount of your and mine activity on lgi, I'd be really glad if you could at least co-maintain this project. If you are interested, please let me know.

Well, this is also how I end up maintaining cairo-xcb and working on libxcb (and a bit like I end up having no clue about Qt, but having reviewer rights), so sure, feel free. I'm used to this. :-)
However, I do not really feel like adequate for the job. I have barely any knowledge about Gtk and GObject "things" and most of my patches start as random poking. So, I'd still want you to look over the mess that I produce and say that it is good.

but at some point I switched to use lua_setuserdata() instead

That's indeed a good idea that could also help the Lua GC to figure out that something is not really reachable.

Maybe it would be possible to attach simple table as userdata and store all these three objects into it instead of using luaL_ref?

Sure, but I read the current use of guards as mostly as "this can be freed as soon as the current function returns"...

I'll try to understand which parts needs which other parts to stay around and reduce the use of luaL_ref where possible.

@pavouk
Copy link
Collaborator

pavouk commented Feb 19, 2017

Sure, I definitely do not plan to retire completely, reviews and smaller-scale contributions are still fine.

I'm a bit confused about you talking about guards. Is it lgi_guard_create you are talking about? But these guards do not use luaL_ref at all. If the culprit is really luaL_ref, then its use is in callable.c, and it is definitely used differently - keeps alive the target and thread in which the callback should be executed.

@psychon
Copy link
Collaborator Author

psychon commented Feb 19, 2017

I'm a bit confused about you talking about guards. Is it lgi_guard_create you are talking about?

Yes. I do not remember the Full Story (tm) right now, so half of the following is made-up. Something like this:

	    /* Add guard which releases closure block after the
	       call. */
	    *lgi_guard_create (L, lgi_closure_destroy) = args[argi].v_pointer;

The above delayed-destroys the closure. The closure however has luaL_ref-style references to "something".

So yes, the guards do not use luaL_ref, but they are used for freeing something that in turn uses luaL_ref.

My new plan would be to replace usage of luaL_ref-references with lua_getuservalue / lua_getfenv (depending on Lua version). That should also help quite a lot with this bug (since then all the garbage is freed together, while currently it takes two "rounds of GC" to free everything).
I'll try to work on this later.

@psychon
Copy link
Collaborator Author

psychon commented Feb 19, 2017

After looking into the code (and mixing up callable and closure for a moment)... ok, this might not be that easy to fix. Closures are not (really) owned by Lua and so I cannot use lua_setfenv on them. Instead they seem to be half-unowned and half-"some destroy notify via C+ffi frees them".

Since I feel like my last answer did not make the problem clear enough:

The problem is that two GC cycles are needed before some(?) callback can be freed. Since Lua uses the amount of memory used as an estimate on when the next GC cycle is needed, this creates a feedback loop:

  • After the GC cycles, the guards are destroyed, but the callbacks (things that were referenced via luaL_ref) are not yet freed, because they were reachable via the registry in the sweep phase. The next GC cycle will free these.
  • Since the callbacks are not yet freed, memory usage is higher and thus the next GC cycle starts later.
  • Since the next GC cycle starts latter, more callables are created until then.
  • Thus, memory usage after the next GC cycle is now higher than after the last GC and the following GC cycle starts even later.

psychon added a commit to psychon/lgi that referenced this issue Apr 9, 2017
Related-to: lgi-devs#157

Signed-off-by: Uli Schlachter <psychon@znc.in>
@psychon
Copy link
Collaborator Author

psychon commented May 2, 2018

Random possibly relevant post on lua-l: http://lua-users.org/lists/lua-l/2018-04/msg00106.html

@hoelzro
Copy link

hoelzro commented Aug 10, 2022

While investigating my own AwesomeWM memory issues, I dug into this issue a bit - I discovered that many of the references in the Lua registry pointed to the same function and thread values. I was wondering - would be be reasonable to have a weakly-keyed table store a shared reference to things like callback functions and threads? That way if you have, say, 1000 marshaled callables, they would all share a reference to a single registry entry rather than creating their own. I imagine that it would work something like this:

Create reference for k

local registry = debug.getregistry()
local shared_references = registry[SHARED_REFERENCE_ID]
if not shared_references then
  shared_references = setmetatable({}, {__mode = 'k'})
  SHARED_REFERENCE_ID = luaL_ref(shared_references)
end
local ref_id = shared_references[k]
if not ref_id or registry[ref_id] ~= k then
  ref_id = luaL_ref(k)
  shared_references[k] = ref_id
end

Clean up reference to k

I don't have a good way to clean up references - I think lgi might need to maintain a reference count or something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants