Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault on new_thread and worker.queue #636

Closed
Bilal2453 opened this issue Mar 3, 2023 · 8 comments · Fixed by #638
Closed

segfault on new_thread and worker.queue #636

Bilal2453 opened this issue Mar 3, 2023 · 8 comments · Fixed by #638
Labels

Comments

@Bilal2453
Copy link
Contributor

The following is consistently segfaulting on my machine when used with Luajit:

local uv = require 'luv'

uv.new_thread(function() end)

uv.run()

The same code is not segfaulting with Luvit's luv although. But this is:

local uv = require'uv'

local tbl = {}
for i = 1, 25 do
  tbl[i] = i
end

uv.new_thread(function() end, unpack(tbl))

uv.run()

I am not sure if the first provided code is segfaulting because of a bad build I have of Luv or what is the deal exactly with that but if you can' reproduce the first then ignore it. The main focus is the second one:

image

This is running Luvit with that code multiple times, sometimes it works fine, most of the time it segfaults, and sometimes it prints this Error: thread arg not support type [random] at 10Uncaught Error: attempt to call a [random] value.

I tried backtracing this with no luck, the second thread is sent a segfault and its backtrace was just random when trying to use gdb.

@Bilal2453
Copy link
Contributor Author

Note that there may actually be multiple things going wrong here, the first real issue is that Luv failing to enforce the LUV_THREAD_MAXNUM_ARG limit (which is 9 in this case) possibly overflowing some struct somewhere. Also a possible race condition may be happening which could actually be causing this segfault.

@erw7
Copy link
Contributor

erw7 commented Mar 28, 2023

luv/src/thread.c

Lines 68 to 70 in b9c9b65

while (i <= top && i <= LUV_THREAD_MAXNUM_ARG + idx)
{
luv_val_t *arg = args->argv + i - idx;

@zhaozg
Copy link
Member

zhaozg commented Mar 28, 2023

    local tbl = {}
    for i = 1, 25 do
      tbl[i] = i
    end

    uv.new_thread(function(...)
      local a = {...}
      io.write(#a)
      io.write("\n")
      assert(#a == 9)
    end, unpack(tbl)):join()
    uv.run()

@zhaozg
Copy link
Member

zhaozg commented Mar 28, 2023

Your thread exist after main uv exit, that makes mem corrupt.
use thread:join() to avoid that.

@Bilal2453
Copy link
Contributor Author

Do I understand from this that blocking the main thread is required in order to let the rest work? I am not using :join here to explicitly not block the main thread, hence using threads.

@erw7
Copy link
Contributor

erw7 commented Mar 29, 2023

@zhaozg Your comment may be right about the first example of #636 (comment). However, the part of the code I showed has the typical out of bounds problem.

For simplicity, consider the case idx=0, LUV_THREAD_MAXNUM_ARG <= top. If we modify the code in a way that is easier to understand, the relevant code will eventually look like this

  i = 0;
  while (i <= LUV_THREAD_MAXNUM_ARG)
  {
    luv_val_t *arg = args->argv[i];
    …
    i++;
  }

Obviously the last arg refers to args->argv[LUV_THREAD_MAXNUM_ARG].

By executing the following code, you can see that an impermissible tenth argument is passed to the entry function.

local uv = require'luv'

local tbl = {}
-- for i = 1, tonumber(arg[1]) do
for i = 1, 10 do
  tbl[i] = i
end

local thread = uv.new_thread(function(...)
  -- print(#{...})
  assert(#{...} <= 9)
end, unpack(tbl))
uv.thread_join(thread)

thread = uv.new_thread({stack_size = 2048}, function(...)
  -- print(#{...})
  assert(#{...} <= 9)
end, unpack(tbl))
uv.thread_join(thread)

@zhaozg
Copy link
Member

zhaozg commented Mar 29, 2023

while (i <= LUV_THREAD_MAXNUM_ARG)
should be <,
my fault, I will fix this.

@Bilal2453
Copy link
Contributor Author

I have actually already noticed this out-of-bound access, sorry I seem to have completely forgot about it as I went deeper into the code. I am actually still not sure if this is all there is into this; at least that is what I remember making me continue digging into the code. Will test this fix once I can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants