Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random segfault when running with luajit #496

Closed
houqp opened this issue Feb 22, 2016 · 27 comments
Closed

random segfault when running with luajit #496

houqp opened this issue Feb 22, 2016 · 27 comments

Comments

@houqp
Copy link

houqp commented Feb 22, 2016

Hi,

We have been having this issue for awhile. 90% of the time, it segfaults at the end of the test, inside luajit's lua_close function (most of the time it dies at lj_gc_freeall). It kind smells like a race condition. Does busted do multi-threading at all?

@mpeterv
Copy link
Contributor

mpeterv commented Feb 22, 2016

Reduced testsuite reproducing the issue: put this into spec/1_spec.lua and spec/2_spec.lua, run busted.

for i = 1, 1000 do
  loadstring([[
    for j = 1, 20 do
      local x = {0LL}
    end
  ]])()
end

But this looks like a bug in LuaJIT, not busted...

@houqp
Copy link
Author

houqp commented Feb 22, 2016

Interesting, It could be a bug in LuaJIT. But we have been running the same code on many real devices for a long time any never ran into this issue. It only happens quiet frequently when running with busted.

But yeah, if it works for lua5.1 and assuming busted is not using any special magic to run those tests, then I would agree it's more likely a bug in LuaJIT.

@Tieske
Copy link
Member

Tieske commented Feb 23, 2016

we have seen this as well on Travis and CircleCI, where tests crashed at the end (busted with LuaJit). Probably warrants a bug report...

@houqp
Copy link
Author

houqp commented Feb 26, 2016

Sometimes the tests even crash during the execution. But this happens way less frequent.

@vsergeev
Copy link

👍 I am experiencing this issue as well. In my case it occurs maybe 20% of the time.

@Tieske
Copy link
Member

Tieske commented Feb 27, 2016

did someone report this on the LuaJit mailing list?

@houqp
Copy link
Author

houqp commented Feb 28, 2016

I can report it to the list. @Tieske , just to confirm, busted is not doing multi-threading right?

@Tieske
Copy link
Member

Tieske commented Feb 28, 2016

No, it isn't. Please do report, otherwise I will, just wanted to make sure we don't report it multiple times.

@houqp
Copy link
Author

houqp commented Feb 29, 2016

I have sent the report to the mailing list.

@Tieske
Copy link
Member

Tieske commented Feb 29, 2016

👍

For the record, the reported issue; http://www.freelists.org/post/luajit/random-segfaults-when-running-tests-with-busted

@mpeterv
Copy link
Contributor

mpeterv commented Feb 29, 2016

Actually, given that I can reproduce the issue with busted 2.0.rc7 but not 2.0.rc6 this could be an issue in lua-term, a C dependency added in rc7. I'll bisect for exact commit introducing the issue.

@mpeterv
Copy link
Contributor

mpeterv commented Feb 29, 2016

Nevermind, commenting out lua-term import doesn't fix the problem, bisect shows that segfault is introduced in @b3993e6.

@houqp
Copy link
Author

houqp commented Mar 1, 2016

@mpeterv what version of luajit are you using?
EDIT: nvm, I was able to reproduce with the latest LuaJIT.

@mpeterv
Copy link
Contributor

mpeterv commented Mar 1, 2016

@houqp I reproduced the issue using busted with testcase I posted above with luajit 2.0.0, 2.0.4, head of master branch, head of v2.1 branch. I've also reduced the testcase (the whole blob with busted, luarocks, penlight and other dependencies) to one file. I'll finish some more testing with different compilation flags and post it to luajit list.

@mpeterv
Copy link
Contributor

mpeterv commented Mar 1, 2016

So, as expected, this is a duplicate of #369. Some workarounds are posted there.

@houqp
Copy link
Author

houqp commented Mar 2, 2016

Hm... I think the problem now is the discussion in #369 is not documented. @Tieske is it possible to get this documented on the official documentation page?

For us, the workaround does not work. Most of our modules requires ffi, and it's not easier to put all of them into a helper.lua script. It's probably easier for neovim since their application code does not depend on luajit ffi. I guess a cleaner way to fix it is to run each test in its own process...

@mpeterv
Copy link
Contributor

mpeterv commented Mar 2, 2016

@houqp it should be enough to require just the FFI module in helper.lua, ignoring any other modules depending on it.

@houqp
Copy link
Author

houqp commented Mar 2, 2016

@mpeterv I tried that, but got a bunch of redefined errors in the cdef call (link), which is imported by some of the tests.

Error message:

Error → ffi/util.lua @ 54
suite ./spec/front/unit/nickel_conf_spec.lua
ffi/util.lua:54: attempt to redefine '_FILETIME' at line 11

stack traceback:
        ffi/util.lua:54: in main chunk

@mpeterv
Copy link
Contributor

mpeterv commented Mar 2, 2016

@houqp so busted drops and reloads that module, too. So, yes, you'll need to add all modules that can't be safely reloaded in helper.lua.

@houqp
Copy link
Author

houqp commented Mar 2, 2016

That's why I said this workaround is not practical for us ;) Basically all our modules depends on FFI, that means we will have to put all of them in helper.lua...

@vsergeev
Copy link

As a workaround, I've had good results running busted with the option --no-auto-insulate, e.g.

busted --lua=luajit --no-auto-insulate tests

This does sacrifice the test insulation feature of busted, but it seems to prevent the crash caused by the ffi module being unanchored between tests.

@houqp
Copy link
Author

houqp commented Apr 19, 2016

@vsergeev thanks! I can also confirm adding --no-auto-insulate fixes all our random segfaults :)

@houqp
Copy link
Author

houqp commented Apr 19, 2016

I suggest we add this to the documentation.

@DorianGray
Copy link
Contributor

Alongside the documentation, I suggest we add a list of default preloaded libraries. We add common things that don't like to be reloaded, like ffi, by default, and allow users to add modules either in helper.lua or at the command line.

@houqp
Copy link
Author

houqp commented Apr 20, 2016

yep, that will work too, as long as the behavior is documented :)

@Tieske
Copy link
Member

Tieske commented Jun 17, 2017

for reference, a possible fix: lunarmodules/luassert#150

@Tieske
Copy link
Member

Tieske commented Oct 20, 2020

closing this, reloading ffi has been resolved

@Tieske Tieske closed this as completed Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants