Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wreck/lua: fix crash due to Lua stack overflow #1594

Merged
merged 2 commits into from Jul 24, 2018

Conversation

Projects
None yet
4 participants
@grondo
Copy link
Contributor

grondo commented Jul 23, 2018

Problem: The Lua stack is not reset anywhere in luastack.c when
loading and running Lua wrexecd plugins. We got away with this
when each script was run in its own lua_thread (coroutine), because
the stacks were not shared between scripts, but when the use of
coroutines was removed, the stack growth was large enough to cause
crashes in some tests that ran many tasks.

Fix the bug by resetting the Lua stack before executing a callback
function in each lua script.

Fixes #1592

wreck/lua: fix crash due to Lua stack overflow
Problem: The Lua stack is not reset anywhere in luastack.c when
loading and running Lua wrexecd plugins. We got away with this
when each script was run in its own lua_thread (coroutine), because
the stacks were not shared between scripts, but when the use of
coroutines was removed, the stack growth was large enough to cause
crashes in some tests that ran many tasks.

Fix the bug by resetting the Lua stack before executing a callback
function in each lua script.

Fixes #1592
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 23, 2018

@garlick, if you could sanity check this fix on your desktop juuuust to double-check I'd appreciate it! 😉

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 23, 2018

Coverage Status

Coverage increased (+0.07%) to 79.506% when pulling 9806423 on grondo:issue#1592 into 3760c0b on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jul 23, 2018

Codecov Report

Merging #1594 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1594      +/-   ##
==========================================
- Coverage   79.26%   79.23%   -0.04%     
==========================================
  Files         171      171              
  Lines       31340    31341       +1     
==========================================
- Hits        24842    24833       -9     
- Misses       6498     6508      +10
Impacted Files Coverage Δ
src/modules/wreck/luastack.c 40.76% <100%> (+0.32%) ⬆️
src/modules/connector-local/local.c 72.76% <0%> (-1.43%) ⬇️
src/common/libflux/future.c 90.7% <0%> (-0.45%) ⬇️
src/modules/kvs/kvs.c 65.87% <0%> (-0.17%) ⬇️
src/common/libflux/message.c 80.78% <0%> (-0.12%) ⬇️
src/common/libflux/handle.c 83.66% <0%> (ø) ⬆️
src/broker/modservice.c 81.55% <0%> (+0.97%) ⬆️
@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 23, 2018

I de-installed a liblua5.2 package earlier today and seem to have screwed up my lua environment. Many tests fail with e.g.

lua: (command line):1: module 'posix' not found:
	no field package.preload['posix']
	no file './posix.lua'
	no file './posix.so'
	no file '/usr/local/lib/lua/5.1/posix.so'
	no file '/usr/lib/x86_64-linux-gnu/lua/5.1/posix.so'
	no file '/usr/lib/lua/5.1/posix.so'
	no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
	[C]: in function 'require'
	(command line):1: in main chunk
	[C]: ?
error: failed to find lua posix module in path
ERROR: t0000-sharness.t - missing test plan
ERROR: t0000-sharness.t - exited with status 1

yet I have the lua-posix and lua-posix-dev packages installed. I reverted back to master and that is failing too. Standby...

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 23, 2018

My current guess is that ffdb702 broke make check. I can reproduce this if I unset LUA_PATH in my environment. Maybe a stupid human error! (in case it is not clear, stupid human is me)

t/Makefile.am: fix make check failure when LUA_PATH not set
Problem: In t/Makefile.am, `./?.lua` is prepended to LUA_PATH in
order to ensure that the build tree's `fluxomter.lua` is found,
even if LUA_PATH doesn't contain the current working directly.
This doesn't work if LUA_PATH is currently empty, since you end
up with *only* `./?.lua` in LUA_PATH.

Since the default LUA_PATH already contains the current dir, we
only need to prepend to it if it is already set. So, do nothing
when LUA_PATH not set.
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 23, 2018

Oh, duh. If we prepend ./?.lua to LUA_PATH in an environment when LUA_PATH is not already set, then only the current working directory will be in the path, and no other modules can be found.

Fix the fix to only prepend to LUA_PATH if LUA_PATH is set.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 23, 2018

This works for me - thanks!

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 24, 2018

Restarted one builder that failed with the

make[3]: write error

which I don't think we've seen in a while, but clearly not related to this PR.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 24, 2018

Travis is happy. Thanks @grondo!

@garlick garlick merged commit 4830a90 into flux-framework:master Jul 24, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.07%) to 79.506%
Details
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 24, 2018

Thanks!

@grondo grondo deleted the grondo:issue#1592 branch Jul 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.