Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lua: 5.2 compatibility and other fixes #1586

Merged
merged 15 commits into from Jul 23, 2018

Conversation

Projects
None yet
4 participants
@grondo
Copy link
Contributor

grondo commented Jul 18, 2018

This PR updates the code in src/modules/wreck/luastack.c for compatibility with Lua 5.2.

Additionally, a minor testing fix is included (discovered on c9.io), along with the removal of accidentally committed debug code that used inspect.lua, which we do not install with the flux-core package.

Finally, Travis testing is updated to run at least one builder using lua5.2, to ensure continued support of that version. (have yet to see if that works)

@grondo grondo force-pushed the grondo:lua-5.2 branch from 30936f9 to bbe2113 Jul 18, 2018

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 18, 2018

Coverage Status

Coverage decreased (-0.009%) to 79.38% when pulling 345ccf2 on grondo:lua-5.2 into 03a1ac9 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 18, 2018

Restarted three stalled builders. Two stalled right after rolemask/loop.t and one after t4000-issues-test-driver.t.

Nice that this didn't end up being a lot of change!

Hmm, I thought I had fixed that setenv test before. Maybe it was in a PR that I withdrew? Sorry you had to stumble over it.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 18, 2018

Restarted the first two builders again. They had stalled in the same place as before (right after rolemask/loop.t).

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 18, 2018

Strange, those two are stuck in the same place again. I'll hold off rerunning in case travis is sad right now.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 18, 2018

Hm, that seems to be the last test run before the sharness tests start.
I'm not sure how this PR could have affected those tests as a whole, but perhaps I should try removing the last commit and retesting to see if including 2 lua packages is to blame.

@grondo grondo force-pushed the grondo:lua-5.2 branch from bbe2113 to 8d3a275 Jul 18, 2018

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 18, 2018

Ok, retrying without the travis-ci changes

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 18, 2018

without the lua 5.2 travis changes, a different test hung in a different place. Trying again with the travis-ci changes. If the first 2 builders hang again I'm giving up for now.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 18, 2018

Hangs again in the same spot. Let me try one more thing (not even sure my travis changes were working)

@grondo grondo force-pushed the grondo:lua-5.2 branch from a84f541 to b50a710 Jul 18, 2018

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 18, 2018

Still hanging. Let me make sure this is not a problem with the use of lua 5.2 before trying anything else

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 19, 2018

Ok, I think I found the issue. The version of luarocks in Ubuntu 14.04 only works with Lua 5.1, so the built luaposix module is also for 5.1, which the lua5.2 interpreter will happily load, but then hangs -- which happens in the sharness.d/flux-sharness.sh code here:

#  Test requirements for testsuite
if ! lua -e 'require "posix"'; then
    error "failed to find lua posix module in path"
fi

I'll try to find a different way to test 5.2.

grondo added some commits Jul 17, 2018

wreck/luastack: Lua 5.2 compatibility
Problem: luastack.c is not compatible with Lua 5.2, which removed
the LUA_GLOBALSINDEX special index. Also, per-coroutine globals
no longer seem to be supported, instead coroutines (as created by
luaL_newthread()) share global environment of the main Lua state,
so the hack to use these threads for each Lua script in order to get
a different per-script globals table no longer works.

Solution: Instead of using the hacky coroutine and shadow global table,
create a function environment (setfenv in 5.1, and _ENV/1st upvalue
in 5.2) for each compiled chunk (script), and save *that* table in
the registry, to later use to look up callbacks registered by the Lua
plugins. Dump the use of luaL_newthread(), it is no longer needed --
we weren't really using coroutines anyway.

Fixes #62
t2000-wreck-env.t: improve setenv all test
Problem: t2000-wreck-env.t 'setenv all' test fails if there are environment
variables that contain HOSTNAME, FLUX_URI, or ENVIRONMENT.

Restrict the regex used with `grep -ve` further to avoid false failures of
this test in some environments.
t/scripts/waitfile: remove useless use of setfenv
Remove useless use of setfenv (f, _G) in waitfile.lua.
_G is the default function environment, and setfenv()
is not available in Lua 5.2/5.3.
cmd/flux-wreck: remove debug code
Problem: Debug code in flux-wreck utility uses the 'inspect.lua'
module, which isn't installed along with the flux-core package.

Remove debug code to fix errors in installed package.

Fixes #1582
cmd/flux-exec: remove debug code
Problem: warning in flux-exec when receiving an unexpected messages
uses inspect.lua, which is not installed as part of flux-core.

Remove use of the package, just print a plain warning.
lua: keep existing LUA_PATH/CPATH for tests
Lua test environment in src/bindings/lua/Makefile.am was overwriting
existing LUA_PATH and LUA_CPATH, which thwarted efforts to use a lua
installation from non-standard path. Instead of overwriting these env
vars, prepend to them to allow the side installed Lua installation or
modules to be found.
lua: update test scripts to #!/usr/bin/env lua
A couple test scripts from externally included projects still had
hardcoded /usr/bin/lua, which breaks `make check` when using a
Lua from a different path.

Use #!/usr/bin/env lua instead.
wreck: add -export-dynamic to wrexecd
wrexecd may need to export dynamic symbols to plugins, most notably
if liblua is linked statically with the binary.
testsuite: ensure fluxometer.lua is found
Problem: if './?.lua' is not in LUA_PATH, then fluxometer.lua will not
be found during 'make check'.

Ensure the in-tree fluxometer.lua is found by adding its path explicitly
to LUA_PATH in AM_TESTS_ENVIRONMENT.

Fixes #1585

@grondo grondo force-pushed the grondo:lua-5.2 branch from e6d7fc8 to 385989b Jul 21, 2018

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 21, 2018

Ok, after many, many, many attempts, and a few fixes along the way, I think I have this PR testing against Lua 5.1 and 5.2. The changes are:

  • Don't install the lua apt packages, instead use hererocks, which is a helper script that installs Lua to separate roots, including a copy of luarocks. hererocks is installed by pip and used in the travis-dep-builder.sh
  • Use Lua 5.1 by default, otherwise check $LUA_VERSION exported from the travis-ci build matrix.
  • I had a problem where it seemed like the build matrix env vars (except CC) were not exported to the before_install script, so I moved everything in travis.yml to just the script
  • Other than that, there were just a few places where #!/usr/bin/lua was still hard coded, and some baffling LUA_PATH and LUA_CPATH issues that I think I've worked out for travis anyway.
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jul 21, 2018

Codecov Report

Merging #1586 into master will decrease coverage by 0.02%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##           master    #1586      +/-   ##
==========================================
- Coverage   79.21%   79.18%   -0.03%     
==========================================
  Files         171      171              
  Lines       31336    31340       +4     
==========================================
- Hits        24824    24818       -6     
- Misses       6512     6522      +10
Impacted Files Coverage Δ
src/modules/wreck/luastack.c 40.43% <88.88%> (+1.33%) ⬆️
src/common/libflux/mrpc.c 86.11% <0%> (-1.2%) ⬇️
src/broker/content-cache.c 72.88% <0%> (-0.64%) ⬇️
src/common/libkvs/kvs_watch.c 89.69% <0%> (-0.43%) ⬇️
src/broker/overlay.c 73.81% <0%> (-0.32%) ⬇️
src/common/libflux/handle.c 83.66% <0%> (-0.25%) ⬇️
src/common/libflux/message.c 80.9% <0%> (-0.24%) ⬇️
src/common/libutil/dirwalk.c 94.28% <0%> (+0.71%) ⬆️
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 21, 2018

It occurs to me that the travis-dep-builder.sh changes here will affect flux-sched builds, which use the same script.

I should probably make the hererocks setup optional so that it is not active by default.

@grondo grondo force-pushed the grondo:lua-5.2 branch 2 times, most recently from e9db3d4 to e7665e6 Jul 21, 2018

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 21, 2018

Updated the travis-dep-builder.sh script so that it only uses hererocks when LUA_VERSION is set or the -L, --lua-version option is used. This should make it so no changes are required in flux-sched once this PR is merged.

I also figured out why I thought environment variables from the .travis.yml weren't being set in the before_install script (my fault), so I removed the changes to move everything into script:, which reduced the amount of churn in the PR.

We'll see if this actually builds though...

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 21, 2018

Sadly, now getting python test errors in the LUA_VERSION=5.2 builds only. 😠

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 21, 2018

Ok, finally got the error out of Travis for the python build and I'm still mystified. This only happens with LUA_VERSION=5.2, I guess some kind of environment issue? Anyone got any hints on what might be wrong here?

TAP version 13
#Running doctests and unit tests in: /home/travis/build/grondo/flux-core/flux-core-0.7.0-1887-g4ec8b021/src/bindings/python/test/wrapper.py
ok 1 test.wrapper.TestWrapper.test_automatic_unwrapping
PASS: ../t/t9990-python-tests.t 1 test.wrapper.TestWrapper.test_automatic_unwrapping
ok 2 test.wrapper.TestWrapper.test_call_insufficient_arguments
PASS: ../t/t9990-python-tests.t 2 test.wrapper.TestWrapper.test_call_insufficient_arguments
ok 3 test.wrapper.TestWrapper.test_call_invalid_argument_type
PASS: ../t/t9990-python-tests.t 3 test.wrapper.TestWrapper.test_call_invalid_argument_type
ok 4 test.wrapper.TestWrapper.test_call_non_existant
PASS: ../t/t9990-python-tests.t 4 test.wrapper.TestWrapper.test_call_non_existant
ok 5 test.wrapper.TestWrapper.test_masked_function
PASS: ../t/t9990-python-tests.t 5 test.wrapper.TestWrapper.test_masked_function
ok 6 test.wrapper.TestWrapper.test_read_basic_value
PASS: ../t/t9990-python-tests.t 6 test.wrapper.TestWrapper.test_read_basic_value
ok 7 test.wrapper.TestWrapper.test_set_pimpl_handle
PASS: ../t/t9990-python-tests.t 7 test.wrapper.TestWrapper.test_set_pimpl_handle
Traceback (most recent call last):
  File "../../../src/bindings/python/test_commands/test_runner.t", line 53, in <module>
ok 8 test.wrapper.TestWrapper.test_set_pimpl_handle_invalid
    run_under_dir(result_set, '../test', 'test')
  File "../../../src/bindings/python/test_commands/test_runner.t", line 46, in run_under_dir
    run_tests_with_size(result_set, suite, size)
  File "../../../src/bindings/python/test_commands/test_runner.t", line 16, in run_tests_with_size
PASS: ../t/t9990-python-tests.t 8 test.wrapper.TestWrapper.test_set_pimpl_handle_invalid
    with sideflux.run_beside_flux(size):
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/travis/build/grondo/flux-core/flux-core-0.7.0-1887-g4ec8b021/src/bindings/python/test_commands/sideflux.py", line 130, in run_beside_flux
    f.start()
  File "/home/travis/build/grondo/flux-core/flux-core-0.7.0-1887-g4ec8b021/src/bindings/python/test_commands/sideflux.py", line 92, in start
    raise EnvironmentError(self.sub.poll())
EnvironmentError: 1
close failed in file object destructor:
IOError: [Errno 9] Bad file descriptor
Exception OSError: (3, 'No such process') in <bound method SideFlux.__del__ of <sideflux.SideFlux object at 0x2af697300ad0>> ignored
#Running doctests and unit tests in: /home/travis/build/grondo/flux-core/flux-core-0.7.0-1887-g4ec8b021/src/bindings/python/test/handle.py
ERROR: ../t/t9990-python-tests.t - missing test plan
ERROR: ../t/t9990-python-tests.t - exited with status 1

grondo added some commits Jul 20, 2018

travis-ci: require luaposix 33.0.4
Don't allow luarocks to install the latest luaposix.
Instead specifically request v33.0.4, which is compatible with Flux.
travis-ci: allow testing with different lua versions
Allow testing with different Lua versions by optionally using the hererocks
package in travis-dep-builder.sh. The hererocks script builds and installs
Lua versions to different roots (including luarocks), which will allow
testing with multiple versions much more tractable.

The use of hererocks is only activated if LUA_VERSION is set, or a new
-L, --lua option is used.
travis-ci: build with Lua 5.1 and 5.2
Update .travis.yml to set LUA_VERSION for every build in order to trigger the
use of hererocks support in `travis-dep-builder.sh`.

Remove the lua* APT packages so they don't accidentally creep back into the
build, which can cause difficult to track down problems, especially when
Lua 5.1 and 5.2 packages are mixed together.
travis-ci: dump all test-suite.log on failure
Find and dump all test-suite.log on failure. Prepend output with a
searchable string to allow for easy finding in the travis-ci logs.
travis-ci: only dump config.log on config failure
The config.log dump in travis-ci after_failure is way too many lines,
and is only useful when ./configure fails. Therefore, check configure
status and only output config.log on failure.
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 22, 2018

I was able to get some more information by setting SIDEFLUX_DEBUG in the travis environment and the failure above is what you get when flux start fails. In this case (not surprisingly) this was a Lua issue. I believe the problem is some Lua modules are being linked statically with liblua.a, and Lua 5.2 especially doesn't like this when liblua.so is also dynamically linked.

#Running doctests and unit tests in: /home/travis/build/grondo/flux-core/flux-core-0.7.0-1887-g797b3840/src/bindings/python/test/handle.py
flux-start: warning: setting --bootstrap=selfpmi due to --size option

2018-07-21T22:53:02.612216Z broker.err[0]: rc1: lua: multiple Lua VMs detected

2018-07-21T22:53:02.612345Z broker.err[0]: rc1: stack traceback:

2018-07-21T22:53:02.612364Z broker.err[0]: rc1: 	[C]: ?

2018-07-21T22:53:02.612368Z broker.err[0]: rc1: 	[C]: in function 'require'

2018-07-21T22:53:02.612371Z broker.err[0]: rc1: 	...887-g797b3840/_build/sub/../../src/cmd/flux-hostlist:30: in main chunk

2018-07-21T22:53:02.612376Z broker.err[0]: rc1: 	[C]: ?

2018-07-21T22:53:02.617863Z broker.err[0]: rc1: lua: multiple Lua VMs detected

2018-07-21T22:53:02.617980Z broker.err[0]: rc1: stack traceback:

2018-07-21T22:53:02.618012Z broker.err[0]: rc1: 	[C]: ?

2018-07-21T22:53:02.618037Z broker.err[0]: rc1: 	[C]: in function 'require'

2018-07-21T22:53:02.618065Z broker.err[0]: rc1: 	...887-g797b3840/_build/sub/../../src/cmd/flux-hostlist:30: in main chunk

2018-07-21T22:53:02.618102Z broker.err[0]: rc1: 	[C]: ?

2018-07-21T22:53:02.618363Z broker.err[0]: Run level 1 Exited with non-zero status (rc=1) 0.4s

flux-start: 0 (pid 21411) exited with rc=1

    return self.gen.next()
  File "/home/travis/build/grondo/flux-core/flux-core-0.7.0-1887-g797b3840/src/bindings/python/test_commands/sideflux.py", line 130, in run_beside_flux
    f.start()
  File "/home/travis/build/grondo/flux-core/flux-core-0.7.0-1887-g797b3840/src/bindings/python/test_commands/sideflux.py", line 92, in start
    raise EnvironmentError(self.sub.poll())
EnvironmentError: 1
close failed in file object destructor:
IOError: [Errno 9] Bad file descriptor
Exception OSError: (3, 'No such process') in <bound method SideFlux.__del__ of <sideflux.SideFlux object at 0x2ad0215aead0>> ignored

I'm not sure why this was a problem in the sideflux environment only (all other tests pass), but the problem was due to some lua APT packages still being installed in Travis. Once I removed those, the issue went away.

@grondo grondo force-pushed the grondo:lua-5.2 branch from e7665e6 to 345ccf2 Jul 22, 2018

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 23, 2018

Nice work running down all these tricky issues!

@garlick garlick merged commit 5ad5396 into flux-framework:master Jul 23, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.009%) to 79.38%
Details
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 23, 2018

Thanks!

@grondo grondo referenced this pull request Jul 23, 2018

Closed

wreck: segfault in wrexecd #1592

@grondo grondo deleted the grondo:lua-5.2 branch Jul 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.