Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault #171

Closed
lipp opened this issue Jun 3, 2013 · 19 comments · Fixed by #173
Closed

Segmentation Fault #171

lipp opened this issue Jun 3, 2013 · 19 comments · Fixed by #173
Labels

Comments

@lipp
Copy link
Contributor

lipp commented Jun 3, 2013

I know segmentation faults are improbable (not possible) for lua-only modules, but here is what I am observing with 9c221c4:

https://travis-ci.org/lipp/lua-websockets/builds/7674777

This does not happen with busted 1.9.0 from Luarocks repo :
https://travis-ci.org/lipp/lua-websockets/builds/7729683
(I know it is failing anyhow...)

@Tieske
Copy link
Member

Tieske commented Jun 4, 2013

what runtime did you use? Lua, 5.1/5.2 LuaJit ?

@lipp
Copy link
Contributor Author

lipp commented Jun 4, 2013

I have Lua 5.1 and luajit installed. The busted call does not specify the interpreter. Also calling busted --lua=lua and busted --lua=luajit gave a segfault.

I looked at the code and didn't find where the cli arg --lua is being processed... does this work right now?

@Tieske
Copy link
Member

Tieske commented Jun 5, 2013

--lua is handled in the bootstrap code (the shell scripts) located in./bin

@lipp
Copy link
Contributor Author

lipp commented Jun 5, 2013

On OSX the situation is a bit different. The test "blocks/hangs" forever (instead of segfaulting).
The scenario can be boiled down to running: busted spec/ev_common_spec.lua (from lua-websockets project dir).
I guess the segfault is some kind of error alias for a work-off problem with the async loop.

@Tieske
Copy link
Member

Tieske commented Jun 5, 2013

Segfaulting Lua is hard, Luajit is somewhat easier, so I would suspect it to be related to Luajit. Have you tried the latest (just released) version?

@lipp
Copy link
Contributor Author

lipp commented Jun 5, 2013

Just tried this (on OSX):

busted -l lua spec/ev_common_spec.lua

●○/usr/local/bin/busted: line 41: 95435 Segmentation fault: 11  $COMMAND $BOOTSTRAP_PATH $*
busted spec/ev_common_spec.lua

●○^C^C

So Lua 5.1.5 crashes, whereas Luajit hangs.

@Tieske
Copy link
Member

Tieske commented Jun 5, 2013

beats me...

first thing that comes to ming would be stacksize, due to recursion. But this seems to happen in the first test, so unlikely?

I had a quick look, and websockets is pure Lua isn't it? maybe best to try and reduce further and post to the Lua list.

@lipp
Copy link
Contributor Author

lipp commented Jun 5, 2013

yes, lua-websockets is pure Lua. It is the second test which hangs / crashes I think. The strange thing is, that the official 1.9.0 release does not show this behaviour... I'lll investigate deeper.

lipp added a commit to lipp/busted that referenced this issue Jun 5, 2013
@lipp
Copy link
Contributor Author

lipp commented Jun 5, 2013

got it. it's in init.lua pcall is replaced by copcall!
That's why the different loops provided their own pcall.

#173

@Tieske
Copy link
Member

Tieske commented Jun 5, 2013

imo this is a workaround. It should not happen in the first place.
What loop do you use for websockets?

coxpcall is (supposed to be) transparent, and segfault is a failure anyway, Lua is supposed to be a safe language. Would it be possible to reduce the problem further? this really smells like a Lua bug.

@lipp
Copy link
Contributor Author

lipp commented Jun 5, 2013

websockets has implementations for ev and copas. but the test affected is ev based.

@Tieske
Copy link
Member

Tieske commented Jun 5, 2013

Then ev is the other suspect. ..

@DorianGray
Copy link
Contributor

I saw this happen in the first pass at busted code in the file loader. It had trouble with too many nested directories because each nest ended up adding a few layers to the stack. Stack sizes are different in lua on each OS. I think I worked around it by moving the directory traversal code into a coroutine.

@lipp
Copy link
Contributor Author

lipp commented Jun 6, 2013

I asked on Lua mailing list for any ideas: http://permalink.gmane.org/gmane.comp.lang.lua.general/100273

@Tieske
Copy link
Member

Tieske commented Jun 6, 2013

recursion should provide a stack overflow error, never a segfault. Ran a simple test

C:\Users\Thijs>type test.lua

print(_VERSION)

function f() f() end
f()



C:\Users\Thijs>test.lua
Lua 5.1
C:\Users\Public\Lua\5.1lfw\lua.exe: C:\Users\Thijs\test.lua:4: stack overflow
stack traceback:
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        ...
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:4: in function 'f'
        C:\Users\Thijs\test.lua:5: in main chunk
        [C]: ?

C:\Users\Thijs>

@Tieske
Copy link
Member

Tieske commented Jun 6, 2013

@lipp any chance of running this in a debugger and trace where it goes wrong? Probably start with a debug version of ev

@lipp
Copy link
Contributor Author

lipp commented Jun 10, 2013

This reason for the segfault should be clarified still.

@Tieske
Copy link
Member

Tieske commented Jun 10, 2013

Fabio suggested using valgrind while running the faulty test. Youmay want to give that a try.

@lipp
Copy link
Contributor Author

lipp commented Jun 11, 2013

This is deep :(
There is too much valgrind output to share it yet... Valgrind is definitely complaining.
I/it will need some time to figure that out (other high prio stuff on the desk).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

3 participants