Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CALL /OUTPUT throws exception "Bus error: 10" and dies #537

Closed
draegtun opened this issue Jun 8, 2017 · 17 comments
Closed

CALL /OUTPUT throws exception "Bus error: 10" and dies #537

draegtun opened this issue Jun 8, 2017 · 17 comments
Assignees
Labels

Comments

@draegtun
Copy link

draegtun commented Jun 8, 2017

Example:

>> x: {}
== {}

>> call/shell/output {netstat -a} x
Bus error: 10

Without the /output refinement it works fine:

>> call/shell {netstat -a}
... get output in console ...

The /output issue is same without /shell refinement:

>> call [%/usr/sbin/netstat]
... again output ok to console ...

>> call/output [%/usr/sbin/netstat "-a"] x
Bus error: 10

Now netstat was deliberately chosen because it does work fine on smaller stuff:

>> call/shell/output {ls} x
;; works - x has the output

So i'm assuming some buffering issue so anything big crashes with this bus error :(

BTW - This is a newish issue because it works fine on r3-0f831cb (22-Mar-2017).

Test built from latest source (default make) on Mac OS/X 10.9.5

UPDATE - Along with the crash I also get a fatal error with this example:

>> call/shell/output {GIT_PAGER=cat git log} x
fatal: write failure on 'stdout': Resource temporarily unavailable
Bus error: 10

Interesting I get same fatal error when using r3-0f831cb but no crash?

>> call/shell/output {GIT_PAGER=cat git log} x
fatal: write failure on 'stdout': Resource temporarily unavailable
== 128

>> length? x
== 5387

So it has copied over partial data to x before throwing that fatal error.

@gchiu
Copy link

gchiu commented Jun 8, 2017

and if you use x: copy {} ?

@draegtun
Copy link
Author

draegtun commented Jun 8, 2017

@gchiu - Still dies with Bus error: 10 :(

Does that mean you don't get this crash on Windows (with or without copy)?

@hostilefork
Copy link
Member

hostilefork commented Jun 9, 2017

I found two problems which were introduced in switching the ALLOC_N macros to malloc(). I fixed those, and also made the operation fail out early if you try to pass it locked series to use for the stdout or stderr capture:

11da999

But with that addressed, it still seems to have problems with outputs that need to be larger than the buffer "chunk" size used to read from the pipe. It seems that long output is clipped, because even when it hasn't processed all the output it's ending the gathering loop because WIFEXITED(status) thinks that the process is done. And it seems there's a sort of process-race-condition, so you don't get the same behavior stepping in the debugger as you do when you don't.

Basically, it seems the child process is not waiting just because the parent isn't done reading the data yet.
@zsx - since you wrote the original logic, can you take a look?

@draegtun
Copy link
Author

draegtun commented Jun 9, 2017

@hostilefork

The good news is that this fixes the Bus error: 10 crash and so now netstat -a does work

However bad news is I'm still getting fatal: write failure on 'stdout': Resource temporarily unavailable on GIT_PAGER=cat git log

Here's couple of other interesting examples...

>> x: make string! 100'000
== ""

>> call/shell/wait/output {git --no-pager log} x
fatal: write failure on 'stdout': Resource temporarily unavailable
== 128

>> length-of x
== 4609

>> clear x
== ""

>> call/shell/wait/output {git log | less -F} x
== 0

>> length-of x
== 4609

>> clear x
== ""

>> call/shell/wait/output {git log} x        
fatal: write failure on 'stdout': Resource temporarily unavailable
== 128

>> length? x
== 4609

So always get first 4609 characters regardless since your fix (was 5387 before). And always get fatal warning/error except when piped (which isn't surprising!).

So at the moment CALL is struggling with something like git which paginates... even if i switch off the pagination :(

@hostilefork
Copy link
Member

hostilefork commented Jun 9, 2017

Well, I knew it wouldn't make it work. I just got rid of a couple crashing mistakes I'd introduced. But it helps to have tests.

One problem with these tests though is that they'd be platform dependent in terms of what commands you run, and perhaps hard to predict what output you got. I guess we could remedy this perhaps by running Rebol itself... based on the executable path... with arguments that ran a script with predictable generative output? If you have interest in helping with that, it would be great.

In any case, @zsx and I talked about changing how the memory buffering is done anyway, so you don't use twice as much memory as you need to for the operation. So maybe he can take a crack at stepping over what's happening with an eye toward that change.

@draegtun
Copy link
Author

draegtun commented Jun 9, 2017

@hostilefork - I've created some tests which hopefully will cover this problem - #541

Test 1 - Passes

Test 2 - Fails. This is where we get partial data (usually first 4609 bytes but it does sometime vary).

Test 3 - Crashes. Somewhere over 50,000 bytes received will this causes the following error:

Assertion failed: (result == 0), function OS_Form_Error, file ../src/os/posix/host-error.c, line 150.
Abort trap: 6

@draegtun
Copy link
Author

draegtun commented Jun 10, 2017

@hostilefork - Update

I've run the CALL tests on Linux (Ubuntu 12.04.5 32-bit) and now test 2 crashes rather than failing returning incomplete data:

  • Test 1 - Passes

  • Tests 2 & 3 - Crashes with following error:

    r3: ../src/os/posix/host-error.c:150: OS_Form_Error: Assertion `result == 0' failed.
    Aborted (core dumped)
    

I have found with little tinkering of test that anything over 5000 bytes received causes the crash.

@zsx
Copy link

zsx commented Jun 12, 2017

@draegtun The test cases should be fixed, but the git log is still failing, and I'm not sure how we can fix that: git was failing because it couldn't write to the output (the pipe is full because the rebol process is not reading as fast as git is writing). A larger buffer could mitigate the issue, it could still be filled up anyway.

@draegtun
Copy link
Author

@zsx - Thanks for fix.

I can confirm that all tests do now PASS on Mac OSX. I'll test on Linux later.

And I added a new test for big git log (see #544) and it PASSES for me! Does this new test FAIL for you?

NB. This new test does require git command-line so not sure if its platform independent test? If it doesn't work or can't be adapted to Windows then we can remove.

@hostilefork
Copy link
Member

@zsx Is that something specific about git log? :-/ I'd assume that it does standard I/O, which should block as long as the pipe isn't ready, shouldn't it?

@draegtun
Copy link
Author

draegtun commented Jun 12, 2017

@hostilefork - For me the git log crash/fail appears to be fixed? Don't know if it's a Windows issue for @zsx or was fixed by another commit post comment?

@zsx - I can now confirm that ALL tests (including new git log test) have succeeded (PASSED) on Linux.

@zsx
Copy link

zsx commented Jun 12, 2017

@draegtun For me, git log failed on Linux due to it can't write to standard output:
fatal: write failure on 'stdout': Resource temporarily unavailable

The strace looks like:

32060 read(4, "m in these cases.\n\ncommit 578f94"..., 4096) = 4096
32060 mmap(NULL, 630784, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa8a0ebe000
32061 write(1, "\ncommit 4492f5811d52e5419b5264ab"..., 538) = 538
32061 write(1, "\ncommit cd316c7d939e5edb89097bb7"..., 657) = 657
32061 write(1, "\ncommit 112da013fe2324d32bdc6a0e"..., 1287) = 1287
32061 write(1, "\ncommit 1daba3ff13f07ddbe9fd016e"..., 364) = 364
32061 write(1, "\ncommit ccac32755180f78b1d798f59"..., 747) = 747
32061 write(1, "\ncommit 2d995cf9c9c0835601b0859f"..., 1316) = -1 EAGAIN (Resource temporarily unavailable)
32061 open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC <unfinished ...>
32060 munmap(0x7fa8a0f58000, 626688 <unfinished ...>
32061 <... open resumed> )              = 4
32061 fstat(4, {st_mode=S_IFREG|0644, st_size=2997, ...}) = 0
32060 <... munmap resumed> )            = 0
32061 read(4,  <unfinished ...>
32060 read(4,  <unfinished ...>
32061 <... read resumed> "# Locale name alias data base.\n#"..., 4096) = 2997
32060 <... read resumed> "ommit b19d4912f340efb07fafddf016"..., 4096) = 4096
32060 mmap(NULL, 634880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa8a0e23000
32061 read(4, "", 4096)                 = 0
32061 close(4)                          = 0
32061 open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
32061 write(2, "fatal: write failure on 'stdout'"..., 67) = 67
32061 exit_group(128)                   = ?
32060 munmap(0x7fa8a0ebe000, 630784)    = 0
32060 read(4, "test\n    for if a function is ei"..., 4096) = 4096
32060 mmap(NULL, 638976, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa8a0f55000
32061 +++ exited with 128 +++
32060 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=32061, si_uid=1000, si_status=128, si_utime=3, si_stime=0} ---

Here process 32061 is git, and 32060 is rebol. So git exited right after the writing failure (where write returns EAGAIN)

However, if I increase BUF_SIZE_CHUNK to 40960, it passes.

@zsx
Copy link

zsx commented Jun 12, 2017

git log also passes if I redirect the output to a file instead of a string.

@draegtun
Copy link
Author

draegtun commented Jun 13, 2017

@zsx - Hmmmm, so we're getting different results at the moment :(

For me all tests, including the git log test in /tests/call.test.reb are passing on...

  • Mac OSX 10.9.5 (32-bit & 64-bit)
  • Ubuntu 12.04.5 32-bit (running under VMWare)
  • Ubuntu 14.04.5 64-bit

All the above were built today using latest source.

@zsx
Copy link

zsx commented Jun 13, 2017

@draegtun I think it's a race condition between rebol and git, if git runs slower, it works fine, or it fails.

@zsx
Copy link

zsx commented Jun 15, 2017

I think this one should now be fixed.

@draegtun
Copy link
Author

Well done @zsx

And I can confirm that the 4 call tests do still pass for me on all my platforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants