Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious process manager test failures #2297

Closed
SeanTAllen opened this issue Oct 21, 2017 · 4 comments
Closed

Spurious process manager test failures #2297

SeanTAllen opened this issue Oct 21, 2017 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@SeanTAllen
Copy link
Member

SeanTAllen commented Oct 21, 2017

Some process manager tests (but not all), check the exit code. You can see this here:

https://github.com/ponylang/ponyc/blob/master/packages/process/_test.pony#L460

The problem is that despite the test running to completion in terms of process correct data, sometimes, _kill_child fails. Returning a -1 error which triggers "failed" being called on the notify which fails our test.

It appears that from the perspective of the test, everything worked fine, but for some reason, kill failed. There are two possibilities:

  1. the external process didn't die
  2. it was already shutdown

If its the latter, we want to make _kill_child smarter. If the process is already gone, then there is no need to throw an error. See: https://github.com/ponylang/ponyc/blob/master/packages/process/process_monitor.pony#L454

If the external process didn't die but the test otherwise completed fine, then really, there's nothing we can do on the Pony side and the test should pass.

Someone needs to dig in and determine exactly why these occassionally fail with a "KILL ERROR".

My advice to anyone taking this one, you'll want to run the standard library tests in a loop in order to trigger.

Something like:

while ./stdlib --sequential; do :; done
@mfelsche
Copy link
Contributor

mfelsche commented Nov 18, 2017

Some more tests failing spuriously:

Example outputs from jenkins job https://travis-ci.org/ponylang/ponyc/jobs/303769678 for commit: 398d361 :

LLVM 3.7.1:

**** FAILED: process/WritevOrdering
	Received from stdout: 11 bytes
	Received so far: 11 bytes
	Expecting: 11 bytes
dispose: child exit code: 0
dispose: stdout: 11 bytes
dispose: stderr: 
dispose: received first data after: 	7425822 ns
dispose: total data process_time: 	8143206 ns
dispose: ProcessNotify lifetime: 	15569028 ns
ProcessError: KillError

LLVM 3.9.1:

**** FAILED: process/STDERR
dispose: child exit code: 1
dispose: stdout: 0 bytes
dispose: stderr: cat: file_does_not_exist: No such file or directory
dispose: total data process_time: 	754961948132 ns
dispose: ProcessNotify lifetime: 	45210553 ns
ProcessError: KillError

@jemc
Copy link
Member

jemc commented Nov 19, 2017

There are two possibilities:

  • the external process didn't die
  • it was already shutdown

Hmm.. according to this man page for kill, there are three possibilities:

On success (at least one signal was sent), zero is returned. On error, -1 is returned, and errno is set appropriately.

  • EINVAL - An invalid signal was specified.
  • EPERM - The process does not have permission to send the signal to any of the target processes.
  • ESRCH - The pid or process group does not exist. Note that an existing process might be a zombie, a process which already committed termination, but has not yet been wait(2)ed for.

Of those, I think ESRCH is probably the only one that makes sense as being a sporadic failure, so it's probably that one.

So it seems we just need to update to swallow the error if errno == ESRCH?

@jemc jemc closed this as completed Nov 19, 2017
@jemc jemc reopened this Nov 19, 2017
@SeanTAllen
Copy link
Member Author

@jemc that seems like a reasonable first attempt.

@winksaville
Copy link
Contributor

I'll look into this.

winksaville added a commit to winksaville/ponyc that referenced this issue Dec 26, 2017
This provides a fix for ponylang#2297. I was able to replicate two spurious
failures. The first was KillError thrown by _kill_child, I've changed
the code so KillError is not thrown as there is really no reasonable
recovery that can be performed.

The second spurious error I saw was in _close where @waitpid would
return an error because the _child_pid was -1. This occurs in
packages/process/_test.pony::TestFileExecCapabilityIsRequired and
::_TestNonExecutablePathResultsInExecveError which are designed to fail
and _child_pid is always -1. This causes a failure to be always be
sent asynchronously and sometimes is actually gets reported.
failure is reported
winksaville added a commit to winksaville/ponyc that referenced this issue Dec 26, 2017
This provides a fix for ponylang#2297. I was able to replicate two spurious
failures. The first was KillError thrown by _kill_child, I've changed
the code so KillError is not thrown as there is really no reasonable
recovery that can be performed.

The second spurious error I saw was in _close where @waitpid would
return an error because the _child_pid was -1. This occurs in
packages/process/_test.pony::TestFileExecCapabilityIsRequired and
::_TestNonExecutablePathResultsInExecveError which are designed to fail
and _child_pid is always -1. This causes a failure to be always be
sent asynchronously and sometimes is actually gets reported.
failure is reported
winksaville added a commit to winksaville/ponyc that referenced this issue Dec 26, 2017
This provides a fix for ponylang#2297. I was able to replicate two spurious
failures. The first was KillError thrown by _kill_child, I've changed
the code so KillError is not thrown as there is really no reasonable
recovery that can be performed.

The second spurious error I saw was in _close where @waitpid would
return an error because the _child_pid was -1. This occurs in
packages/process/_test.pony::TestFileExecCapabilityIsRequired and
::_TestNonExecutablePathResultsInExecveError which are designed to fail
and _child_pid is always -1. This causes a failure to be always be
sent asynchronously and sometimes is actually gets reported.
failure is reported
winksaville added a commit to winksaville/ponyc that referenced this issue Dec 30, 2017
This provides a fix for ponylang#2297. I was able to replicate two spurious
failures. The first was KillError thrown by _kill_child, I've changed
the code so KillError is not thrown as there is really no reasonable
recovery that can be performed.

The second spurious error I saw was in _close where @waitpid would
return an error because the _child_pid was -1. This occurs in
packages/process/_test.pony::TestFileExecCapabilityIsRequired and
::_TestNonExecutablePathResultsInExecveError which are designed to fail
and _child_pid is always -1. This causes a failure to be always be
sent asynchronously and sometimes is actually gets reported.
failure is reported
winksaville added a commit to winksaville/ponyc that referenced this issue Dec 30, 2017
This provides a fix for ponylang#2297. I was able to replicate two spurious
failures. The first was KillError thrown by _kill_child, I've changed
the code so KillError is not thrown as there is really no reasonable
recovery that can be performed.

The second spurious error I saw was in _close where @waitpid would
return an error because the _child_pid was -1. This occurs in
packages/process/_test.pony::TestFileExecCapabilityIsRequired and
::_TestNonExecutablePathResultsInExecveError which are designed to fail
and _child_pid is always -1. This causes a failure to be always be
sent asynchronously and sometimes is actually gets reported.
failure is reported
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants