Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the issue Unclear buildbot failure email #71

Merged
merged 1 commit into from
Nov 23, 2023

Conversation

slydiman
Copy link
Contributor

#65

Listen to the header TTY stream and monitor the message like command timed out: 1200 seconds without output running [b'ninja', b'check-clang-unit'], attempting to kill sent from a worker.

@slydiman
Copy link
Contributor Author

slydiman commented Nov 13, 2023

The email before this patch:

Subject: ☠ Buildbot (<SERVER>): <builder> - failed test (failure) (main)
                
The Buildbot has detected a new failure on builder <builder> while building llvm.
...                
BUILD FAILED: failed test (failure)
               
Step 8 (test-build-unified-tree-check-clang-unit) failure: test (failure)
...

The email after this patch:

Subject: ☠ Buildbot (<SERVER>): <builder> - failed 1200 seconds without output running [b'ninja', b'check-clang-unit'], attempting to kill (main)
                
The Buildbot has detected a new failure on builder <builder> while building llvm.
...                
BUILD FAILED: failed 1200 seconds without output running [b'ninja', b'check-clang-unit'], attempting to kill
                
Step 8 (test-build-unified-tree-check-clang-unit) failure: 1200 seconds without output running [b'ninja', b'check-clang-unit'], attempting to kill
...

@gkistanova
Copy link
Contributor

Thanks, Dmitry!

Is it possible to skip/remove the ", attempting to kill (main)" trailing part of the string? This does not seem relevant to the problem summary.

Everything else looks good.

@slydiman
Copy link
Contributor Author

Is it possible to skip/remove the ", attempting to kill (main)" trailing part of the string? This does not seem relevant to the problem summary.

Note (main) is the part of the summary formatting in the buildbot.
Currently the worker contains the following code

    def kill(self, msg):
        # This may be called by the timeout, or when the user has decided to abort this build.
        self._cancelTimers()
        msg += ", attempting to kill"
        log.msg(msg)
        self.send_update([('header', "\n" + msg + "\n")])
        ...

We can adjust the regex to skip , attempting to kill, but I'd rather keep it. Just a timeout doesn't tell you what finally happened. I think attempting to kill explains why some tests may be incorrectly marked as FAIL.

Copy link
Contributor

@vvereschaka vvereschaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably something like reachedTimout and timeoutReason would be more correct naming instead for those killed and killReason, but it also looks ok for now.

Copy link
Contributor

@gkistanova gkistanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

llvm#65

Listen to the `header` TTY stream and monitor the message like `command timed out: 1200 seconds without output running [b'ninja', b'check-clang-unit'], attempting to kill` sent from a worker.
@slydiman slydiman merged commit ea1f5a4 into llvm:main Nov 23, 2023
@slydiman slydiman deleted the fix-issue-65 branch November 23, 2023 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants