Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1042: Watchdog causing multiple restarts for mlbridge #1157

Closed
wants to merge 4 commits into from

Conversation

erikj79
Copy link
Member

@erikj79 erikj79 commented May 14, 2021

When starting certain bots with a fresh scratch area, we currently end up in a restart loop. This is because all the threads immediately get busy cloning repos, which starves out the watchdog pings for longer than the hard coded 10 minutes. This patch changes the watchdog to use the configuration setting "watchdog" for the restart timeout instead. This value is currently used for a log warning which is also driven by the watchdog, so to be able to still have separate values, I've introduced a new option "watchdog_warn" which can optionally be set for just the warning part.

In addition to this, I also added a bit more logging to make it easier to follow through logstash when watchdog pings occur, or when a new instance of a bot runner is started. Failure to start due to configuration errors are now also posted using proper logs.


Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed

Issue

  • SKARA-1042: Watchdog causing multiple restarts for mlbridge

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/skara pull/1157/head:pull/1157
$ git checkout pull/1157

Update a local copy of the PR:
$ git checkout pull/1157
$ git pull https://git.openjdk.java.net/skara pull/1157/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1157

View PR using the GUI difftool:
$ git pr show -t 1157

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/skara/pull/1157.diff

@bridgekeeper
Copy link

bridgekeeper bot commented May 14, 2021

👋 Welcome back erikj! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title SKARA-1042 1042: Watchdog causing multiple restarts for mlbridge May 14, 2021
@openjdk openjdk bot added the rfr label May 14, 2021
@mlbridge
Copy link

mlbridge bot commented May 14, 2021

Webrevs

Copy link
Member

@kevinrushforth kevinrushforth left a comment

Looks good.

edvbld
edvbld approved these changes May 17, 2021
Copy link
Member

@edvbld edvbld left a comment

Looks good. Please see my question/comment in-line.

@openjdk
Copy link

openjdk bot commented May 17, 2021

@erikj79 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

🔍 One or more changes in this pull request modifies files in areas of the source code that often require two reviewers. Please consider if this is the case for this pull request, and if so, await a second reviewer to approve this pull request before you integrate it.

After integration, the commit message for the final commit will be:

1042: Watchdog causing multiple restarts for mlbridge

Reviewed-by: kcr, ehelin

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 2 new commits pushed to the master branch:

  • f4a18b3: 1036: Extend bot config files to support comments
  • 8ae7a24: Add SKARA_JAVA_OPTS env variable to launchers

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label May 17, 2021
edvbld
edvbld approved these changes May 17, 2021
Copy link
Member

@edvbld edvbld left a comment

Looks good!

@erikj79
Copy link
Member Author

erikj79 commented May 21, 2021

/integrate

@openjdk openjdk bot closed this May 21, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels May 21, 2021
@openjdk
Copy link

openjdk bot commented May 21, 2021

@erikj79 Since your change was applied there have been 5 commits pushed to the master branch:

  • 3111cc4: 1029: Skara bot should prevent integration of PR with incorrect issue type
  • a3c582b: 1000: mlbridge stuck failing on closed PR targeting a branch that no longer exists
  • d73f9e5: metrics: add metrics for HotSpot
  • f4a18b3: 1036: Extend bot config files to support comments
  • 8ae7a24: Add SKARA_JAVA_OPTS env variable to launchers

Your commit was automatically rebased without conflicts.

Pushed as commit c1f8697.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants