Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raft worker errors #13325

Merged
merged 1 commit into from Sep 10, 2021
Merged

Raft worker errors #13325

merged 1 commit into from Sep 10, 2021

Conversation

SimonRichardson
Copy link
Member

In order to improve the readability of why a raft worker is restarting, the
following introduces logging error messages.

The problem isn't helped by the masked ErrStartTimeout waiting for a new
raft channel. As the catacomb is killed by the ErrStartTimeout, we lose
the original error message.

QA steps

$ snap install juju --classic
$ /snap/bin/juju bootstrap lxd test
$ go get github.com/lxc/lxd@172613bf35d31b8e788f00fe93069b8d0d19b947
$ juju upgrade-controller --build-agent
$ juju debug-log -m controller

You should see the following error message

machine-0: 16:26:49 ERROR juju.worker.raft Failed to setup raft instance, err: failed to get last log at index 52: msgpack decode error [pos 13]: invalid length of bytes for decoding time - expecting 4 or 8 or 12, got 15

Bug reference

https://bugs.launchpad.net/juju/+bug/1943075

In order to improve readability of why a raft worker is restarting, the
following introduces logging error messages.

The problem isn't helped by the masked ErrStartTimeout waiting for a new
raft channel. As the catacomb is killed by the ErrStartTimeout, we loose
the original error message.

We should revisit if this is the best way to do this!
Copy link
Contributor

@achilleasa achilleasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

However given the potential for the returned errors to be masked, you may want to log them at Critical level to make them stand out in the logs.

@SimonRichardson
Copy link
Member Author

$$merge$$

@jujubot jujubot merged commit 8d0e74c into juju:2.9 Sep 10, 2021
@achilleasa achilleasa mentioned this pull request Sep 13, 2021
jujubot added a commit that referenced this pull request Sep 13, 2021
#13329

This PR forward ports 2.9 into develop. The following PRs are included in this port:
 - Merge pull request #13324 from achilleasa/2.9-add-ovs-integration-test
 - Merge pull request #13328 from manadart/2.9-assess-series-upgrade
 - Merge pull request #13327 from wallyworld/azure-tests-fix
 - Merge pull request #13325 from SimonRichardson/raft-worker-errors
 - Merge pull request #13274 from SimonRichardson/lxd-network-devices-config-host-name
 - Merge pull request #13323 from jujubot/increment-to-2.9.15
 - Merge pull request #13321 from wallyworld/more-secret-metadata
 - Merge pull request #13319 from manadart/2.9-bridge-policy
 - Merge pull request #13320 from SimonRichardson/revert-lxd-changes
 - Merge pull request #13194 from juanmanuel-tirado/patch-1
 - Merge pull request #13318 from hpidcock/fix-1942948
 - Merge pull request #13314 from simondeziel/snap-ack
 - Merge pull request #13297 from achilleasa/2.9-allow-empty-openvswitch-blocks-in-netplan-config
 - Merge pull request #13317 from jujubot/increment-to-2.9.14
 - Merge pull request #13316 from hpidcock/fix-1942948
 - Merge pull request #13296 from ycliuhw/fix/registry-oauth2
 - Merge pull request #13311 from wallyworld/unitagent-missing-charm
 - Merge pull request #13315 from wallyworld/lxd-not-found-fix
 - Merge pull request #13221 from juanmanuel-tirado/status_watch_flag
 - Merge pull request #13312 from wallyworld/remove-txnwatcher-started
 - Merge pull request #13309 from kot0dama/fix-instrospection-posix-shell-2.9

The following files had merge conflicts that had to be resolved (please double-check the changes in last commit):
- caas/kubernetes/provider/bootstrap_test.go
- feature/flags.go
- scripts/win-installer/setup.iss
- snap/snapcraft.yaml
- state/pool.go
- version/version.go
- worker/uniter/relation/state_test.go
@SimonRichardson SimonRichardson deleted the raft-worker-errors branch June 22, 2023 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants