Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libvirt] Failed to resync clusterversion because: error pool master is not ready #579

Closed
praveenkumar opened this issue Mar 26, 2019 · 8 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@praveenkumar
Copy link
Contributor

Description

machine-config-operator is in False state because master pool is not ready (more than an hour)

Steps to reproduce the issue:

  1. Use https://github.com/openshift/installer/blob/master/docs/user/customization.md#control-plane-with-no-taints to make master node no-taint one.
  2. Edit the $CLUSTER_DIR/openshift/99_openshift-cluster-api_master-machines-0.yaml and add worker as label also
  3. start the installer on libvirt provider using this manifest.

Describe the results you received:

$ oc get clusterversion
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.alpha-2019-03-26-031759   False       True          98m     Unable to apply 4.0.0-0.alpha-2019-03-26-031759: the cluster operator machine-config is failing

$ oc get co machine-config -oyaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: 2019-03-26T04:58:04Z
  generation: 1
  name: machine-config
  resourceVersion: "35636"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/machine-config
  uid: bd13d9ce-4f83-11e9-bb4d-52fdfc072182
spec: {}
status:
  conditions:
  - lastTransitionTime: 2019-03-26T04:58:04Z
    message: Cluster not available for 4.0.0-0.alpha-2019-03-26-031759
    status: "False"
    type: Available
  - lastTransitionTime: 2019-03-26T05:05:25Z
    message: Cluster version is 4.0.0-0.alpha-2019-03-26-031759
    status: "False"
    type: Progressing
  - lastTransitionTime: 2019-03-26T04:58:43Z
    message: 'Failed to resync 4.0.0-0.alpha-2019-03-26-031759 because: error pool
      master is not ready, retrying. Status: (total: 1, updated: 0, unavailable: 0)'
    reason: 'error pool master is not ready, retrying. Status: (total: 1, updated:
      0, unavailable: 0)'
    status: "True"
    type: Failing
  extension:
    master: 0 out of 1 nodes have updated to latest configuration rendered-master-8d4bbd2f64dfc1bc50803d67ff1241be
    worker: 0 out of 1 nodes have updated to latest configuration rendered-worker-47f6396a8d609a8736b04e759645cf97
  relatedObjects:
  - group: ""
    name: openshift-machine-config-operator
    resource: namespaces
  versions:
  - name: operator
    version: 4.0.0-0.alpha-2019-03-26-031759

$ oc adm release info --commits | grep machine-config
  machine-config-controller                     https://github.com/openshift/machine-config-operator                       dc9b354d7b1c87d36c07c538c0387d09c04e8221
  machine-config-daemon                         https://github.com/openshift/machine-config-operator                       dc9b354d7b1c87d36c07c538c0387d09c04e8221
  machine-config-operator                       https://github.com/openshift/machine-config-operator                       dc9b354d7b1c87d36c07c538c0387d09c04e8221
  machine-config-server                         https://github.com/openshift/machine-config-operator                       dc9b354d7b1c87d36c07c538c0387d09c04e8221
  setup-etcd-environment                        https://github.com/openshift/machine-config-operator                       dc9b354d7b1c87d36c07c538c0387d09c04e8221

$ oc logs machine-config-controller-5798685d97-2fs5t -n openshift-machine-config-operator
[...]
E0326 05:32:33.669656       1 reflector.go:134] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to list *v1.ControllerCon
fig: Get https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/controllerconfigs?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0326 05:32:44.556050       1 reflector.go:134] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to list *v1.MachineConfig
Pool: Get https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigpools?limit=500&resourceVersion=0: net/http: TLS handshake timeout
E0326 05:32:44.593023       1 reflector.go:134] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to list *v1.ClusterVersion: the server could n
ot find the requested resource (get clusterversions.config.openshift.io)
E0326 05:32:44.606657       1 reflector.go:134] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to list *v1.MachineConfig
: the server could not find the requested resource (get machineconfigs.machineconfiguration.openshift.io)
I0326 05:32:51.902071       1 node_controller.go:467] Setting node test1-tw7nx-master-0 to desired config rendered-master-8d4bbd2f64dfc1bc50803d67ff1241be
I0326 05:32:51.903539       1 node_controller.go:467] Setting node test1-tw7nx-master-0 to desired config rendered-worker-47f6396a8d609a8736b04e759645cf97
I0326 05:32:52.235986       1 status.go:159] Node test1-tw7nx-master-0 unavailable: different configs true or node not ready false
I0326 05:32:57.236057       1 node_controller.go:467] Setting node test1-tw7nx-master-0 to desired config rendered-master-8d4bbd2f64dfc1bc50803d67ff1241be
I0326 05:33:02.279574       1 node_controller.go:467] Setting node test1-tw7nx-master-0 to desired config rendered-master-8d4bbd2f64dfc1bc50803d67ff1241be
I0326 06:01:37.160915       1 node_controller.go:467] Setting node test1-tw7nx-master-0 to desired config rendered-master-8d4bbd2f64dfc1bc50803d67ff1241be
I0326 06:01:37.161176       1 node_controller.go:467] Setting node test1-tw7nx-master-0 to desired config rendered-worker-47f6396a8d609a8736b04e759645cf97
I0326 06:01:37.232144       1 status.go:159] Node test1-tw7nx-master-0 unavailable: different configs true or node not ready false
I0326 06:01:42.233005       1 node_controller.go:467] Setting node test1-tw7nx-master-0 to desired config rendered-master-8d4bbd2f64dfc1bc50803d67ff1241be
I0326 06:01:47.278816       1 node_controller.go:467] Setting node test1-tw7nx-master-0 to desired config rendered-master-8d4bbd2f64dfc1bc50803d67ff1241be
I0326 06:30:22.419391       1 node_controller.go:467] Setting node test1-tw7nx-master-0 to desired config rendered-worker-47f6396a8d609a8736b04e759645cf97
I0326 06:30:22.428071       1 node_controller.go:467] Setting node test1-tw7nx-master-0 to desired config rendered-master-8d4bbd2f64dfc1bc50803d67ff1241be
I0326 06:30:22.521013       1 status.go:159] Node test1-tw7nx-master-0 unavailable: different configs true or node not ready false
I0326 06:30:27.516994       1 node_controller.go:467] Setting node test1-tw7nx-master-0 to desired config rendered-master-8d4bbd2f64dfc1bc50803d67ff1241be
I0326 06:30:32.558405       1 node_controller.go:467] Setting node test1-tw7nx-master-0 to desired config rendered-master-8d4bbd2f64dfc1bc50803d67ff1241be

$ oc get pods
NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-5798685d97-2fs5t   1/1     Running   1          101m
machine-config-daemon-kh4z9                  1/1     Running   0          101m
machine-config-operator-86d787cb5f-gn9vd     1/1     Running   1          103m
machine-config-server-htxwh                  1/1     Running   0          101m

Describe the results you expected:

Status for machine-config shouldn't be false.

Additional information you deem important (e.g. issue happens only occasionally):

Output of oc adm release info --commits | grep machine-config-operator:

$ oc adm release info --commits | grep machine-config-operator
  machine-config-controller                     https://github.com/openshift/machine-config-operator                       dc9b354d7b1c87d36c07c538c0387d09c04e8221
  machine-config-daemon                         https://github.com/openshift/machine-config-operator                       dc9b354d7b1c87d36c07c538c0387d09c04e8221
  machine-config-operator                       https://github.com/openshift/machine-config-operator                       dc9b354d7b1c87d36c07c538c0387d09c04e8221
  machine-config-server                         https://github.com/openshift/machine-config-operator                       dc9b354d7b1c87d36c07c538c0387d09c04e8221
  setup-etcd-environment                        https://github.com/openshift/machine-config-operator                       dc9b354d7b1c87d36c07c538c0387d09c04e8221

Additional environment details (platform, options, etc.):

cc @cgwalters @kikisdeliveryservice @runcom

@runcom
Copy link
Member

runcom commented Mar 26, 2019

we need machine-config-daemon logs as well

@praveenkumar
Copy link
Contributor Author

@runcom here you go.

$ oc logs machine-config-daemon-272hn -n openshift-machine-config-operator
[...]
I0326 11:43:37.751926    9465 daemon.go:738] State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41973eb774db51c505f91d9a9428de4a578ffe5b8d9a7a48333300862f11af7f
              CustomOrigin: Managed by pivot tool
                   Version: 410.8.20190322.0 (2019-03-22T20:35:08Z)
  pivot://docker-registry-default.cloud.registry.upshift.redhat.com/redhat-coreos/maipo@sha256:c09f455cc09673a1a13ae7b54cc4348cda0411e06dfa79ecd0130b35d62e8670
              CustomOrigin: Provisioned from oscontainer
                   Version: 400.7.20190306.0 (2019-03-06T22:16:26Z)
I0326 11:43:37.752097    9465 daemon.go:673] Current config: rendered-master-c801fea46ed76e4033e43f7eb7619be1
I0326 11:43:37.752112    9465 daemon.go:674] Desired config: rendered-worker-3dac17af8a7134d27be3011a4f3a1466
E0326 11:43:37.780693    9465 daemon.go:1129] content mismatch for file: "/etc/systemd/system/kubelet.service"
E0326 11:43:37.780832    9465 writer.go:119] Marking Degraded due to: unexpected on-disk state
W0326 11:43:37.826624    9465 daemon.go:292] Booting the MCD errored with unexpected on-disk state
I0326 11:43:37.827159    9465 run.go:22] Running captured: rpm-ostree status
I0326 11:43:37.953980    9465 daemon.go:738] State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41973eb774db51c505f91d9a9428de4a578ffe5b8d9a7a48333300862f11af7f
              CustomOrigin: Managed by pivot tool
                   Version: 410.8.20190322.0 (2019-03-22T20:35:08Z)
  pivot://docker-registry-default.cloud.registry.upshift.redhat.com/redhat-coreos/maipo@sha256:c09f455cc09673a1a13ae7b54cc4348cda0411e06dfa79ecd0130b35d62e8670
              CustomOrigin: Provisioned from oscontainer
                   Version: 400.7.20190306.0 (2019-03-06T22:16:26Z)
I0326 11:43:37.954280    9465 daemon.go:673] Current config: rendered-master-c801fea46ed76e4033e43f7eb7619be1
I0326 11:43:37.954343    9465 daemon.go:674] Desired config: rendered-worker-3dac17af8a7134d27be3011a4f3a1466
E0326 11:43:37.974997    9465 daemon.go:1129] content mismatch for file: "/etc/systemd/system/kubelet.service"
E0326 11:43:37.975297    9465 writer.go:119] Marking Degraded due to: unexpected on-disk state
W0326 11:43:38.047340    9465 daemon.go:292] Booting the MCD errored with unexpected on-disk state
I0326 11:43:38.047971    9465 run.go:22] Running captured: rpm-ostree status
I0326 11:43:38.175867    9465 daemon.go:738] State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41973eb774db51c505f91d9a9428de4a578ffe5b8d9a7a48333300862f11af7f
              CustomOrigin: Managed by pivot tool
                   Version: 410.8.20190322.0 (2019-03-22T20:35:08Z)
  pivot://docker-registry-default.cloud.registry.upshift.redhat.com/redhat-coreos/maipo@sha256:c09f455cc09673a1a13ae7b54cc4348cda0411e06dfa79ecd0130b35d62e8670
              CustomOrigin: Provisioned from oscontainer
                   Version: 400.7.20190306.0 (2019-03-06T22:16:26Z)
I0326 11:43:38.176055    9465 daemon.go:673] Current config: rendered-master-c801fea46ed76e4033e43f7eb7619be1
I0326 11:43:38.176095    9465 daemon.go:674] Desired config: rendered-worker-3dac17af8a7134d27be3011a4f3a1466
E0326 11:43:38.212600    9465 daemon.go:1129] content mismatch for file: "/etc/systemd/system/kubelet.service"
E0326 11:43:38.212714    9465 writer.go:119] Marking Degraded due to: unexpected on-disk state
W0326 11:43:38.268854    9465 daemon.go:292] Booting the MCD errored with unexpected on-disk state
I0326 11:43:38.271283    9465 run.go:22] Running captured: rpm-ostree status

@runcom
Copy link
Member

runcom commented Mar 26, 2019

I0326 11:43:37.954280    9465 daemon.go:673] Current config: rendered-master-c801fea46ed76e4033e43f7eb7619be1
I0326 11:43:37.954343    9465 daemon.go:674] Desired config: rendered-worker-3dac17af8a7134d27be3011a4f3a1466
E0326 11:43:37.974997    9465 daemon.go:1129] content mismatch for file: "/etc/systemd/system/kubelet.service"
E0326 11:43:37.975297    9465 writer.go:119] Marking Degraded due to: unexpected on-disk state

looks like it's flipping from master to worker 🤔 @cgwalters ptal

@cgwalters cgwalters self-assigned this Mar 26, 2019
@praveenkumar
Copy link
Contributor Author

@cgwalters ping, let me know if you have anything to test this out.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 5, 2020
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 5, 2020
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link
Contributor

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants