Waiting for app ReplicaSet be marked available indefinitely #1978

jsravn · 2022-04-26T12:53:18Z

What happened?

A completely healthy deployment gets stuck on "Waiting for app ReplicaSet be marked available" despite all replicas being fully available.

Steps to reproduce

It's not clear yet to me how to reproduce reliably. It seems to happen sporadically. Also I see there are quite a few issues already that have the same problem - I guess it's not completely fixed?

Expected Behavior

It should notice the deployment is healthy and proceed.

Actual Behavior

Do you want to perform this update? yes
Updating (infra.pre-dev):
     Type                                    Name                 Status       Info
     pulumi:pulumi:Stack                     infra-infra.pre-dev  running      read kubernetes:core/v1:Secret gke-cluster-addons-feed-original-tls
     └─ core_gcp:infra:cluster_addons        gke-cluster-addons                
        ├─ core:gcp:lib:traefik              traefik-internal                  
 ~      │  └─ kubernetes:apps/v1:Deployment  traefik-internal     updating...  [diff: ~spec]; [1/2] Waiting for app ReplicaSet be marked available (3/3 Pods available)
        └─ core:gcp:lib:traefik              traefik-external                  
 ~         └─ kubernetes:apps/v1:Deployment  traefik-external     updating     [diff: ~spec]; [1/2] Waiting for app ReplicaSet be marked available (1/1 Pods available)

ReplicaSet status:

status:
  availableReplicas: 3
  fullyLabeledReplicas: 3
  observedGeneration: 11
  readyReplicas: 3
  replicas: 3

Deployment status:

status:
  availableReplicas: 3
  conditions:
  - lastTransitionTime: "2022-04-26T10:41:05Z"
    lastUpdateTime: "2022-04-26T10:41:05Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2022-04-13T09:10:26Z"
    lastUpdateTime: "2022-04-26T10:41:26Z"
    message: ReplicaSet "traefik-internal-b28bb768-6547995858" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 24
  readyReplicas: 3
  replicas: 3
  updatedReplicas: 3

Versions used

CLI          
Version      3.30.0
Go Version   go1.17.9
Go Compiler  gc

Plugins
NAME        VERSION
gcp         6.20.0
gcp         6.20.0
kubernetes  3.18.2
kubernetes  3.18.2
nodejs      unknown
random      4.4.2
random      4.4.2

Host     
OS       nixos
Version  21.11 (Porcupine)
Arch     x86_64

This project is written in nodejs (/nix/store/46g0dmf6rcpikbzs22y7w4amyg0ciksi-nodejs-16.14.2/bin/node v16.14.2)

Additional context

Tried running a pulumi refresh beforehand, but it made no difference. Kubernetes is version 1.22.

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

The text was updated successfully, but these errors were encountered:

jsravn · 2022-04-26T13:40:39Z

Unsure why, but after updating my pulumi-kubernetes provider to 3.18.3 (from 3.18.2) it proceeded to work.

jsravn · 2022-05-06T09:41:11Z

This is still happening to us. Not sure what triggers it yet.

jsravn · 2022-05-25T10:46:43Z

This keeps happening to us and we haven't quite figured out what triggers it. I think it may be related to performing a refresh before doing pulumi up, since it seems to happen pretty reliably when I do pulumi --refresh --skip-preview --yes up. Whereas it never seems to happen if I avoid refresh.

Shrooblord · 2022-08-26T14:51:23Z

Hey there! Thanks for the insight into pulumi refresh. I routinely do pulumi refresh before a pulumi up and seem to be running into the same problem. However, if I do pulumi refresh --> pulumi up --> error --> pulumi up, I still get the same errors you describe in the first post.

Is there anything you do to 'reset' pulumi so the refresh bug doesn't come up?

thanhtoan1196 · 2022-10-07T07:43:57Z

any update?

kralikba · 2023-01-12T11:15:04Z

I'm having the same problem with Pulumi.Kubernetes 3.23.1. (c#, .net 7 project) Any ideas/updates?

jsravn · 2023-01-16T17:03:25Z

It's been a while since I looked at this but I believe after a lot of debugging we found this happens if something in the cluster is modifying the pod spec after deployment. This seems to screw up Pulumi's ability to perform an upgrade if the spec doesn't match what it last applied. It should be easy to reproduce this by manually modifying and seeing if Pulumi can update it.

lblackstone · 2023-07-14T21:34:45Z

We fixed some bugs related to refresh in #2445 so I'm hoping this is fixed in v4. I'm going to close as resolved, but please let us know if you're still seeing the error after upgrading.

RouxAntoine · 2023-11-03T18:46:33Z

hello,
I use github.com/pulumi/pulumi-kubernetes/sdk/v4 v4.5.3 and have the same effect

replicaset status seems good :

status:
  availableReplicas: 1
  fullyLabeledReplicas: 1
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1

no event in replicaset

Events:                            <none>

this event in deployment :

Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  ampere-1-deployment-66221e9a-5d8ff544c6 (0/0 replicas created), ampere-1-deployment-66221e9a-d447578c7 (0/0 replicas created), ampere-1-deployment-66221e9a-6f5f89f6f5 (0/0 replicas created), ampere-1-deployment-66221e9a-659c47cc96 (0/0 replicas created), ampere-1-deployment-66221e9a-7996b77fd9 (0/0 replicas created), ampere-1-deployment-66221e9a-5bf5546fd6 (0/0 replicas created)
NewReplicaSet:   ampere-1-deployment-66221e9a-8f74b44bb (1/1 replicas created)
Events:          <none>

hope it could help

jtmarmon · 2023-11-10T18:11:57Z

hey @lblackstone @jsravn, I'm running into this very reliably and I think your assessment

after a lot of debugging we found this happens if something in the cluster is modifying the pod spec after deployment

Is likely accurate. I'm on GCP GKE using Autopilot mode, and I'm running into this only when I modify my deployment's podspec resource limits. I can modify resource requests no problem, but resource limits reliably causes the operation to hang. I presume this has something to do with autopilot magic around resource limits causing the podspec modification you mentioned

lblackstone · 2023-11-10T18:27:45Z

Thanks for the updated info. I opened #2662 to track the bug with the new info you provided.

jsravn added the kind/bug Some behavior is incorrect or out of spec label Apr 26, 2022

jsravn closed this as completed Apr 26, 2022

jsravn reopened this May 6, 2022

stack72 added kind/bug Some behavior is incorrect or out of spec and removed kind/bug Some behavior is incorrect or out of spec labels May 9, 2022

stack72 added this to the 0.72 milestone May 9, 2022

lukehoban modified the milestones: 0.72, 0.73 Jun 3, 2022

stack72 removed this from the 0.73 milestone Jun 27, 2022

lblackstone self-assigned this Jul 14, 2023

lblackstone added the resolution/fixed This issue was fixed label Jul 14, 2023

lblackstone closed this as completed Jul 14, 2023

lblackstone mentioned this issue Nov 10, 2023

ReplicaSet await logic hanging if resource modified by another controller #2662

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Waiting for app ReplicaSet be marked available indefinitely #1978

Waiting for app ReplicaSet be marked available indefinitely #1978

jsravn commented Apr 26, 2022 •

edited

Loading

jsravn commented Apr 26, 2022

jsravn commented May 6, 2022

jsravn commented May 25, 2022 •

edited

Loading

Shrooblord commented Aug 26, 2022

thanhtoan1196 commented Oct 7, 2022

kralikba commented Jan 12, 2023

jsravn commented Jan 16, 2023

lblackstone commented Jul 14, 2023

RouxAntoine commented Nov 3, 2023

jtmarmon commented Nov 10, 2023

lblackstone commented Nov 10, 2023

Waiting for app ReplicaSet be marked available indefinitely #1978

Waiting for app ReplicaSet be marked available indefinitely #1978

Comments

jsravn commented Apr 26, 2022 • edited Loading

What happened?

Steps to reproduce

Expected Behavior

Actual Behavior

Versions used

Additional context

Contributing

jsravn commented Apr 26, 2022

jsravn commented May 6, 2022

jsravn commented May 25, 2022 • edited Loading

Shrooblord commented Aug 26, 2022

thanhtoan1196 commented Oct 7, 2022

kralikba commented Jan 12, 2023

jsravn commented Jan 16, 2023

lblackstone commented Jul 14, 2023

RouxAntoine commented Nov 3, 2023

jtmarmon commented Nov 10, 2023

lblackstone commented Nov 10, 2023

jsravn commented Apr 26, 2022 •

edited

Loading

jsravn commented May 25, 2022 •

edited

Loading