-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling update not possible with a Persistent Volume on Azure. #52236
Comments
For what it's worth, I'm not sure what kind of clean fix can be accomplished here. An outage is effectively required in order to detach the Page Blob from one VM, and allow it to be re-attached to a new VM in Azure. Perhaps the ideal fix would be for Azure to fix their crusty old Storage system 😈 |
Honing in on this a bit more, I believe this isn't limited to only when a new agent VM is added to the cluster. I believe that any |
/sig azure |
Closing as resolved. |
/kind bug
What happened:
With an existing cluster, with existing workloads, I used the Azure command line tool (
az
) to add a new agent machine to the cluster. And some time later performed arolling-update
on a pod, which failed to ever complete.What you expected to happen:
I expected the
rolling-update
to complete (obviously) and appropriately/successfully move the containers from the first agent to the newly created one.How to reproduce it (as minimally and precisely as possible):
az acs scale -n my-cluster-name -g my-resource-group --new-agent-count 2
)kubectl rolling-update my-persistent-pod
)Anything else we need to know?:
The gist of what I understand to be happening here is that the Page Blobs in the Azure Storage Account are in a Leased state by the machine
agent-0
, becausepod-alpha
is utilizing a Persistent Volume.When the rolling-update occurs, Kubernetes appropriately tries to provision the
pod-beta
onagent-1
, which at this point is not running anything. Unfortunately, because the machineagent-0
already has Leased the Page Blob in Azure Storage, it cannot properly attach the Page Blob to the VM.Work Around
The work-around I applied, which seem to work, as to fully delete the Replication Controller (not the Persistent Volume) which released the Lease on the Page Blob. Then I recreated the Replication Controller, which caused the Page Blob to be leased by
agent-1
.That seemed to work, but incurred a few unacceptable minutes of downtime 🙁
Environment:
This environment was deployed on Azure via the Azure Container Service.
FYI @kris-nova
The text was updated successfully, but these errors were encountered: