-
Notifications
You must be signed in to change notification settings - Fork 21
Service Fabric service upgrade not working #595
Comments
For the existing package before you go actually try to deploy it: does it pass Test-ServiceFabricApplicationPackage if you point to the existing package? |
How are your services configured - how many replicas (min/target)? In the upgrade status you pasted, the upgrade fails because the UD timeout is exhausted. Inside the UD, the upgrade is stuck at PreUpgradeSafetyCheck. This is a check we perform to ensure availability. We don't proceed with the upgrade until we are sure the application has enough replicas to function properly. There are 2 partitions that are mentioned above that are stuck. The upgrade specified UpgradeReplicaSetCheckTimeout to 20 minutes. This is the time-out period to check whether the replica set has quorum. After the time-out period, the upgrade proceeds. If you set the UD timeout to a value less that the replica set check timeout (for example, 10 minutes) and moving the replica out of the node could cause quorum loss, the upgrade will fail (which is the correct behavior, since the main purpose of the monitored upgrade is to maintain availability). As a side note, UpgradeReplicaSetCheckTimeout is deprecated, you should use UpgradeReplicaSetCheckTimeoutSec parameter instead. These articles tell you more about upgrade parameters and troubleshoot app upgrades. |
@masnider Nope. It gives me the following error but I don't think that it should expect to see the ServiceManifest.xml file if its a differential packaging update.
|
@oanapl It is a very very simple example... No partitioning, no stateful services.... Just 4 stateless services. Also, the InstanceCount for every service is -1. If I am missing something else, please do let me know :-) |
Pass the ImageStoreConnectionString to Test-ServiceFabricApplication to use the previous deployed package for validation. Are you using one node cluster? If you have one stateless instance and nowhere to move it, upgrade waits until UpgradeReplicaSetCheckTimeout passes to let the upgrade continue. |
@haseeb-ahmed-tkxel @oanapl Yes. For the most part this is probably physical layout so let's make sure that works first. The Test package command needs to succeed before we can expect the actual deployment to work. Passing the image store connection address tells SF to check the image store for a package to delta from if this one is differential. https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-application-upgrade-advanced#upgrade-with-a-diff-package may help a little. Upgrade is a second thing. Let's get there after we be sure that the differential package is correct. |
hello I'm also experiencing I have created default app with 2 default stateless services. I have created a package and deployed with
Instead of this I see specified above error. Could you help with this? |
@makar-sasha , when you say "updated cluster version" I assume app manifest version? These are the steps that should work:
Then version 2:
You provision version 1. Can you validate that this is what you did? |
yes I have tried this flow. I have found it works if I'm adding thanks for your comments! |
@dbreshears , can you take a look at the publish profile issue? |
I guess I am not seeing the issue with the publish profile. Deploy-FabricApplication.ps1 is just a wrapper around the scripts installed as a part of the SDK. When upgrade in Publish Profile is specified, then it calls Publish-UpgradedServiceFabricApplication script , otherwise Publish-NewServiceFabricApplication is called. The issue if I am understanding correctly seems that Test-ServiceFabricApplicationPackage expects a path to a full package when Publish-NewServiceFabricApplication script invokes it, but Publish-UpgradedServiceFabricApplication script does not. |
@makar-sasha , looks like our SDK script does not pass the ImageStoreConnectionString when calling Test-ServiceFabricApplicationPackage. I opened a tracking issue to improve our scripts. As a mitigation, can you change your local scripts to pass the parameter? Or you can call the powershell cmdlets directly if that's more convenient. |
@anamkhalid , have you changed your local scripts to pass the image store connection string as I mentioned in my previous reply? Without this, the deployment will fail. You can also run all the deployment steps through Powershell cmdlets. |
@oanapl Yes, I used Powershell cmdlets to pass image store connection string. |
Thank you for the update, I am glad you are unblocked. We will change our scripts to pass the ImageStoreConnectionString in our next major release (6.2). |
@oanapl Actually not :) I was just confirming that I did pass image store connection string but the results are still same. Update Domain Timeout issue that Haseeb mentioned in his top comment. Are you able to get rid of this issue by passing image store connection string at your end? |
My bad, you said in previous post that you used Powershell to pass image store connection string, and I assumed this worked. Did you mean you called Powershell directly and that worked? @amanbha is fixing this at our end. |
@oanapl Here is the current status: |
The upgrade fails because of safety checks. See my first reply above for more upgrade related resources. Basically, it can't safely move replicas out of the node to proceed with upgrade, moving them out can affect availability. How many nodes are in the cluster? If you have 1 node, there's nowhere to move the app and the upgrade will fail. Since 1 node is for testing purposes, you can pass small UpgradeReplicaSetCheckTimeoutSec to the upgrade command to tell the cluster it's ok to move the replicas immediately. If your cluster has > 1 node, how many services do you have in the app and how are they configured (number of partitions and number of replicas)? Can you paste the powershell command you used to start the upgrade? |
@oanapl It seems like adding UpgradeReplicaSetCheckTimeoutSec in the upgrade command resolves the issue. I copied below command from Upgrade Using Powershell Start-ServiceFabricApplicationUpgrade -ApplicationName fabric:/VisualObjects -ApplicationTypeVersion 2.0.0.0 -HealthCheckStableDurationSec 60 -UpgradeDomainTimeoutSec 1200 -UpgradeTimeout 3000 -FailureAction Rollback -Monitored Upgrade fails without explicitly specifying this parameter. |
Ok, this is expected behavior for a one node cluster. Closing the issue based on this. |
So, let me start at the beginning... We recently upgraded our SF project from VS2015 to VS2017 and we noticed that SF application gets deployed only 1/3 times and this is the behavior on multiple machines and not just one. Not being sure if this was a VS thing or a SF thing, I thought this was time for me to focus on differential packaging so that maybe I can do smaller updates instead of full deployments which would result in a faster deployments and hence my VS2017 wouldn't time out.
Now switching my focus to differential packaging, I made a very very simple SF Application with only 4 ServiceTypes. Than, I did the following:
Now, I've tried multiple variations of it, I've also tried reset my local cluster and what not but have had no luck. Here is the result of my "Get-ServiceFabricApplicationUpgrade" command.
I'm not sure what I might be doing wrong here so please help me out, Thanks!
The text was updated successfully, but these errors were encountered: