Service Fabric service upgrade not working #595

ghost · 2017-11-02T12:49:35Z

So, let me start at the beginning... We recently upgraded our SF project from VS2015 to VS2017 and we noticed that SF application gets deployed only 1/3 times and this is the behavior on multiple machines and not just one. Not being sure if this was a VS thing or a SF thing, I thought this was time for me to focus on differential packaging so that maybe I can do smaller updates instead of full deployments which would result in a faster deployments and hence my VS2017 wouldn't time out.

Now switching my focus to differential packaging, I made a very very simple SF Application with only 4 ServiceTypes. Than, I did the following:

Deployed v2.0.0 of my newly created application to local cluster through PowerShell.
Updated Service3 and the Manifest file to v2.0.1.
Repackaged the application, manually removed all service folders except Service3 and the AppManifest xml file.
Performed an update through PowerShell and it failed.

Now, I've tried multiple variations of it, I've also tried reset my local cluster and what not but have had no luck. Here is the result of my "Get-ServiceFabricApplicationUpgrade" command.

ApplicationName                : fabric:/DifferentialPackaging
ApplicationTypeName            : DifferentialPackagingType
TargetApplicationTypeVersion   : 2.0.1
ApplicationParameters          : {}
StartTimestampUtc              : 11/2/2017 12:36:06 PM
FailureTimestampUtc            : 11/2/2017 12:41:06 PM
FailureReason                  : UpgradeDomainTimeout
UpgradeDomainProgressAtFailure : 0

                                 NodeName            : _Node_0
                                 UpgradePhase        : PreUpgradeSafetyCheck
                                 PendingSafetyChecks :
                                        EnsureAvailability - PartitionId: 7c7bd322-0588-4ae8-a545-1050459990c6
UpgradeState                   : RollingBackInProgress
UpgradeDuration                : 00:12:01
CurrentUpgradeDomainDuration   : 00:07:01
CurrentUpgradeDomainProgress   : 0

                                 NodeName            : _Node_0
                                 UpgradePhase        : PreUpgradeSafetyCheck
                                 PendingSafetyChecks :
                                        EnsureAvailability - PartitionId: 1fd9be7b-747c-4c1e-a337-a0781e6a74f3
NextUpgradeDomain              :
UpgradeDomainsStatus           : { "0" = "InProgress" }
UpgradeKind                    : Rolling
RollingUpgradeMode             : UnmonitoredAuto
ForceRestart                   : False
UpgradeReplicaSetCheckTimeout  : 00:20:00

I'm not sure what I might be doing wrong here so please help me out, Thanks!

masnider · 2017-11-02T23:24:28Z

For the existing package before you go actually try to deploy it: does it pass Test-ServiceFabricApplicationPackage if you point to the existing package?

oanapl · 2017-11-03T04:28:07Z

How are your services configured - how many replicas (min/target)?

In the upgrade status you pasted, the upgrade fails because the UD timeout is exhausted. Inside the UD, the upgrade is stuck at PreUpgradeSafetyCheck. This is a check we perform to ensure availability. We don't proceed with the upgrade until we are sure the application has enough replicas to function properly. There are 2 partitions that are mentioned above that are stuck.

The upgrade specified UpgradeReplicaSetCheckTimeout to 20 minutes. This is the time-out period to check whether the replica set has quorum. After the time-out period, the upgrade proceeds. If you set the UD timeout to a value less that the replica set check timeout (for example, 10 minutes) and moving the replica out of the node could cause quorum loss, the upgrade will fail (which is the correct behavior, since the main purpose of the monitored upgrade is to maintain availability).

As a side note, UpgradeReplicaSetCheckTimeout is deprecated, you should use UpgradeReplicaSetCheckTimeoutSec parameter instead.

These articles tell you more about upgrade parameters and troubleshoot app upgrades.

ghost · 2017-11-03T06:20:40Z

@masnider Nope. It gives me the following error but I don't think that it should expect to see the ServiceManifest.xml file if its a differential packaging update.

λ  Test-ServiceFabricApplicationPackage -ApplicationPackagePath "C:\Users\Haseeb\Documents\Visual Studio 2017\Projects\DifferentialPackaging\DifferentialPackaging\pkg\Debug"
False
Test-ServiceFabricApplicationPackage : The BuildLayout of the application in C:\Users\Haseeb\AppData\Local\Temp\TestApplicationPackage_2936631255996\cc4awbux.rch\Debug is invalid. ServiceManifest.xml is
missing for service Service1Pkg.
At line:1 char:1
+ Test-ServiceFabricApplicationPackage -ApplicationPackagePath "C:\User ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [Test-ServiceFabricApplicationPackage], FabricImageBuilderValidationException
    + FullyQualifiedErrorId : TestApplicationPackageErrorId,Microsoft.ServiceFabric.Powershell.TestApplicationPackage

ghost · 2017-11-03T06:22:11Z

@oanapl It is a very very simple example... No partitioning, no stateful services.... Just 4 stateless services. Also, the InstanceCount for every service is -1. If I am missing something else, please do let me know :-)

oanapl · 2017-11-03T17:35:58Z

Pass the ImageStoreConnectionString to Test-ServiceFabricApplication to use the previous deployed package for validation.

Are you using one node cluster? If you have one stateless instance and nowhere to move it, upgrade waits until UpgradeReplicaSetCheckTimeout passes to let the upgrade continue.

masnider · 2017-11-03T19:50:03Z

@haseeb-ahmed-tkxel @oanapl Yes. For the most part this is probably physical layout so let's make sure that works first. The Test package command needs to succeed before we can expect the actual deployment to work. Passing the image store connection address tells SF to check the image store for a package to delta from if this one is differential.

https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-application-upgrade-advanced#upgrade-with-a-diff-package may help a little.

Upgrade is a second thing. Let's get there after we be sure that the differential package is correct.

makar-sasha · 2017-11-09T14:58:30Z

hello

I'm also experiencing
Test-ServiceFabricApplicationPackage : The BuildLayout of the application in C:\Users\makar\AppData\Local\Temp\TestApplicationPackage_240509860615\53ruzecf.u5e\Release is invalid. ServiceManifest.xml is missing for service Stateless1Pkg.

I have created default app with 2 default stateless services. I have created a package and deployed with c:\code\temp\difftest\Scripts\Deploy-FabricApplication.ps1 -ApplicationPackagePath c:\code\temp\difftest\pkg\Release\ -PublishProfileFile c:\code\temp\difftest\PublishProfiles\Cloud.xml -UseExistingClusterConnection:$true -DeployOnly:$false -UnregisterUnusedApplicationVersionsAfterUpgrade $false -OverrideUpgradeBehavior 'None' -OverwriteBehavior 'SameAppTypeAndVersion' -SkipPackageValidation:$false -ErrorAction Stop
After this

I have updated cluster version and one particular service version
Created a package and manually removed unchanged service from it.
Executed the deployment command again hoping it will deploy the diff package.

Instead of this I see specified above error. Could you help with this?

oanapl · 2017-11-09T18:26:59Z

@makar-sasha , when you say "updated cluster version" I assume app manifest version?

These are the steps that should work:
For version 1, the package contains:

App manifest version 1, references <SP1, version1> and <SP2, version1>
- SP1 version 1
- SP2 version 1

Then version 2:

App manifest version 2, references <SP1, version1> (non-included) and <SP2, version2>
- SP2 version 2

You provision version 1.
Test-ServiceFabricApplicationPackage with correct -ImageStoreConnectionString finds SP1 in the cluster and validates modified package.

Can you validate that this is what you did?

makar-sasha · 2017-11-09T20:34:48Z

@oanapl

yes I have tried this flow. I have found it works if I'm adding
<UpgradeDeployment Mode="Monitored" Enabled="true"> <Parameters FailureAction="Rollback" Force="True" /> </UpgradeDeployment>
to publish profile. in other case it results in error ServiceManifest.xml is missing for service Stateless1Pkg

thanks for your comments!

oanapl · 2017-11-10T04:23:38Z

@dbreshears , can you take a look at the publish profile issue?

dbreshears · 2017-11-10T19:48:55Z

I guess I am not seeing the issue with the publish profile. Deploy-FabricApplication.ps1 is just a wrapper around the scripts installed as a part of the SDK. When upgrade in Publish Profile is specified, then it calls Publish-UpgradedServiceFabricApplication script , otherwise Publish-NewServiceFabricApplication is called.

The issue if I am understanding correctly seems that Test-ServiceFabricApplicationPackage expects a path to a full package when Publish-NewServiceFabricApplication script invokes it, but Publish-UpgradedServiceFabricApplication script does not.

oanapl · 2017-11-10T23:49:29Z

@makar-sasha , looks like our SDK script does not pass the ImageStoreConnectionString when calling Test-ServiceFabricApplicationPackage. I opened a tracking issue to improve our scripts.

As a mitigation, can you change your local scripts to pass the parameter?

Or you can call the powershell cmdlets directly if that's more convenient.

anamkhalid · 2017-11-20T10:00:46Z

@oanapl @masnider
Test-ServiceFabricApplicationPackage -ApplicationPackagePath '...' -ImageStoreConnectionString '...' returns True but still the deployment/upgrade fails.

oanapl · 2017-11-20T17:34:21Z

@anamkhalid , have you changed your local scripts to pass the image store connection string as I mentioned in my previous reply? Without this, the deployment will fail.

You can also run all the deployment steps through Powershell cmdlets.

anamkhalid · 2017-11-27T05:10:15Z

@oanapl Yes, I used Powershell cmdlets to pass image store connection string.

oanapl · 2017-11-27T18:07:55Z

Thank you for the update, I am glad you are unblocked. We will change our scripts to pass the ImageStoreConnectionString in our next major release (6.2).

anamkhalid · 2017-11-28T04:36:11Z

@oanapl Actually not :) I was just confirming that I did pass image store connection string but the results are still same. Update Domain Timeout issue that Haseeb mentioned in his top comment.

Are you able to get rid of this issue by passing image store connection string at your end?

oanapl · 2017-11-28T07:53:30Z

My bad, you said in previous post that you used Powershell to pass image store connection string, and I assumed this worked. Did you mean you called Powershell directly and that worked?
To double check, you changed the scripts and they still don't work? Which step fails this time and what error do you see?

@amanbha is fixing this at our end.

anamkhalid · 2017-11-29T15:57:06Z

@oanapl Here is the current status:
1- I'm using Powershell cmdlets to Connect, Copy, Register, Test and Upgrade app.
2- I'm passing ImageStoreConnectionString with Test command but it always fails with Upgrade Domain Timeout issue as shown in the image below:

oanapl · 2017-11-29T17:26:27Z

The upgrade fails because of safety checks. See my first reply above for more upgrade related resources. Basically, it can't safely move replicas out of the node to proceed with upgrade, moving them out can affect availability.

How many nodes are in the cluster? If you have 1 node, there's nowhere to move the app and the upgrade will fail. Since 1 node is for testing purposes, you can pass small UpgradeReplicaSetCheckTimeoutSec to the upgrade command to tell the cluster it's ok to move the replicas immediately.

If your cluster has > 1 node, how many services do you have in the app and how are they configured (number of partitions and number of replicas)?

Can you paste the powershell command you used to start the upgrade?

anamkhalid · 2017-12-04T07:38:44Z

@oanapl It seems like adding UpgradeReplicaSetCheckTimeoutSec in the upgrade command resolves the issue. I copied below command from Upgrade Using Powershell

Start-ServiceFabricApplicationUpgrade -ApplicationName fabric:/VisualObjects -ApplicationTypeVersion 2.0.0.0 -HealthCheckStableDurationSec 60 -UpgradeDomainTimeoutSec 1200 -UpgradeTimeout 3000 -FailureAction Rollback -Monitored

Upgrade fails without explicitly specifying this parameter.

oanapl · 2017-12-04T17:16:48Z

Ok, this is expected behavior for a one node cluster. Closing the issue based on this.

masnider assigned masnider and vaishnavk Nov 2, 2017

masnider added the question label Nov 2, 2017

vaishnavk assigned oanapl and unassigned vaishnavk Nov 3, 2017

oanapl assigned dbreshears Nov 10, 2017

oanapl assigned amanbha Nov 10, 2017

oanapl added bug and removed question labels Nov 10, 2017

oanapl unassigned dbreshears and masnider Nov 10, 2017

oanapl added this to the Backlog milestone Nov 27, 2017

oanapl closed this as completed Dec 4, 2017

amanbha modified the milestones: Backlog, Runtime 6.2 Mar 15, 2018

muradkhateeb78 mentioned this issue Apr 28, 2020

Release is invalid. Code is missing for service error in SF Application Upgrade using diffPackage. microsoft/service-fabric#608

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Service Fabric service upgrade not working #595

Service Fabric service upgrade not working #595

ghost commented Nov 2, 2017

masnider commented Nov 2, 2017

oanapl commented Nov 3, 2017

ghost commented Nov 3, 2017

ghost commented Nov 3, 2017

oanapl commented Nov 3, 2017

masnider commented Nov 3, 2017

makar-sasha commented Nov 9, 2017

oanapl commented Nov 9, 2017

makar-sasha commented Nov 9, 2017

oanapl commented Nov 10, 2017

dbreshears commented Nov 10, 2017

oanapl commented Nov 10, 2017

anamkhalid commented Nov 20, 2017

oanapl commented Nov 20, 2017

anamkhalid commented Nov 27, 2017

oanapl commented Nov 27, 2017

anamkhalid commented Nov 28, 2017

oanapl commented Nov 28, 2017

anamkhalid commented Nov 29, 2017

oanapl commented Nov 29, 2017 •

edited

Loading

anamkhalid commented Dec 4, 2017

oanapl commented Dec 4, 2017

Service Fabric service upgrade not working #595

Service Fabric service upgrade not working #595

Comments

ghost commented Nov 2, 2017

masnider commented Nov 2, 2017

oanapl commented Nov 3, 2017

ghost commented Nov 3, 2017

ghost commented Nov 3, 2017

oanapl commented Nov 3, 2017

masnider commented Nov 3, 2017

makar-sasha commented Nov 9, 2017

oanapl commented Nov 9, 2017

makar-sasha commented Nov 9, 2017

oanapl commented Nov 10, 2017

dbreshears commented Nov 10, 2017

oanapl commented Nov 10, 2017

anamkhalid commented Nov 20, 2017

oanapl commented Nov 20, 2017

anamkhalid commented Nov 27, 2017

oanapl commented Nov 27, 2017

anamkhalid commented Nov 28, 2017

oanapl commented Nov 28, 2017

anamkhalid commented Nov 29, 2017

oanapl commented Nov 29, 2017 • edited Loading

anamkhalid commented Dec 4, 2017

oanapl commented Dec 4, 2017

oanapl commented Nov 29, 2017 •

edited

Loading