-
Notifications
You must be signed in to change notification settings - Fork 21
Copy-ServiceFabricApplicationPackage hangs forever #813
Comments
@lanfeust69 , based on the traces you pasted, I can't tell which message is dropped because of size, it may or may not be related to upload. Could you share the traces and the package (or the file count and size information) with me (oanapl at microsoft.com)? |
This is due to a configuration change between Standalone and SFRP clusters. When you deploy a cluster in Azure, MaxMessageSize is set to 4 MB. When you download and install a cluster locally, MaxMessageSize is set to 10 MB (this was a change in 6.1). Now the client sends bigger messages (based on its configured MaxMessageSize) and the server drops them (per its configuration). We are investigating a configuration change to Azure clusters to change the MaxMessageSize to 10 MB to resolve this issue, but that will take some time. To unblock, you can remove the local cluster, which will remove the new configuration. Then upload from that machine will work. Note that every time you load the local cluster (either manually or through Visual Studio), the new configuration (10MB) will take effect, so upload of files bigger than 4 MB will fail again. |
I am trying to publish to a stand-alone cluster and experienced the same issues reported here. Any step-by-step instructions on a workaround for this issue? Need to publish to SF badly. |
@JunRamoneda , in your configuration what is the version for the cluster and what is the version for the client? |
Cluster and VS2017 both have 6.1.456.9494 |
Removing local cluster before publishing seems to work for 6.1.456.9494 |
I can confirm that I can publish to a secure Azure cluster after removing the local cluster. Both clusters are 6.1.456.9494 |
Thank you for confirming! We are working on the fix, and I will update the thread when the updates are pushed. |
Hm... It's a magic, but after removing local cluster Copy-ServiceFabricApplicationPackage working well. |
Oh wow I spent an entire day + all last night reinstalling VS17 and then formatting my PC after was not working, thinking of some VS Issue due to the tools update! Glad I found this thread. Closing the local cluster (clicking exit) didn't work either, still stuck in Which steps I should do in order to "remove" the local cluster and being able to use the workaround? |
@francescocristallo, to remove the local cluster, go to your system tray, right click the service fabric icon, click "Remove Local Cluster" I confirmed that that workaround worked for me too. Thanks @oanapl! I'll look out for the update. |
@oanapl , seems that removing local cluster didn't help me. |
@alex-1244 , can you give me a few details about your scenario and the steps you tried for mitigation? What is your cluster version and client version? What are the cmdlet parameters you run and the output for copy-servicefabricapplicationPackage? Can you try uninstalling SF from your machine, installing and running the copy cmdlet to ensure there's no local cluster? |
I confirm the "Remove cluster" trick works 👍 |
Cluster version(Azure): 6.1.456.9494 to remove local cluster i used @JohnnyFun suggestion |
If it's any help, @oanapl, I noticed the cluster I'm deploying to in azure is still showing it's at the old version (6.0.232.9494), and I recently updated my local sdk and project nuget packages to the latest (6.1.456.9494). When I try to update my azure cluster to be the latest, from the azure portal, I get an error. But I'm able to deploy to it just fine though, and the app seems to run fine. |
@JohnnyFun , I recommend you follow up separately on the cluster upgrade failing. @vaishnavk can help with that. @alex-1244 , I am sorry for the frustration you are experiencing with this issue. Have you tried uninstall, reinstall and then run upload (without setting up local cluster)? And instead of the deploy script, can you use the Powershell cmdlet directly? |
Same issue here, our CI is not working and all processes are stopped. CI uploads both to local cluster (case when we want to run tests) As we are using gulp scripts, that are calling powershell cmdlets directly, I can say, that it does not work. We consider it as a critical issue. |
very serious one. Our CI is blocked :(((( |
We've fixed it by adding special tasks (we are using Bamboo) just before deploying to staging and just after the deploy. To stop local cluster before the process:
To start local cluster after:
|
I also have this problem after upgrade to 6.1 |
Another way to mitigate the issue is to upgrade the cluster to set the MaxMessageSize to be in sync with the version on the client. This way, there are no changes needed on the client machine.
|
Thanks @oanapl for the suggestion.
|
@alecor191 , the MaxMessageSize is used to determine how big are the messages the cluster accepts. It's used for queries to determine how much information we put into one message (before we page or simply drop the message because it's too big) and for copying packages to the cluster. It will not have undesirable side effects per our testing. We will upgrade the Azure clusters to use this value for the setting. This config upgrade will take some time to go through all regions. To unblock yourself, I proposed to manually make the upgrade instead of waiting for the general change. |
@ezuidema , yours is probably different, please open another issue so we can address that appropriately. Update on the status of the original issue: Last week we started to deploy the config change that updates the clusters' MaxMessageSize to 10MB. This was meant as a mitigation for this issue, and not a permanent solution. We decided to cancel this deployment, as we have a new build available that reverts the breaking change done on the clients to set MaxMessageSize to 10MB (plus other features). We plan to make this build available as soon as we validate it properly - as usually, check our team's blog for info on the release. |
Removing local cluster was the workaround that I used before deploying. Lately, I was able to deploy to a stand-alone cluster without first removing the local cluster. I am not sure what changed but deploy/publish seems to be working properly |
It was working for a while after removing the cluster and then restarting VS but now it's stopped again. Any update on when this will be resolved? |
Had the same problem for the last couple of days. I have tried all the resolutions in this thread and non of them allow me to publish successfully to Azure. Then all of a sudden it standard randomly working (local cluster was removed at this point). I then turned on local cluster, publish to Azure failed. Turned off local cluster, restarted VS2017, publish to Azure continues to fail. Can't nail down a reliable way to reproduce and workaround this issue but it does feel like it is something related to restarting VS2017 or exiting the local cluster. or this could just be a red herring. P.s. @oanapl trying to do a PUT/PATCH against the SF provider to add the MaxMessageSize was unsuccessful. The setting just disappeared after clicking PUT / PATCH. After doing this, the SF resource in the portal was showing a red failure banner. Had to go back into resources.azure.com and introduce a white-space change to the SF provider configuration to reset the error. The underlying cluster was actually healthy during this time. |
@dotdestroyer, when you say you "turned off local cluster", did you "Remove Local Cluster" or "Stop Local Cluster"? I would guess either would work, but when it worked for me last, I specifically selected "Remove Local Cluster" on the machine I was deploying from. p.s., I like the phrase "standard randomly working" lol, a cynical way of looking at the state of software these days. |
@JohnnyFun I've done both and neither helped conclusively! haha my typo hits a little too close to home right now :P |
The release that reverts the MaxMessageSize change (runtime 6.1.467, SDK 3.0.467) is going to be live soon. Once you update, there will be no changes needed to the Azure clusters and you can upload your packages with local cluster installed. |
@oanapl just to confirm: When upgrading to the patched version of SDK you mentioned, will we have any issues if we did increase the MaxMessageSize in our Azure clusters? Or will that SDK version work with any MaxMessageSize? |
@alecor191 , we recommend you change your cluster back to initial value (eg. remove the change you made). Thank you for checking! Application upload will succeed with the mismatched values, but it's better to have them in sync. |
6.1 refresh is available. Please update your local cluster to resolve the copy issues. |
Excellent. Its working for me after i removed the local cluster from system tray. and exited the service fabric cluster running in local. |
I was afflicted by this issue as well. Removing the local cluster makes deployment work again. |
When is the fix expected please? |
The fix is already available for about 2 months now |
I am using runtime 6.2.283.9494 and SDK version 3.1.283.9494, and I am still facing the publish process getting stuck at "Copying application to Image Store". I have tried removing the local cluster, gotten rid of it from system tray altogether, but nothing works. This issue started all of a sudden. |
@suvamM: I'd suggest to make sure this is not a local issue. |
@abatishchev Alright, updating the Nuget packages seems to have made the fix. Thanks a lot for the help. |
@suvamM i have this problem, but i'm not understanding what you and @abatishchev are referring to when you say nuget packages. What is the problem with which packages? I'm unable to deploy, and just stuck in Copying application to image store... |
Target Service Fabric SDK doesn't match the actual version on the cluster
The NuGet packages of Service Fabric you reference from your services withing your application. |
@alexgman I take back my comment: I had updated my runtime and SDK, and the Nuget packages for my project, and the publish succeeded. But it's the same problem for every other project I have, including the template SF app which is generated when I create a new SF project. I checked that there is no version mismatch between my local SDK and the version on the cluster. This is really strange, and is taking a lot of time to diagnose. It's just stuck at Copying application to image store... |
unable to publish my code using Microsoft Azure ServiceFabric SDK-3.1.274 & runtime -6.2.274 on Local Cluster. Can someone suggest the required changes. |
@H286424 , there can be many reasons for which this can happen. The initial issue that was discussed on this thread has long been fixed. Please open another issue and add details so we can track your issue properly. |
I had this problem with SDK 3.1.301 and Azure Cluster Service Fabric 6.2.301.9494. Changing So I compared the .sfproj file in my failing application with one from a brand new SF application I created, which could be deployed. Changing a few settings in the application .sfproj file fixed the issue. Change " I also changed all "1.6.4" to "1.6.6" to use the current MSBuild verison and also changed the version in packages.config to 1.6.6. I did all of the above steps, but I think it would work by just changing the ProjectVersion. |
The same issue persists for me and the solution here of deleting the local cluster does not solve my problem. I have been wrestling a lot with the Service Fabric setup and seem to be hitting a major roadblock. I am trying to deploy a basic REST Spring Boot application as a guest executable on Service Fabric. I have been able to set it up on a cluster on my local machine but the deployment on the Azure cluster seems to be stuck for more than 16 hours. I have had multiple attempts at it but the problem seems to stick around. I have created a single node Standard_D2s_v3 (1 instance) cluster on Azure and using Visual Studio 2017 v15.9.17 to deploy the Spring Boot REST application as a guest executable. Local Cluster Azure (Cloud): |
This is possibly related to #732, which has been wrongly closed IMO (following a comment almost unrelated to the original issue).
Following the upgrade of a windows on-prem cluster from 5.7.198 to 6.1.456, I cannot deploy anymore with my powershell scripts, because
Copy-ServiceFabricApplicationPackage
hangs forever.I see warning events 'Transport' being dispatched when this happens, with a message along the lines of :
Followed by a few
and then a bunch of
And sure enough, if I manually remove the few files bigger than 4.4MB from the package, then everything is copied normally (but of course the missing files prevent proper operation).
It might be an inconsistent setting when splitting big files, since I could match and error with a frame length of around 5M with a file of almost exactly the same size, but there was no 7.8MB file, only a couple of much bigger ones (32MB zips).
The text was updated successfully, but these errors were encountered: