-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS Issue (fabric v6.1) and unrelated cluster start up issue. #110
Comments
Could you share the DNS settings in cluster manifest please, making sure to remove any sensitive information (if any).
What version of service fabric is your cluster? Thanks! |
Fabric Version: 6.1.480.9494
|
I get this error:
|
Are you able to upgrade the cluster to a newer version? This is a known issue which has been fixed in newer releases (6.2 and up). If you are not able to update, you can use the mitigation with ICS mentioned in the thread. This issue will also not appear in cloud clusters, it shows up only in local clusters. Please let me know if you run into issues applying those mitigations. Thanks! |
I've updated - now I am unable to start my cluster... where can I got to find the logs for the reason why, the |
My manifest now:
|
Unfortunately, it's hard to tell just from that. Can you repro the issue and upload your trace (.etl) file from c:\sfcluster? Before uploading, run For the local clusters, I've found that re-installing msi and then creating a new cluster or rebooting the machine sometimes helps, if you have no data you want to keep. Thanks! |
Where do I upload it to? Plus i’ve reinstalled, rebooted, tried all sorts to get it working - to no avail |
You can upload it to here or provide a link I can download it from. |
Thanks for the logs. The nodes are unable to come up with error access denied because there is some issue with the certificate. I will update this post with more details / a fix in a bit. |
Cheers much appreciated |
Thanks for your patience :) How did you deploy this cluster (including the upgrade)? The specific error is this: "CertCreateSelfSignCertificate failed: E_ACCESSDENIED" From an elevated PowerShell instance, can you run the following command?
It should list at least these 2 entries: If either are missing, that would explain the failures reported below. The mitigation is to run the following (from an elevated PowerShell instance):
If that doesn't work, please let me know and we can try something else. |
I just used web platform installer to install it |
still not worked out |
Can you provide the output from the command Thanks! |
|
Thanks for the output! Looks like the network service doesn’t have access, there should be an entry like this:
Can you try running the following?
This modifies the first line of the commands from earlier. |
I got this output now:
|
no luck with creating a 5 node cluster |
In good news, it looks like the cert issue is gone! This looks like a firewall issue potentially. I will get in touch with the correct people for this error and get back to you. Thanks! |
Some of these traces look pretty old. Can you try cleaning the cluster and re-deploying? You can also try uninstalling and reinstalling (maybe rebooting if possible, but you should not need to). If it still doesn't work after that, can you re-upload the traces again? The traces should be up to date then. Thanks! |
So I can't extract those logs any more. I had to force delete the sfcluster folder because something had removed access from me. I had to reboot in safe mode and delete the directory. Now when I try and zip up the folder using shell zip or zip using 7z it just says access denied on everything. So I'm not sure what is causing that, but it ain't helping |
Hi @no1melman, apologies for the late reply. Are you still having issues with this? |
Yeah, it just isn't working, I've updated service fabric, reinstalled, rebooted, set the permissions as you said. What I found out is that the folder mentioned above just locks up and I can't remove it |
Are you trying to delete/move the log files while the cluster is running? Can you check if FabricHostSvc is running when you get permission denied? If so, then can you stop FabricHostSvc and then Fabric.exe (if any)? Afterwards, can you try again to see if you can get the log files? |
I've managed to perform the logman command again, and zip up the new log files without issue |
It's the same certificate issue. Can you run Thanks! |
@Christina-Kang can we set up some time to get this cluster running - it still all looks good my end... |
Sounds good. Can you send me an email at bikang@microsoft.com? Thanks! |
Thank you @no1melman for your time working with us on this! The below work around applies to Windows. The start up issue was with network service losing permission after it being set, due to a reason unknown. While running the PowerShell commands did not work in this instance, going to the directory Permission was changed by right clicking on folder A root cause fix will be implemented on the Service Fabric run time. No changes are required of sfctl for this issue. |
Using the localhost cluster. I've done a health check output:
I've got issues with the DNS, how do I go about diagnosing it further and fixing it?
The text was updated successfully, but these errors were encountered: