Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS Issue (fabric v6.1) and unrelated cluster start up issue. #110

Closed
no1melman opened this issue Jul 19, 2018 · 33 comments
Closed

DNS Issue (fabric v6.1) and unrelated cluster start up issue. #110

no1melman opened this issue Jul 19, 2018 · 33 comments
Labels

Comments

@no1melman
Copy link

Using the localhost cluster. I've done a health check output:

{
  "additionalProperties": {},
  "aggregatedHealthState": "Error",
  "applicationHealthStates": [
    {
      "additionalProperties": {},
      "aggregatedHealthState": "Error",
      "name": "fabric:/System"
    }
  ],
  "healthEvents": [],
  "healthStatistics": {
    "additionalProperties": {},
    "healthStateCountList": [
      {
        "additionalProperties": {},
        "entityKind": "Node",
        "healthStateCount": {
          "additionalProperties": {},
          "errorCount": 0,
          "okCount": 5,
          "warningCount": 0
        }
      },
      {
        "additionalProperties": {},
        "entityKind": "Application",
        "healthStateCount": {
          "additionalProperties": {},
          "errorCount": 0,
          "okCount": 0,
          "warningCount": 0
        }
      },
      {
        "additionalProperties": {},
        "entityKind": "Service",
        "healthStateCount": {
          "additionalProperties": {},
          "errorCount": 0,
          "okCount": 0,
          "warningCount": 0
        }
      },
      {
        "additionalProperties": {},
        "entityKind": "Partition",
        "healthStateCount": {
          "additionalProperties": {},
          "errorCount": 0,
          "okCount": 0,
          "warningCount": 0
        }
      },
      {
        "additionalProperties": {},
        "entityKind": "Replica",
        "healthStateCount": {
          "additionalProperties": {},
          "errorCount": 0,
          "okCount": 0,
          "warningCount": 0
        }
      },
      {
        "additionalProperties": {},
        "entityKind": "DeployedApplication",
        "healthStateCount": {
          "additionalProperties": {},
          "errorCount": 0,
          "okCount": 0,
          "warningCount": 0
        }
      },
      {
        "additionalProperties": {},
        "entityKind": "DeployedServicePackage",
        "healthStateCount": {
          "additionalProperties": {},
          "errorCount": 0,
          "okCount": 0,
          "warningCount": 0
        }
      }
    ]
  },
  "nodeHealthStates": [
    {
      "additionalProperties": {},
      "aggregatedHealthState": "Ok",
      "id": {
        "additionalProperties": {},
        "id": "4f4e3698a196896b5efe8156cc4e1351"
      },
      "name": "_Node_4"
    },
    {
      "additionalProperties": {},
      "aggregatedHealthState": "Ok",
      "id": {
        "additionalProperties": {},
        "id": "6b5c3db003a0bd126f7b8a86fc3916a4"
      },
      "name": "_Node_3"
    },
    {
      "additionalProperties": {},
      "aggregatedHealthState": "Ok",
      "id": {
        "additionalProperties": {},
        "id": "876a44d9185bf9416336b22e5d37cde8"
      },
      "name": "_Node_2"
    },
    {
      "additionalProperties": {},
      "aggregatedHealthState": "Ok",
      "id": {
        "additionalProperties": {},
        "id": "a3784be1d81710242ed0a9632647b4f7"
      },
      "name": "_Node_1"
    },
    {
      "additionalProperties": {},
      "aggregatedHealthState": "Ok",
      "id": {
        "additionalProperties": {},
        "id": "bf865279ba277deb864a976fbf4c200e"
      },
      "name": "_Node_0"
    }
  ],
  "unhealthyEvaluations": [
    {
      "additionalProperties": {},
      "healthEvaluation": {
        "additionalProperties": {},
        "aggregatedHealthState": "Error",
        "description": "System application is unhealthy.",
        "kind": "SystemApplication",
        "unhealthyEvaluations": [
          {
            "additionalProperties": {},
            "healthEvaluation": {
              "additionalProperties": {},
              "aggregatedHealthState": "Error",
              "description": "Unhealthy services: 100% (1/1), ServiceType='DnsServiceType', MaxPercentUnhealthyServices=0%.",
              "kind": "Services",
              "maxPercentUnhealthyServices": 0,
              "serviceTypeName": "DnsServiceType",
              "totalCount": 1,
              "unhealthyEvaluations": [
                {
                  "additionalProperties": {},
                  "healthEvaluation": {
                    "additionalProperties": {},
                    "aggregatedHealthState": "Error",
                    "description": "Unhealthy service: ServiceName='fabric:/System/DnsService', AggregatedHealthState='Error'.",
                    "kind": "Service",
                    "serviceName": "fabric:/System/DnsService",
                    "unhealthyEvaluations": [
                      {
                        "additionalProperties": {},
                        "healthEvaluation": {
                          "additionalProperties": {},
                          "aggregatedHealthState": "Error",
                          "description": "Unhealthy partitions: 100% (1/1), MaxPercentUnhealthyPartitionsPerService=0%.",
                          "kind": "Partitions",
                          "maxPercentUnhealthyPartitionsPerService": 0,
                          "totalCount": 1,
                          "unhealthyEvaluations": [
                            {
                              "additionalProperties": {},
                              "healthEvaluation": {
                                "additionalProperties": {},
                                "aggregatedHealthState": "Error",
                                "description": "Unhealthy partition: PartitionId='e71ec636-6eab-4472-829d-0dc596bd7188', AggregatedHealthState='Error'.",
                                "kind": "Partition",
                                "partitionId": "e71ec636-6eab-4472-829d-0dc596bd7188",
                                "unhealthyEvaluations": [
                                  {
                                    "additionalProperties": {},
                                    "healthEvaluation": {
                                      "additionalProperties": {},
                                      "aggregatedHealthState": "Error",
                                      "considerWarningAsError": false,
                                      "description": "Error event: SourceId='System.FM', Property='State'.",
                                      "kind": "Event",
                                      "unhealthyEvent": {
                                        "additionalProperties": {},
                                        "description": "Partition is below target replica or instance count.\r\nDnsService 1 1 e71ec636-6eab-4472-829d-0dc596bd7188\r\n  IB _Node_3 Up 131764963398090115\r\n  (Showing 1 out of 1 replicas. Total available replicas: 0)\r\n\r\nFor more information see: http://aka.ms/sfhealth",
                                        "healthState": "Error",
                                        "isExpired": false,
                                        "lastErrorTransitionAt": "2018-07-18T20:08:28.423000+00:00",
                                        "lastModifiedUtcTimestamp": "2018-07-19T17:52:41.464000+00:00",
                                        "lastOkTransitionAt": "0001-01-01T00:00:00+00:00",
                                        "lastWarningTransitionAt": "0001-01-01T00:00:00+00:00",
                                        "property": "State",
                                        "removeWhenExpired": false,
                                        "sequenceNumber": "393",
                                        "sourceId": "System.FM",
                                        "sourceUtcTimestamp": "2018-07-19T17:52:19.859000+00:00",
                                        "timeToLiveInMilliSeconds": "10675199 days, 2:48:05.477581"
                                      }
                                    }
                                  }
                                ]
                              }
                            }
                          ]
                        }
                      }
                    ]
                  }
                }
              ]
            }
          }
        ]
      }
    }
  ]
}

I've got issues with the DNS, how do I go about diagnosing it further and fixing it?

@Christina-Kang
Copy link
Contributor

Christina-Kang commented Jul 19, 2018

Could you share the DNS settings in cluster manifest please, making sure to remove any sensitive information (if any).
Specifically, we are looking for a section that looks similar to the following:

<Section Name="DnsService">
      <Parameter Name="IsEnabled" Value="true" />
</Section>

What version of service fabric is your cluster? Thanks!

@no1melman
Copy link
Author

Fabric Version: 6.1.480.9494

<Section Name="DnsService">
  <Parameter Name="InstanceCount" Value="1" />
  <Parameter Name="IsEnabled" Value="True" />
</Section>

@no1melman
Copy link
Author

I get this error:

Unhealthy event: SourceId='System.FabricDnsService', Property='Socket', HealthState='Warning', ConsiderWarningAsError=false.
DnsService UDP listener is unable to start. Please make sure there are no processes listening on the DNS port 53.
List of processes listening on the DNS port:
UDP 0.0.0.0:53 : 6212

@no1melman
Copy link
Author

@Christina-Kang
Copy link
Contributor

Are you able to upgrade the cluster to a newer version? This is a known issue which has been fixed in newer releases (6.2 and up). If you are not able to update, you can use the mitigation with ICS mentioned in the thread. This issue will also not appear in cloud clusters, it shows up only in local clusters. Please let me know if you run into issues applying those mitigations. Thanks!

@no1melman
Copy link
Author

I've updated - now I am unable to start my cluster... where can I got to find the logs for the reason why, the c:\sfcluster only has etl logs which I have no way of understanding....

@no1melman
Copy link
Author

My manifest now:

<?xml version="1.0" encoding="utf-8"?>
<ClusterManifest xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Name="DevCluster" Version="0" Description="This is a generated file. Do not modify." xmlns="http://schemas.microsoft.com/2011/01/fabric">
  <NodeTypes>
    <NodeType Name="NodeType0">
      <Endpoints>
        <ClientConnectionEndpoint Port="19000" />
        <LeaseDriverEndpoint Port="19001" />
        <ClusterConnectionEndpoint Port="19002" />
        <HttpGatewayEndpoint Port="19080" Protocol="http" />
        <HttpApplicationGatewayEndpoint Port="19081" Protocol="http" />
        <ServiceConnectionEndpoint Port="19006" />
        <ApplicationEndpoints StartPort="30001" EndPort="31000" />
      </Endpoints>
      <PlacementProperties>
        <Property Name="NodeTypeName" Value="NodeType0" />
      </PlacementProperties>
    </NodeType>
  </NodeTypes>
  <Infrastructure>
    <WindowsServer IsScaleMin="true">
      <NodeList>
        <Node NodeName="_Node_0" IPAddressOrFQDN="DESKTOP-ITQGDFK" IsSeedNode="true" NodeTypeRef="NodeType0" FaultDomain="fd:/0" UpgradeDomain="0" />
      </NodeList>
    </WindowsServer>
  </Infrastructure>
  <FabricSettings>
    <Section Name="ApplicationGateway/Http">
      <Parameter Name="IsEnabled" Value="true" />
    </Section>
    <Section Name="ClusterManager">
      <Parameter Name="MinReplicaSetSize" Value="1" />
      <Parameter Name="TargetReplicaSetSize" Value="1" />
      <Parameter Name="UpgradeStatusPollInterval" Value="5" />
      <Parameter Name="UpgradeHealthCheckInterval" Value="5" />
      <Parameter Name="FabricUpgradeHealthCheckInterval" Value="5" />
    </Section>
    <Section Name="Diagnostics">
      <Parameter Name="ProducerInstances" Value="ServiceFabricEtlFile,ServiceFabricPerfCtrFolder" />
      <Parameter Name="MaxDiskQuotaInMB" Value="10240" />
      <Parameter Name="EnableCircularTraceSession" Value="true" />
    </Section>
    <Section Name="DnsService">
      <Parameter Name="InstanceCount" Value="-1" />
      <Parameter Name="IsEnabled" Value="True" />
      <Parameter Name="AllowMultipleListeners" Value="true" />
    </Section>
    <Section Name="FabricClient">
      <Parameter Name="HealthReportSendInterval" Value="0" />
    </Section>
    <Section Name="Failover">
      <Parameter Name="NodeUpRetryInterval" Value="1" />
      <Parameter Name="SendToFMTimeout" Value="1" />
    </Section>
    <Section Name="FailoverManager">
      <Parameter Name="ExpectedClusterSize" Value="1" />
      <Parameter Name="IsSingletonReplicaMoveAllowedDuringUpgrade" Value="false" />
      <Parameter Name="MinReplicaSetSize" Value="1" />
      <Parameter Name="TargetReplicaSetSize" Value="1" />
      <Parameter Name="ClusterStableWaitDuration" Value="0" />
      <Parameter Name="PeriodicStateScanInterval" Value="1" />
      <Parameter Name="ReconfigurationTimeLimit" Value="20" />
      <Parameter Name="BuildReplicaTimeLimit" Value="20" />
      <Parameter Name="CreateInstanceTimeLimit" Value="20" />
      <Parameter Name="PlacementTimeLimit" Value="20" />
      <Parameter Name="ServiceLocationBroadcastInterval" Value="1" />
      <Parameter Name="ServiceLookupTableEmptyBroadcastInterval" Value="1" />
      <Parameter Name="MinRebuildRetryInterval" Value="1" />
      <Parameter Name="MaxRebuildRetryInterval" Value="1" />
    </Section>
    <Section Name="Federation">
      <Parameter Name="NodeIdGeneratorVersion" Value="V4" />
      <Parameter Name="ProcessAssertExitTimeout" Value="86400" />
      <Parameter Name="UnresponsiveDuration" Value="0" />
    </Section>
    <Section Name="Hosting">
      <Parameter Name="CacheCleanupScanInterval" Value="300" />
      <Parameter Name="DeactivationGraceInterval" Value="2" />
      <Parameter Name="DeactivationScanInterval" Value="600" />
      <Parameter Name="DeploymentRetryBackoffInterval" Value="1" />
      <Parameter Name="EnableProcessDebugging" Value="true" />
      <Parameter Name="EndpointProviderEnabled" Value="true" />
      <Parameter Name="RunAsPolicyEnabled" Value="true" />
      <Parameter Name="ServiceTypeRegistrationTimeout" Value="20" />
    </Section>
    <Section Name="HttpGateway">
      <Parameter Name="IsEnabled" Value="true" />
    </Section>
    <Section Name="ImageStoreService">
      <Parameter Name="MinReplicaSetSize" Value="1" />
      <Parameter Name="TargetReplicaSetSize" Value="1" />
    </Section>
    <Section Name="Management">
      <Parameter Name="DisableChecksumValidation" Value="true" />
      <Parameter Name="EnableDeploymentAtDataRoot" Value="true" />
      <Parameter Name="ImageCachingEnabled" Value="false" />
      <Parameter Name="ImageStoreConnectionString" Value="file:C:\SfDevCluster\Data\ImageStoreShare" />
    </Section>
    <Section Name="NamingService">
      <Parameter Name="MinReplicaSetSize" Value="1" />
      <Parameter Name="TargetReplicaSetSize" Value="1" />
      <Parameter Name="PartitionCount" Value="1" />
    </Section>
    <Section Name="PlacementAndLoadBalancing">
      <Parameter Name="MinLoadBalancingInterval" Value="300" />
      <Parameter Name="QuorumBasedReplicaDistributionPerFaultDomains" Value="true" />
      <Parameter Name="TraceCRMReasons" Value="false" />
    </Section>
    <Section Name="ReconfigurationAgent">
      <Parameter Name="IsDeactivationInfoEnabled" Value="true" />
      <Parameter Name="LocalHealthReportingTimerInterval" Value="5" />
      <Parameter Name="MinimumIntervalBetweenRAPMessageRetry" Value="0.5" />
      <Parameter Name="RAPMessageRetryInterval" Value="0.5" />
      <Parameter Name="RAUpgradeProgressCheckInterval" Value="3" />
      <Parameter Name="ServiceApiHealthDuration" Value="20" />
      <Parameter Name="ServiceReconfigurationApiHealthDuration" Value="20" />
    </Section>
    <Section Name="Security">
      <Parameter Name="ClusterCredentialType" Value="None" />
      <Parameter Name="ServerAuthCredentialType" Value="None" />
    </Section>
    <Section Name="ServiceFabricEtlFile">
      <Parameter Name="DataDeletionAgeInDays" Value="3" />
      <Parameter Name="EtlReadIntervalInMinutes" Value="5" />
      <Parameter Name="IsEnabled" Value="true" />
      <Parameter Name="ProducerType" Value="EtlFileProducer" />
    </Section>
    <Section Name="ServiceFabricPerfCtrFolder">
      <Parameter Name="DataDeletionAgeInDays" Value="3" />
      <Parameter Name="FolderType" Value="ServiceFabricPerformanceCounters" />
      <Parameter Name="IsEnabled" Value="true" />
      <Parameter Name="ProducerType" Value="FolderProducer" />
    </Section>
    <Section Name="Setup">
      <Parameter Name="FabricDataRoot" Value="C:\SfDevCluster\Data" />
      <Parameter Name="FabricLogRoot" Value="C:\SfDevCluster\Log" />
      <Parameter Name="SkipFirewallConfiguration" Value="true" />
    </Section>
    <Section Name="Trace/Etw">
      <Parameter Name="Level" Value="4" />
    </Section>
    <Section Name="TransactionalReplicator">
      <Parameter Name="CheckpointThresholdInMB" Value="64" />
    </Section>
  </FabricSettings>
</ClusterManifest>

@Christina-Kang
Copy link
Contributor

Unfortunately, it's hard to tell just from that. Can you repro the issue and upload your trace (.etl) file from c:\sfcluster? Before uploading, run logman update FabricTraces -fd -ets to update the traces.

For the local clusters, I've found that re-installing msi and then creating a new cluster or rebooting the machine sometimes helps, if you have no data you want to keep.

Thanks!

@no1melman
Copy link
Author

Where do I upload it to?

Plus i’ve reinstalled, rebooted, tried all sorts to get it working - to no avail

@Christina-Kang
Copy link
Contributor

You can upload it to here or provide a link I can download it from.

@no1melman
Copy link
Author

Log.zip

@Christina-Kang
Copy link
Contributor

Thanks for the logs.

The nodes are unable to come up with error access denied because there is some issue with the certificate. I will update this post with more details / a fix in a bit.

@no1melman
Copy link
Author

Cheers much appreciated

@Christina-Kang
Copy link
Contributor

Thanks for your patience :) How did you deploy this cluster (including the upgrade)?

The specific error is this: "CertCreateSelfSignCertificate failed: E_ACCESSDENIED"

From an elevated PowerShell instance, can you run the following command?

(Get-Acl -Path ($env:ProgramData+'\Microsoft\Crypto\rsa\MachineKeys\')).Access

It should list at least these 2 entries:
- NT Authority\Network Service: Allow: Read, Execute and Synchronize
- Everyone: Allow: Read, Write, Synchronize

If either are missing, that would explain the failures reported below. The mitigation is to run the following (from an elevated PowerShell instance):
 

$sid="*S-1-1-0" # everyone
$path=$env:ProgramData+'\Microsoft\Crypto\Rsa\MachineKeys' # machine key store
$perms="(RX,W)" # read, write, and execute
icacls $path /grant $sid`:$perms

If that doesn't work, please let me know and we can try something else.

@no1melman
Copy link
Author

I just used web platform installer to install it

@no1melman
Copy link
Author

Log.zip

@no1melman
Copy link
Author

still not worked out

@Christina-Kang
Copy link
Contributor

Christina-Kang commented Jul 27, 2018

Can you provide the output from the command (Get-Acl -Path ($env:ProgramData+'\Microsoft\Crypto\rsa\MachineKeys\')).Access (from an admin PowerShell) from earlier?

Thanks!

@no1melman
Copy link
Author

FileSystemRights  : Write, ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : Everyone
IsInherited       : False
InheritanceFlags  : None
PropagationFlags  : None

FileSystemRights  : FullControl
AccessControlType : Allow
IdentityReference : NT AUTHORITY\SYSTEM
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

FileSystemRights  : FullControl
AccessControlType : Allow
IdentityReference : BUILTIN\Administrators
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

FileSystemRights  : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : BUILTIN\Users
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

FileSystemRights  : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : Everyone
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

@Christina-Kang
Copy link
Contributor

Thanks for the output!

Looks like the network service doesn’t have access, there should be an entry like this:

FileSystemRights  : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : NT AUTHORITY\NETWORK SERVICE
IsInherited       : False
InheritanceFlags  : None
PropagationFlags  : None

Can you try running the following?

$sid="*S-1-5-20" # network service
$path=$env:ProgramData+'\Microsoft\Crypto\Rsa\MachineKeys' # machine key store
$perms="(RX,W)" # read, write, and execute
icacls $path /grant $sid`:$perms

This modifies the first line of the commands from earlier.

@no1melman
Copy link
Author

I got this output now:

FileSystemRights  : Write, ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : Everyone
IsInherited       : False
InheritanceFlags  : None
PropagationFlags  : None

FileSystemRights  : Write, ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : NT AUTHORITY\NETWORK SERVICE
IsInherited       : False
InheritanceFlags  : None
PropagationFlags  : None

FileSystemRights  : FullControl
AccessControlType : Allow
IdentityReference : NT AUTHORITY\SYSTEM
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

FileSystemRights  : FullControl
AccessControlType : Allow
IdentityReference : BUILTIN\Administrators
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

FileSystemRights  : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : BUILTIN\Users
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

FileSystemRights  : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : Everyone
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

@no1melman
Copy link
Author

no luck with creating a 5 node cluster
Log.zip

@Christina-Kang
Copy link
Contributor

In good news, it looks like the cert issue is gone! This looks like a firewall issue potentially. I will get in touch with the correct people for this error and get back to you. Thanks!

@Christina-Kang
Copy link
Contributor

Some of these traces look pretty old. Can you try cleaning the cluster and re-deploying? You can also try uninstalling and reinstalling (maybe rebooting if possible, but you should not need to). If it still doesn't work after that, can you re-upload the traces again? The traces should be up to date then. Thanks!

@no1melman
Copy link
Author

So I can't extract those logs any more. I had to force delete the sfcluster folder because something had removed access from me. I had to reboot in safe mode and delete the directory. Now when I try and zip up the folder using shell zip or zip using 7z it just says access denied on everything. So I'm not sure what is causing that, but it ain't helping

@Christina-Kang
Copy link
Contributor

Hi @no1melman, apologies for the late reply. Are you still having issues with this?

@no1melman
Copy link
Author

Yeah, it just isn't working, I've updated service fabric, reinstalled, rebooted, set the permissions as you said. What I found out is that the folder mentioned above just locks up and I can't remove it

@Christina-Kang
Copy link
Contributor

Are you trying to delete/move the log files while the cluster is running? Can you check if FabricHostSvc is running when you get permission denied? If so, then can you stop FabricHostSvc and then Fabric.exe (if any)? Afterwards, can you try again to see if you can get the log files?

@no1melman
Copy link
Author

no1melman commented Aug 21, 2018

I've managed to perform the logman command again, and zip up the new log files without issue

Log.zip

@Christina-Kang
Copy link
Contributor

It's the same certificate issue. Can you run (Get-Acl -Path ($env:ProgramData+'\Microsoft\Crypto\rsa\MachineKeys\')).Access (from an admin PowerShell) again to make sure that permissions have not been removed, possibly from a domain policy. If it has not been, then let's set up a time to chat offline and get the cluster up and running.

Thanks!

@no1melman
Copy link
Author

@Christina-Kang can we set up some time to get this cluster running - it still all looks good my end...

@Christina-Kang
Copy link
Contributor

Sounds good. Can you send me an email at bikang@microsoft.com? Thanks!

@Christina-Kang Christina-Kang changed the title DNS Issue DNS Issue (fabric v6.1) and unrelated cluster start up issue. Oct 2, 2018
@Christina-Kang
Copy link
Contributor

Christina-Kang commented Oct 4, 2018

Thank you @no1melman for your time working with us on this!

The below work around applies to Windows.

The start up issue was with network service losing permission after it being set, due to a reason unknown. While running the PowerShell commands did not work in this instance, going to the directory C:\ProgramData\Microsoft\Crypto\RSA and changing permissions for the folder MachineKeys allowed the cluster to come up correctly.

Permission was changed by right clicking on folder MachineKeys and selecting the Security tab and selecting the NETWORK SERVICE group and giving at minimum read, write, and execute permissions.

A root cause fix will be implemented on the Service Fabric run time. No changes are required of sfctl for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants