Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access exception upon certificate renewal attempt #302

Open
InteXX opened this issue Mar 22, 2019 · 20 comments
Open

Access exception upon certificate renewal attempt #302

InteXX opened this issue Mar 22, 2019 · 20 comments

Comments

@InteXX
Copy link

InteXX commented Mar 22, 2019

I'm getting an error when the webjob attempts to renew a certificate:

Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.Cleanup ---> Microsoft.Rest.Azure.CloudException: The client '[Redacted]' with object id '[Redacted]' does not have authorization to perform action 'Microsoft.Web/sites/config/list/action' over scope '/subscriptions/[Redacted]/resourceGroups/[Redacted]/providers/Microsoft.Web/sites/[Redacted]/config/publishingcredentials'.

The full stack report is below.

I've reviewed documentation here and here, but I'm afraid I'm still at a loss.

I've found the Microsoft.Web/sites/config/list/action provider here, but it's not listed in the available roles and there's no indication as to how to give it access to this:

/subscriptions/[Redacted]/resourceGroups/[Redacted]/providers/Microsoft.Web/sites/[Redacted]/config/publishingcredentials

All has been working well for the past year, but it only started failing within the past month or so. I have two websites on which I'm running the job, and suddenly both are failing with like errors. I've changed nothing in my Azure configuration.

 1   {
 2     "Type": "FunctionCompleted",
 3     "EndTime": "2019-03-21T03:11:53.1829332+00:00",
 4     "Failure": {
 5       "ExceptionType": "Microsoft.Azure.WebJobs.Host.FunctionInvocationException",
 6       "ExceptionDetails": "Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.Cleanup ---> Microsoft.Rest.Azure.CloudException: The client '[Redacted]' with object id '[Redacted]' does not have authorization to perform action 'Microsoft.Web/sites/config/list/action' over scope '/subscriptions/[Redacted]/resourceGroups/[Redacted]/providers/Microsoft.Web/sites/[Redacted]/config/publishingcredentials'.
 7      at Microsoft.Azure.Management.WebSites.WebAppsOperations.<BeginListPublishingCredentialsWithHttpMessagesAsync>d__210.MoveNext()
 8   --- End of stack trace from previous location where exception was thrown ---
 9      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
10      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
11      at Microsoft.Azure.Management.WebSites.WebAppsOperationsExtensions.<BeginListPublishingCredentialsAsync>d__411.MoveNext()
12   --- End of stack trace from previous location where exception was thrown ---
13      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
14      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
15      at Microsoft.Azure.Management.WebSites.WebAppsOperationsExtensions.BeginListPublishingCredentials(IWebAppsOperations operations, String resourceGroupName, String name)
16      at LetsEncrypt.Azure.Core.KuduHelper.GetKuduClient(WebSiteManagementClient client, IAzureWebAppEnvironment settings) in D:\\a\\1\\s\\LetsEncrypt.SiteExtension.Core\\KuduHelper.cs:line 15
17      at LetsEncrypt.Azure.Core.Services.KuduFileSystemAuthorizationChallengeProvider..ctor(IAzureWebAppEnvironment azureEnvironment, IAuthorizationChallengeProviderConfig config) in D:\\a\\1\\s\\LetsEncrypt.SiteExtension.Core\\Services\\KuduFileSystemAuthorizationChallengeProvider.cs:line 22
18      at LetsEncrypt.Azure.Core.CertificateManager..ctor(AppSettingsAuthConfig config) in D:\\a\\1\\s\\LetsEncrypt.SiteExtension.Core\\CertificateManager.cs:line 31
19      at LetsEncrypt.SiteExtension.Functions.Cleanup(TimerInfo timerInfo) in D:\\a\\1\\s\\LetsEncrypt.SiteExtension.WebJob\\Functions.cs:line 73
20      at lambda_method(Closure , Functions , Object[] )
21      at Microsoft.Azure.WebJobs.Host.Executors.VoidMethodInvoker`1.InvokeAsync(TReflected instance, Object[] arguments)
22      at Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker`1.<InvokeAsync>d__8.MoveNext()
23   --- End of stack trace from previous location where exception was thrown ---
24      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
25      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
26      at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<InvokeAsync>d__22.MoveNext()
27   --- End of stack trace from previous location where exception was thrown ---
28      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
29      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
30      at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
31      at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithWatchersAsync>d__21.MoveNext()
32   --- End of stack trace from previous location where exception was thrown ---
33      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
34      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
35      at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__19.MoveNext()
36   --- End of stack trace from previous location where exception was thrown ---
37      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
38      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
39      at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
40      at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__13.MoveNext()
41      --- End of inner exception stack trace ---
42      at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
43      at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__13.MoveNext()
44   --- End of stack trace from previous location where exception was thrown ---
45      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
46      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
47      at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<TryExecuteAsync>d__10.MoveNext()"
48     },
49     "ParameterLogs": {},
50     "FunctionInstanceId": "[Redacted]",
51     "Function": {
52       "Id": "LetsEncrypt.SiteExtension.Functions.Cleanup",
53       "FullName": "LetsEncrypt.SiteExtension.Functions.Cleanup",
54       "ShortName": "Functions.Cleanup",
55       "Parameters": [
56         {
57           "Name": "timerInfo",
58           "DisplayHints": {
59             "Description": "Timer executed on schedule (Daily: 1 occurrences)"
60           }
61         }
62       ]
63     },
64     "Arguments": {
65       "timerInfo": "2019-03-21T03:11:49.9071967+00:00"
66     },
67     "Reason": "AutomaticTrigger",
68     "ReasonDetails": "Timer fired at 2019-03-21T03:11:48.8550732+00:00",
69     "StartTime": "2019-03-21T03:11:48.8550732+00:00",
70     "OutputBlob": {
71       "ContainerName": "azure-webjobs-hosts",
72       "BlobName": "output-logs/[Redacted].txt"
73     },
74     "ParameterLogBlob": {
75       "ContainerName": "azure-webjobs-hosts",
76       "BlobName": "output-logs/[Redacted].params.txt"
77     },
78     "HostInstanceId": "[Redacted]",
79     "HostDisplayName": "LetsEncrypt.SiteExtension.WebJob",
80     "SharedQueueName": "azure-webjobs-host-le-[Redacted]",
81     "InstanceQueueName": "azure-webjobs-host-[Redacted]",
82     "Heartbeat": {
83       "SharedContainerName": "azure-webjobs-hosts",
84       "SharedDirectoryName": "heartbeats/le-[Redacted]",
85       "InstanceBlobName": "[Redacted]",
86       "ExpirationInSeconds": 45
87     },
88     "WebJobRunIdentifier": {
89       "WebSiteName": "[Redacted]",
90       "JobType": "Continuous",
91       "JobName": "letsencrypt.siteextension.job",
92       "RunId": ""
93     }
94   }
@sjkp
Copy link
Owner

sjkp commented Mar 27, 2019

If you generate the client secret for the service principal from the portal should be aware that the default life-time is one year. So maybe the secret is simply expired. You can just lookup the service principal in azure ad (using the client id, if you forgot what you named it), and generate a new secret. This time set the life-time to non-expiring, then you wont have this problem later again.

@InteXX
Copy link
Author

InteXX commented Mar 27, 2019

Yes, I ran into that a while back. Since then I've always generated them as non-expiring:

image

@Tsaukpaetra seems to feel that the Service Principal no longer has the role required to access that resource group, but I'm struggling to figure out how to check on that (official documentation is frustrating, to say the least).

Would you concur?

@InteXX
Copy link
Author

InteXX commented Mar 27, 2019

You can just lookup the service principal in azure ad (using the client id, if you forgot what you named it)

How does one do this?

@sjkp
Copy link
Owner

sjkp commented Mar 27, 2019

Oh I can see from the other thread that you don't know how to check if the service principal certificate really is expired - you can use these powershell scripts to get the info

Connect-AzureAD -TenantId "yourtenantid"

$a = Get-AzureADApplication -All $true -Filter "AppId eq 'your-client-id'"
$a.PasswordCredentials

Then you will see something similar to this
image
If all the end dates are in the past, then you need to create a new password. You can't get this information from the portal UI AFAIK.

@InteXX
Copy link
Author

InteXX commented Mar 27, 2019

If all the end dates are in the past

Hm, it must be something else:

image

@sjkp
Copy link
Owner

sjkp commented Mar 27, 2019

Check that the service principal still have access to the resource group?

@InteXX
Copy link
Author

InteXX commented Mar 27, 2019

Yes, that's what I'm trying to figure out how to do ;-)

Anyway... I opened a support ticket. Expensive, I know, but this is a must-have. Hopefully I'll be receiving a phone call shortly.

@sjkp
Copy link
Owner

sjkp commented Mar 27, 2019

image
Go to your resource group with the web site, and Click Acces Control (IAM) and use the Chekc Access Feature. If you know the name of your Service Principal/Application you can just search for it and see what persionssion it have assigned. In my case here it is granted access on the subscription, but that is more than required, contributor on the resource group should be sufficient.

@InteXX
Copy link
Author

InteXX commented Mar 27, 2019

Well shucky darn, that seems to have been it.

I'd come upon this same page in all my searching, but I didn't realize you could search for an application by name. It didn't appear in the pick list, so I figured I was at the wrong screen.

So I added lews to the Contributor role. I'll wait for the job to run again, and we'll see what happens.

Thanks for the screen shot and the tip!

@sjkp
Copy link
Owner

sjkp commented Mar 27, 2019

NP

@InteXX
Copy link
Author

InteXX commented Mar 27, 2019

It's set to run again in just over six hours. I'd run it manually, but I want to wait to see what it does under schedule.

I'll report back here. Keep the issue open?

@sjkp
Copy link
Owner

sjkp commented Mar 27, 2019

Just close it when you have validated that it works :)

@InteXX
Copy link
Author

InteXX commented Mar 27, 2019

Will do.

@InteXX
Copy link
Author

InteXX commented Mar 28, 2019

OK, we have a new cert:

image

...but alas we only have partial success. Stack trace below.

This is a different error. Is it an Azure issue or a LEWS issue?

Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.Cleanup ---> Microsoft.Rest.Azure.CloudException: Operation returned an invalid status code 'Conflict'
   at Microsoft.Azure.Management.WebSites.CertificatesOperations.<DeleteWithHttpMessagesAsync>d__9.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Management.WebSites.CertificatesOperationsExtensions.<DeleteAsync>d__9.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Management.WebSites.CertificatesOperationsExtensions.Delete(ICertificatesOperations operations, String resourceGroupName, String name)
   at LetsEncrypt.Azure.Core.Services.WebAppCertificateService.RemoveCertificate(WebSiteManagementClient webSiteClient, Certificate s) in D:\a\1\s\LetsEncrypt.SiteExtension.Core\Services\WebAppCertificateService.cs:line 104
   at LetsEncrypt.Azure.Core.Services.WebAppCertificateService.<>c__DisplayClass4_1.<RemoveExpired>b__1(Certificate s) in D:\a\1\s\LetsEncrypt.SiteExtension.Core\Services\WebAppCertificateService.cs:line 96
   at System.Collections.Generic.List`1.ForEach(Action`1 action)
   at LetsEncrypt.Azure.Core.Services.WebAppCertificateService.RemoveExpired(Int32 removeXNumberOfDaysBeforeExpiration) in D:\a\1\s\LetsEncrypt.SiteExtension.Core\Services\WebAppCertificateService.cs:line 96
   at LetsEncrypt.SiteExtension.Functions.Cleanup(TimerInfo timerInfo) in D:\a\1\s\LetsEncrypt.SiteExtension.WebJob\Functions.cs:line 73
   at lambda_method(Closure , Functions , Object[] )
   at Microsoft.Azure.WebJobs.Host.Executors.VoidMethodInvoker`1.InvokeAsync(TReflected instance, Object[] arguments)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker`1.<InvokeAsync>d__8.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<InvokeAsync>d__22.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithWatchersAsync>d__21.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__13.MoveNext()
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__13.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<TryExecuteAsync>d__10.MoveNext()

@Tsaukpaetra
Copy link

Looks like it was trying to delete the old certs but failed? I assume this may have happened because the cert was still associated with a site.

@InteXX
Copy link
Author

InteXX commented Mar 28, 2019

Maybe so... I replayed it and it succeeded this time.

I also noticed that Cleanup() ran before RenewCertificate(). What do you make of that?

@InteXX
Copy link
Author

InteXX commented Mar 28, 2019

private async Task RemoveCertificate(WebSiteManagementClient webSiteClient, Certificate s)
{
await webSiteClient.Certificates.DeleteAsync(azureEnvironment.ServicePlanResourceGroupName, s.Name);
}

@Tsaukpaetra
Copy link

I think in theory it should have skipped any certs that were still in use (see link 92 above your reference) but somehow it found a letsencrypt cert that was associated but it didn't know about. 🤷‍♂️
I assume the resource group had certs by letsencrypt belonging to another site that wasn't the one the configuration was pointing to.

@InteXX
Copy link
Author

InteXX commented Mar 28, 2019

That makes sense... that's a little bit of why I'm leaving the issue open for the time being. Simon may want to have a look at that part of it.

It'd be hard to reproduce this one, I think.

@Tsaukpaetra
Copy link

Yeah, it'll take three months now to repro (since I assume you didn't export the expired cert).
Suggested repro steps:

  1. Have two sites install the letsencrypt extension and get certs for their sites, and both store the certs in the same resource group.
  2. Remove the letsencrypt extension (or otherwise disable the webjob) so that its cert expires or is otherwise not auto-renewed.
  3. The remaining letsencrypt webjob should notice that there's a letsencrypt cert that's expired and try to delete it, but will fail because it hasn't removed/replaced it on the site (because it doesn't know about the site).

In this case, I thought there was some standalone multi-site thing that was added, but I never got around to setting it up. That would probably be a better solution in the long run if multiple sites need to be certed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants