Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency calls not being logged by App Insights when being performed as part of a Polly retry #1923

Open
guysteel opened this issue Jun 16, 2020 · 24 comments
Labels

Comments

@guysteel
Copy link

guysteel commented Jun 16, 2020

Describe your environment. Describe any aspect of your environment relevant to the problem:

  • SDK version: 2.14.0
  • .NET runtime version (.NET or .NET Core, TargetFramework in the .csproj file): .netcore 3.1
  • Hosting Info (IIS/Azure WebApps/etc): Azure WebApp
  • Platform and OS version: Azure / WebApp

Since upgrading from v2.8.2, when using Polly to retry a failing Http request automatically only the first failing call to the dependency is being logged. Subsequent dependency calls are no longer appear to be logged.

Steps to reproduce.
In ConfigureServices - configure app insights and setup a named http client:

  services.AddApplicationInsightsTelemetry();
  
  ...

  var httpClientBuilder = services.AddHttpClient("TestClient", (provider, client) =>
  {
     client.Timeout = TimeSpan.FromSeconds(120);
 });

  var retryPolicy = Policy.Handle<HttpRequestException>()
        .OrResult<HttpResponseMessage>(response => (int)response.StatusCode >= 500)
        .WaitAndRetryAsync(5, retryAttempt => TimeSpan.FromSeconds(1), (result, span) =>
                {
                    Console.WriteLine(result.Result.StatusCode);
                });

  httpClientBuilder.AddPolicyHandler(retryPolicy);

In Configure add a failing call to demonstrate the behaviour:


  ...
  var httpClientFactory = app.ApplicationServices.GetService<IHttpClientFactory>();
  var httpClient = httpClientFactory.CreateClient("TestClient");
  var request = new HttpRequestMessage(HttpMethod.Get, new Uri("http://httpstat.us/500"));
  var result = httpClient.SendAsync(request).Result;

What is the expected behavior?
I would expect each call to the failing endpoint to be logged as a separate dependency call.

What is the actual behavior?
I see only a single dependency call logged - the initial call. In the previous version of App Insights all of the calls were being logged.

Additional context.
The above uses Polly 3.1.3

@guysteel guysteel added the bug label Jun 16, 2020
@karun-verghese
Copy link

karun-verghese commented Sep 15, 2020

I am experiencing a very similar error. The setup is similar to how @guysteel has set it up above. On retries, I do not see a dependency call being logged in app insights. I know the retries are happening because I see the info logs that I am writing within the retry policy handler. The packages/SDK I am using are:
.Net Core SDK 3.1.402
Microsoft.ApplicationInsights.AspNetCore (2.14.0)
Microsoft.Extensions.Http.Polly (3.1.8)

@cijothomas
Copy link
Contributor

The retry calls will likely not be captured if the retries are transparent to the caller. ApplicationInsights collects httpcalls by subscribing to the DiagnosticSource events emitted by a DiagnosticHandler in the HttpClient. If the retry is happening without going through the full pipeline of handlers (esp DiagnosticHandler), then the retries will not be captured by ApplicationInsights.

Don't think I can suggest a workaround at this stage, as the DiagnosticHandler is part of the HttpClient itself, so you may/may not have the ability to make sure it is called when using the Polly package to do the retry.

@karun-verghese
Copy link

@cijothomas I tried the same scenario but with an older version of our service. Basically, I checked out an older branch of my service and ran the same scenario. In app insights I can see the following:
  | timestamp [UTC] | itemType | resultCode |  
|9/24/2020, 8:26:48.803 AM | dependency | 500 |  
  | 9/24/2020, 8:26:32.737 AM | dependency | 500 |  
  | 9/24/2020, 8:26:24.641 AM | dependency | 500 |  
  | 9/24/2020, 8:26:20.550 AM | dependency | 500 |  
  | 9/24/2020, 8:26:18.425 AM | dependency | 500|

I know these are my retry calls, as there are no other services running in this environment at the moment, and I can see from the timestamp that the calls are due to the exponential backoff algorithm I have configured for Polly.
The packages I am using in this version are :

.Net Core SDK 2.2.0
Microsoft.ApplicationInsights.AspNetCore (2.7.1)
Microsoft.Extensions.Http.Polly (2.2.0)

So it appears that this behaviour was available earlier but has since changed for some reason.

My app insights config is as follows:
var aiOptions = new Microsoft.ApplicationInsights.AspNetCore.Extensions.ApplicationInsightsServiceOptions();
aiOptions.EnableAdaptiveSampling = false;
aiOptions.InstrumentationKey = Configuration.GetSection("ApplicationInsights").GetValue("InstrumentationKey");
services.AddApplicationInsightsTelemetry(aiOptions);

My app insights config has not changed with my new version, except that now we configure the connection string instead of the instrumentation key and of course we use the newer packages which I mentioned in my earlier message.

@piotrzbyszynski
Copy link

Also, Application Insights seem to log exceptions handled by Polly

Policy.Handle<HttpRequestException>()
After exception occurs, it is logged although IMO it should be regarded as handled exception. @cijothomas Any thoughts?

@cijothomas
Copy link
Contributor

Also, Application Insights seem to log exceptions handled by Polly

Policy.Handle<HttpRequestException>()
After exception occurs, it is logged although IMO it should be regarded as handled exception. @cijothomas Any thoughts?

In the ExceptionTelemetry, what is the SDKVersion field value.
If it is something starting with "rdd..", then its captured by the DependencyCollectionModule, which I would consider a bug and provide fix.
If it is something starting with "il..", then the exception is captured by Msft.Extensions.Logging.ApplicationInsightsLoggingProvider. This works independent of whether DependencyCollectionModule is enabled or not. By default, this loggingprovider captures all logs Warning or above. This is by design and you can configure logging to not capture this by using instructions from https://docs.microsoft.com/en-us/azure/azure-monitor/app/ilogger#control-logging-level

@piotrzbyszynski
Copy link

@cijothomas SDK Version is azurefunctions: 3.0.14785.0 (forgot to mention that it is Azure Function, sorry for that)

@cijothomas
Copy link
Contributor

@pzbyszynski Ok. For functions, you need to raise this issue in https://github.com/Azure/azure-webjobs-sdk repo. Functions + ApplicationInsights integration is in that repo.

(But if you see same behaviour outside of functions, let us know right here, we can help)

@karun-verghese
Copy link

@cijothomas Any update on the original question? I had posted some tests that I did in the message below:

@cijothomas I tried the same scenario but with an older version of our service. Basically, I checked out an older branch of my service and ran the same scenario. In app insights I can see the following:
  | timestamp [UTC] | itemType | resultCode |  
|9/24/2020, 8:26:48.803 AM | dependency | 500 |  
  | 9/24/2020, 8:26:32.737 AM | dependency | 500 |  
  | 9/24/2020, 8:26:24.641 AM | dependency | 500 |  
  | 9/24/2020, 8:26:20.550 AM | dependency | 500 |  
  | 9/24/2020, 8:26:18.425 AM | dependency | 500|

I know these are my retry calls, as there are no other services running in this environment at the moment, and I can see from the timestamp that the calls are due to the exponential backoff algorithm I have configured for Polly.
The packages I am using in this version are :

.Net Core SDK 2.2.0
Microsoft.ApplicationInsights.AspNetCore (2.7.1)
Microsoft.Extensions.Http.Polly (2.2.0)

So it appears that this behaviour was available earlier but has since changed for some reason.

My app insights config is as follows:
var aiOptions = new Microsoft.ApplicationInsights.AspNetCore.Extensions.ApplicationInsightsServiceOptions();
aiOptions.EnableAdaptiveSampling = false;
aiOptions.InstrumentationKey = Configuration.GetSection("ApplicationInsights").GetValue("InstrumentationKey");
services.AddApplicationInsightsTelemetry(aiOptions);

My app insights config has not changed with my new version, except that now we configure the connection string instead of the instrumentation key and of course we use the newer packages which I mentioned in my earlier message.

@cijothomas
Copy link
Contributor

@karun-verghese Is the only thing different is the SDK version,? need to double confirm this, as you said "checked out older branch of your service".
Please share a minimal repro showing this behavior. Will investigate.

(From what I see, in .Net core, we always relied on DiagnosticSource callbacks to collect http dependency. In very old versions esp. .NET Framework, there were alternate options to collect http calls)

@karun-verghese
Copy link

karun-verghese commented Oct 29, 2020

@cijothomas @guysteel I have invited you to be collaborators on a small repo that I set up to investigate this issue. The repo is simple:
It contains a very basic .Net Core Web service with couple of healthcheck operations. On the 'main' branch, the setup is as following:
.Net Core SDK (3.1.403)
Microsoft.ApplicationInsights.AspNetCore (2.15.0)
Microsoft.Extensions.Http.Polly (3.1.9)

The configuration for Polly and App insights can be found in the Startup class and is exactly the same as we have in our actual services.

On the branch 'OldSDK', I performed the same tests with the following setup:
.Net Core SDK (2.2.208)
Microsoft.ApplicationInsights.AspNetCore (2.7.1)
Microsoft.Extensions.Http.Polly (2.2.0)

The tests performed are as follows:
I have setup an 'isOperable' endpoint such that when I locally host it using kestrel, I can call the endpoint http://localhost:5001/isOperable

The isOperable endpoint intentionally calls a non-existing 'dependent api' which is a variation of some publicly available dummy test api's that I found in the following git repo: https://github.com/public-apis/public-apis#animals. Thus, the expected status code is a 404(Not Found)

The test is very simple and has just 3 steps:

  1. Launch the service
  2. Call the isOperable endpoint
  3. Check the logs in app insights

When I run the test for the latest SDK and packages I see the following in my logs:

timestamp [UTC] message data resultCode itemType
10/29/2020, 7:53:17.863 AM - https://cat-fact.herokuapp.com/factsWrongUrl 404 dependency
10/29/2020, 7:53:19.447 AM Delaying for 2.096 seconds and retrying. Retry attempt :1. Request uri : https://cat-fact.herokuapp.com/factsWrongUrl - - trace
10/29/2020, 7:53:22.807 AM Delaying for 4.084 seconds and retrying. Retry attempt 2 . Request uri : https://cat-fact.herokuapp.com/factsWrongUrl - - trace
10/29/2020, 7:53:28.127 AM Delaying for 8.08 seconds and retrying. Retry attempt :3. Request uri : https://cat-fact.herokuapp.com/factsWrongUrl - - trace

When I run the test for the Old 2.2 SDK with the old Polly and App Insights packages I see the following:

timestamp [UTC] message data resultCode  itemType
10/29/2020, 9:09:32.865 AM - https://cat-fact.herokuapp.com/factsWrongUrl 404 dependency
10/29/2020, 9:09:34.313 AM Delaying for 2.059 seconds and retrying. Retry attempt :1. Request uri : https://cat-fact.herokuapp.com/factsWrongUrl - - trace
10/29/2020, 9:09:36.381 AM - https://cat-fact.herokuapp.com/factsWrongUrl 404 dependency
10/29/2020, 9:09:37.616 AM Delaying for 4.02 seconds and retrying. Retry attempt 2 . Request uri : https://cat-fact.herokuapp.com/factsWrongUrl - - trace
10/29/2020, 9:09:41.636 AM - https://cat-fact.herokuapp.com/factsWrongUrl 404 dependency
10/29/2020, 9:09:42.981 AM Delaying for 8.096 seconds and retrying. Retry attempt :3. Request uri : https://cat-fact.herokuapp.com/factsWrongUrl - - trace
10/29/2020, 9:09:51.089 AM - https://cat-fact.herokuapp.com/factsWrongUrl 404 dependency

The app insights instance I am pointing to is a private instance I created.

As you can see, with a really stripped down service with nothing except the app insights and polly configuration, the behaviour is the same as reported earlier. With the old SDK I see the dependencies being logged for the retries. But with the latest .Net Core SDK that functionality has stopped working.
The difference between the two branches is now really only the differences in SDK, packages and small code changes which you can easily check for yourself

@karun-verghese
Copy link

@cijothomas I also gave this a try with the new .Net 5 SDK and packages. It's the same behaviour as described above. The configuration is as follows:
.Net Core SDK (5.0.100)
Microsoft.ApplicationInsights.AspNetCore (2.16.0)
Microsoft.Extensions.Http.Polly (5.0.0)
Even now, the dependencies due to retries are not logged.

@cijothomas
Copy link
Contributor

There is no special handling in .NET5 which I am aware of, which would help with this issue.

@sebader
Copy link

sebader commented Jan 12, 2021

I came across the same (or very similar?) issue. Running in Azure Functions v3 (netcore 3.1)

        public override void Configure(IFunctionsHostBuilder builder)
        {
            builder.Services
                .AddHttpClient<HttpClient>("HttpClient", client => client.Timeout = TimeSpan.FromSeconds(120)) // default overall request request timeout (includes all polly retries)
                .AddPolicyHandler(GetRetryWithTimeoutPolicy());
        }

        static IAsyncPolicy<HttpResponseMessage> GetRetryWithTimeoutPolicy()
        {
            var retryPolicy = HttpPolicyExtensions
                .HandleTransientHttpError()
                .Or<TimeoutRejectedException>()
                .WaitAndRetryAsync(3, retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)))
                .WrapAsync(Policy.TimeoutAsync(30)); // per request timeout

            return retryPolicy;
        }

Some of the calls to an external API fail (usually time out after the 30 sec.). They are correctly retried by Polly. But AppInsights does not log another dependency call:
image
You can actually see it in the gap between the failed call and the next one. Based on the trace logs I can see that the call is executed again and succeeds.

I would expect that the retry call shows up as another dependency.

The SDK version on the dependency telemetry is recorded as rdddsc:2.14.0-17971

@kAleksei
Copy link

Any updates?
Maybe someone found a temporary solution for this issue?

@karun-verghese
Copy link

Dear all,

It's been a long time since we've had any update on this issue? Has any investigation been done on this? We even provided a minimal repo (as asked) for you guys to investigate.

Looking forward to hearing from you guys. Thanks.

@sekiyak
Copy link

sekiyak commented May 3, 2021

We experience the original issue described with the following versions:
.Net Core SDK (5.0.202)
Microsoft.ApplicationInsights.AspNetCore (2.17.0)
Microsoft.Extensions.Http.Polly (3.1.7)

@Hollister
Copy link

We experience the same behaviour:
Microsoft.ApplicationInsights.AspNetCore (2.17.0)
Microsoft.Extensions.Http.Polly (5.0.1)

It is disappointing to read through this thread, see the complete reproducible example and full detail provided by @karun-verghese and then the complete silence afterwards.

@SoucianceEqdamRashti
Copy link

We experience the same behavior with Polly version 3.1.13 and Azure functions 3.0.13.
Its frustrating that there is exist no workaround for this.

@cijothomas
Copy link
Contributor

#2556 same/duplicate issue.

@devlie
Copy link
Member

devlie commented Oct 5, 2022

The retry calls will likely not be captured if the retries are transparent to the caller. ApplicationInsights collects httpcalls by subscribing to the DiagnosticSource events emitted by a DiagnosticHandler in the HttpClient. If the retry is happening without going through the full pipeline of handlers (esp DiagnosticHandler), then the retries will not be captured by ApplicationInsights.

Don't think I can suggest a workaround at this stage, as the DiagnosticHandler is part of the HttpClient itself, so you may/may not have the ability to make sure it is called when using the Polly package to do the retry.

@cijothomas Should this be be a runtime bug then? Perhaps they can reorder the DiagnosticHandler? This issue combined with #2556 makes DependencyTelemetry pretty useless when there's retry involved.

@cijothomas
Copy link
Contributor

@devlie No. I do not see an issue with Runtime. #2556 shows it is an application insights sdk behavior.

(Separate Note: OpenTelemetry is fixing it.)

@diyan-penkov-sciant
Copy link

diyan-penkov-sciant commented Dec 19, 2022

We experience the same behavior:
Microsoft.ApplicationInsights (2.21.0)
Microsoft.Extensions.Http.Polly (6.0.11)
Microsoft.NET.Sdk.Functions (4.1.1)
Microsoft.NETCore.App (6.0.11)

Any new on resolving it?

@andrasg
Copy link

andrasg commented Apr 17, 2023

Is there anyone who has found a workaround on this? Wrapping retries somehow and manually generating dependency telemetry?

@Krusty93
Copy link

Is there anyone who has found a workaround on this? Wrapping retries somehow and manually generating dependency telemetry?

We do it using DelegatingHandlers (Add sorting is important):

services.AddTransient<ApplicationInsightsPayloadTracerHandler>();

services
  .AddHttpClient(..)
  .AddPolicyHandler(..)
  .AddHttpMessageHandler<ApplicationInsightsPayloadTracerHandler>();

The custom DelegatingHandler to log requests and responses.

    public class ApplicationInsightsPayloadTracerHandler : DelegatingHandler
    {
        private readonly IServiceProvider _serviceProvider;
        private readonly ILogger<ApplicationInsightsPayloadTracerHandler> _logger;

        public ApplicationInsightsPayloadTracerHandler(...) { ... }

        protected override async Task<HttpResponseMessage> SendAsync(
            HttpRequestMessage request,
            CancellationToken cancellationToken)
        {
            HttpResponseMessage response = null;
            var requestTimestamp = DateTime.UtcNow;

            try
            {
                response = await base.SendAsync(request, cancellationToken);
            }
            finally
            {
                await TraceDependencyAsync(request, response, requestTimestamp);
            }

            return response;
        }

        private async Task TraceDependencyAsync(
            HttpRequestMessage request,
            HttpResponseMessage response,
            DateTimeOffset traceTimestamp)
        {
            string traceCorrelationId = ActivityTraceHelper.BuildTraceCorrelationId();

            await TraceRequestAsync(request, traceCorrelationId, traceTimestamp);
            await TraceResponseAsync(request, response, traceCorrelationId);
        }

        private async Task TraceRequestAsync(
            HttpRequestMessage request,
            string traceCorrelationId,
            DateTimeOffset traceTimestamp)
        {
            try
            {
                using var requestScope = _serviceProvider.CreateScope();
                var telemetryClient = requestScope.ServiceProvider.GetRequiredService<TelemetryClient>();

                ...

                var requestTelemetry = new TraceTelemetry
                {
                    SeverityLevel = SeverityLevel.Verbose,
                    Timestamp = traceTimestamp,
                };

                requestTelemetry.Properties.Add("TraceCorrelationId", traceCorrelationId);
                // log headers, status code, etc.
                telemetryClient.TrackTrace(requestTelemetry);
            }
            catch (Exception ex)  {  }
        }

        private async Task TraceResponseAsync(...)
        {
           // very likely as above
        }
    }

Note: To get request headers, use request.Headers.ToString().Split(Environment.NewLine) because ToString` method is internally overridden with a special implementation.

With this setup if the first attemp fails, Polly triggers the retry that will end up in this handler again, so the request will be logged twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests