Skip to content
This repository has been archived by the owner on Jun 10, 2020. It is now read-only.

Request and dependency telemetry is tracked by each Webhost in the app #621

Closed
lmolkova opened this issue Mar 9, 2018 · 35 comments
Closed
Labels
Milestone

Comments

@lmolkova
Copy link
Member

lmolkova commented Mar 9, 2018

Multiple WebHosts could run in the same process.

This scenario is possible with any ASP.NET Core app and when using the Shared Process hosting model in ServiceFabric

In this case AppInsights services are added to each service provider, however, singletons that track requests and dependencies are static and process-wide - they are not aware of WebHost boundaries.

So the requests and dependencies are tracked by multiple DiagnostiSource subscribers - one per web host.

Repro Steps

Ignored functional test in AspNetCore2.0 app will be added shortly

            using (var server1 = new InProcessServer(assemblyName, this.output))
            using (var server2 = new InProcessServer(assemblyName, this.output))
            {
                this.ExecuteRequest(server1.BaseHost + requestPath);
                var telemetry1 = server1.Listener.ReceiveItems(TestListenerTimeoutInMs);

                this.ExecuteRequest(server2.BaseHost + requestPath);
                var telemetry2 = server2.Listener.ReceiveItems(TestListenerTimeoutInMs); 

                Assert.Single(telemetry1.Where(t => t is TelemetryItem<RequestData>));
                Assert.Single(telemetry1.Where(IsServiceDependencyCall));
                Assert.DoesNotContain(telemetry1, t => t is TelemetryItem<ExceptionData>);

                var request1 = telemetry1.Single(t => t is TelemetryItem<RequestData>);
                Assert.Equal("200", ((TelemetryItem<RequestData>)request1).data.baseData.responseCode);

                // Fails here, we track everything twice
                // it did not happen with the first host because second one has not been really started yet
                Assert.Single(telemetry2.Where(t => t is TelemetryItem<RequestData>));
                Assert.Single(telemetry2.Where(IsServiceDependencyCall));
                Assert.DoesNotContain(telemetry2, t => t is TelemetryItem<ExceptionData>);

                var request2 = telemetry2.Single(t => t is TelemetryItem<RequestData>);
                Assert.Equal("200", ((TelemetryItem<RequestData>)request2).data.baseData.responseCode);
            }

Actual Behavior

each webhost tracks requests and depedencies of all running web hosts

Expected Behavior

We should not re-create diagnostic listeners if there is an active one.

This blocks scenario with web hosts from having different configurations: imagine a different set of ikeys or telmetryinitializers/processors. I'm not sure it's feasible to support this scenario (even if feasible, it's much harder to implement), in this case, there could a listener per webhost, but the listener must only track its own webhost telemetry.

@lmolkova
Copy link
Member Author

lmolkova commented Mar 9, 2018

related to microsoft/ApplicationInsights-dotnet#613

@mkosieradzki
Copy link

@lmolkova Please let me know, if possible, when there is a preview nuget package available containing the fix - I will be happy to battle test it in a real Service Fabric environment.

@lmolkova
Copy link
Member Author

@mkosieradzki we have not decided on the version the fix for double-tracking will be included into.

Once we have fix for NullReferenceException (#622). can we ask you try the preview from our myget feed?

@mkosieradzki
Copy link

@lmolkova I would be happy to try it out.

@lmolkova
Copy link
Member Author

@mkosieradzki cool! I'll let you know once we have it

@cijothomas
Copy link
Contributor

Plan is to have 2.3.0-beta1 (with fix for NRE) released on April 1st week. Let me know if there is a need to get a private version even earlier.

@mkosieradzki
Copy link

Any news on this one?

@cijothomas
Copy link
Contributor

@mkosieradzki Official release of 2.3.0-beta1 will be in 3-4 days. Thank you for your patience!

@cijothomas
Copy link
Contributor

We have published beta1 in myget. It should be in nuget.org Monday. I'll update when i push to nuget.

@mkosieradzki
Copy link

@cijothomas @lmolkova Thanks a lot. I have uploaded it to my test cluster. It seems to be working, but I will have conclusive information in couple of days.

@mkosieradzki
Copy link

@cijothomas @lmolkova This specific bug seems to be fixed. Now I am experiencing different issues with duplicate telemetry.

I am trying to diagnose the exact reason, but long story short I am observing that if I open 4 listeners in my Service Fabric service all requests are reported 4 times.

Each listener has its own WebHostBuilder and UseApplicationInsights is called on each WebHost.

I will try to isolate the issue,

@mkosieradzki
Copy link

@cijothomas @lmolkova

Ok. Got a repro. Below is the telemetry generated by a single request with five listeners registered.

Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Message","time":"2018-04-17T21:46:19.0055361Z","tags":{"ai.application.ver":"1.0.0.0","ai.cloud.roleInstance":"DESKTOP-4FMNMG2","ai.user.id":"zjl1q","ai.operation.id":"85f341ab-49b319826c826d2b","ai.operation.parentId":"|85f341ab-49b319826c826d2b.","ai.operation.name":"GET /","ai.location.ip":"::1","ai.internal.sdkVersion":"aspnet5f:2.3.0-beta1","ai.internal.nodeName":"DESKTOP-4FMNMG2"},"data":{"baseType":"MessageData","baseData":{"ver":2,"message":"Request starting HTTP/1.1 GET http://localhost:8300/","severityLevel":"Information","properties":{"CategoryName":"Microsoft.AspNetCore.Hosting.Internal.WebHost","Method":"GET","Scheme":"http","Host":"localhost:8300","Protocol":"HTTP/1.1","Path":"/","AspNetCoreEnvironment":"Production"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Message","time":"2018-04-17T21:46:19.0075351Z","tags":{"ai.application.ver":"1.0.0.0","ai.cloud.roleInstance":"DESKTOP-4FMNMG2","ai.user.id":"zjl1q","ai.operation.id":"85f341ab-49b319826c826d2b","ai.operation.parentId":"|85f341ab-49b319826c826d2b.","ai.operation.name":"GET /","ai.location.ip":"::1","ai.internal.sdkVersion":"aspnet5f:2.3.0-beta1","ai.internal.nodeName":"DESKTOP-4FMNMG2"},"data":{"baseType":"MessageData","baseData":{"ver":2,"message":"Request finished in 1.8146ms 200","severityLevel":"Information","properties":{"ElapsedMilliseconds":"1.8146","CategoryName":"Microsoft.AspNetCore.Hosting.Internal.WebHost","StatusCode":"200","AspNetCoreEnvironment":"Production"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Request","time":"2018-04-17T21:46:19.0055361Z","tags":{"ai.application.ver":"1.0.0.0","ai.cloud.roleInstance":"DESKTOP-4FMNMG2","ai.user.id":"zjl1q","ai.operation.id":"85f341ab-49b319826c826d2b","ai.operation.name":"GET /","ai.location.ip":"::1","ai.internal.sdkVersion":"aspnet5f:2.3.0-beta1","ai.internal.nodeName":"DESKTOP-4FMNMG2"},"data":{"baseType":"RequestData","baseData":{"ver":2,"id":"|85f341ab-49b319826c826d2b.","name":"GET /","duration":"00:00:00.0035846","success":true,"responseCode":"200","url":"http://localhost:8300/","properties":{"_MS.ProcessedByMetricExtractors":"(Name:'Requests', Ver:'1.0')","httpMethod":"GET","AspNetCoreEnvironment":"Production"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Request","time":"2018-04-17T21:46:19.0055361Z","tags":{"ai.application.ver":"1.0.0.0","ai.cloud.roleInstance":"DESKTOP-4FMNMG2","ai.user.id":"zjl1q","ai.operation.id":"85f341ab-49b319826c826d2b","ai.operation.name":"GET /","ai.location.ip":"::1","ai.internal.sdkVersion":"aspnet5f:2.3.0-beta1","ai.internal.nodeName":"DESKTOP-4FMNMG2"},"data":{"baseType":"RequestData","baseData":{"ver":2,"id":"|85f341ab-49b319826c826d2b.","name":"GET /","duration":"00:00:00.0055857","success":true,"responseCode":"200","url":"http://localhost:8300/","properties":{"_MS.ProcessedByMetricExtractors":"(Name:'Requests', Ver:'1.0'); (Name:'Requests', Ver:'1.0')","httpMethod":"GET","AspNetCoreEnvironment":"Production"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Request","time":"2018-04-17T21:46:19.0055361Z","tags":{"ai.application.ver":"1.0.0.0","ai.cloud.roleInstance":"DESKTOP-4FMNMG2","ai.user.id":"zjl1q","ai.operation.id":"85f341ab-49b319826c826d2b","ai.operation.name":"GET /","ai.location.ip":"::1","ai.internal.sdkVersion":"aspnet5f:2.3.0-beta1","ai.internal.nodeName":"DESKTOP-4FMNMG2"},"data":{"baseType":"RequestData","baseData":{"ver":2,"id":"|85f341ab-49b319826c826d2b.","name":"GET /","duration":"00:00:00.0090520","success":true,"responseCode":"200","url":"http://localhost:8300/","properties":{"_MS.ProcessedByMetricExtractors":"(Name:'Requests', Ver:'1.0'); (Name:'Requests', Ver:'1.0'); (Name:'Requests', Ver:'1.0')","httpMethod":"GET","AspNetCoreEnvironment":"Production"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Request","time":"2018-04-17T21:46:19.0055361Z","tags":{"ai.application.ver":"1.0.0.0","ai.cloud.roleInstance":"DESKTOP-4FMNMG2","ai.user.id":"zjl1q","ai.operation.id":"85f341ab-49b319826c826d2b","ai.operation.name":"GET /","ai.location.ip":"::1","ai.internal.sdkVersion":"aspnet5f:2.3.0-beta1","ai.internal.nodeName":"DESKTOP-4FMNMG2"},"data":{"baseType":"RequestData","baseData":{"ver":2,"id":"|85f341ab-49b319826c826d2b.","name":"GET /","duration":"00:00:00.0108100","success":true,"responseCode":"200","url":"http://localhost:8300/","properties":{"_MS.ProcessedByMetricExtractors":"(Name:'Requests', Ver:'1.0'); (Name:'Requests', Ver:'1.0'); (Name:'Requests', Ver:'1.0'); (Name:'Requests', Ver:'1.0')","httpMethod":"GET","AspNetCoreEnvironment":"Production"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Request","time":"2018-04-17T21:46:19.0055361Z","tags":{"ai.application.ver":"1.0.0.0","ai.cloud.roleInstance":"DESKTOP-4FMNMG2","ai.user.id":"zjl1q","ai.operation.id":"85f341ab-49b319826c826d2b","ai.operation.name":"GET /","ai.location.ip":"::1","ai.internal.sdkVersion":"aspnet5f:2.3.0-beta1","ai.internal.nodeName":"DESKTOP-4FMNMG2"},"data":{"baseType":"RequestData","baseData":{"ver":2,"id":"|85f341ab-49b319826c826d2b.","name":"GET /","duration":"00:00:00.0129843","success":true,"responseCode":"200","url":"http://localhost:8300/","properties":{"_MS.ProcessedByMetricExtractors":"(Name:'Requests', Ver:'1.0'); (Name:'Requests', Ver:'1.0'); (Name:'Requests', Ver:'1.0'); (Name:'Requests', Ver:'1.0'); (Name:'Requests', Ver:'1.0')","httpMethod":"GET","AspNetCoreEnvironment":"Production"}}}}

I have also attached a project reproducing this issue.

BTW. If you take a closer looks properties are duplicated inside duplicate telemetries as well ;)

DuplicateTelemetryApplication.zip

@lmolkova
Copy link
Member Author

@mkosieradzki we are aware of it, this #621 is exactly about it.
we do not have a fix or workaround as we do not really support multiple web hosts. our goal with #622 was to not crash a process at least.

@mkosieradzki
Copy link

@lmolkova You are right. Sorry for a duplicate report. Not crashing the process is really a lot ;), so thanks for fixing this so far.

I am also facing duplicate telemetries when using ApplicationInsights serilog sink and .NET Core (Does not happen on .NET Framework). Each duplicate telemetry has a different SDK version, but I believe it's an issue with specific serilog sink, however I will investigate this one as well.

@cijothomas
Copy link
Contributor

Closing this as wontfix as we have no immediate plan to fix this. If the pattern of multiple webhosts in a single process is being used by lot of customers, we will revisit and reprioritize.

@SergeyKanzhelev
Copy link
Contributor

@cijothomas this is quite common for Service Fabric deployments. Consider to re-open and investigate possible solutions

@cijothomas
Copy link
Contributor

I see. @lmolkova I think we had chatted about this where you mentioned this model is not so common/not encouraged in SF? Or was it someone else ...

@mkosieradzki
Copy link

@cijothomas

Some explanation:

  • Shared Process hosting model is currently the only way to enable scalability from for example 5 actor hosts to 200, because without this model hosts responsible in a scale-down state you would have 40 processes on five nodes each instead of 1 process per node hosting multiple partitions.
  • It is a common practice to have multiple Listeners in one service. For example one listener exposed for internal communication only that is not exposed to the public internet and another listener for public access. Each listener = separate WebHost.

So it's actually not only Shared Process, but Shared Process hosting model is really critical to Service Fabric scalability.

Feel free to talk to the Service Fabric team, and they will explain the details.

@lmolkova
Copy link
Member Author

@mkosieradzki we discussed it with SF team. They changed VS templates to default to Exclusive process and they expect most customers to NOT use shared mode.

@mikkelhegn could you please help us re-evaluate importance of this scenario?

@lmolkova
Copy link
Member Author

/cc @brahmnes

@brahmnes
Copy link

@dkkapur

@dkkapur
Copy link

dkkapur commented May 25, 2018

@srrengar

@mkosieradzki
Copy link

Is this dead? Am I supposed to properly rewrite this SDK, but in a way it actually works, or what?

@esbenbach
Copy link

@lmolkova That the SF team change the template to Exclusive process is important how? It might be that this is the most common scenario, but as @mkosieradzki points out, there are legit use cases for the shared process model - and doubly so for multiple listeners inside a single process.

We currently have this issue with multiple listeners (more than two actually), and we were considering having a webhost per tenant for one of our services - but so far we are holding back on it due to the insights concern.

@lmolkova
Copy link
Member Author

@esbenbach @mkosieradzki
I get your point and I agree there are legit scenarios here. We do have some limitations that make support for this scenario limited.

Could you help us with following: assuming multiple hosts run in the same process: would they have the same configuration? Would you need to have different instrumentation keys for them or just any other piece of configuration: TelemetryInitializers, TelemetryProcessors, etc?

@lmolkova lmolkova reopened this Jun 14, 2018
@mkosieradzki
Copy link

mkosieradzki commented Jun 14, 2018

Thanks for reopening the issue. Single telemetry configuration is more than enough. In a perfect solution, telemetry configuration can be set-up outside of the WebHost - because WebHost has a shorter lifetime than Service Fabric host process.

I would be happy to integrate this solution with Service Fabric SDK...

Some scenarios:

  1. Multiple Web Listeners - rather single config (one may want to disable incoming requests telemetry on the specific listener) - but it is exactly the same situation
  2. Actor model scalability - single config
  3. High density multi-tenancy - one might want a dedicated telemetry for every tenant separately, but it is reasonable to say to use exclusive process hosting model then.

However there are scenarios where there are no active listeners and you want to have telemetry enabled… like for example:

  • Disaster Recovery (OnDataLoss)
  • Maintenance tasks
  • Service Initialization

@esbenbach
Copy link

In our particular case a single cfg instance is enough. There is probably a case to be made for separate cfgs but im unable to come up with a sane example where it is useful

@mkosieradzki
Copy link

Bump! It has been three months...

@lmolkova
Copy link
Member Author

Heads up! I'm working on the fix.

@skwasiborski
Copy link

Just wanted to add that this is also problem in my company. We are using shared process hosting extensively to ensure future scalability and this issue is problematic for us. In my case similarly as stated above single configuration is ok.

@pksorensen
Copy link

pksorensen commented Feb 14, 2019

I am thinking this is related:

   at Microsoft.ApplicationInsights.DependencyCollector.Implementation.ApplicationInsightsUrlFilter.get_EndpointLeftPart()
   at Microsoft.ApplicationInsights.DependencyCollector.Implementation.ApplicationInsightsUrlFilter.IsApplicationInsightsUrlImpl(String url)
   at Microsoft.ApplicationInsights.DependencyCollector.Implementation.HttpCoreDiagnosticSourceListener.HttpCoreDiagnosticSourceSubscriber.<OnNext>b__6_0(String evnt, Object r, Object _)
   at System.Diagnostics.DiagnosticListener.IsEnabled(String name, Object arg1, Object arg2)
   at System.Net.Http.DiagnosticsHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Http.HttpClientWrapper.GetResponseAsync()
   at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Http.AdalHttpClient.GetResponseAsync[T](Boolean respondToDeviceAuthChallenge)
   at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Http.AdalHttpClient.GetResponseAsync[T]()
   at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenHandlerBase.SendHttpMessageAsync(IRequestParameters requestParameters)
   at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenHandlerBase.SendTokenRequestAsync()
   at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenHandlerBase.CheckAndAcquireTokenUsingBrokerAsync()
   at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenHandlerBase.RunAsync()
   at Microsoft.IdentityModel.Clients.ActiveDirectory.AuthenticationContext.AcquireTokenForClientCommonAsync(String resource, ClientKey clientKey)
   at Microsoft.IdentityModel.Clients.ActiveDirectory.AuthenticationContext.AcquireTokenAsync(String resource, ClientCredential clientCredential)

Null Exception.

image

@mkosieradzki
Copy link

@lmolkova Thank you very much for fixing this. It works really nice now. I can use Application Insights again!

@lmolkova
Copy link
Member Author

lmolkova commented Mar 5, 2019

@pksorensen I believe this is microsoft/ApplicationInsights-dotnet#613. Please update your AppInsights and check if the issue goes away.

@pksorensen
Copy link

Yes @lmolkova - looks like updating solved it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

9 participants