-
Notifications
You must be signed in to change notification settings - Fork 764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected Telemetry Logging Disruption in .NET 6 Projects Using OpenTelemetry Within an IIS Context #4936
Comments
What is the bug in OpenTelemetry ? As mentioned here (https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/metrics#common-issues-that-lead-to-missing-metrics), MeterProvider is must be kept active for metrics to work. It also links to examples for achieving this with asp.net core style apps and normal console apps as well.. |
The issue fundamentally is that metrics are collected intermittently. We have environments where everything works properly and environments where the problem occurs. The problem occurs on applications hosted on AWS DEMO, written with .NET 6.0, and running under IIS. |
That seem like an environment issue, rather than an issue with OpenTelemetry. And it seems you are already fixing your environment to ensure MeterProvider is not disposed earlier than desired, so I am not seeing anything actionable for OpenTelemetry to fix... Let me know if I misunderstood the whole issue. |
UpdatesHello Team and @cijothomas , I've refactored the configuration class as follows and it seems to be a pretty robust implementation:
🚀 Upon deployment to the STG environment on AWS, it functioned well for about 3 hours, but then started throwing the following error in bursts:
Analysis: Even though the collector and the application reside on the same machine (thus making a network error quite unlikely), this error is presented. It leads me to believe that it might be obscuring another issue underneath. Additional Query: Could this error originate from the collector side? If so, what specifically should we focus on investigating? Would appreciate if someone can delve a bit deeper into this matter. The part where the IOException occurs (The response ended prematurely while waiting for the next frame from the server) needs special attention. Looking forward to hearing thoughts or suggestions on the implementation and error! Thank you 🙏. |
🚨 Important Update 🚨 Dear Contributors, @cijothomas Thank you for your continuous attention and support on this issue. I’d like to share a crucial update regarding our advancements and the current challenges. Issue: We’ve identified that the test environment hasn’t been logging for over an hour and a specific type of error keeps appearing in the trace. Actions Taken: We attempted to restart the connection pool on two web servers in sticky session sequentially. Upon restarting the first and attempting an action to generate a metric (not sent to the collector), we restarted the second pool, performed another operation, and both metrics were sent successfully. Detailed Analysis: A thorough analysis of active communication processes with the collector showed that before the pool restarts, there were no processes attributable to our application, while after recycling the pool, the processes are present. Insights: Metrics are being captured correctly, but communication issues with the grpc service seem to be occurring. Interestingly, this only occurs when the application is on framework 6.0, in-process on IIS, and hosted on AWS. Seeking Suggestions: We're reaching out to the community for insights or suggestions that might assist in resolving this issue. Your expertise is invaluable, and any input would be greatly appreciated! Thank you for your patience and collaboration! |
Sorry, not clear what is the ask from this repo? The error below would why telemetry is missing. This could indicates collector is not available.. could be transient error as well. I do not have much expertise to troubleshoot gRPC connection errors. "2023-10-10T07:38:24.9429543Z:Exporter failed send data to collector to {0} endpoint. Data will not be sent. Exception: {1}{http://localhost:5110/}{Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error starting gRPC call. HttpRequestException: An error occurred while sending the request. IOException: The request was aborted. IOException: The response ended prematurely while waiting for the next frame from the server.", DebugException="System.Net.Http.HttpRequestException: An error occurred while sending the request.")" |
@cijothomas I believe the issue lies within the code because the collector is gathering correctly from other applications (which, however, are using the framework 4.8). It is mathematically impossible for it to be a networking problem since the collector and the application reside on the same server. I have attempted to substitute 'localhost' with '127.0.0.1' without obtaining any results. Your help in shedding light on this perplexing issue would be profoundly appreciated. Warm regards, |
I do not have sufficient experience further with root causing gRPC exception. One thing to note is that in .NET Framework vs .NET 6.0, the exporter uses different client library for gRPC, so the issue (if any) could be with the underlying library.... |
There is a proposal in #4860 to remove the older gRPC library. Is switching protocols to http/protobuf an option in this case? |
Hi @matt-hensley , OpenTelemetry Logging Error: Task Cancellation Exception with Grpc.Core Description: I've been consistently encountering an error when trying to write logs with OpenTelemetry.
This error happens at an unspecified frequency and seems to be related to the gRPC channel state change mechanism. This might be causing a delay or inability to log data, which is crucial for our operations. Any assistance or insights on resolving this issue would be greatly appreciated. |
|
From the description, the issue is in .NET 6.0 only, which would be using the new gRPC client! |
Found the issue but NOT the solution at all: Expanding x-dynatrace Header Causing Request Rejection by OpenTelemetry Collector in .NET 6.0 Web ApplicationI am grappling with an intriguing issue related to the x-dynatrace header while working with an IIS hosted web application and a custom OpenTelemetry Collector service, particularly during communications via GRPC protocol. The expanding header eventually leads to the inability to send metrics to the OT Collector as it starts rejecting requests after the header size exceeds a particular threshold (~20KB). Environment Details:
Issue In-depth:
Seeking Solutions
Your insights, suggestions, or shared experiences with similar scenarios would be greatly valuable. Thank you in advance for your expertise and assistance! |
I'd recommend asking this in Dynatrace agent support. |
This issue was marked stale due to lack of activity and will be closed in 7 days. Commenting will instruct the bot to automatically remove the label. This bot runs once per day. |
Closed as inactive. Feel free to reopen if this issue is still a concern. |
Bug Report
Symptom
Describe the bug
After deployment, telemetry logging in our .NET 6 projects using OpenTelemetry only functions for a limited time span. The logging ceases without error warnings, and a memory dump reveals that instances of OpenTelemetry vanish until a connection pool refresh.
Expected behavior
Continuous and uninterrupted telemetry logging with OpenTelemetry instances remaining persistent without requiring a pool refresh.
Runtime environment:
Additional context
Potential issue with the Garbage Collector prematurely deallocating the MeterProvider instance, interrupting the telemetry logging. Deployed applications utilizing .NET 6 with slightly varied OpenTelemetry initiation codes are being impacted.
Notably, within our suite of applications, the ones experiencing this issue are the only two hosted on IIS, presenting a possible correlation that may warrant further exploration.
Reproduce
Steps to reproduce the behavior:
Additional Steps Taken:
I added GC KeepAlive
that replaces the previous logic:
services.AddSingleton(meterProvider);
The solution seemed to work, but in the long run, an oscillating performance occurred so we haven't solved the issue.
Configuration class code
Here I provide you the code of the configuration class before applying changes described above :
The class is referenced in Program.cs written using Minimal Hosting / Minimal Apis approach :
Log Results:
Enabling logs the following error comes out, but I'm pretty sure that we haven't network issue :
Clear Indication Towards an IIS Context Issue
After traversing through the data and instances provided, we've descended into a scenario that requires our utmost attention and collaborative resolution. I’ve dissected the situation and here is a clear, deductive path that leans towards the conclusions reached in point 3.
Persistent Memory Instances:
After manually invoking the Garbage Collector during the memory dump, both instances - the metrics-producing application and the collector - remain discernible in the memory. This consistency between active instances and memory representation points towards an undeviating relation of instances with our issue at hand, rather than an irregularity in memory allocation or deallocation.
Network Error Elimination:
The co-existence of the metrics producer and collector on the identical machine effectively nullifies the hypothesis of network errors contributing to the traced issue. This brings to the forefront a stark revelation that the surface-level error we are perceiving is perhaps a symptom of an underlying, concealed issue. We also try to replace localhost:2222 with 127.0.0.1:2222 to force IPv4 without positive results.
The IIS Context Culprit:
When immersing into a .NET 6 console application context, the metrics are dispatched seamlessly, devoid of discrepancies. Conversely, introducing an application into the IIS context, by launching it as a process (in-process), resurrects the aforestated issue. This correlation between the manifestation of the problem and the IIS context is undeniable.
Conclusion
Considering these aspects and the additional information that the problem is being observed only on applications hosted on AWS using Framework 6.0 - as opposed to those on Azure using Framework 4.8 which are running smoothly - it's evident that our issue is intricately tied to the IIS context in which the application is running, and potentially the interaction with the specific environment and framework version.
Your expertise and insights into how applications behave diversely when executed within an IIS context, or any anomalies witnessed previously, especially within an AWS environment and utilizing Framework 6.0, would be immensely valuable. Urgency binds us due to the criticality of the situation, and your swift response will be greatly appreciated.
Let's unravel this together and sculpt a resolution at the earliest. Observations on peculiarities or necessary settings, either at the environment or application level for this specific hosting scenario, would be instrumental in navigating towards a solution.
Thank you in advance for your cooperation and looking forward to fruitful discussions ahead.
The text was updated successfully, but these errors were encountered: