-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent ApplicationInsights:Sender [ [Error: UNABLE_TO_VERIFY_LEAF_SIGNATURE] ] #180
Comments
@annanhan thanks for reporting. So Application Insights fail to communicate to the server. Does it repro on all computers or on your dev box? Do you think this is related to this issue: #177? Can you please try to open https://dc.services.visualstudio.com/api/ping from the same computer and see whether cert is trusted. Also please check explicitly whether this page can be opened: https://baltimore-cybertrust-root.digicert.com/
W.r.t. memory leak - if SDK failed to communicate to the backend it should start dropping items from the queue. @KamilSzostak to confirm this behavior. |
I can't reproduce on my box. We are running v0.17.2 on another app that is running in Heroku and that seems to be running fine, but I think the difference is that that isn't using an SSL endpoint. I am able to hit both of those pages from my dev box. |
We only have an SSL endpoint - http is not supported. Which environment does it repro? Does it repro constantly or intermittedly? |
It seemed intermittent. Sorry, I should have been more clear. Although, I just noticed that App A is using node v6.9.1 and App B is using node v0.10.28. I'll try updating the node package and see if that helps. |
So it looks like the memory was resolved with upgrading Node to v6.9.1, but I'm still getting the cert issue errors. |
IT's good to hear that memory issues got resolved. Are those cert issues evenly distributed over time? Or they somehow time-grouped? I see that cert is OK on endpoint: https://sslanalyzer.comodoca.com/?url=https%3A%2F%2Fdc.services.visualstudio.com%2Fapi%2Fping I also found this thread with the recommendation to do:
|
Also getting this error in my environment hosted in Azure. Following is the error log
|
@cmdkoh would it be possible to try my recommendation above? I wonder if it's a problem of missing SSL intermediate certificate on Ubuntu image in Azure |
@SergeyKanzhelev Sure, will report back the finding... |
@SergeyKanzhelev i made the changes and it did the following
unfortunately the "unable to verify the first certificate" still showing up after I restarted the app... |
Does |
yes, response looks fine (200 OK) |
So no events can be sent or only some connections ends up with this error? |
@cmdkoh I'm actually out of ideas and get to the last page of bing search =). So if you have any - let us know. @KamilSzostak @OsvaldoRosado could you try Ubuntu image on Azure for a repro? |
@SergeyKanzhelev It seems I am still able to post telemetry to applicationinsights, perhaps is other connections (other auto collections? but I did turned them off...) end up with this error. What I have to do is add logging prior to posting telemetry and then cross check with any SSL error and telemetry being posted in Azure. |
@cmdkoh I didn't fully understand your solution. We do not have any other endpoints our node.js SDK will talk to. Perhaps it's something node is doing by itself or some other module? (like this issue where somehow extra call was injected to node.js running as Azure Web Site: #144) Can you please explain or even copy/paste the code snippet on what you are logging prior to posting telemetry? |
Still seeing this. Production App Service deployment having intermittent errors... |
@SergeyKanzhelev I've also been running into this issue in my local; the telemetry seems to be showing up in Azure properly, but I see intermittent warnings about ssl cert failures. I tried the recommendation from the SO question you linked above, but that didn't help. However, digging into That section mentions to try adding the intermediate certs on the client to compensate for the server. I tried that by following the cert link you included above to this page and using the So I rolled all that back and added some logging to the location in applicationinsights that is throwing the error. I log the destination URL before the request is made, then log the options and the error message when the cert error is thrown. I get logs that look like the following:
I'm wondering if these requests are being load balanced to different servers and some of them are not configured correctly? It seems odd that these wouldn't fail consistently. I admit I'm not very knowledgeable about SSL certs, but maybe someone could look in the logs for dc.services.visualstudio.com/v2/track at the timestamps I have pasted above and see if there is any consistency in errors coming from specific VMs? I also ran into errors at I'm actually not even sure what these requests are, my local server isn't handling any traffic while I'm logging, but it still seems to ping dc.services.visualstudio.com/v2/track regularly 3 times per minute. If there's anything I can log on my side to help get to a solution let me know; I'd like to get to the bottom of this. |
Ok, I'm able to repro the issue locally with the small loop - need to wait a while though. I enabled tracing, but it didn't give me much. I doubt that the assumption that server is misconfigured is correct. We do not see this issue on other SDKs. I'll keep digging. We'll schedule the work to check every server errors (there are a lot =)) as well. @OsvaldoRosado @KamilSzostak do you have other ideas how to troubleshoot the issue? var http = require('http');
var url = require('url');
var sleep = require('system-sleep');
var appInsights = require("applicationinsights");
//everything got fixed with this setting:
//process.env['NODE_TLS_REJECT_UNAUTHORIZED'] = '0';
appInsights.enableVerboseLogging();
var client = appInsights.getClient("key");
client.config.maxBatchSize = 1;
while (true) {
console.log("test");
client.trackEvent("test");
sleep(300);
} I enabled tracing by Successful
Unsuccessful
|
Update: we found some SSL-related messages on servers. Investigating whether they are relevant to the issue. |
Great, thanks for the update!
… On Feb 22, 2017, at 2:05 PM, Sergey Kanzhelev ***@***.***> wrote:
Update: we found some SSL-related messages on servers. Investigating whether they are relevant to the issue.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@KamilSzostak can you please make sure SDK re-tries on this error. We are still investigating the server side issue |
Update: current thinking is that the issue may be related to this
We keep investigating the issue. |
We get this on Azure Web Apps also for a Node Server. |
Hack/workaround based on Sergey's investigation, that seems to be working for me:
Like I said, seems to have fixed things by trusting the |
I am getting the same error on a Linux VM (Ubuntu) on azure. |
@SergeyKanzhelev any update on this issue? It seems to come and go. Do you need more information to help with the investigation? |
@OsvaldoRosado said it doesn't repro any longer. Osvaldo? |
It wasn't occurring for some time, but it seems to have returned. Root cause for the missing intermediate certificate is still unclear. |
Closing this as it doesn't appear to occur any longer from my own testing and no new reports have been received in many months. If anyone is still seeing this please re-open! |
This is still happening, see logs below | ApplicationInsights:Sender [ { Error: unable to verify the first certificate |
@OsvaldoRosado
|
Thanks for the reports! I've raised this issue internally for investigation. This error indicates an issue connecting with our backend (in particular the SDK not being able to trust the backend's SSL certificate). For now the only action to do on your applications is to make sure the SDK's disk-backed retries are enabled (this is the default in v1.x). This will allow the SDK to retry sending this telemetry at a time when the backend is able to receive it. |
@OsvaldoRosado any update on this issue? |
My understanding is that an intermittent backend issue has been identified and a long-term fix is being planned. That being said, I'd love to collect some more data on this. If those who see this could mention if they have any data loss (when disk-backed retries are enabled), how frequently they see the error, and if there is any regularity as to when it occurs, that would be great! I'm also considering reducing the severity of this error message when retries are enabled, as it is purely informational in this case. Ideally, I think this should only start complaining at error severity if telemetry couldn't be sent some N consecutive times - indicating more than a transient and automatically recovered error. This change would remove this log message (unless verbose logging is enabled) for any cases where the telemetry did eventually make it to the backend, I'm open to thoughts on this as well! Re-opening the issue. |
Hello, we see this error on most of our azure app services and we had to setUseDiskRetryCaching to false because that was also throwing storage errors. :/ |
@jkrinsky Could you please open a new issue for the storage errors? Definitely sounds like a bug! Assuming they're issues with the ACL folder protection for Windows that was added in 1.0.2, you can forcefully disable it with:
But be aware that you're running a bit riskier with this disabled (which is why it's hard to disable), as the SDK can't ensure other user accounts on the box are restricted from reading the telemetry stored to disk for retry. |
@OsvaldoRosado We report events from Azure Functions - no retries there I assume since it's a rather short-lived process? |
@yoadsn I don't have any great solutions right now if you can't use retries. You can of course enable retries but this might cause your functions to run longer than you like (and depending on the details of the bug found by @jkrinsky you might also need the config overrides I posted above to get the retries to work). You can use the Flush API to manually send telemetry to get programmatically informed that telemetry sending failed. Eg.
But admittedly this seems rather obtuse. I do know the backend team is working to resolve the root cause - which would remove any need for these workarounds. |
Version 1.0.3 of this SDK has now been released. It includes some changes that might help with this problem.
|
Closing this for now due to a lack of reports after 1.0.3's adjustment to how this situation is handled. Please feel free to reopen if things still don't seem right! |
I am noticing the following issue in v1.0.7, Any suggestion on how to overcome this? ApplicationInsights:Sender [ 'Ingestion endpoint could not be reached. This batch of telemetry items has been lost. Use Disk Retry Caching to enable resending of failed telemetry. Error:', |
It's expected for the backend to have transient failures. As long as you don't see this all of the time, things are working properly. As the error message states, it's highly encouraged for you to enable disk retry caching to ensure these transient backend failures do not result in lost telemetry. You can do that by using EDIT: As an additional note, when disk retry caching is on, transient errors like these will not be presented to you unless the SDK detects a prolonged problem reaching the backend. |
thanks @OsvaldoRosado, but this issue seems to be occurring all the time.. and after some amount of time, it take down the entire application.. not sure why this occurs.. it is easily reproducible.. |
@sharath-srinivasan If you remove SDK from your application does it still crash? Network failures should never be fatal. Does the crash come with a stack trace? As for the persistent SSL errors:
Can you also provide what version of Node you're using? |
I am seeing same error in the AppCenter build for Xamarin projects. `##[section]Starting: Analyze build logTask : Command Line
|
I've raised this issue internally; a fix is in progress for the endpoint server and should be rolled out soon. No exact ETA to report, but I'll update if by a few weeks if nothing changes |
While using applicationinsights@0.17.2, I noticed these errors in our logs. The node app is hosted on Heroku using an SSL endpoint addon. After doing some research, it looks like node isn't accepting the server's cert. The messages were getting to Application Insights as far as I could tell.
Additionally, it caused a huge memory leak. After I disabled the package, the memory profile was back to normal.
Here are the other packages we're using:
"engines": {
"node": "^0.10.28",
"npm": "^1.4.9"
},
"dependencies": {
"amqp.channel": "^0.0.8",
"applicationinsights": "^0.17.2",
"args-list": "^0.3.3",
"bluebird": "^2.1.2",
"cookie-parser": "^1.3.4",
"cool-ascii-faces": "^1.3.3",
"cors": "^2.3.1",
"ddi": "^1.1.0",
"express": "^4.8.0",
"express-winston": "^1.0.0",
"formdefutils": "^0.0.2",
"gift": "^0.6.0",
"github-webhook-handler": "^0.5.0",
"glob": "^4.3.2",
"le_node": "^1.0.12",
"lodash": "^3.10.1",
"mailgun-js": "^0.7.7",
"mapquest": "^0.2.0",
"moment": "^2.10.3",
"multiparty": "^4.1.2",
"node-xlsx": "^0.5.1",
"phone-formatter": "0.0.2",
"request": "^2.34.0",
"rimraf": "^2.2.8",
"romis": "^2.0.0",
"sprintf": "^0.1.4",
"urlencoded-request-parser": "^1.0.1",
"us": "^1.0.3",
"uuid": "^2.0.1",
"winston": "^1.0.0",
"xml2js": "~0.4.4",
"yamljs": "^0.2.1"
}
The text was updated successfully, but these errors were encountered: