New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logs of "context deadline exceeded", broken pipes, and eventual process death #506
Comments
Upgraded to NextDNS 1.32.3. Issues persist. |
Please provide a https://nextdns.io/diag |
I tried running it 4 times, but it keeps hanging after the 3rd hop in "Traceroute for ultra low latency primary IPv6". I can run some manual traceroutes to more hosts if it helps. Here's the output I got:
|
The routing issue should be fixed. Please try a diag again and tell me if the timeouts are still happening. |
I'm getting the same behavior where it hangs after the 3rd hop:
|
What do you get for |
{"locationName": "🇺🇸 Los Angeles, United States", "pop": "zepto-lax", "rtt": 22675} |
For kicks, I downgraded to NextDNS 1.9.4 as I had been running that for months without issue. It's been about 5 hours and I have a fraction of the errors as before (though I'm am still seeing them) and the memory has grown to only 42M resident so far. I would have had to restart 1.32.3 multiple times by now. Gonna see what happens overnight (I'm UTC-7). |
Can you also try without the cache enable on the latest version? |
The 1.9.4 run finally ended (about 15' ago) the same way (oom-killer), but it lasted much longer. Hopefully that's useful data. I'll try running latest without cache now. |
I'm about 2 hours into running 1.32.3 configured with |
3 more hours and still not a single error message. Resident memory has only grown by ~4.5 MB. |
Would you mind running it again with 10M and see if you reproduce the issue then try again with 1M? |
Been running 1.32.3 with 10M cache for about 90' and so far, zero errors and restrained memory growth. Did something change on the backend? Gonna let it run overnight. Do you still want me to test a 1M cache as well? |
Ran all night without any errors. Memory is still slowly growing (at 28M resident now), but I'm not sure if that's on par with previous behavior and is nothing more than the cache filling up. Let me know if you still want me to run things with a 1M cache. |
Errors are still gone, however memory continues to grow. Up to 42M resident now. |
I've been running into something similar as well, mostly when my internet connection fails-over to backup. My ISP disconnects the PPPoE session every 24 hours so there is usually a failover to backup every 24 hours. When this happens, I see a lot of
|
Same just happened to me... loads of "context deadline exceeded" messages before these...
I ended up restarting the NextDNS service which restored DNS. |
I too have this symptom. |
This will be fixed by d0570f3 |
Thanks @rs I was having similar issues on nexdns cli on openwrt. Though have no way of confirming if it's exactly the same issue but conditions seem similar - pppoe tripping every few minutes and nextdns crashing. I have disabled nextdns and it seems to be much stable - no disconnect for past 11 hours. |
May I ask when will this be pushed out in new version? |
The new release is currently building, should be ready in a few minutes. |
is this now fixed? |
There have been multiple commits to attempt to solve such an issue. I think it is likely to be fixed in the current version. |
on my synology dsm 7 i am getting this error:
also on my centos8 machine
|
After a few months of everything being fine, it has struck again. I suspect it's the same thing as #586 (which got opened a few days ago). I've been on nextdns v1.36.0 since August 25 without issue. I just upgraded to the latest (1.37.2). |
Context
For the past week or so I've been having problems on and off with the client on my router (and Windows host on a separate network, but I'm going to focus on the router). Starting yesterday, things have gotten much worse and the client is now repeatedly dying. The home network is becoming unusable. 😢
These are the sorts of messages filling my logs:
After a while, the client is killed by the oom-killer, which #505 is about.
I've upgraded all the things as a first step. I was running v1.9.4 on EdgeOS v2.0.9 (no hotfix) for many months without issue.
The text was updated successfully, but these errors were encountered: