Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaking (when recovering from endpoint failures?) #505

Closed
brainsik opened this issue Jun 2, 2021 · 20 comments
Closed

Memory leaking (when recovering from endpoint failures?) #505

brainsik opened this issue Jun 2, 2021 · 20 comments
Labels
🐞 bug Something isn't working 🖥️ edgeos

Comments

@brainsik
Copy link

brainsik commented Jun 2, 2021

Context

  • nextdns version 1.32.2
  • EdgeRouter X v2.0.9-hotfix.2

For days I've had a lot of problems with the NextDNS client failing. Lots of cached HTTP/2.0: doh resolve: context deadline exceeded in the logs. Starting yesterday, the process is getting killed by the OOM killer. This Issue is going to focus on that.I'll open another to focus on why the failures are happening the first place.

It appears that when the following kind of dance occurs, the NextDNS resident memory size grows:

Jun  2 14:27:50 ubnt nextdns[14380]: message repeated 19 times: [ Received signal: broken pipe (ignored)]
Jun  2 14:27:50 ubnt nextdns[14380]: Endpoint provider failed: &{dns.nextdns.io. https://dns.nextdns.io#45.90.28.0,2a07:a8c0::,45.90.30.0,2a07:a8c1::}: exchange: roundtrip: unexpected EOF
Jun  2 14:27:53 ubnt nextdns[14380]: Connected 104.238.181.28:443 (con=11ms tls=1472ms, TCP, TLS13)

After enough of those, the process eventually is killed:

Jun  2 16:04:00 ubnt kernel: nextdns invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
Jun  2 16:04:00 ubnt kernel: CPU: 3 PID: 14383 Comm: nextdns Tainted: P           O    4.14.54-UBNT #1
Jun  2 16:04:00 ubnt kernel: Out of memory: Kill process 14380 (nextdns) score 525 or sacrifice child
Jun  2 16:04:00 ubnt kernel: Killed process 14380 (nextdns) total-vm:671096kB, anon-rss:136880kB, file-rss:0kB, shmem-rss:0kB
Jun  2 16:04:00 ubnt kernel: Process 14383 (nextdns) has crashed (parent 1 (systemd) signal 11, code 128, addr   (null)), preparing coredump
Jun  2 16:04:00 ubnt kernel: Error while handling coredump for 14383 (nextdns): coredump_wait(siginfo->si_signo, &core_state) < 0
Jun  2 16:04:00 ubnt kernel: Process 14392 (nextdns) has crashed (parent 1 (systemd) signal 11, code 128, addr   (null)), preparing coredump
Jun  2 16:04:00 ubnt kernel: Process 14441 (nextdns) has crashed (parent 1 (systemd) signal 11, code 128, addr   (null)), preparing coredump
Jun  2 16:04:00 ubnt kernel: Error while handling coredump for 14441 (nextdns): coredump_wait(siginfo->si_signo, &core_state) < 0
Jun  2 16:04:00 ubnt kernel: Process 14439 (nextdns) has crashed (parent 1 (systemd) signal 11, code 128, addr   (null)), preparing coredump
Jun  2 16:04:00 ubnt kernel: Error while handling coredump for 14439 (nextdns): coredump_wait(siginfo->si_signo, &core_state) < 0
Jun  2 16:04:00 ubnt kernel: Error while handling coredump for 14392 (nextdns): coredump_wait(siginfo->si_signo, &core_state) < 0

How fast this occurs seems related to how much DNS traffic is happening on my network. Last night I restarted things before bed and it took about 4 hours. This morning after restarting, it took about 90 minutes.

@rs
Copy link
Contributor

rs commented Jun 2, 2021

Please try the 1.32.3.

@brainsik
Copy link
Author

brainsik commented Jun 2, 2021

This might be worse? In 10' the process was taking up over 100M of resident memory and was well on it's way to getting killed off by the oom-killer.

Worth mentioning: when these broken pipe messages are coming through the nextdns process is also chewing up all available CPU:

Jun  2 17:20:20 ubnt nextdns[18355]: Query 10.19.45.95 UDP A push.services.mozilla.com. (qry=43/res=12) 5588ms : doh resolve: context deadline exceeded
Jun  2 17:20:20 ubnt nextdns[18355]: Query 172.19.47.10 UDP A api.thingspeak.com. (qry=36/res=12) cached HTTP/2.0: doh resolve: context deadline exceeded
Jun  2 17:20:22 ubnt nextdns[18355]: Received signal: broken pipe (ignored)
Jun  2 17:20:27 ubnt nextdns[18355]: Received signal: broken pipe (ignored)
Jun  2 17:20:30 ubnt nextdns[18355]: Received signal: broken pipe (ignored)
Jun  2 17:20:39 ubnt nextdns[18355]: Received signal: broken pipe (ignored)
Jun  2 17:20:48 ubnt nextdns[18355]: Received signal: broken pipe (ignored)
Jun  2 17:20:56 ubnt nextdns[18355]: message repeated 2 times: [ Received signal: broken pipe (ignored)]
Jun  2 17:21:03 ubnt nextdns[18355]: Received signal: broken pipe (ignored)
root@ubnt:/config/nextdns# Jun  2 17:21:31 ubnt nextdns[18355]: message repeated 15 times: [ Received signal: broken pipe (ignored)]
Jun  2 17:21:33 ubnt nextdns[18355]: Received signal: broken pipe (ignored)
Jun  2 17:21:34 ubnt nextdns[18355]: Received signal: broken pipe (ignored)
Jun  2 17:21:35 ubnt nextdns[18355]: Received signal: broken pipe (ignored)
Jun  2 17:21:56 ubnt nextdns[18355]: message repeated 31 times: [ Received signal: broken pipe (ignored)]
Jun  2 17:21:56 ubnt nextdns[18355]: Received signal: broken pipe (ignored)
Jun  2 17:21:57 ubnt nextdns[18355]: Received signal: broken pipe (ignored)
Jun  2 17:22:03 ubnt nextdns[18355]: message repeated 21 times: [ Received signal: broken pipe (ignored)]
Jun  2 17:22:03 ubnt nextdns[18355]: Endpoint provider failed: &{dns.nextdns.io. https://dns.nextdns.io#45.90.28.0,2a07:a8c0::,45.90.30.0,2a07:a8c1::}: exchange: roundtrip: write tcp 73.70.50.17:54282->45.90.30.0:443: write: broken pipe

@rs
Copy link
Contributor

rs commented Jun 2, 2021

What is the output of nextdns config (please redact your config id)

@brainsik
Copy link
Author

brainsik commented Jun 2, 2021

Same as in #506:

$ /config/nextdns/nextdns config
log-queries false
cache-max-age 0s
hardened-privacy false
report-client-info true
bogus-priv true
setup-router true
listen localhost:53
max-ttl 0s
timeout 5s
cache-size 10MB
discovery-dns 
detect-captive-portals false
use-hosts true
auto-activate true
control /var/run/nextdns.sock
config 10.11.12.0/24=abc123
config def456

@Taik
Copy link

Taik commented Jun 3, 2021

I'm also seeing this same issue, running on NextDNS v1.32.3, also EdgeRouter X v2.0.9-hotfix.1.

Diag output: https://nextdns.io/diag/7b4e8710-c46f-11eb-82a5-a3ebfef2ae33

dmesg:

[129215.378449] Out of memory: Kill process 25355 (nextdns) score 497 or sacrifice child
[129215.394891] Killed process 25355 (nextdns) total-vm:671356kB, anon-rss:129732kB, file-rss:0kB, shmem-rss:0kB
[129215.419354] Process 25860 (nextdns) has crashed (parent 1 (systemd) signal 10, code 2, addr 000902e8), coredumps disabled
[129215.422327] Process 25359 (nextdns) has crashed (parent 1 (systemd) signal 10, code 2, addr 0007647c), coredumps disabled
[129215.424257] Process 25358 (nextdns) has crashed (parent 1 (systemd) signal 10, code 2, addr 0006109c), coredumps disabled
[129215.457728] Process 25361 (nextdns) has crashed (parent 1 (systemd) signal 10, code 2, addr 003b8ec0), coredumps disabled
[129215.535242] oom_reaper: reaped process 25355 (nextdns), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

nextdns config:

cache-size 10MB
report-client-info true
discovery-dns
setup-router true
timeout 5s
auto-activate false
listen localhost:53
control /var/run/nextdns.sock
config REDACTED
log-queries false
max-ttl 0s
detect-captive-portals false
hardened-privacy false
bogus-priv true
use-hosts true
cache-max-age 0s

@Tuinslak
Copy link

Tuinslak commented Jun 3, 2021

Facing the same on an EdgeRouter Lite for the past 48hrs:

$ show version
Version:      v2.0.9-hotfix.1
Build ID:     5371034
Build on:     01/22/21 10:14
Copyright:    2012-2020 Ubiquiti Networks, Inc.
HW model:     EdgeRouter Lite 3-Port
HW S/N:       788A207F4291
Uptime:       22:13:10 up  2:13,  1 user,  load average: 0.56, 1.43, 1.61
Jun  3 22:04:36 sg-erl kernel: nextdns invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
Jun  3 22:04:36 sg-erl kernel: CPU: 0 PID: 4900 Comm: nextdns Tainted: P           O    4.9.79-UBNT #1
Jun  3 22:04:36 sg-erl kernel: Out of memory: Kill process 4498 (nextdns) score 695 or sacrifice child
Jun  3 22:04:36 sg-erl kernel: Killed process 4498 (nextdns) total-vm:1190612kB, anon-rss:351372kB, file-rss:120kB, shmem-rss:0kB

Running latest version:

$ nextdns version
nextdns version 1.32.3

Config:

$ cat /config/nextdns/nextdns.conf
listen :53
max-ttl 10m0s
discovery-dns
timeout 5s
hardened-privacy false
use-hosts true
auto-activate true
config 10.33.128.0/24=bd762b
config 10.3.34.0/24=34e8d8
config 10.19.88.0/24=da86c1
config 10.60.99.0/24=da86c1
config 10.60.111.0/24=34e8d8
config da86c1
log-queries false
cache-size 10MB
cache-max-age 0s
detect-captive-portals false
bogus-priv true
setup-router true
control /var/run/nextdns.sock
report-client-info true

Some parts of the log ($ cat messages | grep nextdns):

Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.41 UDP AAAA cdn-0.nflximg.com. (qry=35/res=12) 5800ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.41 UDP A nrdp.prod.ftl.netflix.com. (qry=43/res=12) 5799ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.7 UDP A time.google.com. (qry=33/res=12) 5495ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.38 UDP A www.google.com. (qry=32/res=12) 5496ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.41 UDP A customerevents.netflix.com. (qry=44/res=12) 5834ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.41 UDP A uiboot.netflix.com. (qry=36/res=12) 5837ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.41 UDP A nrdp.prod.cloud.netflix.com. (qry=45/res=12) 5813ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.4 UDP AAAA www.google.com. (qry=32/res=12) 5246ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.38 UDP A connectivitycheck.gstatic.com. (qry=47/res=12) 5461ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.60.111.246 UDP A 0.ui.flatturtle.xyz. (qry=37/res=12) 5381ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.9 UDP A connectivitycheck.gstatic.com. (qry=47/res=12) 5003ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.41 UDP A cdn-0.nflximg.com. (qry=35/res=12) 6893ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.12 UDP A connectivitycheck.gstatic.com. (qry=47/res=12) 6699ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.38 UDP A www.google.com. (qry=32/res=12) 6580ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.38 UDP AAAA www.google.com. (qry=32/res=12) 6580ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.9 UDP A www.google.com. (qry=32/res=12) 6462ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.7 UDP A time.google.com. (qry=33/res=12) 6588ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.60.111.241 UDP A shan.sg. (qry=25/res=12) 6474ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.6 UDP A connectivitycheck.gstatic.com. (qry=47/res=12) 6474ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.38 UDP A connectivitycheck.gstatic.com. (qry=47/res=12) 6473ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.60.111.241 UDP A 0.ui.flatturtle.xyz. (qry=37/res=12) 6429ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.9 UDP A www.gstatic.com. (qry=33/res=12) 5355ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.38 UDP A connectivitycheck.gstatic.com. (qry=47/res=12) 6430ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.60.99.6 UDP A pool.ntp.org. (qry=30/res=12) 6355ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.7 UDP A connectivitycheck.gstatic.com. (qry=47/res=12) 5309ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.4 UDP A connectivitycheck.gstatic.com. (qry=47/res=12) 5254ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.60.111.89 UDP A chat.google.com. (qry=33/res=12) 6103ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.9 UDP A www.gstatic.com. (qry=33/res=12) 5215ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.41 UDP A ichnaea.netflix.com. (qry=37/res=12) 5902ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.60.111.89 UDP A oauthaccountmanager.googleapis.com. (qry=52/res=12) 6118ms : doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.6 UDP A apresolve.spotify.com. (qry=39/res=12) 5042ms HTTP/2.0: doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.38 UDP AAAA time.google.com. (qry=33/res=12) 5101ms HTTP/2.0: doh resolve: context deadline exceeded
Jun  3 22:09:08 sg-erl nextdns[17709]: Query 10.3.34.4 UDP A connectivitycheck.gstatic.com. (qry=47/res=12) 5141ms HTTP/2.0: doh resolve: unexpected EOF
Jun  3 22:09:09 sg-erl nextdns[17709]: Connected 103.62.48.147:443 (con=16ms tls=1749ms, TCP, TLS13)
Jun  3 22:09:09 sg-erl nextdns[17709]: Switching endpoint: https://dns.nextdns.io.#217.146.9.93,2a00:11c0:26:4::5,103.62.48.147,2401:3bc0:600:101:c2f:a6e6:5afd:1
Jun  3 22:10:30 sg-erl nextdns[17709]: Connected [2a00:11c0:26:4::5]:443 (con=50ms tls=0ms, TCP, )
Jun  3 22:10:53 sg-erl nextdns[17709]: Received signal: interrupt (ignored)
Jun  3 22:11:05 sg-erl nextdns[17709]: Received signal: interrupt (ignored)
Jun  3 22:11:28 sg-erl nextdns[17709]: Connected [2401:3bc0:600:101:c2f:a6e6:5afd:1]:443 (con=4ms tls=912ms, TCP, TLS13)
$ curl https://anycast.dns1.nextdns.io/info
{"locationName": "🇸🇬 Singapore", "pop": "zepto-sin", "rtt": 3872}
$ curl https://anycast.dns2.nextdns.io/info
{"locationName": "🇸🇬 Singapore", "pop": "anexia-sin", "rtt": 4343}

@jfchevrette
Copy link

jfchevrette commented Jun 3, 2021

Facing a very similar issue except that I am not seeing OOM-kills but rather the nextdns process starts using all 4 CPU cores on my poor EdgerouterX and my entire network becomes unresponsive until I reboot the router

nextdns

Jun  3 12:53:57 ubnt nextdns[3440]: Received signal: broken pipe (ignored)
Jun  3 12:53:58 ubnt nextdns[3440]: message repeated 2 times: [ Received signal: broken pipe (ignored)]
Jun  3 12:54:13 ubnt nextdns[3440]: Received signal: broken pipe (ignored)
Jun  3 12:54:23 ubnt nextdns[3440]: message repeated 23 times: [ Received signal: broken pipe (ignored)]
Jun  3 12:54:23 ubnt nextdns[3440]: Endpoint provider failed: &{dns.nextdns.io. https://dns.nextdns.io#45.90.28.0,2a07:a8c0::,45.90.30.0,2a07:a8c1::}: exchange: roundtrip: write tcp 76.69.235.45:50302->45.90.30.0:443: write: broken pipe
Jun  3 12:54:23 ubnt nextdns[3440]: Received signal: broken pipe (ignored)
Jun  3 12:54:26 ubnt nextdns[3440]: Connected 86.106.90.57:443 (con=2ms tls=1437ms, TCP, TLS13)
Jun  3 12:54:26 ubnt nextdns[3440]: Switching endpoint: https://dns.nextdns.io.#51.222.107.153,86.106.90.57
root@ubnt:~# show version
Version:      v2.0.9-hotfix.1
Build ID:     5371034
Build on:     01/22/21 10:14
Copyright:    2012-2020 Ubiquiti Networks, Inc.
HW model:     EdgeRouter X 5-Port
HW S/N:       <redacted>
Uptime:       13:04:09 up  2:35,  1 user,  load average: 1.75, 1.58, 1.60
root@ubnt:~# /config/nextdns/nextdns version
nextdns version 1.32.2
root@ubnt:/var/log# /config/nextdns/nextdns config
cache-max-age 0s
listen localhost:53
cache-size 10MB
max-ttl 5s
hardened-privacy false
discovery-dns
detect-captive-portals false
bogus-priv true
setup-router true
auto-activate false
control /var/run/nextdns.sock
config c913d8
log-queries false
report-client-info true
use-hosts true
timeout 5s
root@ubnt:~# curl https://anycast.dns1.nextdns.io/info
{"locationName": "🇨🇦 Toronto, Canada", "pop": "vultr-yto", "rtt": 20732}

@rs
Copy link
Contributor

rs commented Jun 3, 2021

Can you please run sh -c "$(curl -sL https://nextdns.io/install)" and report the lines starting with "INFO:"? Also, what do you get for cat /proc/cpuinfo

@jfchevrette
Copy link

INFO: OS: edgeos
INFO: GOARCH: mipsle_softfloat
INFO: GOOS: linux
INFO: NEXTDNS_BIN: /config/nextdns/nextdns
INFO: LATEST_RELEASE: 1.32.3
root@ubnt:~# cat /proc/cpuinfo
system type		: MediaTek MT7621 ver:1 eco:3
machine			: Ubiquiti EdgeRouter
processor		: 0
cpu model		: MIPS 1004Kc V2.15
BogoMIPS		: 583.68
wait instruction	: yes
microsecond timers	: yes
tlb_entries		: 32
extra interrupt vector	: yes
hardware watchpoint	: yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa			: mips1 mips2 mips32r1 mips32r2
ASEs implemented	: mips16 dsp mt
shadow register sets	: 1
kscratch registers	: 0
package			: 0
core			: 0
VPE			: 0
VCED exceptions		: not available
VCEI exceptions		: not available

processor		: 1
cpu model		: MIPS 1004Kc V2.15
BogoMIPS		: 583.68
wait instruction	: yes
microsecond timers	: yes
tlb_entries		: 32
extra interrupt vector	: yes
hardware watchpoint	: yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa			: mips1 mips2 mips32r1 mips32r2
ASEs implemented	: mips16 dsp mt
shadow register sets	: 1
kscratch registers	: 0
package			: 0
core			: 0
VPE			: 1
VCED exceptions		: not available
VCEI exceptions		: not available

processor		: 2
cpu model		: MIPS 1004Kc V2.15
BogoMIPS		: 583.68
wait instruction	: yes
microsecond timers	: yes
tlb_entries		: 32
extra interrupt vector	: yes
hardware watchpoint	: yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa			: mips1 mips2 mips32r1 mips32r2
ASEs implemented	: mips16 dsp mt
shadow register sets	: 1
kscratch registers	: 0
package			: 0
core			: 1
VPE			: 0
VCED exceptions		: not available
VCEI exceptions		: not available

processor		: 3
cpu model		: MIPS 1004Kc V2.15
BogoMIPS		: 583.68
wait instruction	: yes
microsecond timers	: yes
tlb_entries		: 32
extra interrupt vector	: yes
hardware watchpoint	: yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa			: mips1 mips2 mips32r1 mips32r2
ASEs implemented	: mips16 dsp mt
shadow register sets	: 1
kscratch registers	: 0
package			: 0
core			: 1
VPE			: 1
VCED exceptions		: not available
VCEI exceptions		: not available

@rs
Copy link
Contributor

rs commented Jun 3, 2021

In order to debug this, we rolled back the cert change. Can you plus restart the process and report if it fixes the issue? It would help us a lot. Thanks.

@jfchevrette
Copy link

Thanks Olivier! @rs

I had upgraded to 1.32.3 after sending my last comment. I have now restarted the process using nextdns restart and will report if I get the issue again

@rs
Copy link
Contributor

rs commented Jun 3, 2021

Can you also please show the output of this command: openssl speed ecdsap256 rsa2048 rsa4096

@Taik
Copy link

Taik commented Jun 3, 2021

From my EdgeRouter X:

INFO: OS: edgeos
INFO: GOARCH: mipsle_softfloat
INFO: GOOS: linux
INFO: NEXTDNS_BIN: /config/nextdns/nextdns
INFO: LATEST_RELEASE: 1.32.3
c) Configure NextDNS
r) Remove NextDNS
e) Exit
Choice (default=c):
root@ubnt:~# cat /proc/cpuinfo
system type             : MediaTek MT7621 ver:1 eco:3
machine                 : Ubiquiti EdgeRouter
processor               : 0
cpu model               : MIPS 1004Kc V2.15
BogoMIPS                : 581.63
wait instruction        : yes
microsecond timers      : yes
tlb_entries             : 32
extra interrupt vector  : yes
hardware watchpoint     : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa                     : mips1 mips2 mips32r1 mips32r2
ASEs implemented        : mips16 dsp mt
shadow register sets    : 1
kscratch registers      : 0
package                 : 0
core                    : 0
VPE                     : 0
VCED exceptions         : not available
VCEI exceptions         : not available

processor               : 1
cpu model               : MIPS 1004Kc V2.15
BogoMIPS                : 581.63
wait instruction        : yes
microsecond timers      : yes
tlb_entries             : 32
extra interrupt vector  : yes
hardware watchpoint     : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa                     : mips1 mips2 mips32r1 mips32r2
ASEs implemented        : mips16 dsp mt
shadow register sets    : 1
kscratch registers      : 0
package                 : 0
core                    : 0
VPE                     : 1
VCED exceptions         : not available
VCEI exceptions         : not available

processor               : 2
cpu model               : MIPS 1004Kc V2.15
BogoMIPS                : 581.63
wait instruction        : yes
microsecond timers      : yes
tlb_entries             : 32
extra interrupt vector  : yes
hardware watchpoint     : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa                     : mips1 mips2 mips32r1 mips32r2
ASEs implemented        : mips16 dsp mt
shadow register sets    : 1
kscratch registers      : 0
package                 : 0
core                    : 1
VPE                     : 0
VCED exceptions         : not available
VCEI exceptions         : not available

processor               : 3
cpu model               : MIPS 1004Kc V2.15
BogoMIPS                : 581.63
wait instruction        : yes
microsecond timers      : yes
tlb_entries             : 32
extra interrupt vector  : yes
hardware watchpoint     : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa                     : mips1 mips2 mips32r1 mips32r2
ASEs implemented        : mips16 dsp mt
shadow register sets    : 1
kscratch registers      : 0
package                 : 0
core                    : 1
VPE                     : 1
VCED exceptions         : not available
VCEI exceptions         : not available

root@ubnt:~#
Doing 2048 bit private rsa's for 10s: 104 2048 bit private RSA's in 9.98s
Doing 2048 bit public rsa's for 10s: 4034 2048 bit public RSA's in 9.91s
Doing 4096 bit private rsa's for 10s: 16 4096 bit private RSA's in 10.06s
Doing 4096 bit public rsa's for 10s: 1137 4096 bit public RSA's in 9.95s
Doing 256 bit sign ecdsa's for 10s: 425 256 bit ECDSA signs in 9.98s
Doing 256 bit verify ecdsa's for 10s: 690 256 bit ECDSA verify in 9.93s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM -DOPENSSLDIR="\"/usr/lib/ssl\"" -DENGINESDIR="\"/usr/lib/mipsel-linux-gnu/engines-1.1\""
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.095962s 0.002457s     10.4    407.1
rsa 4096 bits 0.628750s 0.008751s      1.6    114.3
                              sign    verify    sign/s verify/s
 256 bit ecdsa (nistp256)   0.0235s   0.0144s     42.6     69.5
root@ubnt:~#

@Tuinslak
Copy link

Tuinslak commented Jun 4, 2021

Took forever to run, but diag completed: https://nextdns.io/diag/4e513a80-c4d0-11eb-ba3e-9583b47c3fcd

@jfchevrette
Copy link

I have not seen the issue again since the cert rollback & upgrade to 1.32.3

I think this issue can be closed

@rs
Copy link
Contributor

rs commented Jun 17, 2021

The issue is still there and a fix will be pushed soon.

@andrew-susanto
Copy link

Is this issue already fixed? I still got this problem on cli 1.37.2. The memory usage grows about 35 MB just in a minute because of endpoint failure (?). The log is full of cache fallback HTTP/2.0: doh resolve: context deadline exceeded.

@rs
Copy link
Contributor

rs commented Oct 11, 2021

It's normal that memory raises, but it should be bounded by the max number of allowed // queries.

@shivv3r
Copy link

shivv3r commented Dec 13, 2021

Still not fixed. Having the same issue on Xiaomi Mi Wifi Mini on latest OpenWRT.

NextDNS on OpenWRT takes 543% of memory and 50% of CPU

Tue Dec 14 00:10:36 2021 daemon.warn nextdns[28407]: Endpoint failed: https://dns.nextdns.io#116.203.147.209,2a01:4f8:c0c:fa3f::1: roundtrip: context deadline exceeded

Tue Dec 14 00:10:41 2021 daemon.warn nextdns[28407]: Endpoint failed: https://dns.nextdns.io#217.146.22.163,2a00:11c0:e:ffff:1::d: roundtrip: context deadline exceeded

Tue Dec 14 00:10:46 2021 daemon.warn nextdns[28407]: Endpoint failed: https://dns1.nextdns.io#45.90.28.0,2a07:a8c0::: roundtrip: context deadline exceeded

Tue Dec 14 00:10:52 2021 daemon.warn nextdns[28407]: Endpoint failed: https://dns2.nextdns.io#45.90.30.0,2a07:a8c1::: roundtrip: context deadline exceeded

@poisonsnak
Copy link

poisonsnak commented Aug 5, 2022

I'm facing this issue as well on EdgeOS on 1.37.11. NextDNS memory usage increases when queries fail, and memory used in this way never seems to get released. There are 2 main issues associated with this

  1. If your internet goes down NextDNS starts using a lot of memory. Here's a graph from UISP showing the router's memory usage. You can see memory usage jump from 58% to 70% due to a 3 minute internet outage around 3am on Aug 2, then it jumps to 80% due to the same type of outage on Aug 3. In the afternoon of Aug 4 I used sudo nextdns restart and you can see the usage drop to 56%. This is on an ER-X with 256MB of RAM. Once you hit 80% on this router it starts behaving strangely because some memory is used by tmpfs that isn't accounted for on this graph.
    1

  2. Over time NextDNS slowly uses more and more memory, as queries naturally get lost on the internet. This graph isn't as clear but on Jul 5 memory usage is 60%. I had just restarted NextDNS on Jul 4. I restarted it again on Jul 22 and you can see it go down from 64% to 59% (when you zoom out to month view in UISP, unfortunately the graph gets smoothed out so it looks like it took most of the day to restart, but it didn't). Then memory usage grew to 66% by today. I just restarted NextDNS and it dropped to 61%. 80% is the danger zone so I need to make sure I restart it every 2 months or so.
    2
    For today's restart, here's how NextDNS looked in top before

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
3222 root      20   0  670168  28548   6408 S   0.0 11.3  58:21.26 nextdns

and after

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
2704 root      20   0  669912  13364   6408 S   0.3  5.3   0:11.57 nextdns

As a workaround you can just restart NextDNS. I've been doing it manually but I plan to use the EdgeOS task scheduler. For my one router with the regular ISP outages I think I will restart it daily. I want to restart it when there isn't much activity - I don't want to inconvenience my users, and if a bunch of queries fail during the startup process then I will have high memory usage all over again. I haven't tested it yet but I think this should work to restart it every day at 2:25 AM

set system task-scheduler task nextdns crontab-spec '25 2 * * *'
set system task-scheduler task nextdns executable arguments restart
set system task-scheduler task nextdns executable path /usr/bin/nextdns

edit: tested it last night and it works just fine

@rs rs closed this as not planned Won't fix, can't repro, duplicate, stale Sep 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working 🖥️ edgeos
Projects
None yet
Development

No branches or pull requests

8 participants