Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用阿里dns,不定时出现Connection reset by peer #1712

Open
giveup opened this issue Mar 26, 2024 · 38 comments
Open

使用阿里dns,不定时出现Connection reset by peer #1712

giveup opened this issue Mar 26, 2024 · 38 comments

Comments

@giveup
Copy link

giveup commented Mar 26, 2024

问题现象
如日志显示,使用阿里dns作为上游,不定时出现Connection reset by peer和500错误,考虑到这是阿里的服务,不太可能出现宕机和被干扰的问题。(另一个证据就是,使用阿里dot时,没有出现这种异常)

Mon Mar 25 17:23:49 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer
Mon Mar 25 17:24:12 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer
Mon Mar 25 17:28:40 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer
Mon Mar 25 17:38:53 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer
Mon Mar 25 17:43:48 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer
Mon Mar 25 17:44:33 2024 user.warn smartdns: Handshake with 223.6.6.6 failed, Connection reset by peer
Mon Mar 25 17:48:50 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer
Mon Mar 25 17:49:35 2024 user.warn smartdns: Handshake with 223.6.6.6 failed, Connection reset by peer
Mon Mar 25 17:53:52 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer
Mon Mar 25 17:54:14 2024 user.warn smartdns: http server query from 223.6.6.6:443 failed, server return http code : 500, Internal Server Error
Mon Mar 25 17:58:54 2024 user.warn smartdns: Handshake with 223.6.6.6 failed, Connection reset by peer
Mon Mar 25 17:59:38 2024 user.warn smartdns: Handshake with 223.6.6.6 failed, Connection reset by peer
Mon Mar 25 18:04:03 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer
Mon Mar 25 18:04:39 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer
Mon Mar 25 18:14:43 2024 user.warn smartdns: Handshake with 223.6.6.6 failed, Connection reset by peer
Tue Mar 26 01:28:59 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer
Tue Mar 26 02:19:45 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer
Tue Mar 26 02:24:47 2024 user.warn smartdns: Handshake with 223.6.6.6 failed, Connection reset by peer
Tue Mar 26 10:09:01 2024 user.warn smartdns: http server query from 223.6.6.6:443 failed, server return http code : 500, Internal Server Error
Tue Mar 26 10:09:01 2024 user.warn smartdns: http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error

运行环境

  1. ax6s

  2. 运营商
    联通

  3. smartdns来源以及版本
    1.2024.v45.0.11-OpenWrt-openssl3

  4. 涉及的配置(注意去除个人相关信息)

server-https https://223.5.5.5/dns-query  -no-check-certificate -http-host dns.alidns.com -group domestic -subnet ****
server-https https://223.6.6.6/dns-query  -no-check-certificate -http-host dns.alidns.com -group domestic -subnet ****
server-https https://223.5.5.5/dns-query  -no-check-certificate -http-host dns.alidns.com -group oversea -subnet ****
server-https https://223.6.6.6/dns-query  -no-check-certificate -http-host dns.alidns.com -group oversea -subnet ****

重现步骤
不定时出现,无法稳定复现。

信息收集

  1. 将/var/log/smrtdns.log日志作为附件上传(注意去除个人相关信息)。
  2. 如进程异常,请将coredump功能开启,上传coredump信息文件,同时上传配套的smartdns进程文件。
    在自定义界面,开启设置->自定义设置->生成coredump配置,重现问题后提交coredump文件
    coredump文件在/tmp目录下
@PikuZheng
Copy link
Contributor

#1673 起,我就开始从家里的宽带监视阿里doh可用性,从未有错误(联通)
image
目前还是考虑单个ip查询超出阿里配额上限,被黑名单

@giveup
Copy link
Author

giveup commented Mar 26, 2024

@PikuZheng 使用dot时并不会出现这种错误,难道针对IP的策略还区分doh和dot?
除了500错误偶尔还会出现400

Tue Mar 26 15:10:44 2024 user.warn smartdns: http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
Tue Mar 26 15:19:51 2024 user.warn smartdns: http server query from 223.6.6.6:443 failed, server return http code : 400, Bad Request
Tue Mar 26 15:19:51 2024 user.warn smartdns: http server query from 223.5.5.5:443 failed, server return http code : 400, Bad Request

@PikuZheng
Copy link
Contributor

建议不时的ping一下223.5.5.5,观察ttl会不会变化。
@pymumu 有没有可能是到阿里的数据中心发生跳跃,但是smartdns在尝试保持https链路

@PikuZheng
Copy link
Contributor

问下 楼主用的哪里的联通

@giveup
Copy link
Author

giveup commented Mar 26, 2024

根据阿里dns官网的说法,我所在地区是有阿里dns的节点的,无论是稳定性还是延迟都应该是比较优秀的。
目前手工ping,挂机一段时间看看ttl如何。

@pymumu
Copy link
Owner

pymumu commented Mar 26, 2024

发一下配置看看。

@giveup
Copy link
Author

giveup commented Mar 26, 2024

完整配置如下,配置doh和dot除了上游dns(阿里)不同之外,其他完全一致。 @pymumu

server-name smartdns
speed-check-mode none
dualstack-ip-selection no
prefetch-domain yes
serve-expired yes
dnsmasq-lease-file /tmp/dhcp.leases
rr-ttl-min 600
log-size 64K
log-num 1
log-level notice
log-syslog yes
audit-size 64K
audit-num 1
cache-persist yes
cache-file /etc/smartdns/smartdns.cache
resolv-file /tmp/resolv.conf.d/resolv.conf.auto
bind :1153@br-lan -group domestic
bind :1153@lo -group domestic
bind :1154@br-lan  -no-speed-check -no-dualstack-selection -force-aaaa-soa -group oversea
bind :1154@lo  -no-speed-check -no-dualstack-selection -force-aaaa-soa -group oversea
server-https https://223.5.5.5/dns-query  -no-check-certificate -http-host dns.alidns.com -group domestic -subnet ***
server-https https://223.6.6.6/dns-query  -no-check-certificate -http-host dns.alidns.com -group domestic -subnet ***
server-https https://223.5.5.5/dns-query  -no-check-certificate -http-host dns.alidns.com -group oversea -subnet ***
server-https https://223.6.6.6/dns-query  -no-check-certificate -http-host dns.alidns.com -group oversea -subnet ***
domain-set -name domain-block-list -file /etc/smartdns/domain-block.list
domain-rules /domain-set:domain-block-list/ -address #
conf-file /etc/smartdns/address.conf
conf-file /etc/smartdns/blacklist-ip.conf
conf-file /etc/smartdns/custom.conf

/etc/smartdns/custom.conf 这个配置为空,没有特殊自定义配置

@giveup
Copy link
Author

giveup commented Mar 26, 2024

建议不时的ping一下223.5.5.5,观察ttl会不会变化。 @pymumu 有没有可能是到阿里的数据中心发生跳跃,但是smartdns在尝试保持https链路

跑了一段时间,除了偶尔几个包超时之外,没有其他异常。ttl也没有变化

root@AX6S:~# ping 223.5.5.5
PING 223.5.5.5 (223.5.5.5): 56 data bytes
64 bytes from 223.5.5.5: seq=0 ttl=118 time=9.488 ms
64 bytes from 223.5.5.5: seq=1 ttl=118 time=8.927 ms
64 bytes from 223.5.5.5: seq=2 ttl=118 time=9.372 ms
64 bytes from 223.5.5.5: seq=3 ttl=118 time=8.487 ms
64 bytes from 223.5.5.5: seq=4 ttl=118 time=8.218 ms
64 bytes from 223.5.5.5: seq=5 ttl=118 time=8.509 ms
64 bytes from 223.5.5.5: seq=6 ttl=118 time=8.275 ms
64 bytes from 223.5.5.5: seq=7 ttl=118 time=8.500 ms
64 bytes from 223.5.5.5: seq=8 ttl=118 time=8.283 ms
^C
--- 223.5.5.5 ping statistics ---
9 packets transmitted, 9 packets received, 0% packet loss
round-trip min/avg/max = 8.218/8.673/9.488 ms

路由跟踪

root@AX6S:~#  traceroute 223.5.5.5
traceroute to 223.5.5.5 (223.5.5.5), 30 hops max, 46 byte packets
 1  10.139.32.1 (10.139.32.1)  1.233 ms  1.109 ms  0.983 ms
 2  120.80.166.45 (120.80.166.45)  2.901 ms  2.439 ms  2.571 ms
 3  120.80.145.210 (120.80.145.210)  5.863 ms  120.80.144.78 (120.80.144.78)  6.475 ms  120.80.145.210 (120.80.145.210)  6.011 ms
 4  120.80.99.18 (120.80.99.18)  13.989 ms  12.068 ms  120.80.98.202 (120.80.98.202)  10.058 ms
 5  *  112.95.237.198 (112.95.237.198)  6.639 ms  112.95.237.194 (112.95.237.194)  8.785 ms
 6  *  *  116.251.113.146 (116.251.113.146)  8.552 ms
 7  *  *  *
 8  *  *  *
 9  *  *  *
10  *  *  *
11  *  *  *
12  *  *  *
13  *  *  *
14  *  *  *
15  *  *  *
16  *  *  *
17  *  *  *
18  *  *  *
19  *  *  *
20  *  *  *
21  *  *  *
22  *  *  *
23  *  *  *
24  *  *  *
25  *  *  *
26  *  *  *
27  *  *  *
28  *  *  *
29  *  *  *
30  *  *  *

@pymumu
Copy link
Owner

pymumu commented Mar 26, 2024

你用我编译的release版本看看。

@giveup
Copy link
Author

giveup commented Mar 26, 2024

还是会出现500错误 @pymumu

Tue Mar 26 23:47:51 2024 user.notice smartdns: smartdns starting...(Copyright (C) Nick Peng <pymumu@gmail.com>, build: 1.2024.02.08-0828 (Release43-141-g9ee27e7))
Tue Mar 26 23:50:00 2024 cron.err crond[5368]: USER root pid 10287 cmd /usr/share/wginstaller/wg.sh cleanup_wginterfaces
Tue Mar 26 23:50:03 2024 user.warn smartdns: http server query from 223.6.6.6:443 failed, server return http code : 500, Internal Server Error
Tue Mar 26 23:50:03 2024 user.warn smartdns: http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
Tue Mar 26 23:50:03 2024 user.warn smartdns: send query ipcelou.com to upstream server failed, total server number 0

@lalasou
Copy link

lalasou commented Mar 27, 2024

大概有一段时间了 不止阿里 腾讯和谷歌都这样 阿里最严重
doh dot 都这样 貌似只有tcp udp不这样
不知道为什么
ping 都正常

@pymumu
Copy link
Owner

pymumu commented Mar 27, 2024

去掉server-https的subnet配置看看。

@lalasou
Copy link

lalasou commented Mar 27, 2024

去掉server-https的subnet配置看看。

我也用了edns-client-subnet
是这个问题吗?
我先去掉试试

之前缓存数量开的大 这种情况就很多log
我以为是请求太多风控了呢

@lalasou
Copy link

lalasou commented Mar 29, 2024

跑了试试 没有错误了 貌似真是这个subnet问题 我在用两天试试

@giveup
Copy link
Author

giveup commented Mar 29, 2024

@pymumu 用release版跑了两天,暂时没看到类似的错误。还有个变量是清除缓存了,之前用的每夜版有9000多个域名缓存,现在只有不到2000个。其他的配置保持不变。关于风控,我觉得不太可能,因为之前用dot时,也是这么多缓存,也有配置edns。但是没有出现这种错误。

@PikuZheng
Copy link
Contributor

2024-03-29 03:58:04,559][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error

在一小时准时出现的 考虑是阿里不允许长连接?

@lalasou
Copy link

lalasou commented Mar 30, 2024

跑了两天 可以确定使用subnet问题
去掉subnet 现在缓存11m了没发生错误log

@PikuZheng
Copy link
Contributor

跑了两天 可以确定使用subnet问题 去掉subnet 现在缓存11m了没发生错误log

我有不同意见 😃 server-https https://223.5.5.5/dns-query -http-host dns.alidns.com -group mainland -blacklist-ip这个配置

[2024-03-29 03:58:04,559][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 07:09:45,559][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 08:21:47,557][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 08:46:19,555][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 09:17:26,558][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 09:51:10,558][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 09:54:52,753][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 11:58:04,559][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 15:09:45,576][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 16:21:47,559][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 17:17:26,556][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 17:51:10,737][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 17:54:52,557][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error

@lalasou
Copy link

lalasou commented Mar 30, 2024

跑了两天 可以确定使用subnet问题 去掉subnet 现在缓存11m了没发生错误log

我有不同意见 😃 server-https https://223.5.5.5/dns-query -http-host dns.alidns.com -group mainland -blacklist-ip这个配置

[2024-03-29 03:58:04,559][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 07:09:45,559][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 08:21:47,557][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 08:46:19,555][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 09:17:26,558][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 09:51:10,558][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 09:54:52,753][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 11:58:04,559][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 15:09:45,576][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 16:21:47,559][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 17:17:26,556][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 17:51:10,737][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-03-29 17:54:52,557][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error

好吧 我的暂时解决了

@PikuZheng
Copy link
Contributor

发现不止是阿里dns,使用smartdns作为https服务时,在阿里出现500或connect reset问题的同时,smartdns的下游也会出现连接问题

DoH server connection error: remote disconnected while in HTTP exchange
DoH server connection error: ERROR parsing http: content-length overflow

而且到目前 只有阿里的https会出现

@giveup
Copy link
Author

giveup commented Apr 4, 2024

我从release 45切换到每夜版,又出现了Wed Apr 3 18:23:40 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer ,每夜版和release版的区别是主要openssl的库的区别吧,会不会和这个有关呢?@PikuZheng @pymumu

@lalasou
Copy link

lalasou commented Apr 4, 2024

我从release 45切换到每夜版,又出现了`Wed Apr 3 18:23:40 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer

`,每夜版和release版的区别是主要openssl的库的区别吧,会不会和这个有关呢?@PikuZheng @pymumu

我把subnet 都删了 证书-k
好几天了 没有任何错误报告

@PikuZheng
Copy link
Contributor

PikuZheng commented Apr 4, 2024

我从release 45切换到每夜版,又出现了Wed Apr 3 18:23:40 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, Connection reset by peer ,每夜版和release版的区别是主要openssl的库的区别吧,会不会和这个有关呢?@PikuZheng @pymumu

我用的alpine容器自带OpenSSL,应该是3.2.1。
我应该尝试用3.0.11编译吗(理论上来说应该总是使用最新版的OpenSSL

@PikuZheng
Copy link
Contributor

试着换成openssl3.0.11结果更严重了,除了阿里,腾讯的也开始connect reset

[2024-04-04 06:28:05,764][ WARN][     dns_client.c:3310] Handshake with 1.12.12.12 failed, Connection reset by peer
[2024-04-04 06:28:05,768][ WARN][     dns_client.c:3310] Handshake with 120.53.53.53 failed, Connection reset by peer
[2024-04-04 06:28:43,068][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-04-04 06:28:52,760][ WARN][     dns_client.c:3310] Handshake with 1.12.12.12 failed, Connection reset by peer

但是我在openssl3.1.4,只是阿里的会断续的出现500,腾讯的一直是正常的

@lalasou
Copy link

lalasou commented Apr 4, 2024

我用作者release版本正常

@giveup
Copy link
Author

giveup commented Apr 4, 2024

试着换成openssl3.0.11结果更严重了,除了阿里,腾讯的也开始connect reset

[2024-04-04 06:28:05,764][ WARN][     dns_client.c:3310] Handshake with 1.12.12.12 failed, Connection reset by peer
[2024-04-04 06:28:05,768][ WARN][     dns_client.c:3310] Handshake with 120.53.53.53 failed, Connection reset by peer
[2024-04-04 06:28:43,068][ WARN][     dns_client.c:2866] http server query from 223.5.5.5:443 failed, server return http code : 500, Internal Server Error
[2024-04-04 06:28:52,760][ WARN][     dns_client.c:3310] Handshake with 1.12.12.12 failed, Connection reset by peer

但是我在openssl3.1.4,只是阿里的会断续的出现500,腾讯的一直是正常的

每夜版依赖的openssl应该是系统自带的,我这边是3.0.13-1,release版应该是把依赖也编译进去了,但是我看不出版本是多少。

root@AX6S:~# opkg depends smartdns
smartdns depends on:
	libc
	libpthread
	libopenssl3
root@AX6S:~# opkg list-installed | grep 'libopenssl3'
libopenssl3 - 3.0.13-1

@CallMeR
Copy link

CallMeR commented Apr 10, 2024

不仅是 SmartDNS 会有,我测试环境中 Unbound DNS 用来 AliDNS 的 DoT 也会有这个现象:

image

image

@lalasou
Copy link

lalasou commented Apr 10, 2024

之前 好几天都没出现了 今天突然感觉打开网页卡 查看log 就出现了一堆
image

缓存数量大概 22301

@crackerfly
Copy link

#1673 起,我就开始从家里的宽带监视阿里doh可用性,从未有错误(联通) image 目前还是考虑单个ip查询超出阿里配额上限,被黑名单

哥!!! 你这个监测软件可以分享一下嘛....

@PikuZheng
Copy link
Contributor

#1673 起,我就开始从家里的宽带监视阿里doh可用性,从未有错误(联通) image 目前还是考虑单个ip查询超出阿里配额上限,被黑名单

哥!!! 你这个监测软件可以分享一下嘛....

https://github.com/louislam/uptime-kuma

@giveup
Copy link
Author

giveup commented Apr 15, 2024

使用release版,且使用dot时,也会出现类似的错误。

Mon Apr 15 19:16:35 2024 user.notice smartdns: smartdns starting...(Copyright (C) Nick Peng <pymumu@gmail.com>, build: 1.2024.02.08-0828 (Release43-141-g9ee27e7))
Mon Apr 15 19:16:35 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, No error information
Mon Apr 15 19:16:35 2024 user.warn smartdns: Handshake with 223.6.6.6 failed, No error information
Mon Apr 15 19:16:36 2024 user.warn smartdns: Handshake with 223.5.5.5 failed, No error information

@devioa
Copy link

devioa commented Apr 22, 2024

我最新源码编译的,阿里DNS的所有查询方式(DoH/DoT/HTTPS等)我都没问题,相反我腾讯DNS前不久开始到现在却是经常Connection reset by peer,也是DoH/DoT/HTTPS等都报错

@PikuZheng
Copy link
Contributor

阿里突然挂了
image

@lalasou
Copy link

lalasou commented Apr 26, 2024

一堆错误代码
放弃了 国内用 直接阿里udp了

@CallMeR
Copy link

CallMeR commented Apr 26, 2024

难道开始用 360 的 dot 和 doh (

@lalasou
Copy link

lalasou commented Apr 26, 2024

难道开始用 360 的 dot 和 doh (

台湾dns 也挺好

@CallMeR
Copy link

CallMeR commented Apr 26, 2024

难道开始用 360 的 dot 和 doh (

台湾dns 也挺好

指个路? 我去看看延迟和解析结果怎么样 (

@PikuZheng
Copy link
Contributor

PikuZheng commented May 7, 2024

考虑和 #1730 是相同问题?因为我测试过从路由器断开https查询后,smartdns重连端口号是不变的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants