Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

强退BUG #106

Closed
Aniark opened this issue May 9, 2018 · 32 comments
Closed

强退BUG #106

Aniark opened this issue May 9, 2018 · 32 comments
Assignees
Labels

Comments

@Aniark
Copy link

Aniark commented May 9, 2018

系统:Ubuntu 14.04 LTS
内核:Linux 3.13.0-24-generic x86_64

客户端偶尔会出现连接不上服务端的情况,此时登录VPS查看进程,发现进程已经不在了

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 9, 2018

日志是什么呢?

@Aniark
Copy link
Author

Aniark commented May 9, 2018

@zonyitoo 请问日志在哪个目录? 本人是一个Linux初学者,只会一些基础操作,多有劳烦还请见谅

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 9, 2018

那就要看你配置的在哪里了。信息太少

@Kabie
Copy link

Kabie commented May 9, 2018

我也遇到类似情况,开启了UDP之后出现过几次,之前只用TCP没有遇见过,错误信息的最后几行,之前都是重复的UDP失败信息。似乎是文件打开太多。

[2018-05-09][09:55:13.645108703][INFO] UDP ASSOCIATE *.*.*.*:* -> *.*.*.*:*, payload length 52 bytes
[2018-05-09][09:55:13.645364292][ERROR] Udp relay error: Too many open files (os error 24)
[2018-05-09][09:55:13.656216553][ERROR] Server run failed: Too many open files (os error 24)
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Other, message: "Too many open files" }', libcore/result.rs:945:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 9, 2018

说明同时保持的UDP连接太多了,而且大多没有回复。这个需要Fix一下。

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 9, 2018

@Aniark @Kabie 麻烦用这个版本测试一下,打开UDP是否还会出现Too many open files的情况
shadowsocks-v1.7.0-alpha-nightly.x86_64-unknown-linux-musl.tar.gz

@Aniark
Copy link
Author

Aniark commented May 9, 2018

@zonyitoo 我用的是release的已编译文件,解压在/root目录下

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 10, 2018

@Aniark 我的意思是看你配置的输出日志在哪里。没有错误日志,我不知道当时在你的环境下发生了什么错误。像@Kabie 那样的日志

如果确实是有打开UDP Relay的话,试用一下上面我传的文件,看看问题还不会不会出现。

或直接使用alpha.2版本:https://github.com/shadowsocks/shadowsocks-rust/releases/tag/v1.7.0-alpha.2

@zonyitoo zonyitoo self-assigned this May 10, 2018
@zonyitoo zonyitoo added the bug label May 10, 2018
@Kabie
Copy link

Kabie commented May 10, 2018

测试了 1.7.0 alpha2 确实不会崩溃了,但是运行时间长之后还是会出现错误,之后表现为丢包。不知道是否和客户端有关,我用的是SSTap 1.1.0.1

[2018-05-10][07:07:25.680051908][ERROR] Udp associate waiting datagram * -> * timed out in Duration { secs: 300, nanos: 0 }
[2018-05-10][07:07:25.680153572][ERROR] Udp relay error: udp recv timed out

@zonyitoo
Copy link
Collaborator

这个没有丢包。是它发出请求之后,等了很久没有等到对方回包。
这也是为什么之前导致服务端会Crash,因为之前没有设置这个超时,所以很多一直没有回包的Socket就一直在那里,直至FD耗尽。
你可以查一下SSTap为什么一直在向一个不会给它回复的地址发UDP包。
这个错误日志是正常的。

zonyitoo added a commit that referenced this issue May 15, 2018
* Add #102 link in README

* Migrating to tokio, #100

* Migrated to tokio-signal v0.2, reformatted

* Should not call tokio::spawn before tokio::run

* Build nightly and stable with different image tag

* Removed unnecessary rustup install

* Renamed version to v1.7.0-alpha

* Removed deprecated calls of Buf

* Updated dependencies

* [#106] Fixed bug, should not hold UDP connections forever

* Lower info log to debug

* Bump version to v1.7.0-alpha.2

* Implement a simple DNS relay server, bump version to v1.7.0-alpha.3

* Better logging

* Fixed build on travis

* Add elapsed time in log

* Should not exit if handler return error

* reformatted

* Add test for DNS

* Updated ssdns param

* Use `trust-dns` and impl Trait
1. Replaced `ToSocketAddrs` with `trust-dns`
2. Uses impl Trait for functions
3. Updated dependencies
4. Fixed bugs

* Bug fixed, logging crate in ssdns
@zonyitoo
Copy link
Collaborator

目前没有更多的讨论的话,那看起来在新的alpha.4中已经Fix了问题吗?
@Aniark

@Aniark
Copy link
Author

Aniark commented May 15, 2018

@zonyitoo alpha4正在测试中,alpha1-3目前都存在该bug

@zonyitoo
Copy link
Collaborator

把日志提供出来,不然不知道到底错误是什么

@Aniark
Copy link
Author

Aniark commented May 16, 2018

不会。如果可以的话,请告诉我怎么做才能提取日志

@zonyitoo
Copy link
Collaborator

日志直接输出到终端,直接就在你眼前了为什么说要“提取”?
你怎么部署的?有什么东西启动的么?

@Aniark
Copy link
Author

Aniark commented May 16, 2018

我是稍微修改了一下网上的自启脚本,用脚本启动的
#!/bin/bash

BEGIN INIT INFO

Provides: shadowsocks-rust

Required-Start: $network $local_fs $remote_fs

Required-Stop: $network $local_fs $remote_fs

Default-Start: 2 3 4 5

Default-Stop: 0 1 6

Short-Description: Fast tunnel proxy that helps you bypass firewalls

Description: Start or stop the shadowsocks-rust server

END INIT INFO

Author: Teddysun i@teddysun.com

NAME=shadowsocks-rust
BIN=/root/ssserver
PID_DIR=/var/run
PID_FILE=$PID_DIR/shadowsocks-rust.pid
RET_VAL=0

[ -x $BIN ] || exit 0

if [ ! -d $PID_DIR ]; then
mkdir -p $PID_DIR
if [ $? -ne 0 ]; then
echo "Creating PID directory $PID_DIR failed"
exit 1
fi
fi

check_running() {
if [ -r $PID_FILE ]; then
read PID < $PID_FILE
if [ -d "/proc/$PID" ]; then
return 0
else
rm -f $PID_FILE
return 1
fi
else
return 2
fi
}

do_status() {
check_running
case $? in
0)
echo "$NAME (pid $PID) is running..."
;;
1|2)
echo "$NAME is stopped"
RET_VAL=1
;;
esac
}

do_start() {
if check_running; then
echo "$NAME (pid $PID) is already running..."
return 0
fi
$BIN -m aes-256-gcm -k Giuytt667 -s 0.0.0.0:443 -u 2>&1 > /dev/null &
PID=$!
echo $PID > $PID_FILE
sleep 0.3
if check_running; then
echo "Starting $NAME success"
else
echo "Starting $NAME failed"
RET_VAL=1
fi
}

do_stop() {
if check_running; then
kill -9 $PID
rm -f $PID_FILE
echo "Stopping $NAME success"
else
echo "$NAME is stopped"
RET_VAL=1
fi
}

do_restart() {
do_stop
do_start
}

case "$1" in
start|stop|restart|status)
do_$1
;;
*)
echo "Usage: $0 { start | stop | restart | status }"
RET_VAL=1
;;
esac

exit $RET_VAL

@zonyitoo
Copy link
Collaborator

$BIN -m aes-256-gcm -k Giuytt667 -s 0.0.0.0:443 -u 2>&1 > /dev/null &

这里把输出都重定向到了/dev/null

改一下

$BIN -m aes-256-gcm -k Giuytt667 -s 0.0.0.0:443 -u 2>&1 >> /tmp/shadowsocks-rust.log &

日志就输出到了 /tmp/shadowsocks-rust.log 下,如果挂了把这个日志里面相关的报错信息取出来看看(一般最后几行)

@Aniark
Copy link
Author

Aniark commented May 16, 2018

alpha4依旧强退,检查日志发现,没有任何内容

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 16, 2018

你直接手动用终端启动

ssserver -m aes-256-gcm -k Giuytt667 -s 0.0.0.0:443 -u

不要关掉终端,继续保持它直到退出

这个文件里面 /tmp/shadowsocks-rust.log 什么都没有?那进程都启动不了吧

@Aniark
Copy link
Author

Aniark commented May 17, 2018

alpha4也有强退bug,日志如下:

[2018-05-17][10:31:25.128416832][ERROR] Failed to handle client (106.127.16.195:31432): Operation timed out (os error 110)
[2018-05-17][10:31:37.160455401][ERROR] Failed to handle client (106.127.16.195:31794): Operation timed out (os error 110)
[2018-05-17][10:32:37.128463516][ERROR] Failed to handle client (106.127.16.195:31568): early eof
[2018-05-17][10:42:50.924431829][ERROR] Server run failed: Socket not connected (os error 107)
thread 'tokio-runtime-worker-0' panicked at 'Failed to run server, err: Socket not connected (os error 107)', src/relay/server.rs:47:50
note: Run with RUST_BACKTRACE=1 for a backtrace.
[2018-05-17][10:49:30.330088666][ERROR] Failed to handle client (185.156.177.8:2209): failed to decode Address, may be wrong method or key, peer: 185.156.177.8:2209

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 17, 2018

主要是这句

[2018-05-17][10:42:50.924431829][ERROR] Server run failed: Socket not connected (os error 107)
thread 'tokio-runtime-worker-0' panicked at 'Failed to run server, err: Socket not connected (os error 107)', src/relay/server.rs:47:50

猜测是因为Server在Accept之后,想要调用socket.peer_addr()时,连接已经断了。导致出错。
今晚我发一个alpha.6,你试一下。

你发的日志里其它错误都不会导致Server crash.

@zonyitoo
Copy link
Collaborator

@Aniark
Copy link
Author

Aniark commented May 17, 2018

alpha6也存在强退,不过日志不一样
这次的bug发生在用IDM下载YouTube视频时

[2018-05-17][23:52:28.337844649][INFO] ShadowSocks 1.7.0-alpha.6
[2018-05-17][23:52:28.338237020][INFO] ShadowSocks UDP listening on 0.0.0.0:443
[2018-05-17][23:52:28.338578345][INFO] ShadowSocks TCP Listening on 0.0.0.0:443
[2018-05-17][23:56:12.893550259][ERROR] AEAD decrypt failed, nonce=b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", input=b"\x1f$", tag=b"\x1a\x00\xa8\x16\xbe\xb8\x1b\x90l\x00\x00\x1c\xc0\n\xc0\t", err: Unspecified
[2018-05-17][23:56:12.893838098][ERROR] Failed to handle client (155.94.64.90:58728): failed to decode Address, may be wrong method or key, peer: 155.94.64.90:58728
[2018-05-17][23:56:23.588008821][ERROR] Relay payments.amazon.com:443 client <- server aborted: Connection reset by peer (os error 104)
[2018-05-17][23:56:23.588346697][ERROR] Failed to handle client (171.105.123.105:56759): Connection reset by peer (os error 104)
[2018-05-17][23:56:23.588475787][ERROR] Relay payments.amazon.com:443 client <- server aborted: Connection reset by peer (os error 104)
[2018-05-17][23:56:23.588614648][ERROR] Failed to handle client (171.105.123.105:56777): Connection reset by peer (os error 104)
[2018-05-17][23:58:15.902485444][ERROR] Relay co4-client-s.gateway.messenger.live.com:443 client <- server aborted: Connection reset by peer (os error 104)
[2018-05-17][23:58:15.902811642][ERROR] Failed to handle client (171.105.123.105:56989): Connection reset by peer (os error 104)
[2018-05-18][00:03:33.975464495][ERROR] Udp relay error: udp packet too short
[2018-05-18][00:03:34.893423514][ERROR] Failed to handle client (171.105.123.105:56750): early eof
[2018-05-18][00:03:34.928666665][ERROR] Failed to handle client (171.105.123.105:56815): early eof
[2018-05-18][00:03:35.279536605][ERROR] Failed to handle client (171.105.123.105:57016): early eof
[2018-05-18][00:03:35.307013423][ERROR] Failed to handle client (171.105.123.105:56837): early eof
[2018-05-18][00:03:35.333871384][ERROR] Failed to handle client (171.105.123.105:57120): early eof
[2018-05-18][00:03:35.337121073][ERROR] Failed to handle client (171.105.123.105:56739): early eof
[2018-05-18][00:04:31.811982025][ERROR] Relay r1---sn-a5mlrn7y.googlevideo.com:443 client -> server aborted: Connection reset by peer (os error 104)
[2018-05-18][00:04:31.815742673][ERROR] Failed to handle client (171.105.123.105:56793): Connection reset by peer (os error 104)
[2018-05-18][00:06:07.182679309][ERROR] Failed to handle client (171.105.123.105:56752): early eof
[2018-05-18][00:06:07.202874030][ERROR] Failed to handle client (171.105.123.105:56868): early eof
[2018-05-18][00:06:07.893903171][ERROR] Failed to handle client (171.105.123.105:57123): early eof
[2018-05-18][00:06:07.903396546][ERROR] Failed to handle client (171.105.123.105:57110): early eof
[2018-05-18][00:06:07.923738896][ERROR] Failed to handle client (171.105.123.105:56852): early eof
[2018-05-18][00:06:07.923960862][ERROR] Failed to handle client (171.105.123.105:57033): early eof
[2018-05-18][00:06:13.373185923][ERROR] Failed to handle client (171.105.123.105:56742): early eof
[2018-05-18][00:06:13.381198892][ERROR] Failed to handle client (171.105.123.105:56960): early eof
[2018-05-18][00:06:13.381720352][ERROR] Failed to handle client (171.105.123.105:56956): early eof
[2018-05-18][00:06:13.391250361][ERROR] Failed to handle client (171.105.123.105:56849): early eof
[2018-05-18][00:06:13.391445705][ERROR] Failed to handle client (171.105.123.105:56915): early eof
[2018-05-18][00:06:13.393861511][ERROR] Failed to handle client (171.105.123.105:56909): early eof
[2018-05-18][00:06:15.649399114][ERROR] Failed to handle client (171.105.123.105:56694): early eof
[2018-05-18][00:06:15.650136610][ERROR] Failed to handle client (171.105.123.105:57132): early eof
Killed

@zonyitoo
Copy link
Collaborator

[2018-05-17][23:56:12.893550259][ERROR] AEAD decrypt failed, nonce=b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", input=b"\x1f$", tag=b"\x1a\x00\xa8\x16\xbe\xb8\x1b\x90l\x00\x00\x1c\xc0\n\xc0\t", err: Unspecified
[2018-05-17][23:56:12.893838098][ERROR] Failed to handle client (155.94.64.90:58728): failed to decode Address, may be wrong method or key, peer: 155.94.64.90:58728

这种日志是包无法解析,一般原因是method和password不匹配,导致server无法解包

[2018-05-17][23:56:23.588008821][ERROR] Relay payments.amazon.com:443 client <- server aborted: Connection reset by peer (os error 104)

这种错误是对端把TCP connection close掉了,没有正常结束。但也是预计范围内的行为。

[2018-05-18][00:06:07.182679309][ERROR] Failed to handle client (171.105.123.105:56752): early eof
[2018-05-18][00:06:07.202874030][ERROR] Failed to handle client (171.105.123.105:56868): early eof

这种错误也是类似的,只是对端(应该是Client端)在不合适的情况下关掉了连接。

因此这些错误均不可能导致Server crash。唯一一句可能的就是

Killed

这一句看起来像是操作系统强杀了进程。怀疑是内存不足,你使用的机器内存有多少?
进一步确认的话可以看/var/log/messages,看一下在Killed的时间附近,有没有对应的日志,比如关键词"Out of memory", "Kill process"

@Aniark
Copy link
Author

Aniark commented May 18, 2018

我看了一下/var/log目录,没有发现messages文件

内存256m,只运行了ss-rust和锐速(https://github.com/0oVicero0/serverSpeeder_Install

我在另一台机器也有部署ss-rust,但是我没有安装锐速,而是安装了bbr,那台机器用来下YouTube视频没有强退

是否和锐速有关?

@zonyitoo
Copy link
Collaborator

不清楚,256的话,那很大可能就是因为内存不足而被强杀。建议使用至少1G内存的vps

@zonyitoo
Copy link
Collaborator

@Aniark 是否已解决问题?

@Aniark
Copy link
Author

Aniark commented May 21, 2018

并没有

@zonyitoo
Copy link
Collaborator

用了更大内存的机器了吗?

@Aniark
Copy link
Author

Aniark commented May 21, 2018

没有。我的vps纯用来当梯子用的,更大内存的就没有了

@Aniark
Copy link
Author

Aniark commented Jan 21, 2020

1.8.7已经没有强退现象了,测试环境依旧是以前的那个vps

@luodaoyi
Copy link

我也遇到这个问题了

os error 110

2023-05-23T06:15:19.194746146+00:00 INFO  shadowsocks server 1.15.3 build 2023-03-12T16:50:15.384354616+00:00
2023-05-23T06:15:19.195757285+00:00 INFO  shadowsocks tcp server listening on 0.0.0.0:40220, inbound address 0.0.0.0:40220
2023-05-23T06:19:42.067073370+00:00 ERROR tcp tunnel x.x.x.x:64276 -> 117.52.9.2:80 connect failed, error: Operation timed out (os error 110)
2023-05-23T06:19:44.119132801+00:00 ERROR tcp tunnel x.x.x.x:64318 -> 117.52.9.2:80 connect failed, error: Operation timed out (os error 110)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants