Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

每次修改文件重启报这么一堆错误 #432

Closed
zlab opened this issue Feb 22, 2017 · 55 comments · Fixed by node-modules/cluster-client#25
Closed

每次修改文件重启报这么一堆错误 #432

zlab opened this issue Feb 22, 2017 · 55 comments · Fixed by node-modules/cluster-client#25
Assignees

Comments

@zlab
Copy link

zlab commented Feb 22, 2017

每次修改文件重启报这么一堆错误

017-02-22 15:39:37,488 WARN 184300 [master] App Worker#2:183904 started at 7001, remain 0 (15314ms)
2017-02-22 15:39:37,489 INFO 184300 [master] App Worker#1:170060 disconnect, suicide: true, state: disconnected, current workers: ["2"]
[Wed Feb 22 2017 15:39:37 GMT+0800 (中国标准时间)] [cfork:master:184300] worker:170060 disconnect (suicide: true, state: disconnected, isDead: false)
[Wed Feb 22 2017 15:39:37 GMT+0800 (中国标准时间)] [cfork:master:184300] don't fork new work (refork: false)
2017-02-22 15:39:37,493 ERROR 183332 nodejs.unhandledExceptionError: read ECONNRESET
    at exports._errnoException (util.js:1022:11)
    at TCP.onread (net.js:569:26)
code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'unhandledExceptionError'
pid: 183332
hostname: zhanqi-pc


[Wed Feb 22 2017 15:39:37 GMT+0800 (中国标准时间)][pid: 183332][Leader] Error: read ECONNRESET 
Error Stack:
  Error: read ECONNRESET
    at exports._errnoException (util.js:1022:11)
    at TCP.onread (net.js:569:26)
Error Additions:
  code: "ECONNRESET"
  errno: "ECONNRESET"
  syscall: "read"

2017-02-22 15:39:37,494 ERROR 183332 nodejs.ECONNRESETError: read ECONNRESET
    at exports._errnoException (util.js:1022:11)
    at TCP.onread (net.js:569:26)
code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'ECONNRESETError'
pid: 183332
hostname: zhanqi-pc

[Wed Feb 22 2017 15:39:37 GMT+0800 (中国标准时间)] [cfork:master:184300] worker:170060 exit (code: null, suicide: true, state: dead, isDead: true, isExpected: true)
@dead-horse
Copy link
Member

可以更详细的描述一下? 代码是否写错了?这里报了一个 ECONNRESET 的错误,是不是在请求其他服务?

@dead-horse
Copy link
Member

这是因为你在开发环境写出了未捕获异常,导致进程退出。

@zlab
Copy link
Author

zlab commented Feb 24, 2017

App Worker#4:300960 disconnect, suicide: true, state: disconnected, current workers: ["5"]
每次重启,都会产生一个新的worker,之前的worker is dead
worker:300960 disconnect (suicide: true, state: disconnected, isDead: false)

@atian25
Copy link
Member

atian25 commented Feb 28, 2017

@zlab 你最后这一段是正常的,开发期重启 worker。

但上面的 read ECONNRESET 就要查查,你能不能用 egg-init --type simple 初始化一个项目,然后最小化代码复现下这个问题。

@AnzerWall
Copy link
Contributor

报错我这边也是会出现的。但是我没去管他。。
环境win10,webstorm,egg-init生成的项目

@AnzerWall
Copy link
Contributor

AnzerWall commented Feb 28, 2017

"C:\Program Files (x86)\JetBrains\WebStorm 2016.3.3\bin\runnerw.exe" "C:\Program Files\nodejs\node.exe" "C:\Program Files\nodejs\node_modules\npm\bin\npm-cli.js" run dev

> config-api@1.0.0 dev F:\test
> egg-bin dev

2017-02-28 13:44:25,924 INFO 25088 [master] =================== egg start =====================
2017-02-28 13:44:25,928 INFO 25088 [master] egg version 1.0.0-rc.1
2017-02-28 13:44:25,929 INFO 25088 [master] start with options: {"customEgg":"F:\\test\\node_modules\\egg","baseDir":"F:\\test","port":7001,"workers":1,"plugins":null,"https":false,"key":"","cert":""}
2017-02-28 13:44:25,929 INFO 25088 [master] start with env: isProduction: false, isDebug: false, EGG_SERVER_ENV: undefined, NODE_ENV: development
2017-02-28 13:44:25,974 INFO 25088 [master] Agent Worker:10188 start with ["{\"customEgg\":\"F:\\\\test\\\\node_modules\\\\egg\",\"baseDir\":\"F:\\\\test\",\"port\":7001,\"workers\":1,\"plugins\":null,\"https\":false,\"key\":\"\",\"cert\":\"\",\"clusterPort\":52137}"]
2017-02-28 13:44:26,847 INFO 25088 [master] Agent Worker started (910ms)
2017-02-28 13:44:26,860 INFO 25088 [master] App Worker#1:9832 start, state: none, current workers: ["1"]
2017-02-28 13:44:27,843 WARN 25088 [master] App Worker#1:9832 started at 7001, remain 0 (995ms)
2017-02-28 13:44:27,843 INFO 25088 [master] egg started on http://127.0.0.1:7001 (1914ms)
2017-02-28 13:48:34,400 WARN 10188 [agent:development] reload worker because F:\test\app\controller\home.js change
2017-02-28 13:48:34,401 INFO 25088 [master] reload workers...
2017-02-28 13:48:34,426 INFO 25088 [master] App Worker#2:8372 start, state: none, current workers: ["1","2"]
2017-02-28 13:48:35,182 WARN 25088 [master] App Worker#2:8372 started at 7001, remain 0 (248334ms)
2017-02-28 13:48:35,186 INFO 25088 [master] App Worker#1:9832 disconnect, suicide: true, state: disconnected, current workers: ["2"]
[Tue Feb 28 2017 13:48:35 GMT+0800 (中国标准时间)] [cfork:master:25088] worker:9832 disconnect (suicide: true, state: disconnected, isDead: false)
[Tue Feb 28 2017 13:48:35 GMT+0800 (中国标准时间)] [cfork:master:25088] don't fork new work (refork: false)
(node:25088) DeprecationWarning: worker.suicide is deprecated. Please use worker.exitedAfterDisconnect.
2017-02-28 13:48:35,191 ERROR 10188 nodejs.unhandledExceptionError: read ECONNRESET
    at exports._errnoException (util.js:1028:11)
    at TCP.onread (net.js:572:26)
code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'unhandledExceptionError'
pid: 10188
hostname: USHIO


[Tue Feb 28 2017 13:48:35 GMT+0800 (中国标准时间)][pid: 10188][Leader] Error: read ECONNRESET 
Error Stack:
  Error: read ECONNRESET
    at exports._errnoException (util.js:1028:11)
    at TCP.onread (net.js:572:26)
Error Additions:
  code: "ECONNRESET"
  errno: "ECONNRESET"
  syscall: "read"

2017-02-28 13:48:35,193 ERROR 10188 nodejs.ECONNRESETError: read ECONNRESET
    at exports._errnoException (util.js:1028:11)
    at TCP.onread (net.js:572:26)
code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'ECONNRESETError'
pid: 10188
hostname: USHIO

[Tue Feb 28 2017 13:48:35 GMT+0800 (中国标准时间)] [cfork:master:25088] worker:9832 exit (code: null, suicide: true, state: dead, isDead: true, isExpected: true)

image

系统:window 10
node: v7.6.0
代码:egg-init simple
还需要哪些信息

应该与webstorm无关,在命令行下通过sublime之类的修改依然会触发。

@atian25
Copy link
Member

atian25 commented Feb 28, 2017

@shaoshuai0102 从巴厘岛回来后再看看 ^_^

@shaoshuai0102
Copy link
Contributor

不好意思 休假几天。

有使用什么插件么?我这里没有成功复现。

@shaoshuai0102
Copy link
Contributor

只有 windows 下有?

@AnzerWall
Copy link
Contributor

@shaoshuai0102 在linux上,我还没投入使用。现在只有在windows上遇到,没有其他插件,用的egg-init生成的项目,没有做改动。

@shaoshuai0102
Copy link
Contributor

win 下没成功复现 谁那里有 windows 机器也帮忙试着复现下吧

@spiroo
Copy link

spiroo commented Mar 8, 2017

我也出现这个问题了,就用egg-init初始化了一个项目,只添加了nunjucks模板,还没做任何修改
image

@dead-horse
Copy link
Member

是否会与 cluster-client 有关系?

@atian25
Copy link
Member

atian25 commented Mar 8, 2017

Leader 这个应该就是 cluster-client 的吧

@popomore
Copy link
Member

popomore commented Mar 8, 2017

主要是 watcher 用了 cluster-client

@shaoshuai0102
Copy link
Contributor

我跟 找找windows机器去

@shaoshuai0102
Copy link
Contributor

我找了台 win 实体机稳定重现。

development 插件在文件改动时,先 fork 一个新 worker 进程然后 kill 原来的进程,kill 的方式是发送 sigterm 信号。按这个方式来看,应该必定会出现 leader read 报错的。我现在的疑问反而是为啥 mac 下没有。。。

现在想的思路是在 worker 里捕获 process.on('SIGTERM', handler) 在 handler 里主动 close。有没有其他想法 @gxcsoccer

@atian25
Copy link
Member

atian25 commented Mar 9, 2017

SIGTERM is not supported on Windows, it can be listened on.

https://nodejs.org/api/process.html#process_signal_events

@shaoshuai0102
Copy link
Contributor

虽然说windows不支持,但确实没有做额外处理的情况下,进程还是会被成功杀的。。

@atian25
Copy link
Member

atian25 commented Mar 9, 2017

这句话没看懂,不支持,然后又 it can be listened on

@shaoshuai0102
Copy link
Contributor

https://github.com/eggjs/egg-cluster/blob/master/lib/app_worker.js#L65 app worker 做了处理 捕获 sigterm 后主动退出了。这样就解决了刚才的疑问,为什么 win 不支持还可以退出,因为是主动退出的。

那么我在这里试着主动 close follwer 连接试试

@shaoshuai0102
Copy link
Contributor

这篇值得一读 http://joseoncode.com/2014/07/21/graceful-shutdown-in-node-dot-js/

@jtyjty99999
Copy link
Member

赞,我来翻译下 @shaoshuai0102

@jtyjty99999
Copy link
Member

Nodejs中的优雅退出

标签(空格分隔): 翻译


UNIX信号 所说

信号是一种用于Unix,类Unix和其他posix兼容的操作系统上的一种有限的进程间通信方式。一个信号是一个异步的通知,他会发送到一个进程,或进程内的一个特定的线程来通知某个事件发生了。

信号有很多种,本文会集中讨论两个信号:

  • SIGTERM 用于终止程序. 它可以友好的要求一个程序终止。程序既可以响应这个信号,清理资源并退出,又可以忽略这个信号。

  • SIGKILL 用于立刻终止程序. 与 SIGTERM 信号不同,进程不能响应或忽略这个信号。

不管你在什么环境部署你的 nodejs 应用,你都需要让你的应用响应这两个信号:

Upstart: 当停止一个服务时,默认情况会发送 SIGTERM 信号,等待5秒后如果进程还在运行,会发送 SIGKILL 信号.
supervisord: 当停止一个服务时,默认情况会发送 SIGTERM 信号,等待10秒后如果进程还在运行,会发送 SIGKILL 信号.
runit: 当停止一个服务时,默认情况会发送 SIGTERM 信号,等待10秒后如果进程还在运行,会发送 SIGKILL 信号.
Heroku dynos shutdown: 在参考链接中也描述了,当停止一个服务时,默认情况会发送 SIGTERM 信号,等待10秒后如果进程还在运行,会发送 SIGKILL 信号.
Docker: 当你在docker容器中运行node应用时,如果你运行 docker stop 命令,容器中的主进程会收到 SIGTERM 信号, 一段时间之后(默认10秒), 收到SIGKILL 信号.

下面我们尝试一个简单的nodejs应用

var http = require('http');

var server = http.createServer(function (req, res) {
  setTimeout(function () { //simulate a long request
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello World\n');
  }, 4000);
}).listen(9090, function (err) {
  console.log('listening http://localhost:9090/');
  console.log('pid is ' + process.pid)
});

你可以看到响应会延迟4秒。 nodejs文档是这么说的:

SIGTERM 和 SIGINT 在 非-Windows 平台上有默认的响应方式,会首先以 128 + 信号的退出代码来退出,之后重置命令行.
如果某个信号在程序中被监听,则默认的处理行为会被覆盖 (node 不会退出).

这里我也不清楚默认行为是什么, 我在处理请求的过程中发送一个 SIGTERM 信号,则这个请求会挂掉,如下图所示:

» curl http://localhost:9090 &
» kill 23703
[2] 23832
curl: (52) Empty reply from server

幸运的是,http 模块带有一个 close 方法 ,可以在处理完所有请求后停止响应新的连接,并触发回调函数。这个方法来自 NET 模块, 因此对于所有类型的tcp连接你都可以很方便的使用这个方法。

现在我修改一下例子

var http = require('http');

var server = http.createServer(function (req, res) {
  setTimeout(function () { //simulate a long request
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello World\n');
  }, 4000);
}).listen(9090, function (err) {
  console.log('listening http://localhost:9090/');
  console.log('pid is ' + process.pid);
});

process.on('SIGTERM', function () {
  server.close(function () {
    process.exit(0);
  });
});

之后键入下面的命令:

» curl http://localhost:9090 &
» kill 23703
Hello World
[1]  + 24730 done       curl http://localhost:9090

你会注意到,进程在处理完最后一个请求后才推出。更有意思的是,收到 SIGTERM 信号之后,进程不会在继续处理请求了:

» curl http://localhost:9090 &
[1] 25072

» kill 25070

» curl http://localhost:9090 &
[2] 25097

curl: (7) Failed connect to localhost:9090; Connection refused
[2]  + 25097 exit 7     curl http://localhost:9090

» Hello World
[1]  + 25072 done       curl http://localhost:9090

某些博客和stackoverflow上的例子中,当server.close比预期花费更长时间时,会使用一个timeout来响应 SIGTERM 信号。上面提到的这些表明这个做法是没有必要的,每个进程管理器都会在SIGTERM 信号响应太长时间后发送SIGKILL信号。

@ghost
Copy link

ghost commented Mar 24, 2017

我在win7环境下,vscode编辑器,没有用egg-init 自己手动搭建的环境。
请问最后解决方案是什么?

@atian25
Copy link
Member

atian25 commented Mar 24, 2017

@shaoshuai0102

@shaoshuai0102
Copy link
Contributor

还在解,优先级不是很高,不会影响实际线上环境运行。再等等哈

@joyeecheung
Copy link

另外我也看不懂

SIGTERM is not supported on Windows, it can be listened on.

是什么个意思……手头有 windows 机器的同学监听 process.on('SIGTERM', ...) 有反应咩?大概文档需要改下措辞……

@gxcsoccer
Copy link
Contributor

@shaoshuai0102 分享下?

@shaoshuai0102
Copy link
Contributor

#654

@okoala
Copy link
Member

okoala commented Mar 28, 2017

所以更新计划是?

@shaoshuai0102
Copy link
Contributor

有点忙 这几天搞定

@a526672351
Copy link

这个问题还没解决吗?同样的错误
2017-04-01 12:27:35,130 ERROR 10604 nodejs.ECONNRESETError: read ECONNRESET
at exports._errnoException (util.js:1022:11)
at TCP.onread (net.js:569:26)
code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'ECONNRESETError'
pid: 10604

@sasbsd
Copy link

sasbsd commented Apr 11, 2017

关注中

@jeffincn
Copy link

@shaoshuai0102 我的问题也是如此 windows 7环境, egg版本是1.1.0,启动了1个worker,app目录下没有任何资源或前端文件,但依旧有这个问题,关注中...

2017-04-17 11:20:15,951 INFO 64116 [master] Agent Worker started (1596ms)
2017-04-17 11:20:15,963 INFO 64116 [master] App Worker#1:58844 start, state: none, current workers: ["1"]
2017-04-17 11:20:17,968 WARN 64116 [master] App Worker#1:58844 started at 7011, remain 0 (2015ms)
2017-04-17 11:20:17,969 INFO 64116 [master] egg started on http://127.0.0.1:7011 (3618ms)
2017-04-17 11:20:48,754 WARN 57900 [agent:development] reload worker because D:\Workspace\playground\egg-packer\showcase\config\config.default.js change
[agent:development] reload worker because D:\Workspace\playground\egg-packer\showcase\config\config.default.js change
2017-04-17 11:20:48,796 INFO 64116 [master] reload workers...
2017-04-17 11:20:49,003 INFO 64116 [master] App Worker#2:64160 start, state: none, current workers: ["1","2"]
2017-04-17 11:20:53,139 WARN 64116 [master] App Worker#2:64160 started at 7011, remain 0 (37186ms)
[Mon Apr 17 2017 11:20:53 GMT+0800 (中国标准时间)] [cfork:master:64116] worker:58844 disconnect (suicide: true, state: disco
nnected, isDead: false)
[Mon Apr 17 2017 11:20:53 GMT+0800 (中国标准时间)] [cfork:master:64116] don't fork new work (refork: false)
2017-04-17 11:20:53,144 INFO 64116 [master] App Worker#1:58844 disconnect, suicide: true, state: disconnected, current workers: ["2"]
2017-04-17 11:20:53,151 ERROR 57900 nodejs.unhandledExceptionError: read ECONNRESET
at exports._errnoException (util.js:953:11)
at TCP.onread (net.js:563:26)
code: 'ECONNRESET'
errno: 'ECONNRESET'
[Mon Apr 17 2017 11:20:53 GMT+0800 (中国标准时间)] [cfork:master:64116] worker:58844 exit (code: null, suicide: true, state:
dead,syscall: 'read'
isDead: true, isExpected: true)
name: 'unhandledExceptionError'
pid: 57900
hostname: USER-PL8SMQJLQL

2017-04-17 11:20:53,155 ERROR 57900 nodejs.ECONNRESETError: read ECONNRESET
at exports._errnoException (util.js:953:11)
at TCP.onread (net.js:563:26)
code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'ECONNRESETError'
pid: 57900
hostname: USER-PL8SMQJLQL

@shaoshuai0102
Copy link
Contributor

  • windows 下 sigterm 不能绑定事件,需要换 worker.send('kill-worker') 然后监听事件以便主动 close app,
  • app.close 出发 worker 中的 follower socket close 之后 进行 process.exit(0) 的时候 agent 还是会报ECONNRESET 打印出来在 agent 中已经监听到了 close 事件,在 worker 退出的瞬间还是报错了 帮忙也看看 搞不定 @gxcsoccer @fengmk2

@fengsizi
Copy link

同错:
2017-04-24 10:32:38,764 ERROR 5488 nodejs.ECONNRESETError: read ECONNRESET

at exports._errnoException (util.js:1018:11)
at TCP.onread (net.js:572:26)

code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'ECONNRESETError'
pid: 5488
hostname: PC-20170326ZBVG

@wyqhywjj601
Copy link

2017-04-24 11:12:24,764 ERROR 5488 nodejs.ECONNRESETError: read ECONNRESET

at exports._errnoException (util.js:1018:11)
at TCP.onread (net.js:572:26)

code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'ECONNRESETError'
pid: 5488
hostname: PC-20170326ZBVG

@Abby7826
Copy link

我也遇到了:
2017-04-24 11:02:30,764 ERROR 5488 nodejs.ECONNRESETError: read ECONNRESET

at exports._errnoException (util.js:1018:11)
at TCP.onread (net.js:572:26)

code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'ECONNRESETError'
pid: 5490
hostname: PC-201703288ZBVG

@QuinHan
Copy link

QuinHan commented Apr 24, 2017

我也遇到了:
2017-04-24 11:10:20,764 ERROR 5488 nodejs.ECONNRESETError: read ECONNRESET

at exports._errnoException (util.js:1018:11)
at TCP.onread (net.js:572:26)

code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'ECONNRESETError'
pid: 5471
hostname: PC-20170324ZBVG

@fengsizi
Copy link

@fengmk2
Copy link
Member

fengmk2 commented Apr 24, 2017

已经在看了,请大家再耐心等待。

@1166544
Copy link

1166544 commented Apr 24, 2017

26 days ago@shaoshuai0102说这几天搞定。。。

@shaoshuai0102
Copy link
Contributor

抱歉这些天忙其他事情,上周刚有时间捡起来但没搞定,我们其他同学会投入进来一起看一下。

@fengmk2
Copy link
Member

fengmk2 commented Apr 25, 2017

已经在 node-modules/cluster-client#25 修复,等待合并发布,然后大家重新安装 node_modules 即可。

fengmk2 added a commit to node-modules/cluster-client that referenced this issue Apr 25, 2017
gxcsoccer pushed a commit to node-modules/cluster-client that referenced this issue Apr 25, 2017
@fengmk2
Copy link
Member

fengmk2 commented Apr 25, 2017

image

@shaoshuai0102 可以改一下 windows send message 来退出了。

@shaoshuai0102
Copy link
Contributor

好的

@350469960
Copy link

报错内容!希望尽快解决!谢谢!
[Wed May 03 2017 09:53:00 GMT+0800 (中国标准时间)] [cfork:master:6236] worker:6248 exit (code: null, suicide: true, state: dead, isDead: true, isExpected: true
)
2017-05-03 09:53:00,999 ERROR 6664 nodejs.ECONNRESETError: read ECONNRESET
at exports._errnoException (util.js:1018:11)
at TCP.onread (net.js:572:26)
code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'ECONNRESETError'
pid: 6664
hostname: PC-20170326ZBVG

@shaoshuai0102
Copy link
Contributor

删除 node_modules,重新 install 了么?应该只有一个 warning 了。

@arliang
Copy link

arliang commented Nov 11, 2017

我也遇到这个问题了,在Mac下,运行NODE Websocket,只有一个同事用android端访问才会出现这个问题,我使用手机的浏览器websocket访问没问题

最后重启电脑,问题消失

好像没什么帮助,我只好(逃

@wenlittleoil
Copy link

wenlittleoil commented Aug 17, 2020

同样的问题,本地开发模式下,在macbook pro机器上稳定重现,无限提示reload worker。

2020-08-17 14:02:28,335 WARN 10211 [agent:development] reload worker because ${baseDir}/app.js change
(node:10413) DeprecationWarning: current Server Discovery and Monitoring engine is deprecated, and will be removed in a future version. To use the new Server Discover and Monitoring engine, pass option { useUnifiedTopology: true } to the MongoClient constructor.
(node:10413) DeprecationWarning: collection.ensureIndex is deprecated. Use createIndexes instead.
[2020-08-17 14:02:29.877] [cfork:master:10206] worker:10404 disconnect (exitedAfterDisconnect: true, state: disconnected, isDead: false, worker.disableRefork: false)
[2020-08-17 14:02:29.878] [cfork:master:10206] don't fork new work (refork: false)
2020-08-17 14:02:29,878 INFO 10206 [master] app_worker#21:10404 disconnect, suicide: true, state: disconnected, current workers: ["22"]
[2020-08-17 14:02:29.895] [cfork:master:10206] worker:10404 exit (code: 0, exitedAfterDisconnect: true, state: dead, isDead: true, isExpected: true, worker.disableRefork: false)

实际上我的app.js并没有了修改什么。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.