Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pool is busy,can not get new client in 5 seconds,wait count is 1 #59

Open
zut opened this issue Jun 22, 2020 · 18 comments
Open

pool is busy,can not get new client in 5 seconds,wait count is 1 #59

zut opened this issue Jun 22, 2020 · 18 comments

Comments

@zut
Copy link
Contributor

zut commented Jun 22, 2020

错误: Trace:pool is busy,can not get new client in 5 seconds,wait count is 1

我应该如何来解决这个问题?

谢谢

@seefan
Copy link
Owner

seefan commented Jun 22, 2020

暂时没有发现你说的情况,你可以在显示错误信息时,调用goerr.Error(err).Trace()来显示更多的内容,我们可以根据仔细的信息来分析。

@zut
Copy link
Contributor Author

zut commented Jun 23, 2020

暂时没有发现你说的情况,你可以在显示错误信息时,调用goerr.Error(err).Trace()来显示更多的内容,我们可以根据仔细的信息来分析。

详细错误是: Trace:pool is busy,can not get new client in 5 seconds,wait count is 1

本地测试, 基本没有并发.配置也是默认的.

c := &conf.Config{
		Host:      host,
		Port:      port,
		AutoClose: true,
	}

client info信息我打印出一些. 请看最后的回复

@seefan
Copy link
Owner

seefan commented Jun 23, 2020

把你的代码发我一份吧,我测试一下

@zut
Copy link
Contributor Author

zut commented Jun 23, 2020

把你的代码发我一份吧,我测试一下

在一个项目中用了. 无法单独出来发.

@zut
Copy link
Contributor Author

zut commented Jun 23, 2020

为了方便测试,我timeout设置成2s
最后的回复,有 client info 更详细的信息.

image
是不是应为 AutoClose 导致的.你看这张图, cc.index 18 的时候没有自动关闭.

这时候响应时间变成了2s.

这个请求之后的 3分钟都没有问题.


3分钟后继续出现
image

再过1分钟,报错

image

如果可以,只指导排查方向,谢谢

@zut
Copy link
Contributor Author

zut commented Jun 23, 2020

有更详细的信息了:

特征是:

  1. 时间变长,
  2. 再过一会,就会错误. Trace:pool is busy,can not get new client in 2 seconds,wait count is 1

image
image

GetClient true 19 available is 8,active is 60,wait is 0,connection count is 60,status is 1
Close 1 true 19 available is 8,active is 60,wait is 0,connection count is 60,status is 1
Close 2 false 19 available is 8,active is 59,wait is 0,connection count is 60,status is 1
GetClient true 19 available is 9,active is 60,wait is 0,connection count is 60,status is 1
2020-06-23 16:15:42.866 [WARN] [elapsed > 100ms] 200 /api/meas/dashboard 2.002142606s < 1.middleware.ProcessTime middleware.go:14  2.middleware.CORS mwCors.go:7  3.middleware.Limit mwLimit.go:17 
2020-06-23 16:15:42.867 [DEBU] 200 GET /api/meas/dashboard                       < 1.middleware.Log mwLog.go:17 
GetClient true 19 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 19 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 19 available is 1,active is 60,wait is 0,connection count is 100,status is 1
2020-06-23 16:15:49.364 [DEBU] 200 OPTIONS /api/meas/dashboard                   < 1.middleware.Log mwLog.go:17 
GetClient true 19 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 19 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 19 available is 1,active is 60,wait is 0,connection count is 100,status is 1
2020-06-23 16:15:51.418 [WARN] [elapsed > 100ms] 401 /api/meas/dashboard 2.000636574s < 1.middleware.ProcessTime middleware.go:14  2.middleware.CORS mwCors.go:7  3.middleware.Limit mwLimit.go:17 
2020-06-23 16:15:51.418 [WARN] 401 GET /api/meas/dashboard                       < 1.middleware.Log mwLog.go:15 
GetClient true 0 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 0 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 0 available is 1,active is 60,wait is 0,connection count is 100,status is 1
2020-06-23 16:15:56.840 [DEBU] 200 OPTIONS /api/meas/dashboard                   < 1.middleware.Log mwLog.go:17 
GetClient true 0 available is 2,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 0 available is 2,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 0 available is 2,active is 60,wait is 0,connection count is 100,status is 1
2020-06-23 16:15:58.865 [WARN] [elapsed > 100ms] 401 /api/meas/dashboard 2.001977589s < 1.middleware.ProcessTime middleware.go:14  2.middleware.CORS mwCors.go:7  3.middleware.Limit mwLimit.go:17 
2020-06-23 16:15:58.865 [WARN] 401 GET /api/meas/dashboard                       < 1.middleware.Log mwLog.go:15 
GetClient true 1 available is 1,active is 61,wait is 0,connection count is 80,status is 1
Close 1 true 1 available is 1,active is 61,wait is 0,connection count is 80,status is 1
Close 2 false 1 available is 1,active is 60,wait is 0,connection count is 80,status is 1
2020-06-23 16:16:04.832 [DEBU] 200 OPTIONS /api/meas/dashboard                   < 1.middleware.Log mwLog.go:17 
GetClient true 2 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 2 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 2 available is 1,active is 60,wait is 0,connection count is 100,status is 1
GetClient true 1 available is 2,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 1 available is 2,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 1 available is 2,active is 60,wait is 0,connection count is 100,status is 1
GetClient true 3 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 3 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 3 available is 1,active is 60,wait is 0,connection count is 100,status is 1
GetClient true 2 available is 2,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 2 available is 2,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 2 available is 2,active is 60,wait is 0,connection count is 100,status is 1
GetClient true 4 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 4 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 4 available is 1,active is 60,wait is 0,connection count is 100,status is 1
2020-06-23 16:16:10.863 [WARN] [elapsed > 100ms] 200 /api/meas/dashboard 4.007022137s < 1.middleware.ProcessTime middleware.go:14  2.middleware.CORS mwCors.go:7  3.middleware.Limit mwLimit.go:17 
2020-06-23 16:16:10.863 [DEBU] 200 GET /api/meas/dashboard                       < 1.middleware.Log mwLog.go:17 
GetClient true 3 available is 2,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 3 available is 2,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 3 available is 2,active is 60,wait is 0,connection count is 100,status is 1
2020-06-23 16:16:12.899 [DEBU] T2 elapsed =  2.00168s                            < 1.db.HGetTo hash.go:51  2.s.GetPlotData sMeas.go:157  3.handler.(*Handler).GetImg aImg.go:16 
2020-06-23 16:16:12.900 [ERRO] Text: Hget PlotData/100001_101__1_S2401_EIRP_20_LCH_1_1M__NTNV_SISO_2_0_0 error   Trace:pool is busy,can not get new client in 2 seconds,wait count is 1 < 1.db.HGetTo hash.go:53  2.s.GetPlotData sMeas.go:157  3.handler.(*Handler).GetImg aImg.go:16 
2020-06-23 16:16:12.901 [WARN] Hget PlotData/100001_101__1_S2401_EIRP_20_LCH_1_1M__NTNV_SISO_2_0_0 error < 1.handler.(*Handler).GetImg aImg.go:17  2.middleware.Auth mwAuth.go:24  3.middleware.ProcessTime middleware.go:11 
2020-06-23 16:16:12.902 [WARN] [elapsed > 100ms] 400 /api/img/100001_101__1_S2401_EIRP_20_LCH_1_1M__NTNV_SISO_2_0_0 2.004009257s < 1.middleware.ProcessTime middleware.go:14  2.middleware.CORS mwCors.go:7  3.middleware.Limit mwLimit.go:17 
2020-06-23 16:16:12.902 [WARN] 400 GET /api/img/100001_101__1_S2401_EIRP_20_LCH_1_1M__NTNV_SISO_2_0_0 < 1.middleware.Log mwLog.go:15 
GetClient true 5 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 5 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 5 available is 1,active is 60,wait is 0,connection count is 100,status is 1
2020-06-23 16:16:18.108 [DEBU] 200 OPTIONS /api/meas/dashboard                   < 1.middleware.Log mwLog.go:17 
GetClient true 4 available is 2,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 4 available is 2,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 4 available is 2,active is 60,wait is 0,connection count is 100,status is 1
2020-06-23 16:16:20.183 [WARN] [elapsed > 100ms] 401 /api/meas/dashboard 2.003259654s < 1.middleware.ProcessTime middleware.go:14  2.middleware.CORS mwCors.go:7  3.middleware.Limit mwLimit.go:17 
2020-06-23 16:16:20.183 [WARN] 401 GET /api/meas/dashboard                       < 1.middleware.Log mwLog.go:15 
GetClient true 6 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 6 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 6 available is 1,active is 60,wait is 0,connection count is 100,status is 1
2020-06-23 16:16:25.514 [DEBU] 200 OPTIONS /api/meas/dashboard                   < 1.middleware.Log mwLog.go:17 
GetClient true 5 available is 2,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 5 available is 2,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 5 available is 2,active is 60,wait is 0,connection count is 100,status is 1
2020-06-23 16:16:27.558 [WARN] [elapsed > 100ms] 401 /api/meas/dashboard 2.001992644s < 1.middleware.ProcessTime middleware.go:14  2.middleware.CORS mwCors.go:7  3.middleware.Limit mwLimit.go:17 
2020-06-23 16:16:27.558 [WARN] 401 GET /api/meas/dashboard                       < 1.middleware.Log mwLog.go:15 
GetClient true 7 available is 1,active is 61,wait is 0,connection count is 80,status is 1
Close 1 true 7 available is 1,active is 61,wait is 0,connection count is 80,status is 1
Close 2 false 7 available is 1,active is 60,wait is 0,connection count is 80,status is 1
2020-06-23 16:16:32.844 [DEBU] 200 OPTIONS /api/meas/dashboard                   < 1.middleware.Log mwLog.go:17 
GetClient true 6 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 6 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 6 available is 1,active is 60,wait is 0,connection count is 100,status is 1
2020-06-23 16:16:36.894 [DEBU] T2 elapsed =  2.000896s                           < 1.db.HGetTo hash.go:51  2.s.GetMeas sMeas.go:18  3.handler.(*Handler).GetMeasDashboard aMeas.go:48 
2020-06-23 16:16:36.895 [ERRO] Text: Hget Meas/100001 error      Trace:pool is busy,can not get new client in 2 seconds,wait count is 1 < 1.db.HGetTo hash.go:53  2.s.GetMeas sMeas.go:18  3.handler.(*Handler).GetMeasDashboard aMeas.go:48 
2020-06-23 16:16:36.896 [WARN] Hget Meas/100001 error                            < 1.handler.(*Handler).GetMeasDashboard aMeas.go:49  2.middleware.Auth mwAuth.go:24  3.middleware.ProcessTime middleware.go:11 
2020-06-23 16:16:36.896 [WARN] [elapsed > 100ms] 200 /api/meas/dashboard 2.003202975s < 1.middleware.ProcessTime middleware.go:14  2.middleware.CORS mwCors.go:7  3.middleware.Limit mwLimit.go:17 
2020-06-23 16:16:36.896 [DEBU] 200 GET /api/meas/dashboard                       < 1.middleware.Log mwLog.go:17 
GetClient true 8 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 1 true 8 available is 1,active is 61,wait is 0,connection count is 100,status is 1
Close 2 false 8 available is 1,active is 60,wait is 0,connection count is 100,status is 1

然后
我用简单代码测试发现, active 和 connection count 都不会上涨
image

@zut zut changed the title [不定时出现]Hget User/100001 error,cause is [不定时出现] pool is busy,can not get new client in 5 seconds,wait count is 1 Jun 23, 2020
@zut zut changed the title [不定时出现] pool is busy,can not get new client in 5 seconds,wait count is 1 pool is busy,can not get new client in 5 seconds,wait count is 1 Jun 23, 2020
@seefan
Copy link
Owner

seefan commented Jun 23, 2020

我刚测试了一个小时,10个纯程在跑,并没有出现你说的问题。
我想确认以下内容,
1、你是如何获取连接的,是用NewClient还是GetClient
2、是否持有了获取的连接。

@zut
Copy link
Contributor Author

zut commented Jun 23, 2020

@seefan 用了GetClient (是调用gossdb.Client())

请看代码:

func HSet(name, key, value interface{}) error {
	name2, err := cc(name)
	if err != nil {
		return err
	}
	key2, err := cc(key)
	if err != nil {
		return err
	}
	b, err := xx.Pack(value)
	if err != nil {
		return err
	}
	return gossdb.Client().HSet(name2, key2, b)
}


func HGetTo(name, key, value interface{}) error {
	name2, err := cc(name)
	if err != nil {
		return err
	}
	key2, err := cc(key)
	if err != nil {
		return err
	}
	xx.IsPointer(value)
	vPack, err := gossdb.Client().HGet(name2, key2)
	if err != nil {
		xlog.Error(goerr.Error(err).Trace())
		return err
	} else if len(vPack.Bytes()) == 0 {
		return ErrNotFound
	}
	return xx.UnpackTo(vPack.Bytes(), &value)
}
  1. 用单个测试没有这样的问题,
  2. 我是 GoFrame 的ghttp 搭建的 API服务, 你看我打印的日志, active 和 connection count 都不断上涨.

@seefan
Copy link
Owner

seefan commented Jun 24, 2020

1、我看每次出错附近都这类似这样[WARN] [elapsed > 100ms]的提示,能大概描述一下这个错误是什么吗。
2、从日志中可以看到,每秒并发不高,只有1到2个,但有大量的active连接,这个说明连接是直在用,且没有被回收,是有这样的业务逻辑么。还是说是因为连接无法回收导致的active居高不下。
3、每一次获取范围保存的kv的值有多大,是否达到几M

Trace:pool is busy,can not get new client in 5 seconds,wait count is 1
这个提示是说连接池内没有可以使用的连接了,所以我们需要检查一下为什么连接在使用后无法回收。

@zut
Copy link
Contributor Author

zut commented Jun 24, 2020

1 这个错误是每个请求的处理时间,你可以忽略。

2没有长时间连接的业务。

3 大约几百KB?我可以具体打印出来看看。而且是slob/zstd 压缩过的。

@zut
Copy link
Contributor Author

zut commented Jun 24, 2020

请问你是如何判断回收的?
@seefan

@seefan
Copy link
Owner

seefan commented Jun 29, 2020

通过Avaliable来控制,它是一个队列,我会把连接的位置放到队列里管理

@zut
Copy link
Contributor Author

zut commented Jun 30, 2020

通过Avaliable来控制,它是一个队列,我会把连接的位置放到队列里管理

我看了一段时间log, 还没能解决.
目前已经关闭AutoClose, 还没有发现问题.

@seefan
Copy link
Owner

seefan commented Jun 30, 2020

看你说的情况,感觉是是autoclose在释放时有问题,我还在分析
另外,你关闭AutoClose后,用普通方式是否存在问题呢?

@zut
Copy link
Contributor Author

zut commented Jun 30, 2020

看你说的情况,感觉是是autoclose在释放时有问题,我还在分析
另外,你关闭AutoClose后,用普通方式是否存在问题呢?

使用2天情况来看, 没有出现了

@zut
Copy link
Contributor Author

zut commented Jul 25, 2020

@seefan 最近一次的更新, 我看到 closeClient的更新, 是不是回收方面原来有 BUG, (自动回收)

@seefan
Copy link
Owner

seefan commented Jul 27, 2020

只是调整回收后复用的规则,把原来的平均复用改成优先头部利用了,不影响回收。

自动回收方面你出现的问题还没有找到原因。

另外我看到你有Close1和Close2两处打印,能告诉我一下大概的位置么?

@zut
Copy link
Contributor Author

zut commented Jul 27, 2020

image

@seefan 最近我关闭AutoClose,再也没有出现过了问题了, 所以应该是 AutoClose的问题。

我目前的做法是, 做了一个操作(例如Hset),就关闭。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants