Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categraf采集Mysql指标信息偶现长时间挂起/无响应现象 #646

Closed
Jerry-fmcheng opened this issue Sep 9, 2023 · 4 comments
Closed

Comments

@Jerry-fmcheng
Copy link

Relevant config.toml

input.mysql/mysql.toml配置关键信息如下(IP地址等信息使用xx代替):
# collect interval
interval = 300

[[instances]]
 address = "xx:xx:xx:xx:3306"
 username = "xxxx"
 password = "xxxx"

# set tls=custom to enable tls
 parameters = "tls=false"

 extra_status_metrics = true
 extra_innodb_metrics = false
 gather_processlist_processes_by_state = false
 gather_processlist_processes_by_user = false
 gather_schema_size = true
 gather_table_size = true
 gather_system_table_size = false
 gather_slave_status = true

# timeout
 timeout_seconds = 60

# interval = global.interval * interval_times
 interval_times = 1

# Optional TLS Config
 use_tls = false
# Use TLS but skip chain & host verification
 insecure_skip_verify = true

Logs from categraf

2023/09/08 18:06:48 schema_size.go:18: E! failed to get schema size: invalid connection
2023/09/08 18:06:48 table_size.go:26: E! failed to get table size: dial tcp xx:xx:xx:xx:3306: connect: connection refused
2023/09/08 18:06:48 slave_status.go:40: E! failed to query slave status: dial tcp xx:xx:xx:xx:3306: connect: connection refused
2023/09/08 18:06:48 metrics_reader.go:60: D! local.mysql : after gather once, duration: 56m55.887596058s

System info

categraf-0.3.20 linux

Docker

NA

Steps to reproduce

1.正常配置配置mysql配置进行指标采集
2.大部分时间能够在1s左右完成指标采集,但偶现出现长时间挂起的现象,如上日志,一次采集耗时56分钟。

Expected behavior

期望能够稳定的实现指标采集,如果连接失败/超时应该按照设定的时间结束,不影响后续采集

Actual behavior

实际上可能存在长时间不超时的现象,经过跟踪代码,初步怀疑和未设置go-sql-driver/mysql中ReadTimeout有关系,还在验证中。
1、go-sql-driver/mysql/dsn.go中Config参数有3个和超时有关的定义:Timeout、ReadTimeout、WriteTimeout,分别表示连接超时时间、读超时时间以及写超时时间。
2、如果只设置了Timeout,那么在连接成功之后数据库读写出现异常,会不会可能出现长时间等待的情况?

以下是categraf inputs/mysql/mysql.go Init()代码:
ins.dsn = fmt.Sprintf("%s:%s@%s(%s)/?%s", ins.Username, ins.Password, net, ins.Address, ins.Parameters)
conf, err := mysql.ParseDSN(ins.dsn)
if err != nil {
return err
}
if conf.Timeout == 0 {
if ins.TimeoutSeconds == 0 {
ins.TimeoutSeconds = 3
}
conf.Timeout = time.Second * time.Duration(ins.TimeoutSeconds)
}

Additional info

No response

@kongfei605
Copy link
Collaborator

kongfei605 commented Sep 11, 2023

重新设置

parameters = "tls=false&readTimeout=60s&writeTimeout=60s"

@Jerry-fmcheng
Copy link
Author

建议后续能够优化下日志打印,记录错误日志能够记录采集对象的地址等身份识别信息,方便回溯是哪个实例采集失败?
例如mysql/schema_size.go的日志打印:log.Println("E! failed to get schema size:", err)
优化为:log.Println("E! failed to get:%s, schema size:", ins.Address, err)

@kongfei605
Copy link
Collaborator

建议后续能够优化下日志打印,记录错误日志能够记录采集对象的地址等身份识别信息,方便回溯是哪个实例采集失败? 例如mysql/schema_size.go的日志打印:log.Println("E! failed to get schema size:", err) 优化为:log.Println("E! failed to get:%s, schema size:", ins.Address, err)

重新设置

parameters = ”tls=false&readTimeout=60s&writeTimeout=60s"

按照这个修改生效了 对吧? 日志优化 建议挺好的, 这个后面加一下。

@kongfei605
Copy link
Collaborator

日志优化 #657

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants