Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU占用异常 #71

Closed
ningnao opened this issue Nov 1, 2023 · 16 comments · Fixed by #72
Closed

CPU占用异常 #71

ningnao opened this issue Nov 1, 2023 · 16 comments · Fixed by #72
Labels
bug Something isn't working

Comments

@ningnao
Copy link
Contributor

ningnao commented Nov 1, 2023

XIU version
matser b2fe9af

Describe the bug
长时间使用后CPU占用率异常升高

配置文件

[rtmp]
enabled = true
port = 1935

[httpnotify]
enabled = true
on_publish = "http://localhost:8085/publish"
on_unpublish = "http://localhost:8085/unpublish"
on_play = "http://localhost:8085/play"
on_stop = "http://localhost:8085/stop"

[log]
level = "info"

在之前的沟通中#66,发现我的设备会在每个I帧发送一次metaData 和 sequence header,也是因此日志中大量打印了metadata相关的日志
此外,日志中还偶尔出现以下错误

[2023-10-31T06:45:04Z ERROR rtmp::rtmp] session run error: session_type: server, app_name: live, stream_name: 001, err: net io error: time out error
----------------------------
[2023-10-31T06:45:15Z WARN  rtmp::handshake::handshake_server] complex handshake failed.. err:digest error: cannot generate digest
----------------------------
[2023-10-31T07:12:11Z ERROR rtmp::rtmp] session run error: session_type: server, app_name: live, stream_name: 001, err: pack error: bytes writer error: not enough bytes to write: io error

这里有一些进程/线程的信息不知道对解决问题有没有一些帮助
image
image
image

@ningnao
Copy link
Contributor Author

ningnao commented Nov 1, 2023

查看了最新的日志,发现其中大量出现推流失败的日志如下

[2023-11-01T02:50:42Z ERROR streamhub] event_loop Publish err: exists

在大量推流错误的日志出现之前打印了如下的错误日志

[2023-11-01T02:50:32Z ERROR rtmp::messages::parser] the msg_type_id is not supported: 10
[2023-11-01T02:50:32Z ERROR rtmp::rtmp] session run error: session_type: server, app_name: live, stream_name: 001, err: message error: unknown message type

服务端还未重启,可以继续提供解决问题必要的信息

@harlanc
Copy link
Owner

harlanc commented Nov 1, 2023

感谢反馈。

[2023-10-31T06:45:04Z ERROR rtmp::rtmp] session run error: session_type: server, app_name: live, stream_name: 001, err: net io error: time out error

这个错误日志是前几天你修复的,读数据超时了。

[2023-10-31T06:45:15Z WARN  rtmp::handshake::handshake_server] complex handshake failed.. err:digest error: cannot generate digest

这个日志是RTMP握手日志,先尝试复杂握手,失败后会用简单握手,如果简单握手成功就没有问题。

[2023-10-31T07:12:11Z ERROR rtmp::rtmp] session run error: session_type: server, app_name: live, stream_name: 001, err: pack error: bytes writer error: not enough bytes to write: io error

这个错误没有遇到过,应该是packetizer.rs里面打包的时候,写发送数据出错了,能复现吗?能够提供录制文件?

[2023-11-01T02:50:42Z ERROR streamhub] event_loop Publish err: exists

这个错误的意思是推流的时候发现服务端已经有这个流ID(app name + stream name)了。有两种可能:
- 一种是确实推流的时候用的流地址中的app name 和 stream name 被占用了;
- 一种是旧的流资源没有被正确释放,导致app name 和 stream name一直占用。

[2023-11-01T02:50:32Z ERROR rtmp::messages::parser] the msg_type_id is not supported: 10
[2023-11-01T02:50:32Z ERROR rtmp::rtmp] session run error: session_type: server, app_name: live, stream_name: 001, err: message error: unknown message type

这个日志看着逻辑有问题,碰到不识别的msg type id 不应该返回错误,否则整个session就退出了,这个我改下。

看你三个xiu进程的CPU使用率都不太正常,流的转发不太应该占用这么多CPU,你的一个进程中在处理多少路流?之前碰到的CPU高的问题,大多数是读/写数据异常,碰到死循环了,你的最后一个线程栈的图左边是CPU使用率吗,看这个图可能是 receive_frame_data_loop 这个函数里面有问题,有可能有死循环。

关于性能问题,也可以看下项目的wiki中关于如何profile一个xiu 进程,生成 profile文件后可以分析一下。

@ningnao
Copy link
Contributor Author

ningnao commented Nov 1, 2023

这个错误没有遇到过,应该是packetizer.rs里面打包的时候,写发送数据出错了,能复现吗?能够提供录制文件?

这个出现的频率不高,不太容易复现.. 我会再想想办法

这个错误的意思是推流的时候发现服务端已经有这个流ID(app name + stream name)了。有两种可能:

这里应该是第二种可能,在出现了后面的 err: message error: unknown message type后,我观察日志发现设备开始尝试重连,但并没有看到之前的连接被关闭

你的一个进程中在处理多少路流?

我目前只在处理一路流,平时CPU占用非常低

关于性能问题,也可以看下项目的wiki中关于如何profile一个xiu 进程,生成 profile文件后可以分析一下。

刚才我想尝试生成 profile 时发现服务器 SSH 已经没反应了.... 图三中左侧是CPU占用率

@harlanc
Copy link
Owner

harlanc commented Nov 2, 2023

有可能CPU高的问题也是由session异常退出导致的,我提交了一次代码,你看下 https://github.com/harlanc/xiu/tree/fix_71

@ningnao
Copy link
Contributor Author

ningnao commented Nov 2, 2023

补充一个火焰图 perf

我去试一下修复后的版本

@harlanc harlanc linked a pull request Nov 4, 2023 that will close this issue
@harlanc harlanc added the bug Something isn't working label Nov 4, 2023
@ningnao
Copy link
Contributor Author

ningnao commented Nov 5, 2023

目前没再复现成功这个问题.. 但出现了新的问题
依旧是设备长时间推流

ffmpeg拉流的时候出现以下错误

DTS 4261412847, next:-46000 st:1 invalid dropping
PTS 4261412907, next:-46000 invalid dropping st:1
DTS 4261412928, next:-13000 st:1 invalid dropping
PTS 4261413009, next:-13000 invalid dropping st:1
[flv @ 0000027de782d040] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly

DTS 4261412973, next:20000 st:1 invalid dropping
PTS 4261413018, next:20000 invalid dropping st:1
DTS 4261413008, next:53000 st:1 invalid dropping
PTS 4261413043, next:53000 invalid dropping st:1
DTS 4261413089, next:86000 st:1 invalid dropping
PTS 4261413170, next:86000 invalid dropping st:1
DTS 4261413146, next:119000 st:1 invalid dropping
PTS 4261413203, next:119000 invalid dropping st:1
DTS 4261413227, next:152000 st:1 invalid dropping
PTS 4261413308, next:152000 invalid dropping st:1
DTS 4261413273, next:185000 st:1 invalid dropping
PTS 4261413319, next:185000 invalid dropping st:1
DTS 4261413371, next:218000 st:1 invalid dropping
PTS 4261413469, next:218000 invalid dropping st:1
DTS 4261413407, next:251000 st:1 invalid dropping
PTS 4261413443, next:251000 invalid dropping st:1
DTS 4261413540, next:284000 st:1 invalid dropping
PTS 4261413673, next:284000 invalid dropping st:1
DTS 4261413679, next:317000 st:1 invalid dropping
PTS 4261413818, next:317000 invalid dropping st:1
DTS 4261413807, next:350000 st:1 invalid dropping
PTS 4261413935, next:350000 invalid dropping st:1
DTS 4261413901, next:383000 st:1 invalid dropping
PTS 4261413995, next:383000 invalid dropping st:1
DTS 4261413974, next:416000 st:1 invalid dropping
PTS 4261414047, next:416000 invalid dropping st:1
DTS 4261414075, next:449000 st:1 invalid dropping
PTS 4261414176, next:449000 invalid dropping st:1
DTS 4261414202, next:482000 st:1 invalid dropping
PTS 4261414329, next:482000 invalid dropping st:1
DTS 4261414240, next:515000 st:1 invalid dropping
PTS 4261414278, next:515000 invalid dropping st:1
DTS 4261414275, next:548000 st:1 invalid dropping
PTS 4261414310, next:548000 invalid dropping st:1
DTS 4261414382, next:581000 st:1 invalid dropping
PTS 4261414489, next:581000 invalid dropping st:1
DTS 4261414441, next:614000 st:1 invalid dropping
PTS 4261414500, next:614000 invalid dropping st:1
DTS 4261414536, next:647000 st:1 invalid dropping
PTS 4261414631, next:647000 invalid dropping st:1
DTS 4261414678, next:680000 st:1 invalid dropping
PTS 4261414820, next:680000 invalid dropping st:1
DTS 4275582106, next:713000 st:1 invalid dropping
PTS 4272972316, next:713000 invalid dropping st:1
DTS 4242027805, next:746000 st:1 invalid dropping
PTS 4242027934, next:746000 invalid dropping st:1
DTS 4208473408, next:779000 st:1 invalid dropping
PTS 4208473441, next:779000 invalid dropping st:1
DTS 4208473541, next:812000 st:1 invalid dropping
PTS 4208473674, next:812000 invalid dropping st:1
DTS 4208473574, next:845000 st:1 invalid dropping
PTS 4208473607, next:845000 invalid dropping st:1
[flv @ 0000027de782d040] Non-monotonous DTS in output stream 0:1; previous: -16775393, current: -19385051; changing to -16775393. This may result in incorrect timestamps in the output file.
[flv @ 0000027de782d040] Non-monotonous DTS in output stream 0:1; previous: -16775393, current: -19384885; changing to -16775393. This may result in incorrect timestamps in the output file.
DTS 4208473667, next:878000 st:1 invalid dropping
PTS 4208473760, next:878000 invalid dropping st:1
DTS 4275582568, next:911000 st:1 invalid dropping
PTS 4275582609, next:911000 invalid dropping st:1
[flv @ 0000027de782d040] Non-monotonous DTS in output stream 0:1; previous: -16775393, current: -19384759; changing to -16775393. This may result in incorrect timestamps in the output file.
DTS 4275582601, next:944000 st:1 invalid dropping
PTS 4275582634, next:944000 invalid dropping st:1
[flv @ 0000027de782d040] Non-monotonous DTS in output stream 0:1; previous: -16775393, current: -19384685; changing to -16775393. This may result in incorrect timestamps in the output file.
DTS 4275582636, next:977000 st:1 invalid dropping
PTS 4275582671, next:977000 invalid dropping st:1
[flv @ 0000027de782d040] Non-monotonous DTS in output stream 0:1; previous: -16775393, current: -19384650; changing to -16775393. This may result in incorrect timestamps in the output file.
DTS 4275582883, next:1010000 st:1 invalid dropping
PTS 4275583130, next:1010000 invalid dropping st:1
DTS 4275583010, next:1043000 st:1 invalid dropping
PTS 4275583137, next:1043000 invalid dropping st:1
[flv @ 0000027de782d040] Non-monotonous DTS in output stream 0:1; previous: -16775393, current: -19384276; changing to -16775393. This may result in incorrect timestamps in the output file.
DTS 4275583073, next:1076000 st:1 invalid dropping
PTS 4275583136, next:1076000 invalid dropping st:1
DTS 4275583153, next:1109000 st:1 invalid dropping
PTS 4275583233, next:1109000 invalid dropping st:1
[flv @ 0000027de782d040] Non-monotonous DTS in output stream 0:1; previous: -16775393, current: -19384133; changing to -16775393. This may result in incorrect timestamps in the output file.
DTS 4275583201, next:1142000 st:1 invalid dropping
PTS 4275583249, next:1142000 invalid dropping st:1
[flv @ 0000027de782d040] Non-monotonous DTS in output stream 0:1; previous: -16775393, current: -19384085; changing to -16775393. This may result in incorrect timestamps in the output file.
DTS 4275583298, next:1175000 st:1 invalid dropping
PTS 4275583395, next:1175000 invalid dropping st:1
DTS 4275583334, next:1208000 st:1 invalid dropping
PTS 4275583370, next:1208000 invalid dropping st:1
[flv @ 0000027de782d040] Non-monotonous DTS in output stream 0:1; previous: -16775393, current: -19383952; changing to -16775393. This may result in incorrect timestamps in the output file.
DTS 4275583566, next:1241000 st:1 invalid dropping
PTS 4275583798, next:1241000 invalid dropping st:1
DTS 4275583707, next:1274000 st:1 invalid dropping
PTS 4275583848, next:1274000 invalid dropping st:1
DTS 4289751135, next:1307000 st:1 invalid dropping
PTS 4287141345, next:1307000 invalid dropping st:1
[flv @ 0000027de782d040] Non-monotonous DTS in output stream 0:1; previous: -16775393, current: -19383720; changing to -16775393. This may result in incorrect timestamps in the output file.
DTS 4256196834, next:1340000 st:1 invalid dropping
PTS 4256196963, next:1340000 invalid dropping st:1
FFmpeg:
FFmpeg:
[q] command received. Exiting.
FFmpeg:
frame=   44 fps=0.0 q=-1.0 Lsize=     675kB time=00:00:01.34 bitrate=4125.0kbits/s speed=12.1x
video:666kB audio:7kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.251740%

用potplayer播放时,时长不正确
image
image

ffmpeg中断时,服务器出现以下错误

[2023-11-05T12:14:10Z ERROR rtmp::rtmp] session run error: session_type: server, app_name: live, stream_name: 001, err: pack error: bytes writer error: not enough bytes to write: io error

除此之外没有别的错误信息了

@harlanc
Copy link
Owner

harlanc commented Nov 5, 2023

[2023-11-05T12:14:10Z ERROR rtmp::rtmp] session run error: session_type: server, app_name: live, stream_name: 001, err: pack error: bytes writer error: not enough bytes to write: io error

这个错误,我这也复现了,在拉流的时候,直接断开就会报这个错误,意思是直接断开了TCP连接,拉流的session也会退出,这个流程看着是正常的,可以先不管吧,我在想要不要把这个error改成info。

ffmpeg拉流失败,应该是packetize rtmp chunk的时候时间戳写错了,推流多长时间才会报这个错误?最好在出错的时候可以抓个网络包,上下行的都要,我这边再分析分析,谢谢

@ningnao
Copy link
Contributor Author

ningnao commented Nov 6, 2023

我是第三天下午才发现的这个问题,第二天晚上试的时候还是好的

我在服务端抓了推流包和拉流的包,这个传国内网盘吧
链接:https://pan.baidu.com/s/1lp8sTm5qrs2zbbiORm_VZg?pwd=bpwl
提取码:bpwl

@harlanc
Copy link
Owner

harlanc commented Nov 10, 2023

把rtmp 组包和拆包逻辑梳理了一下,提交了一次代码,看看问题还能复现吗?不过看你提供的抓包记录,发现推上来的音频和视频时间戳差别比较大,而且视频的时间戳有点怪:
image

@ningnao
Copy link
Contributor Author

ningnao commented Nov 11, 2023

试了一下最新版本,panic

thread 'tokio-runtime-worker' panicked at protocol/rtmp/src/chunk/unpacketizer.rs:312:21:
assertion `left == right` failed
  left: 1
 right: 0

@ningnao
Copy link
Contributor Author

ningnao commented Nov 14, 2023

试了一下 03737a0,obs和action推流正常,摄像头推流依然 panic
我打断点观察到第一次format_id为0,csid为3,第二次format_id为1,csid为2,放行后panic

@harlanc
Copy link
Owner

harlanc commented Nov 14, 2023

看了下标准文档:
image
一个新的chunk stream 第一个chunk 肯定是 Type 0呀 也就是format==0(csid为2 的第一个 chunk应该是format_id==0,但是你提供的信息是1),可以抓下包吗?再看看具体情况

@harlanc
Copy link
Owner

harlanc commented Nov 14, 2023

不过看样子这个csid 2应该不是音视频包,可以把

assert_eq!(format_id, 0);

这个断言注释掉再试试,这个chunk的msg_streamd_id 就是随机值了,可能不影响功能。

@ningnao
Copy link
Contributor Author

ningnao commented Nov 14, 2023

抓包数据:
链接:https://pan.baidu.com/s/1sv_nokgwnyjUTxjz87ZYsQ?pwd=z9j5
提取码:z9j5

这个断言注释掉再试试,这个chunk的msg_streamd_id 就是随机值了,可能不影响功能。

这个我之前尝试过了,packetizer.rs:51

cur_msg_header.timestamp_delta = cur_msg_header.timestamp - pre_msg_header.timestamp;

这里会有问题.. 减出来一个负数,debug下panic了

然后我也试着求他俩的差值,但是播放的时候不太正常,具体表现是前一两秒正常,后面都是慢放

@harlanc
Copy link
Owner

harlanc commented Nov 14, 2023

加微信吧,github上效率太低,harlancc

@ningnao
Copy link
Contributor Author

ningnao commented Nov 14, 2023

加您了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants