-
-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ feat: 实现平滑输出功能 #1197
✨ feat: 实现平滑输出功能 #1197
Conversation
引入了一个新的特性,用于在网络波动时保持逐字平滑不间断输出。该功能默认的字符缓冲区输出速率为300毫秒/缓冲区长度,以提高用户体验。此外,还增加了对异常情况的捕获和处理,确保了即使在网络不稳定的情况下,输出过程也不会中断,从而提高了系统的稳定性和可靠性。
@KaiSiMai is attempting to deploy a commit to the LobeHub Team on Vercel. A member of the Team first needs to authorize it. |
Thank you for raising your pull request and contributing to our Community |
@KaiSiMai The current implementation has smoothed output: #945 Please give a demo to preview the difference, or explain in detail what are the advantages of your implementation over the existing smooth output implementation? |
我用的 one-api + azure openai service,看不到平滑输出,每次都要等所有内容生成完了才会一次性输出,而在 ChatGPT Next Web 和 ChatBox 那边是没有问题的。 |
I use one-api + azure openai service, but I can't see smooth output. I have to wait until all the content is generated before outputting it all at once. However, there is no problem with ChatGPT Next Web and ChatBox. |
看下这个有没有帮助? #531 |
See if this helps? #531 |
好像不太行
|
@ShinChven Discord上私戳我,帮你看看 |
@ShinChven PM me privately on Discord to help you take a look |
default.mp4
default.mp4演示api响应较慢场景。
|
default.mp4
default.mp4Demonstrates a scenario where the API response is slow.
|
你可以看下现在 sse 获取消息后的核心处理实现: await fetchSSE(fetcher, {
onMessageHandle: async (text) => {
output += text;
outputQueue.push(...text.split(''));
// is this message is just a function call
if (isFunctionMessageAtStart(output)) {
stopAnimation();
dispatchMessage({
id: assistantId,
key: 'content',
type: 'updateMessage',
value: output,
});
isFunctionCall = true;
}
// if it's the first time to receive the message,
// and the message is not a function call
// then start the animation
if (!isAnimationActive && !isFunctionCall) startAnimation();
},
}) 在这个方法里已经做了把 sse 的消息 chunk 推入输出 queue 堆栈,然后在 // define startAnimation function to display the text in buffer smooth
// when you need to start the animation, call this function
const startAnimation = (speed = 2) =>
new Promise<void>((resolve) => {
if (isAnimationActive) {
resolve();
return;
}
isAnimationActive = true;
const updateText = () => {
// 如果动画已经不再激活,则停止更新文本
if (!isAnimationActive) {
clearTimeout(animationTimeoutId!);
animationTimeoutId = null;
resolve();
}
// 如果还有文本没有显示
// 检查队列中是否有字符待显示
if (outputQueue.length > 0) {
// 从队列中获取前两个字符(如果存在)
const charsToAdd = outputQueue.splice(0, speed).join('');
buffer += charsToAdd;
// 更新消息内容,这里可能需要结合实际情况调整
dispatchMessage({ id, key: 'content', type: 'updateMessage', value: buffer });
// 设置下一个字符的延迟
animationTimeoutId = setTimeout(updateText, 16); // 16 毫秒的延迟模拟打字机效果
} else {
// 当所有字符都显示完毕时,清除动画状态
isAnimationActive = false;
animationTimeoutId = null;
resolve();
}
};
updateText();
}); 你的改动和现有的实现,我理解思路上应该是一样的?无非一个是在 sse 请求方法体里做,还是外层使用的地方做的区别? 我自己是更倾向是 sse 的 fetch 层不做这部分的处理,应该放在展示层优化,有两个考量:
另外还有一个问题是,你提到的这种停顿的问题,是不是只有响应间隔很长才会遇到?我自己感觉按目前 gpt-4-turbo 的速度来看好像都还好吧,需要做这个更进一步的优化么? |
1.我认同你的看法 是只有响应间隔很长才会遇到,有的代理商就是慢。 按我的方案,将动态速率计算(300ms/queue.length)迁移到动画这里,加一个配置项 支持根据平均响应间隔 配置这个 300ms |
1.I agree with you This is only encountered when the response interval is very long, and some agents are just slow. According to my plan, the dynamic rate calculation (300ms/queue.length) is migrated to the animation, and a configuration item is added to support configuring this 300ms based on the average response interval. |
我觉得是否有可能把这个16ms按照你之前的思路做成一个动态化的策略?包含上你期望的300ms。做成用户配置的我觉得不太现实,因为这个点有点太细节了。最好是程序化实现。 比如每次响应间隔太慢,那也对应放缓输出的速度,如果每次响应间隔都很快,那么就按16ms 的来输出。 |
I think it is possible to make this 16ms into a dynamic strategy according to your previous idea? Include the 300ms you expect. I don't think it's realistic to make it user-configurable, because it's a bit too detailed. It is best to implement it programmatically. For example, if the response interval is too slow, the output speed will be slowed down. If the response interval is very fast, the output will be 16ms. |
@KaiSiMai hello,后来我还是把 smoothing 的特性挪到 fetchSSE 层了,然后在上层应用的地方按需开启。实现在:https://github.com/lobehub/lobe-chat/blob/main/src/utils/fetch/fetchSSE.ts#L52-L233 由于这个 PR 和现在的代码实现也有了较大的差别,我先关闭了。 这半年来的模型发展进展非常迅猛,之前的 smoothing 在某些场景下甚至变成了负优化。我觉得你的提的在 300ms 内将 buffer 内的消息全部输出在高 TPS 场景下还是很有意义的。如果你感兴趣的话,欢迎继续 PR 交流。 |
@KaiSiMai hello, later I moved the smoothing feature to the fetchSSE layer, and then enabled it on demand in the upper layer application. Implemented at: https://github.com/lobehub/lobe-chat/blob/main/src/utils/fetch/fetchSSE.ts#L52-L233 Since there is a big difference between this PR and the current code implementation, I closed this PR first. Model development has progressed very rapidly in the past six months, and the previous smoothing has even turned into negative optimization in some scenarios. I think what you mentioned about outputting all the messages in the buffer within 300ms is still very meaningful in high TPS scenarios. If you are interested, please feel free to continue PR communication. |
💻 变更类型 | Change Type
🔀 变更说明 | Description of Change
⧗ input: ✨ feat:平滑输出
在网络波动时,逐字平滑不间断输出,默认速率:300ms/字符缓冲区长度
📝 补充信息 | Additional Information
使用了一段时间发现,对于中文,大多流式api返回字符串长度不定,加上网络波动会出现 字符串闪现,在使用观感上造成卡顿。
修改后将原来直接输出字符串,改为字符串push队列,队列逐字输出。默认300ms内将队列输出完毕。配合定时器,输出速度随队列长度变化而变化。在流获取完毕后,一次性输出队列剩余内容,修改前相比速度上相同,输出更平滑舒适。