Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ feat: 实现平滑输出功能 #1197

Closed
wants to merge 1 commit into from
Closed

Conversation

KaiSiMai
Copy link

@KaiSiMai KaiSiMai commented Jan 30, 2024

💻 变更类型 | Change Type

  • ✨ feat
  • 🐛 fix
  • ♻️ refactor
  • 💄 style
  • 🔨 chore
  • ⚡️ perf
  • 📝 docs

🔀 变更说明 | Description of Change

⧗ input: ✨ feat:平滑输出

在网络波动时,逐字平滑不间断输出,默认速率:300ms/字符缓冲区长度

📝 补充信息 | Additional Information

使用了一段时间发现,对于中文,大多流式api返回字符串长度不定,加上网络波动会出现 字符串闪现,在使用观感上造成卡顿。
修改后将原来直接输出字符串,改为字符串push队列,队列逐字输出。默认300ms内将队列输出完毕。配合定时器,输出速度随队列长度变化而变化。在流获取完毕后,一次性输出队列剩余内容,修改前相比速度上相同,输出更平滑舒适。

引入了一个新的特性,用于在网络波动时保持逐字平滑不间断输出。该功能默认的字符缓冲区输出速率为300毫秒/缓冲区长度,以提高用户体验。此外,还增加了对异常情况的捕获和处理,确保了即使在网络不稳定的情况下,输出过程也不会中断,从而提高了系统的稳定性和可靠性。
Copy link

vercel bot commented Jan 30, 2024

@KaiSiMai is attempting to deploy a commit to the LobeHub Team on Vercel.

A member of the Team first needs to authorize it.

@lobehubbot
Copy link
Member

👍 @KaiSiMai

Thank you for raising your pull request and contributing to our Community
Please make sure you have followed our contributing guidelines. We will review it as soon as possible.
If you encounter any problems, please feel free to connect with us.
非常感谢您提出拉取请求并为我们的社区做出贡献,请确保您已经遵循了我们的贡献指南,我们会尽快审查它。
如果您遇到任何问题,请随时与我们联系。

@arvinxx
Copy link
Contributor

arvinxx commented Jan 31, 2024

@KaiSiMai 当前的实现是已经平滑输出过了的: #945

请给个 demo 预览下差异,或者详细说明一下你的实现比现有的平滑输出实现的优点在哪?

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@KaiSiMai The current implementation has smoothed output: #945

Please give a demo to preview the difference, or explain in detail what are the advantages of your implementation over the existing smooth output implementation?

@ShinChven
Copy link

我用的 one-api + azure openai service,看不到平滑输出,每次都要等所有内容生成完了才会一次性输出,而在 ChatGPT Next Web 和 ChatBox 那边是没有问题的。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I use one-api + azure openai service, but I can't see smooth output. I have to wait until all the content is generated before outputting it all at once. However, there is no problem with ChatGPT Next Web and ChatBox.

@arvinxx
Copy link
Contributor

arvinxx commented Jan 31, 2024

我用的 one-api + azure openai service,看不到平滑输出,每次都要等所有内容生成完了才会一次性输出,而在 ChatGPT Next Web 和 ChatBox 那边是没有问题的

看下这个有没有帮助? #531

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I use one-api + azure openai service, but I can’t see the smooth output. Every time I have to wait until all the content is generated before outputting it all at once. However, there is no problem with ChatGPT Next Web and ChatBox.

See if this helps? #531

@ShinChven
Copy link

我用的 one-api + azure openai service,看不到平滑输出,每次都要等所有内容生成完了才会一次性输出,而在 ChatGPT Next Web 和 ChatBox 那边是没有问题的

看下这个有没有帮助? #531

好像不太行

        location / {
                proxy_set_header X-Forwarded-Host $host;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-Server $host;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

                # ========= edit your app's host here =========
                proxy_pass http://127.0.0.1:8080;

                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "upgrade";
                proxy_cache off;
                proxy_buffering off;
                chunked_transfer_encoding on;
                tcp_nopush on;
                tcp_nodelay on;
                keepalive_timeout 300;
        }

@arvinxx
Copy link
Contributor

arvinxx commented Jan 31, 2024

@ShinChven Discord上私戳我,帮你看看

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@ShinChven PM me privately on Discord to help you take a look

@KaiSiMai
Copy link
Author

  • 修改后
default.mp4
  • 修改前
default.mp4

演示api响应较慢场景。

  • 修改前
    应该是在动画上做了控制,虽然逐字输出了,但api延迟不定的情况下还是会造成“ 我 是一 个 助手” 这样虽然快但输出间断的现象。

  • 修改后
    在接收响应时将chunk分割字符放入队列,配合计时器,能保证300毫秒内将队列输出完毕,在while chunk 结束后,一次性输出剩余内容。只要两次接收流的延迟在300ms内,可以尽可能的保持平稳不间断。达到的效果是“我 是 一 个 助 手” 这样不间断输出,在接收响应流快时加快输出速率,慢时减慢速率。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


  • After modification
default.mp4
  • before fixing
default.mp4

Demonstrates a scenario where the API response is slow.

  • before fixing
    It should be controlled on the animation. Although it is output verbatim, the variable API delay will still cause the phenomenon of "I am an assistant" although it is fast but the output is intermittent.

  • After modification
    When receiving the response, put the chunk segmentation characters into the queue, and cooperate with the timer to ensure that the queue is output within 300 milliseconds. After the while chunk ends, the remaining content is output at once. As long as the delay between two receiving streams is within 300ms, it can be kept as smooth and uninterrupted as possible. The effect achieved is uninterrupted output like "I am an assistant". When the response flow is fast, the output speed is accelerated, and when the response flow is slow, the output speed is slowed down.

@arvinxx
Copy link
Contributor

arvinxx commented Jan 31, 2024

修改前
应该是在动画上做了控制,虽然逐字输出了,但api延迟不定的情况下还是会造成“ 我 是一 个 助手” 这样虽然快但输出间断的现象。

修改后
在接收响应时将chunk分割字符放入队列,配合计时器,能保证300毫秒内将队列输出完毕,在while chunk 结束后,一次性输出剩余内容。只要两次接收流的延迟在300ms内,可以尽可能的保持平稳不间断。达到的效果是“我 是 一 个 助 手” 这样不间断输出,在接收响应流快时加快输出速率,慢时减慢速率。

你可以看下现在 sse 获取消息后的核心处理实现:

    await fetchSSE(fetcher, {

       onMessageHandle: async (text) => {
            output += text;
            outputQueue.push(...text.split(''));

       // is this message is just a function call
        if (isFunctionMessageAtStart(output)) {
          stopAnimation();
          dispatchMessage({
            id: assistantId,
            key: 'content',
            type: 'updateMessage',
            value: output,
          });
          isFunctionCall = true;
        }

        // if it's the first time to receive the message,
        // and the message is not a function call
        // then start the animation
        if (!isAnimationActive && !isFunctionCall) startAnimation();
      },
    })

在这个方法里已经做了把 sse 的消息 chunk 推入输出 queue 堆栈,然后在 startAnimation 部分,再是逐字将消息显示到界面上。现在是逐字输出的间隔是 16ms 。

 // define startAnimation function to display the text in buffer smooth
    // when you need to start the animation, call this function
    const startAnimation = (speed = 2) =>
      new Promise<void>((resolve) => {
        if (isAnimationActive) {
          resolve();
          return;
        }

        isAnimationActive = true;

        const updateText = () => {
          // 如果动画已经不再激活,则停止更新文本
          if (!isAnimationActive) {
            clearTimeout(animationTimeoutId!);
            animationTimeoutId = null;
            resolve();
          }

          // 如果还有文本没有显示
          // 检查队列中是否有字符待显示
          if (outputQueue.length > 0) {
            // 从队列中获取前两个字符(如果存在)
            const charsToAdd = outputQueue.splice(0, speed).join('');
            buffer += charsToAdd;

            // 更新消息内容,这里可能需要结合实际情况调整
            dispatchMessage({ id, key: 'content', type: 'updateMessage', value: buffer });

            // 设置下一个字符的延迟
            animationTimeoutId = setTimeout(updateText, 16); // 16 毫秒的延迟模拟打字机效果
          } else {
            // 当所有字符都显示完毕时,清除动画状态
            isAnimationActive = false;
            animationTimeoutId = null;
            resolve();
          }
        };

        updateText();
      });

你的改动和现有的实现,我理解思路上应该是一样的?无非一个是在 sse 请求方法体里做,还是外层使用的地方做的区别?

我自己是更倾向是 sse 的 fetch 层不做这部分的处理,应该放在展示层优化,有两个考量:

  1. 其他使用 sse 的地方可能不见得需要这个处理,在请求层直接做掉会导致应用部分没法自定义这块的处理逻辑;
  2. function_call 这种识别到前置字符串就需要替换别的展示形式的场景并不适合,我之前特别测试过,会出现大量无意义的 \n 字符,进而报错;

另外还有一个问题是,你提到的这种停顿的问题,是不是只有响应间隔很长才会遇到?我自己感觉按目前 gpt-4-turbo 的速度来看好像都还好吧,需要做这个更进一步的优化么?

@KaiSiMai
Copy link
Author

KaiSiMai commented Feb 1, 2024

1.我认同你的看法
2.这个确实没深入了解

是只有响应间隔很长才会遇到,有的代理商就是慢。
按现在的方案,16ms 可能不太稳,我觉得是可以做到 配置中 支持根据自己响应的需要 调节字符输出间隔

按我的方案,将动态速率计算(300ms/queue.length)迁移到动画这里,加一个配置项 支持根据平均响应间隔 配置这个 300ms

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


1.I agree with you
2. I really don’t understand this in depth

This is only encountered when the response interval is very long, and some agents are just slow.
According to the current plan, 16ms may not be stable, but I think it can be achieved. The configuration supports adjusting the character output interval according to your own response needs.

According to my plan, the dynamic rate calculation (300ms/queue.length) is migrated to the animation, and a configuration item is added to support configuring this 300ms based on the average response interval.

@arvinxx
Copy link
Contributor

arvinxx commented Feb 1, 2024

按现在的方案,16ms 可能不太稳,我觉得是可以做到 配置中 支持根据自己响应的需要 调节字符输出间隔

我觉得是否有可能把这个16ms按照你之前的思路做成一个动态化的策略?包含上你期望的300ms。做成用户配置的我觉得不太现实,因为这个点有点太细节了。最好是程序化实现。

比如每次响应间隔太慢,那也对应放缓输出的速度,如果每次响应间隔都很快,那么就按16ms 的来输出。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


According to the current plan, 16ms may not be stable, but I think it can be achieved. The configuration supports adjusting the character output interval according to your own response needs.

I think it is possible to make this 16ms into a dynamic strategy according to your previous idea? Include the 300ms you expect. I don't think it's realistic to make it user-configurable, because it's a bit too detailed. It is best to implement it programmatically.

For example, if the response interval is too slow, the output speed will be slowed down. If the response interval is very fast, the output will be 16ms.

@arvinxx
Copy link
Contributor

arvinxx commented Oct 28, 2024

@KaiSiMai hello,后来我还是把 smoothing 的特性挪到 fetchSSE 层了,然后在上层应用的地方按需开启。实现在:https://github.com/lobehub/lobe-chat/blob/main/src/utils/fetch/fetchSSE.ts#L52-L233

由于这个 PR 和现在的代码实现也有了较大的差别,我先关闭了。

这半年来的模型发展进展非常迅猛,之前的 smoothing 在某些场景下甚至变成了负优化。我觉得你的提的在 300ms 内将 buffer 内的消息全部输出在高 TPS 场景下还是很有意义的。如果你感兴趣的话,欢迎继续 PR 交流。

@arvinxx arvinxx closed this Oct 28, 2024
@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@KaiSiMai hello, later I moved the smoothing feature to the fetchSSE layer, and then enabled it on demand in the upper layer application. Implemented at: https://github.com/lobehub/lobe-chat/blob/main/src/utils/fetch/fetchSSE.ts#L52-L233

Since there is a big difference between this PR and the current code implementation, I closed this PR first.

Model development has progressed very rapidly in the past six months, and the previous smoothing has even turned into negative optimization in some scenarios. I think what you mentioned about outputting all the messages in the buffer within 300ms is still very meaningful in high TPS scenarios. If you are interested, please feel free to continue PR communication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants