New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

支持并发输出对话token #1380

Open

5 tasks done

beijingtl opened this issue Apr 26, 2024 · 0 comments

Labels

beijingtl commented Apr 26, 2024 •

edited

Loading

例行检查

我已确认目前没有类似 issue
我已确认我已升级到最新版本
我已完整查看过项目 README，已确定现有版本无法满足需求
我理解并愿意跟进此 issue，协助测试和提供反馈
我理解并认可上述内容，并理解项目维护者精力有限，不遵循规则的 issue 可能会被无视或直接关闭

功能描述

在同一时刻，当一个外部API调用one-api时，如果在one-api中，存在2个定义相同的2个渠道（指模型名称相同，但后台对应启动了2个大模型docker服务）。此时，one-api仍然是“顺序”输出token。例如，当请求1token输出完成之后，再输出token2的请求。

基于“负载均衡”的逻辑，能否在调用one-api的一个API时（名称、url、授权均相同），自动增加负载支持，同时输出2个请求的对话？

应用场景

同1个one-api高并发请求出现时，可底层配置多个大语言模型实例，来提高并发能力。但需要one-api支持相同API能“并发”输出token。

beijingtl added the enhancement label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment