You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Very slow inference during agent work in comparison to usual LLM interaction
I'm using local setup with API connection to TextGen WebUI in local network
Each iteration of TaskWeaver is very-very slow
generation speed is drastically decreased to around 1-2 t/s (usual speed on same setup 15-20 t/s)
At this communication rate this tool is net very useful, simple coding task like print numbers executed in 20-30 mins.
Is there any tweak to solve it. I guess it could because of relatively large context in each request?
To Reproduce
Steps to reproduce the behavior:
Start the service
Type the user query "any listed query from example description"
Wait for the response forever
Expected behavior
Similar inference speed as Autogen
Environment Information (please complete the following information):
OS: MacOS
Python Version 3.11
LLM that you're using: number of different 7b models
The text was updated successfully, but these errors were encountered:
Describe the bug Very slow inference during agent work in comparison to usual LLM interaction I'm using local setup with API connection to TextGen WebUI in local network Each iteration of TaskWeaver is very-very slow generation speed is drastically decreased to around 1-2 t/s (usual speed on same setup 15-20 t/s)
At this communication rate this tool is net very useful, simple coding task like print numbers executed in 20-30 mins. Is there any tweak to solve it. I guess it could because of relatively large context in each request?
To Reproduce Steps to reproduce the behavior:
Start the service
Type the user query "any listed query from example description"
Wait for the response forever
Expected behavior Similar inference speed as Autogen
Environment Information (please complete the following information):
OS: MacOS
Python Version 3.11
LLM that you're using: number of different 7b models
Describe the bug
Very slow inference during agent work in comparison to usual LLM interaction
I'm using local setup with API connection to TextGen WebUI in local network
Each iteration of TaskWeaver is very-very slow
generation speed is drastically decreased to around 1-2 t/s (usual speed on same setup 15-20 t/s)
At this communication rate this tool is net very useful, simple coding task like print numbers executed in 20-30 mins.
Is there any tweak to solve it. I guess it could because of relatively large context in each request?
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Similar inference speed as Autogen
Environment Information (please complete the following information):
The text was updated successfully, but these errors were encountered: