-
|
Hi! I have read the paper on SOSP'25 for this project. It is indeed an excellent work that provides great programming flexibility for customizing inference processes for LLM applications. But I have a question. After reading the project documentation, I found that Pie seems to deploy the agent application and the large language model on the same GPU equipped machine, so Inferlet running locally can directly replace the agent application to perform every interaction with the external environment. However, many existing agent applications interact with LLM Serving Systems on cloud vendor servers to obtain intelligent support of LLMs. This allows me to use code agent applications to write code on my machine without GPUs or deployed large language models. Can Pie be compatible with such agent application scenarios? If possible, can it still benefit from reducing network communication overhead? If not possible, does this to some extent limit the applicability of Pie, as currently users using agents may not necessarily deploy models locally, and even if models are deployed, there may not be as many requests as in the cloud, resulting in some waste of this locally built LLM Serving System. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Dear @Gallopm, Thanks for opening the first discussion thread! Pie’s integrated compute and I/O primarily benefit use cases where the agent logic can be executed without external actions (e.g., evaluating symbolic expressions). In the scenario you mentioned such as modifying a user’s files, external I/O is required, so Pie does not gain the advantages of reduced boundary crossings in that case. This "split compute" setup does not limit Pie’s applicability. One can still use the send and receive APIs to interact with the user at runtime, and further improve efficiency through application-specific KV cache management. |
Beta Was this translation helpful? Give feedback.
Dear @Gallopm,
Thanks for opening the first discussion thread! Pie’s integrated compute and I/O primarily benefit use cases where the agent logic can be executed without external actions (e.g., evaluating symbolic expressions).
In the scenario you mentioned such as modifying a user’s files, external I/O is required, so Pie does not gain the advantages of reduced boundary crossings in that case.
This "split compute" setup does not limit Pie’s applicability. One can still use the send and receive APIs to interact with the user at runtime, and further improve efficiency through application-specific KV cache management.