How to do partial inference by SGLang #4798
Unanswered
VincentXWD
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm wondering how to do partial inference using SGLang. For example, I want to do embed layer and from layer 0 to layer 5 (in total 7 layers) on one SGLang instance. Now I implemented the sharded model definition. What's the best practice?
Beta Was this translation helpful? Give feedback.
All reactions