If you want to preload a model on the SGLang nodes, but not the default workers (or vis versa), you cannot.
Add it to preload models, blows up one or the other. A mixed deployment can never use preload therefore.
You have to manually patch the nodes after they are created in k8s, and do it after every deploy update because there is only one variable for the helm chart and it applies universally. Yet, the two main current node types would never preload the same models, so there really needs to be a preload per node type.
If you want to preload a model on the SGLang nodes, but not the default workers (or vis versa), you cannot.
Add it to preload models, blows up one or the other. A mixed deployment can never use preload therefore.
You have to manually patch the nodes after they are created in k8s, and do it after every deploy update because there is only one variable for the helm chart and it applies universally. Yet, the two main current node types would never preload the same models, so there really needs to be a preload per node type.