You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using greedy search (do_sample=False) and dtype=fp32 the generated tokens are not shown in the output of the query. I believe the text generation is happening, because different values for max_new_tokens lead to different runtimes for the query. See this notebook as a minimal example.
We don't currently support fp32 for the Bloom models in MII & DeepSpeed-Inference. I believe this is because the checkpoints are all in half precision. We correctly check the configs with Bloom-176B model, but fail to do so for the smaller variants. I added a fix for this in #107
I just ran your example using fp16 and I see output.
When using greedy search (
do_sample=False
) anddtype=fp32
the generated tokens are not shown in the output of the query. I believe the text generation is happening, because different values formax_new_tokens
lead to different runtimes for the query. See this notebook as a minimal example.The text was updated successfully, but these errors were encountered: