You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for adding support for Medusa. In my comparison of Medusa models versus the original base models with TGI, the latter appeared to be quicker.
docker run --gpus all --shm-size 1g -p 8081:80 ghcr.io/huggingface/text-generation-inference:1.4.3 --model-id text-generation-inference/Mistral-7B-Instruct-v0.2-medusa --num-shard 1
Hardware:
1xH100
Expected behavior
Medusa models should be faster than the original non-medusa models
The text was updated successfully, but these errors were encountered:
infinitylogesh
changed the title
Medusa models seems to be slower than the original base models
Medusa models seem to be slower than the original base models
Mar 13, 2024
System Info
Thank you for adding support for Medusa. In my comparison of Medusa models versus the original base models with TGI, the latter appeared to be quicker.
I tested the below models:
Information
Tasks
Reproduction
Command used :
Hardware:
1xH100
Expected behavior
Medusa models should be faster than the original non-medusa models
The text was updated successfully, but these errors were encountered: