Medusa models seem to be slower than the original base models #1641

infinitylogesh · 2024-03-13T17:54:45Z

System Info

Thank you for adding support for Medusa. In my comparison of Medusa models versus the original base models with TGI, the latter appeared to be quicker.

I tested the below models:

text-generation-inference/gemma-7b-it-medusa
text-generation-inference/Mixtral-8x7B-Instruct-v0.1-medusa
text-generation-inference/Mistral-7B-Instruct-v0.2-medusa
FasterDecoding/medusa-vicuna-7b-v1.3 ( revision="refs/pr/1" )

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Command used :

docker run --gpus all --shm-size 1g -p 8081:80 ghcr.io/huggingface/text-generation-inference:1.4.3 --model-id text-generation-inference/Mistral-7B-Instruct-v0.2-medusa --num-shard 1

Hardware:

1xH100

Expected behavior

Medusa models should be faster than the original non-medusa models

github-actions · 2024-04-14T02:01:15Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

infinitylogesh changed the title ~~Medusa models seems to be slower than the original base models~~ Medusa models seem to be slower than the original base models Mar 13, 2024

github-actions bot added the Stale label Apr 14, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Medusa models seem to be slower than the original base models #1641

Medusa models seem to be slower than the original base models #1641

infinitylogesh commented Mar 13, 2024 •

edited

Loading

github-actions bot commented Apr 14, 2024

Medusa models seem to be slower than the original base models #1641

Medusa models seem to be slower than the original base models #1641

Comments

infinitylogesh commented Mar 13, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

github-actions bot commented Apr 14, 2024

infinitylogesh commented Mar 13, 2024 •

edited

Loading