Support both medusa v1 and v2 #421

tgaddair · 2024-04-18T01:20:06Z

TGI recently introduced a smaller version of Medusa that doesn't require additional LM heads (only a single dense projection per "medusa head"). This makes it a great candidate for dynamic adapter loading, as new variants for 7b param models are < 100MB.

This implementation is taken largely from the one in TGI.

tgaddair added 9 commits April 16, 2024 15:20

WIP: medusa lora

8ba1362

WIP: medusa2

5b12ff1

Refactor weights

1ef6830

Fix slice shape

3e06164

Fix quantize

ad6aa01

Fixes

abfcfcc

Fix

04b4a7f

Enable LM_HEAD

2998ed9

Support both medusa v1 and v2

5227e24

tgaddair requested a review from noyoshi April 18, 2024 01:52

tgaddair merged commit 0a3c627 into main Apr 18, 2024
1 check passed

tgaddair deleted the medusa-lora branch April 18, 2024 15:54

tgaddair mentioned this pull request Apr 18, 2024

fix: Rename _get_slice to get_slice #424

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support both medusa v1 and v2 #421

Support both medusa v1 and v2 #421

tgaddair commented Apr 18, 2024

Support both medusa v1 and v2 #421

Support both medusa v1 and v2 #421

Conversation

tgaddair commented Apr 18, 2024