feat: medusa v2 #1734

OlivierDehaene · 2024-04-12T13:42:27Z

No description provided.

Narsil

Nice.

Few nits.

Narsil · 2024-04-12T13:52:30Z

server/text_generation_server/utils/layers.py

        super().__init__()
        self.blocks = torch.nn.ModuleList(
            [
                ResBlock(config, prefix=f"{prefix}.{i}", weights=weights)
-                for i in range(config["medusa_num_layers"])
+                for i in range(medusa_config["medusa_num_layers"])


Which should probably take that information from speculate instead (to avoid loading layers which we're not going to use)

We can do in another PR

Narsil · 2024-04-12T13:53:58Z

server/text_generation_server/utils/layers.py

+
+        self.act = torch.nn.SiLU()
+
+        self.lm_head = TensorParallelHead.load(config, prefix, weights)


I thought what I had done was cleverer by passing the lm_head directly (which avoids a reload).

Since we're using sharding here Weights will actually load the LM head n times with current code. (iirc)

Narsil · 2024-04-12T13:55:59Z

server/text_generation_server/utils/layers.py

-            medusa = MedusaModel(config, weights)
+            lm_head = None
+            try:
+                medusa = MedusaHeadV2(config, prefix, weights)


Shouldn't it be the reverse ? MedusaV2 will always load properly (since it's only a subset of V1) ?

Narsil · 2024-04-12T13:56:32Z

server/text_generation_server/utils/layers.py

@@ -467,46 +467,159 @@ def forward(self, x):
        return x


-class SpeculativeHead(nn.Module):
+class MedusaHeadV1(nn.Module):


Maybe it's time we put them into their own file (both in the same file is ok I think)?

Narsil

LGTM.

Let's move the nits as chore.

fxmarty · 2024-05-19T16:37:07Z

It would be helpful to document it.

feat: medusa v2

308d7bc

Narsil reviewed Apr 12, 2024

View reviewed changes

swap load

68717f8

Narsil previously approved these changes Apr 12, 2024

View reviewed changes

remove movedim

0dd617b

OlivierDehaene dismissed Narsil’s stale review via 0dd617b April 12, 2024 14:24

OlivierDehaene merged commit eefea5e into main Apr 12, 2024
1 of 3 checks passed

OlivierDehaene deleted the feat/medusa_v2 branch April 12, 2024 14:24

kdamaszk pushed a commit to kdamaszk/tgi-gaudi that referenced this pull request Apr 29, 2024

feat: medusa v2 (huggingface#1734)

f6d5c2e

Nilabhra pushed a commit to TII-AI-Research-Center/text-generation-inference that referenced this pull request May 14, 2024

feat: medusa v2 (huggingface#1734)

4f28b40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: medusa v2 #1734

feat: medusa v2 #1734

OlivierDehaene commented Apr 12, 2024

Narsil left a comment

Narsil Apr 12, 2024

Narsil Apr 12, 2024

Narsil Apr 12, 2024

Narsil Apr 12, 2024

Narsil left a comment

fxmarty commented May 19, 2024


		self.act = torch.nn.SiLU()

		self.lm_head = TensorParallelHead.load(config, prefix, weights)

feat: medusa v2 #1734

feat: medusa v2 #1734

Conversation

OlivierDehaene commented Apr 12, 2024

Narsil left a comment

Choose a reason for hiding this comment

Narsil Apr 12, 2024

Choose a reason for hiding this comment

Narsil Apr 12, 2024

Choose a reason for hiding this comment

Narsil Apr 12, 2024

Choose a reason for hiding this comment

Narsil Apr 12, 2024

Choose a reason for hiding this comment

Narsil left a comment

Choose a reason for hiding this comment

fxmarty commented May 19, 2024