Generate to `max_total_tokens` during warmup #286

tgaddair · 2024-02-28T00:34:14Z

Closes #279.

If the user sets max_total_tokens to a value greater than what the model can support, catch the error during warmup (initialization) rather than at request time. This is important because device-side asserts leave the server in a broken state requiring a reset.

jeffreyftang · 2024-02-28T00:41:20Z

server/lorax_server/models/flash_causal_lm.py

@@ -731,7 +732,11 @@ def warmup(self, batch: FlashCausalLMBatch):
                self.dtype,
                self.device,
            )
-            _, batch = self.generate_token(batch)
+
+            with tqdm(total=max_new_tokens, desc="Warmup to max_total_tokens") as pbar:


Nit: could be nice to put the actual value of max_total_tokens in as well.

Good point, though will require a bit more plumbing. We do show the max new tokens from the progress bar counter, so will leave as is for now.

tgaddair added 2 commits February 27, 2024 15:43

Consider max_total_tokens during warmup

57387d7

Warmup to max_total_tokens

06e0449

tgaddair requested a review from jeffreyftang February 28, 2024 00:34

cargo fmt

80f1f2b

tgaddair changed the title ~~Generate max_total_tokens during warmup~~ Generate to max_total_tokens during warmup Feb 28, 2024

jeffreyftang approved these changes Feb 28, 2024

View reviewed changes

tgaddair merged commit e51f078 into main Feb 28, 2024
1 of 2 checks passed

tgaddair deleted the total-tokens-2 branch February 28, 2024 01:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate to `max_total_tokens` during warmup #286

Generate to `max_total_tokens` during warmup #286

tgaddair commented Feb 28, 2024

jeffreyftang Feb 28, 2024

tgaddair Feb 28, 2024

Generate to max_total_tokens during warmup #286

Generate to max_total_tokens during warmup #286

Conversation

tgaddair commented Feb 28, 2024

jeffreyftang Feb 28, 2024

Choose a reason for hiding this comment

tgaddair Feb 28, 2024

Choose a reason for hiding this comment

Generate to `max_total_tokens` during warmup #286

Generate to `max_total_tokens` during warmup #286