Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate to max_total_tokens during warmup #286

Merged
merged 3 commits into from
Feb 28, 2024
Merged

Generate to max_total_tokens during warmup #286

merged 3 commits into from
Feb 28, 2024

Conversation

tgaddair
Copy link
Contributor

Closes #279.

If the user sets max_total_tokens to a value greater than what the model can support, catch the error during warmup (initialization) rather than at request time. This is important because device-side asserts leave the server in a broken state requiring a reset.

@tgaddair tgaddair changed the title Generate max_total_tokens during warmup Generate to max_total_tokens during warmup Feb 28, 2024
@@ -731,7 +732,11 @@ def warmup(self, batch: FlashCausalLMBatch):
self.dtype,
self.device,
)
_, batch = self.generate_token(batch)

with tqdm(total=max_new_tokens, desc="Warmup to max_total_tokens") as pbar:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could be nice to put the actual value of max_total_tokens in as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, though will require a bit more plumbing. We do show the max new tokens from the progress bar counter, so will leave as is for now.

@tgaddair tgaddair merged commit e51f078 into main Feb 28, 2024
1 of 2 checks passed
@tgaddair tgaddair deleted the total-tokens-2 branch February 28, 2024 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LoRAX attempting to serve requests with token length greater than max input tokens
2 participants