Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch llama prompt #2111

Closed
wants to merge 5 commits into from
Closed

Conversation

tbogdala
Copy link

This PR is discussed in #2108 and handles mask creation for the Llama model that allows for processing a user supplied prompt in token batches instead of all at once. The key change was to Cache::mask(), adding a second usize and then creating the appropriately sized vector to turn into a Tensor there.

The code in candle-examples/examples/llama/main.rs in this PR may need smoothing, but other than that, I've tested the example with and without the new --prompt-batch-size CLI parameter and at a variety of sizes.

@LaurentMazare
Copy link
Collaborator

Yeah the change in the example part indeed seems a bit complex. Maybe we should just have the model change in this PR so that users of the candle-transformers crate can benefit from it and we don't need to adapt the example for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants