-
Notifications
You must be signed in to change notification settings - Fork 301
Closed
Description
Currently our top-p (nucleus) sampler has the worst performance of our entire sampling offering, likely because of the sort operation we need to do for our softmax probabilities. See these rough numbers for example, where top-p is slower than beam, despite needing much less computation on the model itself.
We should see if there is anything we can do to speed up top-p sampling with XLA (e.g. a more XLA friendly sort op).
Metadata
Metadata
Assignees
Labels
No labels