optimize sample_topp by filtering out small value elements up front #276

jrudolph · 2023-08-12T18:37:26Z

This works because we know that in worst case only 1 element will be selected and therefore the remaining (n-1) elements have to split the remaining (1-topp) probability. Probabilities smaller than that cannot be selected and can be filtered out up front.

E.g. for p = 0.9 that means that usually only 100-1000 tokens remain, speeding up the remaining process considerably.

(In llama2.scala, I further improved on that by avoiding the sort in most cases, based on the observation that the distribution looks like power-law and only very few elements will be selected ultimately, so that iteratively scanning over the array to find the next best element (kind of selection sort) keeping track of cumulative p seems to be a slightly better solution yet).

This works because we know that in worst case only 1 element will be selected and therefore the remaining (n-1) elements have to split the remaining (1-topp) probability. Probabilities smaller than that cannot be selected and can be filtered out up front.

cgbur · 2023-08-12T20:02:15Z

238 -> 509 tokens/s when sampling with temperature 1 and top-p 0.9. Really nice work! Did you notice much improvement with the linear scanning? I have not tried that yet in my port. I found the average number of logits per step is pretty drastically reduced already.

jrudolph · 2023-08-12T22:11:19Z

Ah, didn't see #270 and #274 came first with the same idea :)

For fun, I ran a code generation model on llama2.scala for a while and asked it for suggestions: https://gist.github.com/jrudolph/fb7641ba2406de705c5499280783b55c

The suggested algorithms are of sometimes comically bad quality but some ideas seem interesting:

It suggested QuickSelect (which I think somewhat degenerates for the long-tail distributions we have here, but you could choose clever pivots, basically the filtering in the PR here is just the first step of a QuickSelect)
Gathering a histogram in a first run to quickly figure out an upper bound of elements to keep

jrudolph · 2023-08-12T22:13:19Z

Did you notice much improvement with the linear scanning

I think after the filtering it doesn't matter much any more, after all the speed improvements are only needed anyway for small models (since sampling speed only depends on vocabulary size regardless of model size).

jrudolph · 2023-08-13T08:33:47Z

I think after the filtering it doesn't matter much any more, after all the speed improvements are only needed anyway for small models (since sampling speed only depends on vocabulary size regardless of model size).

Ok, in Scala, the effect of scanning is improving just the top-p selection process by another 10x (but that's also because the naive idiomatic sorting involves a high abstraction overhead due to boxing).

[info] TopPBenchmark.topP  filterAndScan  thrpt    5  28793.391 ± 5723.854  ops/s
[info] TopPBenchmark.topP  filterAndSort  thrpt    5   2531.715 ±   50.438  ops/s
[info] TopPBenchmark.topP        sorting  thrpt    5    132.780 ±    5.378  ops/s

karpathy · 2023-08-14T00:05:46Z

Thank you for a nice PR!

jrudolph · 2023-08-29T14:27:46Z

Here's a small report on my experiments of trying out different top-p algorithms: https://blog.virtual-void.net/2023/08/29/calculating-top-p/

optimize sample_topp by filtering out small value elements up front

Majdoddin · 2024-02-01T16:45:26Z

please also consider #313, constant cut-off.
The code can safely fallback to other algorithms if constant cut-off doesn't work, but practically that doesn't happen.

rdentato mentioned this pull request Aug 12, 2023

Speed up sample_topp #274

Closed

karpathy merged commit 4a2c375 into karpathy:master Aug 14, 2023
6 checks passed

karpathy mentioned this pull request Aug 14, 2023

Speedup topp by sorting less elements for each token. #270

Closed

Majdoddin mentioned this pull request Aug 17, 2023

const cuttoff in sample_topp #313

Closed

jrudolph deleted the improve-top-p branch August 29, 2023 13:55

vinhtran2611 pushed a commit to vinhtran2611/llama2.c that referenced this pull request Jan 20, 2024

Merge pull request karpathy#276 from jrudolph/improve-top-p

6d9f5aa

optimize sample_topp by filtering out small value elements up front

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize sample_topp by filtering out small value elements up front #276

optimize sample_topp by filtering out small value elements up front #276

jrudolph commented Aug 12, 2023 •

edited

Loading

cgbur commented Aug 12, 2023

jrudolph commented Aug 12, 2023

jrudolph commented Aug 12, 2023

jrudolph commented Aug 13, 2023

karpathy commented Aug 14, 2023

jrudolph commented Aug 29, 2023

Majdoddin commented Feb 1, 2024

optimize sample_topp by filtering out small value elements up front #276

optimize sample_topp by filtering out small value elements up front #276

Conversation

jrudolph commented Aug 12, 2023 • edited Loading

cgbur commented Aug 12, 2023

jrudolph commented Aug 12, 2023

jrudolph commented Aug 12, 2023

jrudolph commented Aug 13, 2023

karpathy commented Aug 14, 2023

jrudolph commented Aug 29, 2023

Majdoddin commented Feb 1, 2024

jrudolph commented Aug 12, 2023 •

edited

Loading