Skip to content

Min P Test Build (koboldcpp)

Compare
Choose a tag to compare
@kalomaze kalomaze released this 28 Oct 13:11
· 1113 commits to concedo since this release
6a4d9c2

Min P sampling added. When Top P = 0.69 it will override and scale based on 'Min P'.
The way that it works is:

  • Every possible token has a probability percentage attached to it.
  • The base min p value represents the starting required percentage. (For example, 0.05 = only include tokens that are at least 5% probable)
  • This gets multiplied by the top token in the entire list's probability. So if your top token is 90%, then that 5% is multiplied by 0.9x (4.5%)
  • So if the top token is 90% probable, and your base_min_p is set to 0.05, then only tokens that are at least 4.5% probable will be sampled from before temperature is applied.

This method seems more effective at selecting the reasonable tokens compared to both Top P and Top K.

Edit the SamplerBaseMinP.txt file to change the base 'consideration' value. The default is 0.05 (5%), but lower values can work surprisingly well even with a high temperature.

image
  • SET TOP K TO ZERO, AND TOP P = 0.69 TO ENABLE THE OVERRIDE

TEMPERATURE OVERRIDES

Like with the past builds, there are certain temperature values that trigger overrides for Dynamic Temperature. Dynamic Temp is not necessary to use Min P, but it can be used in tandem with it. Dynamic Temp is more experimental compared to Min P.

  • 1.84 Temp
    Entropy Sampling, uses a power function & SamplerTemp.txt file to control the values
  • Advantages: This is the one that seems most reliable, at least before I made Min P. But I've gotten good results with other samplers like Min P and even without them, because the Entropy adapts to the full distribution. I'd test with exponentVal of 1.0 primarily if you use this.
  • 2.0 Temp
    Gini Sampling, (actually HHI sampling but it's a misnomer, sorry) uses a power function & SamplerTemp.txt file to control the values.
  • Advantages: This squares the sum of probabilties to cleanly measure the concentration of probabilities between 0.0 - 1.0. So it's consistent measurement wise regardless of distribution. I had mixed results with this, but maybe this works better now that Min P exists?
  • 1.91 Temp
    Greedy Dynamic Temp (aka DynaTemp), the original implementation, uses a sigmoid & DynaTemp.txt values. Simplest implementation in terms of how it scales since it only considers the top token and guesses uncertainty based on this sole metric. Outdated(?)

Graphic Explanation of Min P:

image