New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Min P sampler implementation [alternative to Top P/Top K] #3841
Conversation
The current implementation:
This is of course suboptimal in a lot of ways, but when drafting sampler ideas, I wanted to avoid touching the sampler stack order as it currently existed before I found a solution. What would be the best way to integrate this if the objective was to avoid Top P and Top K's flaws via an improved single sampler, where it's not intended to be used in tandem with them? (Maybe they should be disabled like how Mirostat disables samplers when this is enabled?) |
+ fixed 0.0 default for min_p
I like this. It is easy to implement Regarding defaults - I believe that all disabled samplers and penalties, with temperature == 1 is the best way to avoid token and context repetitions, and preserve statistically human-like prediction, because that shows model abilities. |
If we did something like that, the defaults would have to be cleared as soon as you override any of them. Which might be confusing if a user just wants to override e.g. top-p and leave the other parameters alone. I've tried using models with minimal samplers (e.g. just min-p, or min-p and top-p=.9x), but I had to fall back on my favorite default preset of Midnight Enigma (temp=1 top-p=0.37 rep-pen=1.18 top-k=100) after the models I tried seemed to have a hard time staying on topic and remaining coherent. |
Top P 0.37 seems aggressively deterministic to me, you're picking like two or three choices max 99% of the time. Your repetition penalty is pretty high as well, which probably helps counteract the determinism, but I try to avoid rep pen because it's a bit of a 'hacky' solution to the problem of overly high determinism compared to turning up Temp. I would try lowering Rep Pen a bit, turning Top P off (to 1.0), and using a Min P that is on the deterministic side. (e.g 0.25 Min P) to get similar effects to what you want. |
In my experience, the TGWUI preset called "Midnight Enigma" (possibly with top-p increased to .57) is very good for Alpaca-style instruction-based prompting with certain models. Free-form prompting with models like Chronoboros does seem to benefit from using e.g. min-p=.25 instead. |
* Update server.cpp with min_p after it was introduced in #3841 * Use spaces instead of tabs * Update index.html.hpp after running deps.sh * Fix test - fix line ending
Closes abetlen#911 Implement min_p sampling as described in ggerganov/llama.cpp#3841 Most of the actual work was already done, I just added the parameters to Llama.sample, Llama.generate, Llama.create_completion, Llama.create_completion, and Llama.create_chat_completion. Tested and working as expected, as far as I can tell.
My small contribution to this great project. Ref: ggerganov/llama.cpp#3841 Closes: abetlen#911
* Added support for min_p My small contribution to this great project. Ref: ggerganov/llama.cpp#3841 Closes: #911 * Fix for negative temp (sample_softmax)
…gerganov#3841) * Introduce the new Min-P sampler by @kalomaze The Min-P sampling method was designed as an alternative to Top-P, and aims to ensure a balance of quality and variety. The parameter *p* represents the minimum probability for a token to be considered, relative to the probability of the most likely token. * Min-P enabled and set to 0.05 default --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
* Update server.cpp with min_p after it was introduced in ggerganov#3841 * Use spaces instead of tabs * Update index.html.hpp after running deps.sh * Fix test - fix line ending
Some languages does not have full word tokens, and you will penalize sub words or characters. |
Having experimented with using strictly
My impression after a brief testing: Current order defaults certainly provide more deterministic output for most cases, including uninformed tweaking of the sampler settings, but probably limit the user's control. Relevant: #4091 |
Also NMS (Non Max Suppression) from image object detection task uses probability threshold too, similarly to Min P. |
The way that this sampler works is:
Top P has a design flaw in that numerous tail end tokens can be considered if the top tokens don't have concentrated enough scores to meet up to the specified Top P value, while TFS and other novel sampler approaches aren't as easily interpretable or consistent as Top P. The primary purpose of the Min P sampler is to accomodate for both of these design flaws.
The current implementation is very rough around the edges code-wise, as I am not very experienced with C++, but I hope to properly polish this implementation to be considered for merging. I have gotten improved results personally and positive feedback from other users, especially in regards to increased coherent creativity.
Mathematically, it is not as complex as TFS or other tail search algorithms, but importantly, it is easily understandable and in how it impacts the probabilities as a result. It is essentially a streamlined linear version of Top A in design. However, it consistently outperforms Top P and Top K for removing tail end tokens.