New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Adept Persimmon 8b #3410
Support Adept Persimmon 8b #3410
Conversation
…hillip-kravtsov/support-adept-persimmon-8b
…kravtsov/support-adept-persimmon-8b
…kravtsov/support-adept-persimmon-8b
… .saftensors file
…kravtsov/support-adept-persimmon-8b
…kravtsov/support-adept-persimmon-8b
…kravtsov/support-adept-persimmon-8b
Let's resolve the CI fails and merge |
…kravtsov/support-adept-persimmon-8b. ggml-ci
92acb44
to
5d259d3
Compare
…kravtsov/support-adept-persimmon-8b
The switches in |
@phillip-kravtsov PTAL at @slaren's comment and fix as necessary |
I got tired of seeing the compiler warning and created #3535 (not sure if there are any other issues, haven't had a chance to test it yet). |
Thanks for the fix @KerfuffleV2 -- that PR should be sufficient. |
…example * 'master' of github.com:ggerganov/llama.cpp: py : change version of numpy requirement to 1.24.4 (ggerganov#3515) quantize : fail fast on write errors (ggerganov#3521) metal : support default.metallib load & reuse code for swift package (ggerganov#3522) llm : support Adept Persimmon 8B (ggerganov#3410) Fix for ggerganov#3454 (ggerganov#3455) readme : update models, cuda + ppl instructions (ggerganov#3510) server : docs fix default values and add n_probs (ggerganov#3506)
* Produces garbage output * wip: correct tensors up to RoPE * correct tensors thru RoPE * Correct outputs through masked & softmax'd KQ * fp32 works * Rename adept->persimmon * Produces correct outputs * clean up convert scripts * remove printing logic from ggml.c * remove prints from llama.cpp & fix merge * trivial cleanups * Add offload funcs * update conversion script to directly take adept artifacts rather than .saftensors file * Fix norm eps bug * Support sqr and concat on metal, persimmon-8b-q4 runs correctly * Small changes from review * Formatting changes * Minor changes to conversion script * Remove old script * Fix editorconfig formatting * Fix build * add overlooked offload code ggml-ci
To support Partial RoPE & Squared ReLU, this PR adds concat & square kernels for metal.
I've confirmed agreement between the GGML & HF implementation up to tensor values in the last layer.