Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add llama 3 model #2239

Closed
qiweiii opened this issue Apr 19, 2024 · 7 comments
Closed

[Feature] Add llama 3 model #2239

qiweiii opened this issue Apr 19, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@qiweiii
Copy link

qiweiii commented Apr 19, 2024

Feature Request

Are we going to support llama 3 model: https://github.com/meta-llama/llama3

@qiweiii qiweiii added the enhancement New feature or request label Apr 19, 2024
@woheller69
Copy link
Contributor

for me it works, but there is an issue: after the first answer the end of the answer does not seem to be detected, CPU stays at 100%...
Then a second question will not be answered. If you stop the model answer before the end it works.

@davidsilvasmith
Copy link

Yes very much hoping for Llama3 in GPT4all!

@woheller69
Copy link
Contributor

This fixes the issue for me with GGUFs:
https://www.reddit.com/r/LocalLLaMA/comments/1c7dkxh/tutorial_how_to_make_llama3instruct_ggufs_less/

Problem: Llama-3 uses 2 different stop tokens, but llama.cpp only has support for one. The instruct models seem to always generate a <|eot_id|> but the GGUF uses <|end_of_text|>.

Solution: Edit the GGUF file so it uses the correct stop token.

./gguf-py/scripts/gguf-set-metadata.py /path/to/llama-3.gguf tokenizer.ggml.eos_token_id 128009

@davidsilvasmith
Copy link

Thank you, very helpful! I'm on an M1 Macbook Air with 16GB Ram.

I downloaded this model https://huggingface.co/lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf and put it in ~/Library/Application\ Support/nomic.ai/GPT4All/ directory and it's working great at about 4.1 - 4.4 tokens per second with my RAM full and about 7.5GB swap.

@davidsilvasmith
Copy link

Maybe I spoke too soon, in another test it kept talking and didn't stop until I told it to. So maybe I need to mess with the correct stop token too.

@woheller69
Copy link
Contributor

No, I think then you need to change the max number of tokens you want, or manually press stop

@dontcryme
Copy link

fixed instruct model link : https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF
I tested above model and 100% working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants