[Feature] Add llama 3 model #2239

qiweiii · 2024-04-19T01:39:03Z

Feature Request

Are we going to support llama 3 model: https://github.com/meta-llama/llama3

woheller69 · 2024-04-19T05:02:09Z

for me it works, but there is an issue: after the first answer the end of the answer does not seem to be detected, CPU stays at 100%...
Then a second question will not be answered. If you stop the model answer before the end it works.

davidsilvasmith · 2024-04-19T11:02:13Z

Yes very much hoping for Llama3 in GPT4all!

woheller69 · 2024-04-19T11:09:33Z

This fixes the issue for me with GGUFs:
https://www.reddit.com/r/LocalLLaMA/comments/1c7dkxh/tutorial_how_to_make_llama3instruct_ggufs_less/

Problem: Llama-3 uses 2 different stop tokens, but llama.cpp only has support for one. The instruct models seem to always generate a <|eot_id|> but the GGUF uses <|end_of_text|>.

Solution: Edit the GGUF file so it uses the correct stop token.

./gguf-py/scripts/gguf-set-metadata.py /path/to/llama-3.gguf tokenizer.ggml.eos_token_id 128009

davidsilvasmith · 2024-04-19T12:28:41Z

Thank you, very helpful! I'm on an M1 Macbook Air with 16GB Ram.

I downloaded this model https://huggingface.co/lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf and put it in ~/Library/Application\ Support/nomic.ai/GPT4All/ directory and it's working great at about 4.1 - 4.4 tokens per second with my RAM full and about 7.5GB swap.

davidsilvasmith · 2024-04-19T12:46:53Z

Maybe I spoke too soon, in another test it kept talking and didn't stop until I told it to. So maybe I need to mess with the correct stop token too.

woheller69 · 2024-04-19T13:32:12Z

No, I think then you need to change the max number of tokens you want, or manually press stop

dontcryme · 2024-04-20T08:28:51Z

fixed instruct model link : https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF
I tested above model and 100% working.

qiweiii added the enhancement New feature or request label Apr 19, 2024

brankoradovanovic-mcom mentioned this issue Apr 22, 2024

High CPU usage in chat with Hermes 2 Pro Mistral 7B after generation has finished #2167

Open

AndriyMulyar closed this as completed Apr 22, 2024

woheller69 mentioned this issue May 12, 2024

[Feature] Please Add Option for Llama 3 70B Parameters #2334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add llama 3 model #2239

[Feature] Add llama 3 model #2239

qiweiii commented Apr 19, 2024

woheller69 commented Apr 19, 2024

davidsilvasmith commented Apr 19, 2024

woheller69 commented Apr 19, 2024

davidsilvasmith commented Apr 19, 2024

davidsilvasmith commented Apr 19, 2024

woheller69 commented Apr 19, 2024

dontcryme commented Apr 20, 2024

[Feature] Add llama 3 model #2239

[Feature] Add llama 3 model #2239

Comments

qiweiii commented Apr 19, 2024

Feature Request

woheller69 commented Apr 19, 2024

davidsilvasmith commented Apr 19, 2024

woheller69 commented Apr 19, 2024

davidsilvasmith commented Apr 19, 2024

davidsilvasmith commented Apr 19, 2024

woheller69 commented Apr 19, 2024

dontcryme commented Apr 20, 2024