Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Phi-3 models #58

Closed
retteghy opened this issue Apr 23, 2024 · 7 comments
Closed

Support for Phi-3 models #58

retteghy opened this issue Apr 23, 2024 · 7 comments

Comments

@retteghy
Copy link

see huggingface for the models

@guinmoon
Copy link
Owner

Hi. work normal with this template


<|user|>
{{prompt}}<|end|>
<|assistant|>

And BOS option enabled.

@paulilioaica
Copy link

Hi. How can I make it generate until EOS? If I select the option, the app crashes.

@retteghy
Copy link
Author

Hi. work normal with this template


<|user|>
{{prompt}}<|end|>
<|assistant|>

And BOS option enabled.

BOS is enabled, I have set that prompt, but I am getting an error as reply for every message:
Load Model Error: [Error]
modelLoad Error
Load Model Error: [Done]

@jekriske-lilly
Copy link

@guinmoon when you say "works normal" are you referring to the development version or the version in the App store?

The stable version from the app store isn't honoring the end token and the app crashes if you try enabling EOS.

@guinmoon
Copy link
Owner

development version

@Cimplex
Copy link

Cimplex commented Apr 24, 2024

Hi. work normal with this template


<|user|>
{{prompt}}<|end|>
<|assistant|>

And BOS option enabled.

BOS is enabled, I have set that prompt, but I am getting an error as reply for every message: Load Model Error: [Error] modelLoad Error Load Model Error: [Done]

In the TestFlight version I’m using ‘Phi-3-mini-4k-instruct-q4.gguf’

When setting up, I used the “Phi 2” setting template and then wrote the recommended prompt. On my iPhone 14 Pro I’m getting around 2-5 token per second.

Sometimes the <|end|> tag isn’t handle correctly, and it just skips over it and starts a new answer
IMG_8352

@savkinavmono
Copy link

savkinavmono commented Apr 25, 2024

Make sure Metal=on, BOS=on, EOS=off. And try setting contextsize=1024. I got 8-9 Tok/sec.

Officially phi3 is only supported starting with llama.cpp release b2717. The latest LLMFarm commit uses b2692. The Testflight version uses b2135 which officially supports only phi2.

@guinmoon guinmoon closed this as completed May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants