-
Notifications
You must be signed in to change notification settings - Fork 53
Description
I'm using your library with phi-2 on an Android device (after updating the llama.cpp version). I've noticed that generation seems to ignore or skip end of stream tokens somehow. For example here's the output from llama.cpp itself:
prompt:
<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
hello<|im_end|>
<|im_start|>assistant
output:
<|im_start|>system
You are a helpful assistant
<|im_start|>user
hello
<|im_start|>assistant
Hello! How can I assist you today? [end of text]
When using jllama it looks like this:
<|im_start|>system
You are a helpful assistant
<|im_start|>user
hello
<|im_start|>assistant
Hello! How can I assist you today?<|im_end|>
<|im_start|>user
[more output]
Note that the text includes the token, and the start token for the next user prompt, which the model is self-generating ;)
Looking at the llama.cpp source, it seems to be stopping here https://github.com/ggerganov/llama.cpp/blob/master/examples/main/main.cpp#L896 and there's a similar condition in jjlama here: https://github.com/kherud/java-llama.cpp/blob/master/src/main/cpp/jllama.cpp#L831 but since I'm not super familiar with these bindings and llama.cpp I haven't figured out what's different in that condition.
Debugging this some more it seems like the token generated when it emits the angled bracket "faulty" <|im_end> is not the end of stream token, but the token for <.