-
Notifications
You must be signed in to change notification settings - Fork 255
Closed
Description
Hi! I'm running into a problem of repeating the first token in subsequent requests using a stream. The prompt structure follows the Meta LLama3 documentation. Could you explain why is this going on?
Simple chat example output looks in this way:
The model name is meta/meta-llama-3-70b-instruct
You: Hi!
Assistant: Hi! How can I help you today?
You: Recommend me a Hemingway novel, please.
Assistant: Hi
I'd recommend "The Old Man and the Sea". It's a classic, concise, and powerful novel that showcases Hemingway's unique writing style.
You: I read it, please recommend something else.
Assistant: Hi
I
How about "A Farewell to Arms"? It's a romantic and tragic novel set during WWI, and it's considered one of Hemingway's best works.
You: It's great! Thank you! Bye!
Assistant: Hi
I
How
You're welcome! I'm glad you enjoyed the recommendation. Have a great day and happy reading! Bye!
Example code:
import os
from replicate.client import Client
replicate_api_key = os.getenv("REPLICATE_API_TOKEN", 'EMPTY')
replicate_model = os.getenv('REPLICATE_MODEL', 'meta/meta-llama-3-70b-instruct')
replicate_client = Client(api_token=replicate_api_key)
SYSTEM_PROMPT = 'You are a helpful assistant. Answer briefly!'
MESSAGES = []
def gen_llama3_prompt(sys_prompt=None, messages=None):
sys_prompt = '' if sys_prompt is None else sys_prompt
messages = [] if messages is None else messages
_result = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{sys_prompt}<|eot_id|>"
for m in messages:
if m['role'] == 'user':
_result += f'<|start_header_id|>user<|end_header_id|>\n\n{m["content"]}<|eot_id|>'
elif m['role'] == 'assistant':
_result += f'<|start_header_id|>assistant<|end_header_id|>\n\n{m["content"]}<|eot_id|>'
_result += '<|start_header_id|>assistant<|end_header_id|>\n\n'
return _result
def print_answer(query=''):
message = {'role': 'user', 'content': query}
answer = ''
MESSAGES.append(message)
for event in replicate_client.stream(
"meta/meta-llama-3-70b-instruct",
input={
"top_p": 1e-5,
"prompt": gen_llama3_prompt(SYSTEM_PROMPT, MESSAGES),
"max_tokens": 512,
"min_tokens": 0,
"temperature": 1e-6
}):
token = str(event)
answer += token
print(token, end='')
message = {'role': 'assistant', 'content': answer}
MESSAGES.append(message)
if __name__ == '__main__':
print(f'Model name is {replicate_model}')
while True:
q = input('\nYou: ')
print('Assistant: ', end='')
print_answer(q)
if 'bye' in q.lower():
break
Thanks for your help!
Gusakovskyi
Metadata
Metadata
Assignees
Labels
No labels