# Introduction 

This notebook replicates a simple instruction template. The model can only answer to instructions and does not do a good job at remembering context.
 
This is a Phi 1.5 model fine-tuned on the Open Assistant Guanaco dataset (https://huggingface.co/datasets/timdettmers/openassistant-guanaco). Find the fine-tuning notebook in the `assistant_sft` directory.

**NOTE: The notebook uses a customized streamer for text streaming.**

In [1]:
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer,
    pipeline,
    logging,
    # TextStreamer
)

from streaming_utils import TextStreamer

In [2]:
model = AutoModelForCausalLM.from_pretrained(
    '../assistant_sft/outputs/phi_1_5_oasst_guanaco_qlora_unpck/best_model/', load_in_4bit=True
)
tokenizer = AutoTokenizer.from_pretrained(
    '../assistant_sft/outputs/phi_1_5_oasst_guanaco_qlora_unpck/best_model/'
)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
  return self.fget.__get__(instance, owner)()
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [3]:
streamer = TextStreamer(
    tokenizer, 
    skip_prompt=True, 
    skip_special_tokens=True, 
    truncate_before_pattern=['###', 'Human:', 'Assistant:'],
    truncate=True
)

In [4]:
print(tokenizer.eos_token)

<|endoftext|>


In [5]:
logging.set_verbosity(logging.CRITICAL)

In [6]:
template = "### Human: {prompt}### Assistant:"
eos_string = tokenizer.eos_token
history = None

In [7]:
while True:
    question=input("Question: ")

    prompt = history + '\n' + template.format(prompt=question) if history is not None else template.format(prompt=question)

    prompt_tokenized = tokenizer(prompt, return_tensors='pt').to('cuda')['input_ids']
    
    output_tokenized = model.generate(
        input_ids=prompt_tokenized, 
        max_length=len(prompt_tokenized[0])+400,
        temperature=0.7,
        top_k=40,
        top_p=0.1,
        do_sample=True,
        eos_token_id=tokenizer.eos_token_id,
        streamer=streamer
    )
    answer = tokenizer.decode(token_ids=output_tokenized[0][len(prompt_tokenized[0]):]).strip()
    
    if eos_string in answer:
        answer = answer.split(eos_string)[0].strip()
    if '###' in answer:
        answer = answer.split('###')[0].strip()

    history = '\n'.join([prompt, answer, eos_string])
    # print(f"ANSWER: {answer}\n")
    # print(f"HISTORY: {history}\n")
    print('#' * 50)

Question:  Hi.


 



Hello! How can I help you today?
##################################################


Question:  Who are you?


 I am an AI language model, I am trained to understand and respond to questions in a conversational style. I am also trained to provide helpful and informative answers to questions.
##################################################


Question:  What are you?


 I am an AI language model, I am trained to understand and respond to questions in a conversational style. I am also trained to provide helpful and informative answers to questions.
##################################################


KeyboardInterrupt: Interrupted by user