# Introduction 

This notebook replicates a simple chat template with continuous chat. The model understands and remembers the context according to its capacity.

This is a OPT 125M model fine-tuned to Chat Alpaca dataset (https://huggingface.co/datasets/flpelerin/ChatAlpaca-10k). Find the fine-tuning notebook in the `assistant_sft` directory.

**NOTE: The notebook uses a customized streamer for text streaming.**

In [1]:
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer,
    pipeline,
    logging,
)

from streaming_utils import TextStreamer

In [2]:
model = AutoModelForCausalLM.from_pretrained(
    '../examples/assistant_sft/opt_125m_chat_alpaca/outputs/opt_125m_chat_alpaca/best_model/',
)
tokenizer = AutoTokenizer.from_pretrained(
    '../examples/assistant_sft/opt_125m_chat_alpaca/outputs/opt_125m_chat_alpaca/best_model/'
)

In [3]:
streamer = TextStreamer(
    tokenizer, 
    skip_prompt=True, 
    skip_special_tokens=True, 
    truncate_before_pattern=['\[\/'],
    truncate=True
)

In [4]:
print(tokenizer.eos_token)

</s>


In [5]:
logging.set_verbosity(logging.CRITICAL)

In [6]:
# template = """</s>[INST] {prompt} [/INST]"""
eos_string = tokenizer.eos_token
history = None

In [7]:
# print(template)

In [8]:
while True:
    question=input("Question: ")
    inputs = ''

    if history is None:
        template = """</s>[INST] {prompt} [/INST]"""
    else:
        template = """[INST] {prompt} [/INST]"""

    prompt = history + ' ' + template.format(prompt=question, inputs=inputs) if history is not None else template.format(prompt=question, inputs=inputs)

    # print(f"PROMPT: {prompt}")

    prompt_tokenized = tokenizer(prompt, return_tensors='pt')['input_ids']
    
    output_tokenized = model.generate(
        input_ids=prompt_tokenized, 
        max_length=len(prompt_tokenized[0])+256,
        temperature=0.7,
        top_k=40,
        top_p=0.1,
        do_sample=True,
        eos_token_id=tokenizer.eos_token_id,
        streamer=streamer
    )
    answer = tokenizer.decode(token_ids=output_tokenized[0][len(prompt_tokenized[0]):]).strip()
    
    if eos_string in answer:
        answer = answer.split(eos_string)[0].strip()
    if '[/' in answer:
        answer = answer.split('[/')[0].strip()

    history = ' '.join([prompt, answer, eos_string])
    # print(f"ANSWER: {answer}\n")
    # print(f"HISTORY: {history}\n")
    print('#' * 50)

Question:  Hello


Hello! I am writing to inform you that the following email is a spam email: I am writing to inform you that I am not able to send you any emails. I am an AI language model and do not have any specific information about you. Please check your email for any spam emails.
##################################################


Question:  Let's talk about deep learning/


Sure! Deep learning is a type of machine learning that uses neural networks to learn from data. It is a powerful tool that can be used to improve the accuracy of machine learning models. Deep learning models can learn from large amounts of data, such as text, images, and other types of data, and then use this data to make predictions or decisions. Deep learning models can also be trained on large datasets, such as large datasets of customer data, to improve their accuracy.
##################################################


Question:  What is 2+2?


2+2 is a term that refers to a set of data that is not directly related to the model. It is often used to describe a set of data that is not directly related to the model. For example, a dataset of customer reviews or customer service interactions can be used to train deep learning models. 

2+2 can also be used to describe a set of data that is not directly related to the model. For example, a dataset of customer reviews or customer service interactions can be used to train deep learning models. 

In summary, 2+2 is a commonly used term in machine learning. It is often used to describe a set of data that is not directly related to the model.
##################################################


KeyboardInterrupt: Interrupted by user