# Introduction 

This notebook replicates a simple instruction template. The model can only answer to instructions and does not do a good job at remembering context. Further, we try to replicate the chat template manually instead of using the tokenizer chat template functionality (NOT RECOMMENDED, JUST FOR TRIAL AND ERROR).
 
This is a Phi 1.5 model fine-tuned to Chat Alpaca dataset (https://huggingface.co/datasets/flpelerin/ChatAlpaca-10k). Find the fine-tuning notebook in the `assistant_sft` directory.

**NOTE: The notebook uses a customized streamer for text streaming.**

In [1]:
from transformers import (
    AutoTokenizer,
    pipeline,
    logging,
)
from streaming_utils import TextStreamer
from peft import AutoPeftModelForCausalLM

In [2]:
model = AutoPeftModelForCausalLM.from_pretrained(
    '../examples/assistant_sft/phi_1_5_chat_alpaca/outputs/phi_1_5_chat_alpaca/best_model/',
    load_in_4bit=True
)
tokenizer = AutoTokenizer.from_pretrained(
    '../examples/assistant_sft/phi_1_5_chat_alpaca/outputs/phi_1_5_chat_alpaca/best_model/',
)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [3]:
streamer = TextStreamer(
    tokenizer,
    skip_prompt=True,
    skip_special_tokens=True,
    truncate_before_pattern=['\[\/', 'Goodbye'],
    truncate=True
)

In [4]:
print(tokenizer.eos_token)

<|endoftext|>


In [5]:
logging.set_verbosity(logging.CRITICAL)

In [6]:
eos_string = tokenizer.eos_token
history = None

In [7]:
# print(template)

In [8]:
while True:
    question = input("Question: ")
    inputs = ''

    if history is None:
        template = """<|endoftext|>[INST] {prompt} [/INST]"""
    else:
        template = """[INST] {prompt} [/INST]"""

    prompt = history + ' ' + template.format(prompt=question, inputs=inputs) if history is not None else template.format(prompt=question, inputs=inputs)

    # print(f"PROMPT: {prompt}")

    prompt_tokenized = tokenizer(prompt, return_tensors='pt').to('cuda')['input_ids']

    output_tokenized = model.generate(
        input_ids=prompt_tokenized,
        max_length=len(prompt_tokenized[0])+400,
        temperature=0.7,
        top_k=40,
        top_p=0.1,
        do_sample=True,
        eos_token_id=tokenizer.eos_token_id,
        streamer=streamer
    )
    answer = tokenizer.decode(token_ids=output_tokenized[0][len(prompt_tokenized[0]):]).strip()

    # Puny guardrails.
    if eos_string in answer:
        answer = answer.split(eos_string)[0].strip()
    if '[/' in answer:
        answer = answer.split('[/')[0].strip()

    history = ' '.join([prompt, answer, eos_string])
    # print(f"ANSWER: {answer}\n")
    # print(f"HISTORY: {history}\n")
    print('#' * 50)

Question:  Let's talk about deep learning.




Deep learning is a type of machine learning that uses artificial neural networks to learn from data. It is a type of supervised learning, where the model is trained on a large dataset and can then make predictions on new data. Deep learning is used in a variety of applications, such as image and speech recognition, natural language processing, and computer vision. It is also used in autonomous vehicles, robotics, and other areas where complex tasks are required. Deep learning is a powerful tool that can be used to solve many complex problems. It is an exciting field with many potential applications.

Deep learning is based on the idea of neural networks, which are modeled after the structure and function of the human brain. Neural networks are composed of layers of interconnected nodes, which are used to process and analyze data. The nodes are connected by weights and biases, which are adjusted during training to improve the accuracy of the model. Deep learning algorithms are designed 

KeyboardInterrupt: Interrupted by user