# Introduction 

This notebook replicates a simple chat template with continuous chat. The model understands and remembers the context according to its capacity. We use the *chat template* functionality of the tokenizer.

This is a Phi 1.5 model fine-tuned on the ORP dataset (https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k). Find the fine-tuning notebook in the `assistant_sft` directory.

**NOTE: The notebook uses a customized streamer for text streaming.**

In [8]:
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    logging,
    # TextStreamer
)

from streaming_utils import TextStreamer
from peft import PeftModel

In [9]:
model = AutoModelForCausalLM.from_pretrained('microsoft/phi-1_5')
tokenizer = AutoTokenizer.from_pretrained('../assistant_sft/outputs/phi_1_5_chat_alpaca_orpo/best_model/')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [10]:
model.resize_token_embeddings(len(tokenizer))

Embedding(50297, 2048)

In [11]:
model = PeftModel.from_pretrained(
    model, 
    '../assistant_sft/outputs/phi_1_5_chat_alpaca_orpo/best_model/',
    load_in_4bit=True
).to('cuda')

In [12]:
# streamer = TextStreamer(
#     tokenizer,
#     skip_prompt=True,
#     skip_special_tokens=True,
#     # truncate_before_pattern=['\[\/', 'Goodbye'],
#     truncate=True
# )

## HF Streamer.
streamer = TextStreamer(
    tokenizer,
    skip_prompt=True,
    skip_special_tokens=True,
    # truncate_before_pattern=['\[\/', 'Goodbye'],
    truncate=True
)

In [13]:
eos_string = tokenizer.eos_token
history = None

In [14]:
while True:
    question = input("Question: ")

    template = """<|im_start|> user\n{question} <|im_end|>\n<|im_start|> assistant\n"""

    prompt = history + '\n' + template.format(question=question) if history is not None else template.format(question=question)

    # print(prompt)
    
    prompt_tokenized = tokenizer(
        prompt, 
        return_tensors='pt', 
        padding=True, 
        truncation=True,
        return_attention_mask=True
    ).to('cuda')
    
    output_tokenized = model.generate(
        **prompt_tokenized,
        max_length=len(prompt_tokenized[0])+400,
        temperature=0.8,
        top_k=40,
        top_p=0.1,
        do_sample=True,
        eos_token_id=tokenizer.eos_token_id,
        repetition_penalty=1.1,
        streamer=streamer
    )
    answer = tokenizer.decode(token_ids=output_tokenized[0][len(prompt_tokenized[0]):]).strip()

    # Puny guardrails.
    if eos_string in answer:
        answer = answer.split(eos_string)[0].strip()
    if '[/' in answer:
        answer = answer.split('[/')[0].strip()

    history = ' '.join([prompt, answer, eos_string])
    # print(f"ANSWER: {answer}\n")
    # print(f"HISTORY: {history}\n")
    print('#' * 50)

Question:  Let's talk about deep learning.


Setting `pad_token_id` to `eos_token_id`:50296 for open-end generation.


Deep learning is a subfield of machine learning that focuses on developing algorithms and neural networks to perform complex tasks, such as image recognition, natural language processing, and speech synthesis. It involves training models with large amounts of data and using them to make predictions or decisions based on new input data. Deep learning has been used in various applications, including computer vision, autonomous vehicles, and voice assistants like Siri and Alexa. Here are some key concepts related to deep learning:
1. Neural Networks: These are the building blocks of deep learning. They consist of interconnected layers of nodes (or neurons) that process information through a series of mathematical operations called activation functions.
2. Training Data: This refers to the set of labeled examples that are used to train a model. The goal is to minimize errors between the predicted output and the actual output by adjusting the weights and biases of the network.
3. Activation

Question:  Hello.


Setting `pad_token_id` to `eos_token_id`:50296 for open-end generation.


Deep learning is a fascinating field that combines artificial intelligence and machine learning techniques to create intelligent systems capable of performing complex tasks. Here are some key concepts related to deep learning:
1. Neural Networks: These are composed of interconnected layers of nodes (neurons) that process information through activation functions.
2. Training Data: This refers to the set of labeled examples used to train a model. The goal is to minimize errors between the predicted output and the actual output.
3. Activation Functions: These are mathematical functions that determine how an input signal is transformed into an output signal. Examples include sigmoid, ReLU, and tanh.
4. Loss Function: This measures the difference between the predicted output and the actual output. Minimizing this loss is the objective of training a deep learning model.
5. Optimization Algorithms: There are several optimization algorithms available for training deep learning models, such as 

Question:  Hi.


Setting `pad_token_id` to `eos_token_id`:50296 for open-end generation.


Deep learning is a powerful tool that allows machines to learn from data and make predictions or decisions based on that knowledge. Here are some key concepts related to deep learning:
1. Neural Networks: These are the building blocks of deep learning. They consist of interconnected layers of nodes (or neurons) that process information through activation functions.
2. Training Data: This refers to the set of labeled examples that are used to train a model. The goal is to minimize errors between the predicted output and the actual output.
3. Activation Functions: These are mathematical functions that determine how an input signal is transformed into an output signal. Examples include sigmoid, ReLU, and tanh.
4. Loss Function: This measures the difference between the predicted output and the actual output. Minimizing this loss is the objective of training a deep learning model.
5. Optimization Algorithms: There are several optimization algorithms available for training deep learning mode

Question:  Hi. Lets' talk about gaming.


Setting `pad_token_id` to `eos_token_id`:50296 for open-end generation.


Gaming is a popular form of entertainment that involves playing video games on electronic devices such as computers, consoles, and mobile phones. Here are some key concepts related to gaming:
1. Video Games: These are interactive electronic games played on a variety of platforms. They can be single-player or multiplayer and involve controlling characters or objects within a virtual world.
2. Platforms: There are several types of gaming platforms, including personal computers (PC), home consoles (Xbox, PlayStation, Nintendo Switch), and mobile phones. Each platform offers unique features and capabilities.
3. Graphics Card: A graphics card is responsible for rendering images and videos in a game. It processes visual data and provides the necessary visuals for gameplay.
4. Audio: Audio components such as speakers, headphones, and microphones are used to provide sound effects and music in games.
5. Multiplayer: Many modern games allow players to connect online and play together. This adds 

Question:  Hi.


Setting `pad_token_id` to `eos_token_id`:50296 for open-end generation.


Sure thing. Here are some additional terms and concepts related to gaming:
1. FPS (Frames per Second): This is the number of frames displayed per second in a game. It determines the smoothness and responsiveness of the game.
2. Mods: Mods are modifications made by players themselves to improve or change aspects of a game. They can add new content, fix bugs, or enhance gameplay.
3. DLC (Downloadable Content): DLC stands for downloadable content. It includes additional levels, characters, weapons, or items that can be purchased and added to a game after it was released.
4. Online Play: Online play refers to multiplayer games where players can compete against each other or work together in teams. It allows for global participation and communication.
5. Virtual Reality: Virtual reality is a technology that simulates a three-dimensional environment. It immerses players in a digital world and can be used for gaming, education, and training purposes.
6. eSports: eSports stands for electronic 

KeyboardInterrupt: 

## Inference

In [1]:
from transformers import (
    AutoModelForCausalLM, 
    logging, 
    AutoTokenizer,
    TextStreamer
)

from peft import PeftModel

In [2]:
model = AutoModelForCausalLM.from_pretrained('microsoft/phi-1_5')
tokenizer = AutoTokenizer.from_pretrained('../assistant_sft/outputs/phi_1_5_chat_alpaca_orpo/best_model/')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [3]:
model.resize_token_embeddings(len(tokenizer))

Embedding(50297, 2048)

In [4]:
model = PeftModel.from_pretrained(
    model, 
    '../assistant_sft/outputs/phi_1_5_chat_alpaca_orpo/best_model/',
    load_in_4bit=True
).to('cuda')

In [5]:
streamer = TextStreamer(tokenizer)

In [6]:
print(tokenizer.eos_token)

<|im_end|>


In [7]:
prompt = """<|im_start|> user
How are you? <|im_end|> 
<|im_start|> assistant
"""

In [8]:
prompt_tokenized = tokenizer(
        prompt, 
        return_tensors='pt', 
        padding=True, 
        truncation=True,
        return_attention_mask=True
    ).to('cuda')
    
output_tokenized = model.generate(
    **prompt_tokenized,
    max_length=len(prompt_tokenized[0])+400,
    temperature=0.7,
    top_k=40,
    top_p=0.95,
    do_sample=True,
    eos_token_id=tokenizer.eos_token_id,
    repetition_penalty=1.1,
    streamer=streamer
)

answer = tokenizer.decode(token_ids=output_tokenized[0][len(prompt_tokenized[0]):]).strip()

Setting `pad_token_id` to `eos_token_id`:50296 for open-end generation.


<|im_start|> user
How are you? <|im_end|> 
<|im_start|> assistant
I'm doing well, thank you! How about yourself?                                                                                                                                                                                                                                                                                                                                                                                                    


In [19]:
print(answer)

I'm doing well, thank you. How about yourself?                                                                                                                                                                                                                                                                                                                                                                                                   
