<a href="https://colab.research.google.com/github/marziemajidi/AI/blob/LLM/Chapter_1_Introduction_to_Language_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Chapter 1 - Introduction to Language Models</h1>
<i>Exploring the exciting field of Language AI</i>


<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961"><img src="https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon"></a>
<a href="https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/"><img src="https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K"></a>
<a href="https://github.com/HandsOnLLM/Hands-On-Large-Language-Models"><img src="https://img.shields.io/badge/GitHub%20Repository-black?logo=github"></a>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter01/Chapter%201%20-%20Introduction%20to%20Language%20Models.ipynb)

---

This notebook is for Chapter 1 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).

---

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961">
<img src="https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png" width="350"/></a>


### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---


@ Ensuring that GPU is enabled in Colab:

In [1]:
import torch
print("GPU available:", torch.cuda.is_available())


GPU available: True


In [None]:
# %%capture
# !pip install transformers>=4.40.1 accelerate>=0.27.2

# Phi-3

The first step is to load our model onto the GPU for faster inference. Note that we load the model and tokenizer separately (although that isn't always necessary).

@ We can use transformers to load both the tokenizer and model.
@ Note that we assume you have an NVIDIA GPU (device_map="cuda") but you
can choose a different device instead. If you do not have access to a GPU you
can use the free Google Colab notebooks


In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

@ Installing optimized versions of the libraries: Ensure that the appropriate versions of Transformers, PyTorch, and Flash Attention are installed:

In [None]:
!pip install --upgrade transformers torch flash-attn

Although we can now use the model and tokenizer directly, it's much easier to wrap it in a `pipeline` object:

@ there is a nice trick in transformers that simplifies the process, namely
transformers.pipeline. It encapsulates the model, tokenizer, and text generation process into a single function:

---


@ return_full_text: By setting this to False, the prompt will not be returned but merely the output of the model


---


@ max_new_tokens: The maximum number of tokens the model will generate. By
setting a limit, we prevent long and unwieldy output as somemodels might continue generating output until they reach their context window.


---


@ do_sample: Whether the model uses a sampling strategy to choose the
next token. By setting this to False, the model will always
select the next most probable token.


In [4]:
from transformers import pipeline

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
)

Device set to use cuda


Finally, we create our prompt as a user and give it to the model:

@Our role is that of “user” and we use the “content” key to define our prompt:




In [7]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Create a funny joke about chickens."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

 Why did the chicken join the band? Because it had the drumsticks!


In [None]:
import pprint

In [8]:
from pprint import pprint


In [12]:
pprint(output[0]["generated_text"])

' Why did the chicken join the band? Because it had the drumsticks!'


In [13]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "my name is marzie"},
    {"role": "system", "content": "Hello, Marzie!"},
    {"role": "user", "content": "What is my name?"},
]
# Generate output
output = generator(messages)
print(output[0]["generated_text"])

 Your name is Marzie.


In [14]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "my birthday is 1981"},
    {"role": "system", "content": "Hello, How can help you!"},
    {"role": "user", "content": "What is my age?"},
]
# Generate output
output = generator(messages)
print(output[0]["generated_text"])

 To calculate your age, you would subtract the year of your birth (1981) from the current year. Assuming the current year is 2023, your age would be:

2023 - 1981 = 42

So, you are 42 years old.


In [21]:
# Example `messages` for the conversation history
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot."
        # This sets the initial behavior of the model, instructing it to act as a friendly chatbot.
    },
    {
        "role": "user",
        "content": "Hello, who are you?"
        # This is the user's input, asking a question to the chatbot.
    },
    {
        "role": "assistant",
        "content": "I'm your friendly chatbot, here to help!"
        # This is the chatbot's response, generated based on the user's input and system instructions.
    }
]

# Assuming `generator` is defined and functional
output = generator(messages)  # Generate response based on the conversation history
print(output[0]["generated_text"])  # Print the generated assistant response


 How can I assist you today?


In [22]:
while True:
    user_message = input("User: ")

    # شرط توقف برای خروج از حلقه
    if user_message.lower() in ["exit", "quit"]:
        print("Chatbot: Goodbye!")
        break

    # اضافه کردن پیام کاربر به لیست
    messages.append({"role": "user", "content": user_message})

    # گرفتن پاسخ از مدل
    output = generator(messages)
    llm_message = output[0]["generated_text"]

    # چاپ پاسخ
    print(f"Chatbot: {llm_message}")

    # اضافه کردن پاسخ مدل به لیست
    messages.append({"role": "assistant", "content": llm_message})

User: hello
Chatbot:  Greetings! How can I assist you today?
User: my name is marzie
Chatbot:  Nice to meet you, Marzie! How can I help you today?
User: i love fruit by red color. which fruit have red color?
Chatbot:  Red fruits come in a variety of delicious options! Here are a few popular red fruits:

1. Apples - There are many varieties, such as Red Delicious, Gala, and Fuji.
2. Strawberries - Sweet and juicy, they're a favorite for many.
3. Cherries - Sweet or tart, they're great for snacking or baking.
4. Raspberries - Small and sweet, they're perfect for desserts or eating on their own.
5. Red Grapes - Perfect for snacking or making into wine.
6. Pomegranates - Packed with juicy seeds, they're a healthy and tasty option.
7. Watermelon - A refreshing fruit, especially on a hot day.
8. Cranberries - Often used in sauces and desserts, they're tart and tangy.
9. Red Currants - Small and tart, they're great for jams and jellies.
10. Red Peppers - While technically a vegetable, they're