<h1>Chapter 1 - Introduction to Language Models</h1>

### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

In [None]:
# %%capture
# !pip install transformers>=4.40.1 accelerate>=0.27.2

# Phi-3

The first step is to load our model onto the GPU for faster inference. Note that we load the model and tokenizer separately (although that isn't always necessary).

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
#from transformers import AutoModelForCausalLM
import torch

# Load model and tokenizer
# Free up GPU memory if a model is already loaded
torch.cuda.empty_cache()

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Although we can now use the model and tokenizer directly, it's much easier to wrap it in a `pipeline` object:

In [None]:
from transformers import pipeline

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
) 

Finally, we create our prompt as a user and give it to the model:

In [3]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Create a funny joke about chickens."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

You are not running the flash-attention implementation, expect numerical differences.


 Why did the chicken join the band? Because it had the drumsticks!


In [4]:
# Teste de lógica para verificar a capacidade de generalização do modelo
# O modelo considerou a bisavó de Deuza como avó materna, 
# O modelo não conseguiu identificar corretamente a relação familiar.
messages = [
    {"role": "user", "content": "Considere que Alba casou com Alberto e teve um filho chamado Bernardo. Bernardo casou com Betania e teve dois filhos: Carlos e Carina. Carina e Cleber tiveram uma filha: Deuza. Descreva toda a árvore genealógica dessa familia e depois diga quem é a avó de Deuza?"}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

 A árvore genealógica da família descrita é a seguinte:


- Alba (mãe de Bernardo)

  - Alberto (pai de Bernardo)

    - Bernardo (pai de Carlos e Carina)

      - Carlos (irmão de Carina)

      - Carina (irmã de Carlos)

        - Carina (mãe de Deuza)

          - Cleber (pai de Deuza)

            - Deuza (filha de Carina e Cleber)


A avó de Deuza é Carina, pois ela é a mãe de Deuza.


In [5]:
print(f"CUDA is available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"Current device: {torch.cuda.current_device()}")
    print(f"Device name: {torch.cuda.get_device_name()}")
    print(f"\nModel device: {next(model.parameters()).device}")

CUDA is available: True
Current device: 0
Device name: NVIDIA GeForce RTX 3070 Laptop GPU

Model device: cuda:0


In [8]:
print("Device map:", model.hf_device_map if hasattr(model, 'hf_device_map') else "cuda")

Device map: {'': device(type='cuda')}
