# Hello, Llama: a starter example  

This example demonstrates how to run Llama on a CPU with a lightweight model. It serves as a basic "Hello, World!" for Llama. If you'd like to experiment with more advanced models, ensure you have a powerful GPU available, and modify the device_map in the pipeline accordingly.

#### Install required libraries (uncomment to install)  

If you plan to use PyTorch with CUDA for GPU acceleration, follow this guide: --> https://pytorch.org/get-started/locally/ 

In [1]:
# UNCOMMENT TO INSTALL

# !pip install transformers
# change the following pip install if you want to use CUDA (check the guide at https://pytorch.org/get-started/locally/)
# !pip install torch
# !pip install ipywidgets
# !pip install accelerate>=0.26.1

#### Import required libraries

In [2]:
import transformers
import torch
import os
import time
import json

#### Choose the model you want to use

The model could be downloaded from HuggingFace for example here --> https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct. You can clone the repo locally after creating an account on Huggingface and accepting the meta policies.  

_Note: you can configure transformer library to download it without cloning repo manually._

In [3]:
# change the following folder to point the path where you have stored the model you want to use
base_folder = "C:/Users//Documents/HuggingFace"

model_name = "Llama-3.2-3B-Instruct"

# set the model id
model_id = os.path.join(base_folder, model_name)

#### Build transformer pipeline mapping to cpu device

In [4]:
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.float32},
    device_map="cpu",
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

#### Define message to be processed by Llama  

__System prompt__ is a pre-defined instruction set generally by a backend that determines how the LLM (Large Language Model) behaves.  
__User prompt__ This is the input provided by the user that interact with the LLM, often representing a query or command that the model needs to respond to.

In [5]:
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Say hello to the world"},
]

#### Launching the Processing  

The `max_new_tokens` parameter defines the maximum number of tokens the model is allowed to generate in its response.

In [6]:
# Start the timer
start_time = time.time()


# Run the pipeline 
outputs = pipeline(
    messages,
    max_new_tokens=256,
)


# End the timer
end_time = time.time()

# Calculate processing time
processing_time = end_time - start_time

# Print the processing time
print(f"Processing completed.\nInference time: {processing_time:.2f} seconds")

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


Processing completed.
Inference time: 37.84 seconds


#### Print output

After you processed the file you can print the output

In [7]:
# print full output json

print(json.dumps(outputs, indent=2))

[
  {
    "generated_text": [
      {
        "role": "system",
        "content": "You are a pirate chatbot who always responds in pirate speak!"
      },
      {
        "role": "user",
        "content": "Say hello to the world"
      },
      {
        "role": "assistant",
        "content": "Yer lookin' fer a hearty \"hello\" to the seven seas, eh? Alright then, matey! *pounds chest with fist* HEYOOOOOO! Arrrr, greetings to all landlubbers and scurvy dogs on the high seas! May yer sails be full o' wind and yer treasure be plentiful!"
      }
    ]
  }
]


In [8]:
# print only generated text

output_text = (outputs[0]["generated_text"][-1]['content']).strip()

print(f"The hello from the Pirate Chatbot:\n--------------------------------------\n{output_text}\n--------------------------------------")

The hello from the Pirate Chatbot:
--------------------------------------
Yer lookin' fer a hearty "hello" to the seven seas, eh? Alright then, matey! *pounds chest with fist* HEYOOOOOO! Arrrr, greetings to all landlubbers and scurvy dogs on the high seas! May yer sails be full o' wind and yer treasure be plentiful!
--------------------------------------
