## Inference on Hugging Face Transformers
Hugging Face Transformers is a popular open-source library that provides an easy-to-use interface for working with state-of-the-art language models, such as BERT, GPT, and Llama variants. These models can be fine-tuned or used off-the-shelf for tasks like text generation, question answering, and sentiment analysis.   

In this tutorial, we’ll demonstrate how to run inference on Hugging Face Transformers models using AMD Instinct™ GPUs. We will cover configuring ROCm for GPU support, installing the necessary libraries, and running a LLM(meta-llama/Meta-Llama-3.1-8B-Instruct) in a containerized environment. 

### Prepare Inference Environment
#### 1. Launch the Docker Container
Run the following command in your terminal to pull the prebuilt Docker image containing all necessary dependencies and launch the Docker container with proper configuration:
```bash
(shell)docker run -it --rm -p 8888:8888 --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 1G --security-opt seccomp=unconfined --security-opt apparmor=unconfined -v $(pwd):/workspace --env HUGGINGFACE_HUB_CACHE=/workspace rocm/pytorch:latest
--> if docker is launched it will look like root@xxx:
```
```bash
(docker)cd && cd /workspace
```
**Note:**Mounts the current host directory ($(pwd)) to **/workspace** in the container, allowing files to be shared between the host and the container.

### 2. Install and Launch Jupyter
Inside the Docker container, install Jupyter using the following command:
```bash

(docker)pip install --upgrade pip setuptools wheel
(docker)pip install jupyter
(docker)jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root
```
**Note:**Save the token or URL provided in the terminal output to access the notebook from your host machine.

### 3. Install Required Libraries
Install the libraries needed for this tutorial. Run the following commands inside the Jupyter notebook running within the Docker container:

In [None]:
!pip install accelerate transformers

Verify the installation:

In [None]:
!pip list | grep transformer
!pip list | grep accelerate

### 4. Provide Your Hugging Face Token
You will need a Hugging Face API token to access meta-llama/Llama-3.1-8B-Instruct. Tokens typically start with "hf_". Generate your token at Hugging Face Tokens and request access for [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).

Run the following interactive block in your Jupyter notebook to set up the token:
**Note:** Please uncheck the "Add token as Git credential?" option.

In [None]:
from huggingface_hub import notebook_login, HfApi

# Prompt the user to log in
status = notebook_login()

Verify that your token was captured correctly:

In [None]:
# Validate the token
try:
    api = HfApi()
    user_info = api.whoami()
    print(f"Token validated successfully! Logged in as: {user_info['name']}")
except Exception as e:
    print(f"Token validation failed. Error: {e}")

### Run LLM Inference using Hugging Face Transformers
Inside the docker container, run the following codes using jupyter notebook:

In [None]:
import transformers
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)
query = "Explain the concept of AI."
messages = [
    {"role": "system", "content": "You are an expert in the field of AI. Make sure to provide an explanation in few sentences."},
    {"role": "user", "content": query},
]

outputs = pipeline(
    messages,
    max_new_tokens=512,
    top_p = 0.7,     
    temperature=0.2,               
)

response = outputs[0]["generated_text"][-1]['content']
print('-------------------------------')
print('Query:\n', query)
print('-------------------------------')
print('Response:\n', response)

After successful execution, output will look like:
```bash
Query:
 Explain the concept of AI.
-------------------------------
Response:
 Artificial Intelligence (AI) refers to the development of computer systems that can perform tasks that typically require human intelligence, such as learning, problem-solving, decision-making, and perception. These systems use algorithms and data to simulate human-like behavior, enabling them to adapt to new situations and improve their performance over time. AI can be categorized into two main types: Narrow or Weak AI, which is designed to perform a specific task, and General or Strong AI, which aims to replicate human intelligence and reasoning across a wide range of tasks.
```