##  Fine-tune Llama 3.1 (8B parameter) using Ray Framework on Hopsworks
This tutorial demonstrates how to perform fine-tuning (with LoRA and deepspeed) of a Llama 3.1 (8B) using the Ray framework on Hopsworks. Ray is an industry-leading distributed computing framework. This tutorial was run on OVH cluster but you can use any cloud provider of your choice.

### Pre-requisites
To perform the steps in this tutorial, you need to create a Hopsworks Kubernetes cluster with Ray enabled. For the fine-tuning task demonstrated in this example, these are the minimum resources required:
* 1 x <b>B3-64</b> (16 CPU 64 GB RAM) for the Ray head
* 8 x <b>T2-LE-90</b> (30 CPU, 90 GB RAM, 2x 32 GRAM Tesla V100S) for the workers
Let's get started!

### Step 1: Dataset preparation
We are going to fine-tune the model for question answering. We need to prepare the dataset that will be used for supervised fine-tuning in a certain format. There is no specific prompt format required for the pre-trained Llama 3.1 so the dataset preprocessing can follow any prompt-completion style. The instruction-tuned models (Meta-Llama-3.1-{8,70,405}B-Instruct) use a multi-turn conversation prompt format that structures the conversation between the users and the models.

The dataset for QA typically includes the following fields:

* Question: The input question to the model.
* Context (optional): A passage or text providing information the model should use to answer.
* Answer: The correct response.

This example is configured to fine-tune the Llama 3.1 8B pre-trained model on the GSM8K dataset.

In [20]:
from datasets import load_dataset
import tempfile
import os
import json
import shutil

In [21]:
llama_dir = "Resources/llama_finetuning"
HOPSFS_STORAGE_PATH = os.path.join(os.environ.get("PROJECT_PATH"), llama_dir)
if not os.path.exists(HOPSFS_STORAGE_PATH):
    os.mkdir(HOPSFS_STORAGE_PATH)

In [22]:
dataset = load_dataset("openai/gsm8k", "main")
dataset_splits = {"train": dataset["train"], "test": dataset["test"]}
dataset_dir = os.path.join(HOPSFS_STORAGE_PATH, "datasets")
if not os.path.exists(dataset_dir):
    os.mkdir(dataset_dir)
    
with open(os.path.join(dataset_dir, "tokens.json"), "w") as f:
    tokens = {}
    print(f)
    tokens["tokens"] = ["<START_Q>", "<END_Q>", "<START_A>", "<END_A>"]
    f.write(json.dumps(tokens))
    for key, ds in dataset_splits.items():
        with open(os.path.join(dataset_dir, f"{key}.jsonl"), "w") as f:
            for item in ds:
                newitem = {}
                newitem["input"] = (
                    f"<START_Q>{item['question']}<END_Q>"
                    f"<START_A>{item['answer']}<END_A>"
                )
                f.write(json.dumps(newitem) + "\n")

<_io.TextIOWrapper name='/hopsfs/Resources/llama_finetuning/datasets/tokens.json' mode='w' encoding='UTF-8'>


### Step 2: Download the pre-trained model
The next step is to download the pre-trained Llama model from hugging face. For this you will need the hugging face token.

In [23]:
from transformers.utils.hub import TRANSFORMERS_CACHE
from transformers import AutoTokenizer, AutoModelForCausalLM

In [24]:
os.environ["HF_TOKEN"] = "<YOUR HUGGING FACE TOKEN>"
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

In [6]:
# download the pre-trained model from Hugging face
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

In [25]:
local_model_dir = os.path.join(TRANSFORMERS_CACHE, f"models--{model_id.replace('/', '--')}")
snapshots_dir = os.path.join(local_model_dir, "snapshots")
blobs_dir = os.path.join(snapshots_dir, next(d for d in os.listdir(snapshots_dir) if os.path.isdir(os.path.join(snapshots_dir, d))))
os.listdir(blobs_dir)

['special_tokens_map.json',
 'model-00002-of-00004.safetensors',
 'model-00001-of-00004.safetensors',
 'model-00003-of-00004.safetensors',
 'config.json',
 'tokenizer.json',
 'model-00004-of-00004.safetensors',
 'tokenizer_config.json',
 'generation_config.json',
 'model.safetensors.index.json']

In [15]:
hopsfs_model_dir = os.path.join(HOPSFS_STORAGE_PATH, f"models--{model_id.replace('/', '--')}")
if not os.path.exists(hopsfs_model_dir):
    os.mkdir(hopsfs_model_dir)

In [26]:
# copy the downloaded model to hopsfs
cp_cmd = f"cp -L -r {blobs_dir}/* {hopsfs_model_dir}"
result = os.system(cp_cmd)
assert result != 0, "Failed to copy pre-trained model files to hopsfs"

AssertionError: Failed to copy pre-trained model files to hopsfs

### Step 3: Create the ray job for the fine-tuning task
We are going to use the hopsworks jobs api to create and run the job for the fine-tuning task

In [27]:
import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

app_file_path = dataset_api.upload("ray_llm_finetuning.py", llama_dir, overwrite=True)
environment_config_yaml_path = dataset_api.upload("llama_fine_tune_runtime_env.yaml", llama_dir, overwrite=True)

Connection closed.
2025-01-09 07:01:22,956 INFO: Python Engine initialized.

Logged in to project, explore it here https://hopsworks.ai.local/p/119


Uploading /hopsfs/Jupyter/ray_llm_finetuning.py: 0.000%|          | 0/28956 elapsed<00:00 remaining<?

Uploading /hopsfs/Jupyter/llama_fine_tune_runtime_env.yaml: 0.000%|          | 0/341 elapsed<00:00 remaining<?

### About the runtime environment file
The runtime environment file contains the dependencies required for the Ray job including files, packages, environment variables, and more. This is useful when you need to install specific packages and set environment variables for this particular Ray job. It should be provided as a YAML file. In this example, the runtime environment file has the following configuration.
```
pip:
  - transformers==4.44.0
  - accelerate==0.31.0
  - peft==0.11.1
  - deepspeed==0.16.2
env_vars:
  LIBRARY_PATH: "$CUDA_HOME/lib64:$LIBRARY_PATH"
  PROJECT_DIR: "/home/yarnapp/hopsfs"
  TRAINED_MODEL_STORAGE_PATH: "${PROJECT_DIR}/Resources/llama_finetuning/fine-tuned-model" # Where the fine-tuned model will be saved
  TRAINING_DATA_DIR: "${PROJECT_DIR}/Resources/llama_finetuning/datasets" # dataset location
  TRAINING_CONFIGURATION_DIR: "${PROJECT_DIR}/Resources/llama_finetuning/configs" # location for deepspeed and lora configuration files
```

In [48]:
jobs_api = project.get_jobs_api()

ray_config = jobs_api.get_configuration("RAY")
pretrained_path = "/home/yarnapp" + hopsfs_model_dir
ray_config['appPath'] = os.path.join('/Projects/'+project.name, app_file_path)
ray_config['environmentName'] = "ray-torch-training-pipeline"
ray_config['driverCores'] = 8
ray_config['driverMemory'] = 34816
ray_config['workerCores'] = 28
ray_config['workerMemory'] = 34816
ray_config['minWorkers'] = 8
ray_config['maxWorkers'] = 8
ray_config['workerGpus'] = 2
ray_config['runtimeEnvironment'] = os.path.join('/Projects/'+project.name, environment_config_yaml_path)
ray_config['defaultArgs'] = f"--model-name models-meta-llama-Meta-Llama-3.1-8B-Instruct --mx fp16 --lora --num-devices=16 --num-epochs=1 --lr=5e-4 --batch-size-per-device=16 --eval-batch-size-per-device=16 --pre-trained-path {pretrained_path}"

job = jobs_api.create_job(job_name, ray_config)

Job created successfully, explore it at https://hopsworks.ai.local/p/119/jobs/named/ray_llama_finetuning


## Step 4: Run the job

In [49]:
finetuning_job = jobs_api.get_job(job_name)

In [None]:
finetuning_job.run()

After the job is run you can go to the hopsworks UI to monitor the job execution. From executions page, you can open the Ray dashboard. In the Ray Dashboard, you can monitor the resources used by the job, the number of workers, logs, and the tasks that are running. 

After the job finishes running successfully, the fine-tuned model will be saved in the directory specified in the TRAINED_MODEL_STORAGE_PATH variable defined in the 