## Fine-Tuning LLM and Prediction

### Setup:
- Install USC VPN: https://itservices.usc.edu/vpn/
- Go to https://ondemand.carc.usc.edu/pun/sys/dashboard and login
- Start JupyterLab by clicking on the JupyterLab icon, fill out the info, and click on launch. Here's an example of working parameters. You may need to change the account to which resources are accounted for:
    * Cluster: Discovery
    * JupyterLab version: 4.0.5
    * Modules to load (optional): gcc/11.3.0 python/3.9.12 git
    * Account: ll_774_951 #Change the account with the one from which the resources will be charged
    * Partition: main
    * Number of CPUs: 1
    * Memory (GB): 16
    * Number of hours: 2
- Once Jupyter starts, click on the icon on the top left to upload files and select this notebook. The notebook will appear in the file list on the left column. Open the notebook just uploaded by double-clicking on it
- Install the hiyouga/LLaMA-Efficient-Tuning repo by running the next code block

In [1]:
!git clone https://github.com/hiyouga/LLaMA-Efficient-Tuning.git

Cloning into 'LLaMA-Efficient-Tuning'...
remote: Enumerating objects: 2669, done.[K
remote: Counting objects: 100% (68/68), done.[K
remote: Compressing objects: 100% (47/47), done.[K
remote: Total 2669 (delta 24), reused 55 (delta 20), pack-reused 2601[K
Receiving objects: 100% (2669/2669), 173.98 MiB | 13.62 MiB/s, done.
Resolving deltas: 100% (1831/1831), done.
Updating files: 100% (108/108), done.


### Generate and upload train and test sets:
- Generate a `train.json` and a `test.json` to train and test the model. The JSONs need this structure:
```json
[
    {
        "instruction": "Is the text of this tweet an information operation: iPhone'da parmak izi uyarısıhttp://t.co/ETXCTbgWcR?",
        "input": "",
        "output": "True"
    },
    {
        "instruction": "Is the text of this tweet an information operation: @libanews Israël fournit-il des armes à Al-Qaïda en Syrie ? http://t.co/q3UfgvSSaO?",
        "input": "",
        "output": "True"
    }
]
```
- Open a new browser tab, go to https://ondemand.carc.usc.edu/pun/sys/dashboard, and click on "Files" in the top bar, then "Home Directory." Then click and open the `LLaMA-Efficient-Tuning` folder, then the `data` folder
- Click on the blue "Upload" button to upload the JSON files you just created
- Specify the LLaMA-Efficient-Tuning folder you created earlier by cloning the repository, and the JSON files you want you want the model to be aware of
- Run the code following code to update the information

In [19]:
efficient_finetuning_folder = "/home1/pante/UntitledFolder/LLaMA-Efficient-Tuning" #absolute path
train = "train.json"
test = "test.json"

In [20]:
import json
def add_json_file(efficient_finetuning_folder, json_file_name):
    # Replace {username} with your actual username
    data_info_file = f"{efficient_finetuning_folder}/data/dataset_info.json"

    # Load the data_info.json file
    with open(data_info_file, 'r') as f:
        data_info = json.load(f)

    # Create a new key by removing the .json extension from the file name
    new_key = json_file_name.replace('.json', '')

    # Add the new key to the data_info dictionary
    data_info[new_key] = {
        'file_name': json_file_name
    }

    # Save the updated data_info.json file
    with open(data_info_file, 'w') as f:
        json.dump(data_info, f, indent=4)

    print(f'Added {new_key} to data_info.json')

add_json_file(efficient_finetuning_folder, train)
add_json_file(efficient_finetuning_folder, test)

Added train to data_info.json
Added test to data_info.json


- Run the following block to load the `generate_ftune_job` function

In [44]:
def generate_ftune_job(account, cpus_per_task, mem, run_time, gpu, hf_token, efficient_finetuning_folder, stage, model_name_or_path, dataset, template, finetuning_type, lora_target, output_dir, per_device_train_batch_size, gradient_accumulation_steps, lr_scheduler_type, logging_steps, save_steps, learning_rate, num_train_epochs, plot_loss, fp16, filename):
    text = f'''#!/bin/bash
#SBATCH --account={account}
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task={cpus_per_task}
#SBATCH --mem={mem}
#SBATCH --time={run_time}
#SBATCH --partition=gpu
#SBATCH --gres=gpu:{gpu}

module purge
module load gcc/11.3.0
module load python/3.9.12
module load nvidia-hpc-sdk
module load git/2.36.1
module load cuda/11.8.0

export HF_TOKEN={hf_token}
huggingface-cli login --token $HF_TOKEN

cd {efficient_finetuning_folder}
pip install --upgrade pip
pip install -r requirements.txt

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \\
    --stage {stage} \\
    --model_name_or_path  {model_name_or_path} \\
    --do_train \\
    --dataset {dataset} \\
    --template {template} \\
    --finetuning_type {finetuning_type} \\
    --lora_target {lora_target} \
    --output_dir {output_dir} \\
    --overwrite_cache \\
    --per_device_train_batch_size {per_device_train_batch_size} \\
    --gradient_accumulation_steps {gradient_accumulation_steps} \\
    --lr_scheduler_type {lr_scheduler_type} \\
    --logging_steps {logging_steps} \\
    --save_steps {save_steps} \\
    --learning_rate {learning_rate} \\
    --num_train_epochs {num_train_epochs} \\
    {'--plot_loss' if plot_loss else ''}\\
    {'--fp16' if fp16 else ''}'''

    with open(filename, 'w') as f:
            f.write(text)

- Update parameters of the generate_ftune_job function (if you are unsure about the values to use, leave them as they are or consult the Efficient Fine-Tuning repository or relevant literature for more information
- After updating the parameters, run the following code to generate the job file to fine-tune the model

In [42]:
generate_ftune_job(
    account="ll_774_951", #the account you are charging resources to
    cpus_per_task=8, #default
    mem="64GB", #default
    run_time="05:00:00", #update accordingly to how much time your predictions take; it varies depending on the question length and resources chosen
    gpu="a40", #a40 or a100, a100 is faster but usually busy
    hf_token="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", #create a Hugging Face account and substitute with your token
    efficient_finetuning_folder="/home1/pante/UntitledFolder/LLaMA-Efficient-Tuning", #the cloned repo folder

    stage="sft", #default
    model_name_or_path="meta-llama/Llama-2-7b-hf", #you can change the model using the Hugging Face models
    dataset="train", #update with the name of the training dataset you uploaded
    template="default", #default
    finetuning_type="lora", #default
    lora_target="q_proj,v_proj", #default
    output_dir="/home1/pante/UntitledFolder/train", #select an output directory for your trained model, choose a different folder for every train dataset, be sure to select an empty folder
    per_device_train_batch_size=4, #default
    gradient_accumulation_steps=4, #default
    lr_scheduler_type="cosine", #default
    logging_steps=10,
    save_steps=1000,
    learning_rate="5e-5",
    num_train_epochs=3.0,
    plot_loss=True,
    fp16=True,

    filename="ftune.job", #the name of the job file to start the fine-tuning
)


- Run the following code to start the job for fine-tuning the model

In [43]:
!sbatch ftune.job

Submitted batch job 16604771


Check your job at [https://ondemand.carc.usc.edu/pun/sys/dashboard/activejobs]

### Prediction:
- Update the parameters.
- Run the following code blocks to start prediction.
- The results will be in the output folder you selected.

In [45]:
def generate_predict_job(account, cpus_per_task, mem, run_time, gpu, hf_token, efficient_finetuning_folder,dataset,train_dir,output_dir, stage, model_name_or_path, template, finetuning_type, per_device_eval_batch_size, max_samples, filename):
    text = f'''#!/bin/bash
#SBATCH --account={account}
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task={cpus_per_task}
#SBATCH --mem={mem}
#SBATCH --time={run_time}
#SBATCH --partition=gpu
#SBATCH --gres=gpu:{gpu}

module purge
module load gcc/11.3.0
module load python/3.9.12
module load nvidia-hpc-sdk
module load git/2.36.1
module load cuda/11.8.0

export HF_TOKEN={hf_token}
huggingface-cli login --token $HF_TOKEN

cd {efficient_finetuning_folder}
pip install --upgrade pip
pip install -r requirements.txt

dataset={dataset}
output_dir={output_dir}

echo "########### PREDICT_FT.JOB ############"
echo "dataset: $dataset"
echo "output_dir: $output_dir"

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \\
    --stage {stage} \\
    --model_name_or_path  {model_name_or_path} \\
    --do_predict \\
    --dataset {dataset} \\
    --template {template} \\
    --finetuning_type {finetuning_type} \\
    --checkpoint_dir {train_dir} \\
    --output_dir {output_dir}\\
    --per_device_eval_batch_size {per_device_eval_batch_size} \\
    --max_samples {max_samples} \\
    --predict_with_generate'''

    with open(filename, 'w') as f:
        f.write(text)


In [46]:
generate_predict_job(
    account="ll_774_951", #the account you are charging resources to
    cpus_per_task=8, #default
    mem="64GB", #default
    run_time="03:00:00", #update accordingly to how much time your predictions take; it varies depending on the question length and resources chosen
    gpu="a40", #a40 or a100, a100 is faster but usually busy
    hf_token="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", #create a Hugging Face account and substitute with your token
    efficient_finetuning_folder="/home1/pante/UntitledFolder/LLaMA-Efficient-Tuning", #the cloned repo folder
    dataset="test",
    train_dir="/home1/pante/UntitledFolder/train",
    output_dir="/home1/pante/UntitledFolder/test", # be sure to select an empty folder
    stage="sft", #default
    model_name_or_path="meta-llama/Llama-2-7b-hf", #you can change the model using the Hugging Face models
    template="default", #default
    finetuning_type="lora", #default
    per_device_eval_batch_size=8, #default
    max_samples=10000,
    filename="predict.job" #the name of the job file to start the prediction
)


In [47]:
!sbatch predict.job

Submitted batch job 16604894


Check your job at [https://ondemand.carc.usc.edu/pun/sys/dashboard/activejobs]