## Install required packages

In [1]:
!pip install torch torchao torchtune wandb --upgrade --no-cache | tail -n 1 



## Explore torchtune commands

Check available commands in TorchTune.

In [2]:
!tune --help

usage: tune [-h] {download,ls,cp,run,validate} ...

Welcome to the torchtune CLI!

options:
  -h, --help            show this help message and exit

subcommands:
  {download,ls,cp,run,validate}
    download            Download a model from the Hugging Face Hub.
    ls                  List all built-in recipes and configs
    cp                  Copy a built-in recipe or config to a local path.
    run                 Run a recipe. For distributed recipes, this supports
                        all torchrun arguments.
    validate            Validate a config and ensure that it is well-formed.


## List all built-in recipes and configs

To view different finetuning recipes and the associated configs. Each recipe consists of three components:

- Configurable parameters, specified through yaml configs and command-line overrides

- Recipe Script, entry-point which puts everything together including parsing and validating configs, setting up the environment, and correctly using the recipe class

- Recipe Class, core logic needed for training, exposed to users through a set of APIs

In [3]:
!tune ls

RECIPE                                   CONFIG                                  
full_finetune_single_device              llama2/7B_full_low_memory               
                                         code_llama2/7B_full_low_memory          
                                         llama3/8B_full_single_device            
                                         llama3_1/8B_full_single_device          
                                         llama3_2/1B_full_single_device          
                                         llama3_2/3B_full_single_device          
                                         mistral/7B_full_low_memory              
                                         phi3/mini_full_low_memory               
                                         qwen2/7B_full_single_device             
                                         qwen2/0.5B_full_single_device           
                                         qwen2/1.5B_full_single_device           
    

## Download Llama3.2-11B-Vision-Instruct model

Prepare a folder to store the downloaded model:

In [9]:
!mkdir /tmp/Llama-3.2-11B-Vision-Instruct

The below command will download the model, along with the tokenizer, which is necessary for processing both the image and text inputs. After downloading the model, you can start setting up the fine-tuning configuration.

In [None]:
!tune download meta-llama/Llama-3.2-11B-Vision-Instruct --output-dir /tmp/Llama-3.2-11B-Vision-Instruct --ignore-patterns "original/consolidated*"

## Configure Fine-Tuning Setup

We will fine-tune model with LoRA configuration on a single device.

#### Modify config
Copy the existing configuration file template for your fine-tuning setup.

In [14]:
!tune cp llama3_2_vision/11B_qlora_single_device ./custom_config_file.yaml

Copied file to custom_config_file.yaml


In [18]:
from ibm_watson_studio_lib import access_project_or_space

# Access your project or space
wslib = access_project_or_space()

# Open the existing YAML file and read its content
file_path = 'custom_config_file.yaml'  
with open(file_path, 'r') as file:
    file_data = file.read()

file_data_bytes = file_data.encode('utf-8')
wslib.save_data(file_path, file_data_bytes)
print(f"File {file_path} saved successfully.")

File custom_config_file.yaml saved successfully.


In [None]:
## Edit the following fields:

"""
# Dataset
dataset:
  _component_: torchtune.datasets.multimodal.the_cauldron_dataset
  subset: diagram_image_to_text
seed: null
shuffle: True
collate_fn: torchtune.data.padded_collate_tiled_images_and_mask


# Fine-tuning arguments
epochs: 3


# Logging
output_dir: /tmp/lora-llama3.2-vision-finetune
metric_logger:
  _component_: torchtune.utils.metric_logging.WandBLogger
  project: llama-3.2-vlm-torchtune
log_every_n_steps: 1
log_peak_memory_stats: True
"""

#### Validate Configuration
Ensure that the configuration file is properly set up.

In [19]:
!tune validate /project_data/data_asset/custom_config_file.yaml

Config is well-formed!


In [20]:
# Open and read the YAML file
file_path = '/project_data/data_asset/custom_config_file.yaml'

with open(file_path, 'r') as file:
    content = file.read()

print(content)

# Config for single device QLoRA finetuning in lora_finetune_single_device.py
# using a Llama3.2 11B Vision Instruct model
#
# This config assumes that you've run the following command before launching:
#   tune download meta-llama/Llama-3.2-11B-Vision-Instruct --output-dir /tmp/Llama-3.2-11B-Vision-Instruct --ignore-patterns "original/consolidated*"
#
# To launch on a single device, run the following command from root:
#   tune run lora_finetune_single_device --config llama3_2_vision/11B_qlora_single_device
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training:
#   tune run lora_finetune_single_device --config llama3_2_vision/11B_qlora_single_device checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works only for training on single device.

# Model arguments
model:
  _component_: torchtune.models.llama3_2_vision.qlora_llama3_2_vision_11b
  decoder_trainable: "frozen"
  encoder_trainabl

#### Set Up Weights & Biases (W&B) for Logging
Log into your W&B account to enable training tracking.

In [21]:
# !wandb login <API_KEY>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/wsuser/.netrc


## Running the Fine-Tuning Script

Once the configuration is set, you can run the fine-tuning process using the following command:

In [22]:
!tune run lora_finetune_single_device --config /project_data/data_asset/custom_config_file.yaml

INFO:torchtune.utils._logging:Running LoRAFinetuneRecipeSingleDevice with resolved config:

batch_size: 2
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /tmp/Llama-3.2-11B-Vision-Instruct/
  checkpoint_files:
    filename_format: model-{}-of-{}.safetensors
    max_filename: '00005'
  model_type: LLAMA3_VISION
  output_dir: /tmp/Llama-3.2-11B-Vision-Instruct/
  recipe_checkpoint: null
clip_grad_norm: 1.0
collate_fn: torchtune.data.padded_collate_tiled_images_and_mask
compile: false
dataset:
  _component_: torchtune.datasets.multimodal.the_cauldron_dataset
  subset: diagram_image_to_text
device: cuda
dtype: bf16
enable_activation_checkpointing: true
epochs: 1
gradient_accumulation_steps: 8
log_every_n_steps: 1
log_peak_memory_stats: true
loss:
  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
lr_scheduler:
  _component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
  num_warmup_steps: 100
max_steps_per_epoch: null

## Uploading your model to the Hugging Face Hub

In [26]:
import os
os.listdir("/tmp/Llama-3.2-11B-Vision-Instruct/")

['.cache',
 '.gitattributes',
 'README.md',
 'config.json',
 'USE_POLICY.md',
 'chat_template.json',
 'generation_config.json',
 'original',
 'LICENSE.txt',
 'model.safetensors.index.json',
 'preprocessor_config.json',
 'special_tokens_map.json',
 'tokenizer_config.json',
 'tokenizer.json',
 'model-00005-of-00005.safetensors',
 'model-00002-of-00005.safetensors',
 'model-00003-of-00005.safetensors',
 'model-00001-of-00005.safetensors',
 'model-00004-of-00005.safetensors',
 'torchtune_config.yaml',
 'logs',
 'hf_model_0001_0.pt',
 'hf_model_0002_0.pt',
 'hf_model_0003_0.pt',
 'hf_model_0004_0.pt',
 'hf_model_0005_0.pt',
 'adapter_0.pt']

In [31]:
# !huggingface-cli login --token <HF_TOKEN>

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
The token `model-deploy` has been saved to /home/wsuser/.cache/huggingface/stored_tokens
Your token has been saved to /home/wsuser/.cache/huggingface/token
Login successful.
The current active token is: `model-deploy`


In [None]:
# !huggingface-cli upload 'hf-repo-id' 'checkpoint-dir'
!huggingface-cli upload llama3.2-vlm-torchtune /tmp/Llama-3.2-11B-Vision-Instruct/

#### To Load Finetuned model for Inference

- https://huggingface.co/docs/transformers/main/en/tasks/image_text_to_text