
# Fine-tuning the multimodal LLaVa model to identify potential male models

In this notebook we'll fine-tune the LLaVA model from [Haotian Liu](https://github.com/haotian-liu/LLaVA) to identify young men who fall under the categories of "pretty boy" and "good looking." The purpose of this fine-tuning is to create a version of the LLaVa model that can be used by male modeling agencies to find men who are not necessarily models but have the potential to be so. When images are submitted to this fine-tuned model, the model will tell the user whether the men in the image are "pretty," as well as other aspects, such as their race. In doing so, we hope to diversify the current modeling market and facilitate the recruiting process.


## Table of contents 

1. Data Preprocessing
2. LLaVA Installation
3. DeepSpeed configuration
4. Weights and Biases
5. Finetuning flow
6. Deployment via gradio interface

## Data Preprocessing 

LLaVA requires data to be in a very specific format. Below we use a helper function to format the required dataset from hugging face: [jcthehaxer/check](https://huggingface.co/datasets/jcthehaxer/check). This dataset teaches the model what an attractive young man looks like and what to look for. To process the dataset effectively, first download the dataset from hugging face. Then, create a new folder in your workspace and call it "datasets." In the new folder, create yet another folder, this time calling it "images". Upload all of the image and the metadata.csv files to "images". Next, run the cell that contains the helper function. The helper function will split the data into training and testing subsets, which will be outputed as two JSON files: "train_llava_data.json" and "val_llava_data.json". Once these files have been returned, create two new folders in "datasets" and name them "train" and "validation". Move each JSON file to its respective folder. Finally, once each JSON file is in its own folder, rename both files to "datasets.json".

In [None]:
# Install preprocessing libraries
!pip install datasets
!pip install --upgrade --force-reinstall Pillow

In [None]:
import os
import csv
import json
import random
import shutil
import uuid

def create_llava_finetune_data(image_folder, metadata_file, train_output_file, val_output_file, split_ratio=0.8):
    # Dictionary to hold image-to-answer mapping from metadata.csv
    image_answer_map = {}

    # Read the metadata.csv file to extract the file_name and corresponding answers
    with open(metadata_file, mode='r', encoding='utf-8') as csv_file:
        csv_reader = csv.reader(csv_file)
        next(csv_reader)  # Skip the header row if present
        for row in csv_reader:
            file_name, answer = row[0], row[1]
            image_answer_map[file_name] = answer

    # List to hold all the formatted data
    all_data = []

    # Iterate through the image files in the folder
    for filename in os.listdir(image_folder):
        # Only process image files
        if filename.endswith(('.jpg', '.jpeg', '.png', '.JPG', '.JPEG', '.PNG')):
            # Generate a unique UUID
            unique_id = str(uuid.uuid4())
            
            # Check if the image has a corresponding answer in the metadata
            if filename in image_answer_map:
                # Retrieve the answer from the CSV data
                answer = image_answer_map[filename]
                
                # Construct the new image file name based on the UUID
                new_image_name = f"{unique_id}.jpg"
                new_image_path = os.path.join(image_folder, new_image_name)
                
                # Rename the image file to match the UUID
                old_image_path = os.path.join(image_folder, filename)
                shutil.move(old_image_path, new_image_path)  # Renames the file
                
                # Construct the formatted JSON data for LLaVA
                json_data = {
                    "id": unique_id,  # UUID as the unique ID
                    "image": new_image_name,  # Use the UUID for the image name
                    "conversations": [
                        {
                            "from": "human",
                            "value": "What do you see?"  # Fixed question
                        },
                        {
                            "from": "gpt",
                            "value": answer  # Answer from the metadata.csv
                        }
                    ]
                }

                # Append the formatted data to the list
                all_data.append(json_data)

    # Shuffle the data to ensure randomness
    random.shuffle(all_data)

    # Split the data into train and validation/test sets based on the split ratio
    train_size = int(len(all_data) * split_ratio)
    train_data = all_data[:train_size]
    val_data = all_data[train_size:]

    # Write the training data to a JSON file
    with open(train_output_file, 'w', encoding='utf-8') as json_file:
        json.dump(train_data, json_file, ensure_ascii=False, indent=4)

    # Write the validation/test data to a separate JSON file
    with open(val_output_file, 'w', encoding='utf-8') as json_file:
        json.dump(val_data, json_file, ensure_ascii=False, indent=4)

    print(f"Training data saved to {train_output_file}")
    print(f"Validation/Test data saved to {val_output_file}")
    print(f"Images have been renamed using UUIDs.")

# Example usage
image_folder = 'dataset/images'  # Folder containing images
metadata_file = 'dataset/images/metadata.csv'  # Path to metadata.csv
train_output_file = 'train_llava_data.json'  # Output JSON file for training data
val_output_file = 'val_llava_data.json'  # Output JSON file for validation/test data

create_llava_finetune_data(image_folder, metadata_file, train_output_file, val_output_file)

## Install LLaVA

To install the functions needed to use the model, we have to clone the original LLaVA repository and and install it in editable mode. This lets us access all functions and helper methods 

In [None]:
# The pip install -e . lets us install the repository in editable mode
!git clone https://github.com/haotian-liu/LLaVA.git
!cd LLaVA && pip install --upgrade pip && pip install -e .

## DeepSpeed

Microsoft DeepSpeed is a deep learning optimization library designed to enhance the training speed and scalability of large-scale artificial intelligence (AI) models. Developed by Microsoft, this open-source tool specifically addresses the challenges associated with training very large models, allowing for reduced computational times and resource usage. By optimizing memory management and introducing novel parallelism techniques, DeepSpeed enables developers and researchers to train models with billions of parameters efficiently, even on limited hardware setups. DeepSpeed API is a lightweight wrapper on PyTorch. DeepSpeed manages all of the boilerplate training techniques, such as distributed training, mixed precision, gradient accumulation, and checkpoints and allows you to just focus on model development. To learn more about DeepSpeed and how it performs the magic, check out this [article](https://www.deepspeed.ai/2021/03/07/zero3-offload.html) on DeepSpeed and ZeRO.

Using deepspeed is extremely simple - you simply pip install it! The LLaVA respository contains the setup scripts and configuration files needed to finetune in different ways. 

In [None]:
!cd LLaVA && pip install -e ".[train]"
!pip install flash-attn --no-build-isolation

In [None]:
!pip install deepspeed

## Weights and Biases

Weights and Biases is an industry standard MLOps tool used to monitor and evaluate training jobs. Make sure to have a Weights and Biases account before continuing. After fine-tuning, you will be able to see a report of the run on your Weights and Biases workspace.

In [None]:
!pip install wandb

In [None]:
import wandb

wandb.login()

## Finetuning job

Below we start the DeepSpeed training run for 5 epochs. It will automatically recognize multiple GPUs and parallelize across them. Most of the input flags are standard but you can adjust your training run with the `num_train_epochs` and `per_device_train_batch_size` flags!

In [None]:
!deepspeed LLaVA/llava/train/train_mem.py \
    --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \
    --deepspeed LLaVA/scripts/zero3.json \
    --model_name_or_path liuhaotian/llava-v1.5-13b \
    --version v1 \
    --data_path ./dataset/train/dataset.json \
    --image_folder ./dataset/images \
    --vision_tower openai/clip-vit-large-patch14-336 \
    --mm_projector_type mlp2x_gelu \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --image_aspect_ratio pad \
    --group_by_modality_length True \
    --bf16 True \
    --output_dir ./checkpoints/llava-v1.5-13b-task-lora \
    --num_train_epochs 5 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 50000 \
    --save_total_limit 1 \
    --learning_rate 2e-4 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --report_to wandb

In [None]:
# merge the LoRA weights with the full model
!python LLaVA/scripts/merge_lora_weights.py --model-path checkpoints/llava-v1.5-13b-task-lora --model-base liuhaotian/llava-v1.5-13b --save-model-path llava-ftmodel

In [None]:
# bump transformers down for gradio/deployment inference if needed
!pip install transformers==4.37.2

## Deployment

LLaVA gives us 2 ways to deploy the model - via CLI or Gradio UI. You can test out the fine-tuned model with the cells below!

In [None]:
# Uncomment the lines below to run the CLI. You need to pass in a JPG image URL to use the multimodal capabilities

!python -m llava.serve.cli \
     --model-path llava-ftmodel \
     --image-file "image url"

In [None]:
# Download the model runner
!wget -L https://raw.githubusercontent.com/brevdev/notebooks/main/assets/llava-deploy.sh 

In [None]:
# Run inference! Use the public link provided in the output to test
!chmod +x llava-deploy.sh && ./llava-deploy.sh