## Fine-tuning Mistral 7b with AutoTrain

Setup Runtime
For fine-tuning Llama, a GPU instance is essential. Follow the directions below:

- Go to `Runtime` (located in the top menu bar).
- Select `Change Runtime Type`.
- Choose `T4 GPU` (or a comparable option).

### Step 1: Setup Environment

In [11]:
%pip install pandas autotrain-advanced ipywidgets -q

Note: you may need to restart the kernel to use updated packages.


In [8]:
!DS_BUILD_CPU_ADAM=1
%pip install deepspeed -U

Note: you may need to restart the kernel to use updated packages.


In [7]:
!python -m deepspeed.env_report

[2024-03-14 18:38:39,761] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [93m[NO][0m ....... [92m[OKAY][0m
cpu_adagrad ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_lion ............... 

In [1]:
!autotrain setup --update-torch

> [1mINFO    Installing latest xformers[0m
> [1mINFO    Successfully installed latest xformers[0m
> [1mINFO    Installing latest PyTorch[0m
> [1mINFO    Successfully installed latest PyTorch[0m


## Step 2: Connect to HuggingFace for Model Upload

### Logging to Hugging Face
To make sure the model can be uploaded to be used for Inference, it's necessary to log in to the Hugging Face hub.

### Getting a Hugging Face token
Steps:

1. Navigate to this URL: https://huggingface.co/settings/tokens
2. Create a write `token` and copy it to your clipboard
3. Run the code below and enter your `token`

In [2]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Step 3: Upload your dataset

Add your data set to the root directory in the Colab under the name train.csv. The AutoTrain command will look for your data there under that name.

#### Don't have a data set and want to try finetuning on an example data set?
If you don't have a dataset you can run these commands below to get an example data set and save it to train.csv

In [3]:
!git clone https://github.com/joshbickett/finetune-llama-2.git
%cd finetune-llama-2
%mv train.csv ../train.csv
%cd ..

Cloning into 'finetune-llama-2'...
remote: Enumerating objects: 70, done.[K
remote: Counting objects: 100% (70/70), done.[K
remote: Compressing objects: 100% (50/50), done.[K
remote: Total 70 (delta 38), reused 48 (delta 19), pack-reused 0[K
Receiving objects: 100% (70/70), 25.13 KiB | 5.03 MiB/s, done.
Resolving deltas: 100% (38/38), done.
/home/katopz/book/src/ml/finetuning/sft/finetune-llama-2
/home/katopz/book/src/ml/finetuning/sft


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [3]:
import pandas as pd
df = pd.read_csv("data/train.csv")
df

Unnamed: 0,Concept,Funny Description Prompt,text
0,A cactus at a dance party,"A cactus, wearing disco lights and surrounded ...",###Human:\nGenerate a midjourney prompt for A ...
1,A robot on a first date,"A robot, with a bouquet of USB cables, nervous...",###Human:\nGenerate a midjourney prompt for A ...
2,A snail at a speed contest,"A snail, with a mini rocket booster, confident...",###Human:\nGenerate a midjourney prompt for A ...
3,A penguin at a beach party,"A penguin, with sunscreen and a surfboard, try...",###Human:\nGenerate a midjourney prompt for A ...
4,A cloud in a bad mood,"A cloud, grumbling and dropping mini lightning...",###Human:\nGenerate a midjourney prompt for A ...
...,...,...,...
112,A donut feeling the hole emptiness,"A donut, in existential bakery therapy, ponder...",###Human:\nGenerate a midjourney prompt for A ...
113,A pineapple with a prickly attitude,"A pineapple, in a prickly personality class, s...",###Human:\nGenerate a midjourney prompt for A ...
114,A calculator crunching life's problems,"A calculator, at a problem-solving workshop, c...",###Human:\nGenerate a midjourney prompt for A ...
115,A kite reaching new heights,"A kite, in an altitude adjustment session, unt...",###Human:\nGenerate a midjourney prompt for A ...


In [4]:
df['text'][15]

'###Human:\nGenerate a midjourney prompt for A book on a mystery adventure\n\n###Assistant:\nA book, wearing detective glasses, flipping through its own pages, trying to solve the cliffhanger it was left on.'

## Step 4: Overview of AutoTrain command

#### Short overview of what the command flags do.

- `!autotrain`: Command executed in environments like a Jupyter notebook to run shell commands directly. `autotrain` is an automatic training utility.

- `llm`: A sub-command or argument specifying the type of task

- `--train`: Initiates the training process.

- `--project_name`: Sets the name of the project

- `--model abhishek/llama-2-7b-hf-small-shards`: Specifies original model that is hosted on Hugging Face named "llama-2-7b-hf-small-shards" under the "abhishek".

- `--data_path .`: The path to the dataset for training. The "." refers to the current directory. The `train.csv` file needs to be located in this directory.

- `--use_int4`: Use of INT4 quantization to reduce model size and speed up inference times at the cost of some precision.

- `--learning_rate 2e-4`: Sets the learning rate for training to 0.0002.

- `--train_batch_size 12`: Sets the batch size for training to 12.

- `--num_train_epochs 3`: The training process will iterate over the dataset 3 times.

### Steps needed before running
Go to the `!autotrain` code cell below and update it by following the steps below:

1. After `--project_name` replace `*enter-a-project-name*` with the name that you'd like to call the project
2. After `--repo_id` replace `*username*/*repository*`. Replace `*username*` with your Hugging Face username and `*repository*` with the repository name you'd like it to be created under. You don't need to create this repository before hand, it will automatically be created and uploaded once the training is completed.
3. Confirm that `train.csv` is in the root directory in the Colab. The `--data_path .` flag will make it so that AutoTrain looks for your data there.
4. Make sure to add the LoRA Target Modules to be trained `--target-modules q_proj, v_proj`
5. Once you've made these changes you're all set, run the command below!

In [5]:
!autotrain llm --train \
    --project-name foobar \
    --model mistralai/Mistral-7B-Instruct-v0.2 \
    --data-path data \
    --lr 2e-4 \
    --batch-size 12 \
    --epochs 3 \
    --trainer sft \
    --target-modules q_proj,v_proj \
    --quantization int4 \
    --peft \
    

> [1mINFO    Running LLM[0m
> [1mINFO    Params: Namespace(version=False, text_column='text', rejected_text_column='rejected', prompt_text_column='prompt', model_ref=None, warmup_ratio=0.1, optimizer='adamw_torch', scheduler='linear', weight_decay=0.0, max_grad_norm=1.0, add_eos_token=False, block_size=-1, peft=True, lora_r=16, lora_alpha=32, lora_dropout=0.05, logging_steps=-1, evaluation_strategy='epoch', save_total_limit=1, save_strategy='epoch', auto_find_batch_size=False, mixed_precision=None, quantization='int4', model_max_length=1024, trainer='sft', target_modules='q_proj,v_proj', merge_adapter=False, use_flash_attention_2=False, dpo_beta=0.1, chat_template=None, padding=None, train=True, deploy=False, inference=False, username=None, backend='local-cli', token=None, repo_id=None, push_to_hub=False, model='mistralai/Mistral-7B-Instruct-v0.2', project_name='foobar', seed=42, epochs=3, gradient_accumulation=1, disable_gradient_checkpointing=False, lr=0.0002, log='none', data_pat

## Step 5: Completed 🎉
After the command above is completed your Model will be uploaded to Hugging Face.

#### Learn more about AutoTrain (optional)
If you want to learn more about what command-line flags are available

## Step 6: Inference Engine

In [8]:
!autotrain llm -h

usage: autotrain <command> [<args>] llm [-h] [--text_column TEXT_COLUMN]
                                        [--rejected_text_column REJECTED_TEXT_COLUMN]
                                        [--prompt-text-column PROMPT_TEXT_COLUMN]
                                        [--model-ref MODEL_REF]
                                        [--warmup_ratio WARMUP_RATIO]
                                        [--optimizer OPTIMIZER]
                                        [--scheduler SCHEDULER]
                                        [--weight_decay WEIGHT_DECAY]
                                        [--max_grad_norm MAX_GRAD_NORM]
                                        [--add_eos_token]
                                        [--block_size BLOCK_SIZE] [--peft]
                                        [--lora_r LORA_R]
                                        [--lora_alpha LORA_ALPHA]
                                        [--lora_dropout LORA_DROPOUT]
                            

In [10]:
%pip install -q peft  accelerate bitsandbytes safetensors

Note: you may need to restart the kernel to use updated packages.


In [None]:
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
adapters_name = "ashishpatel26/mistral-7b-mj-finetuned"
model_name = "bn22/Mistral-7B-Instruct-v0.1-sharded" #"mistralai/Mistral-7B-Instruct-v0.1"


device = "cuda" # the device to load the model onto

In [None]:
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,
    device_map='auto'
)

Loading checkpoint shards:   0%|          | 0/11 [00:00<?, ?it/s]

## Step 7: Peft Model Loading with upload model

In [None]:
model = PeftModel.from_pretrained(model, adapters_name)

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.bos_token_id = 1

stop_token_ids = [0]

print(f"Successfully loaded the model {model_name} into memory")

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Successfully loaded the model bn22/Mistral-7B-Instruct-v0.1-sharded into memory


In [None]:
text = "[INST] generate a midjourney prompt for A person walks in the rain [/INST]"

encoded = tokenizer(text, return_tensors="pt", add_special_tokens=False)
model_input = encoded
model.to(device)
generated_ids = model.generate(**model_input, max_new_tokens=200, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[INST] generate a midjourney prompt for A person walks in the rain [/INST] "As you wander through the pouring rain, you can't help but wonder what the world would be like if things were different. What if the rain was a symbol of the turmoil in your life, and the sunshine promised a brighter future? What if you suddenly found yourself lost in a small town where time stood still, and the people were trapped in a time loop? As you struggle to find your way back to reality, you discover a mysterious stranger who seems to hold the key to unlocking the secrets of the town and your own past."</s>
